首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
人类基因组上的假基因   总被引:5,自引:0,他引:5  
周光金  余龙  赵寿元 《生命科学》2004,16(4):210-214,230
假基因是基因组上与编码基因序列非常相似的非功能性基因组DNA拷贝,一般情况都不被转录,且没有明确生理意义。假基因根据其来源可分为复制假基因和已加工假基因。迄今为止,明确鉴定的人类假基因多为已加工假基因,有8000个之多。在Swiss-Prot/TrEMBL收录的编码蛋白质的将近25500个基因序列中,约10%在基因组中有一个或多个近全长已加工假基因。其余的功能基因都没有已加工假基因。核糖体蛋白基因具有最多数量的已加工假基因,约有l700个(占已加工假基因数的22%),少数基因,如cyclophilinA、肌动蛋白(actin)、角蛋白(keratin)、GAPDH、细胞色素C(cytochromec)和nucleophosmin等则有很多份已加工假基因。总体上讲,假基因在人类染色体上的分布与染色体长度成比例,但已加工假基因在GC含量为41%~46%的染色体区域密度最高。已加工假基因的拷贝数和功能基因在生殖器官中的表达高度一致,说明许多假基因发生在胚胎阶段,另外也和基因中GC含量和基因大小密切相关。假基因的准确鉴定对基因组进化、分子医学研究和医学应用具有重要意义。  相似文献   

2.
3.
4.
5.
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the ‘current’ proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences (‘the orfome’). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes (‘dead’ genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)  相似文献   

6.
Eleven daughters of NANOG   总被引:6,自引:0,他引:6  
Booth HA  Holland PW 《Genomics》2004,84(2):229-238
Nanog is a recently discovered ANTP class homeobox gene. Mouse Nanog is expressed in the inner cell mass and in embryonic stem cells and has roles in self-renewal and maintenance of pluripotency. Here we describe the location, genomic organization, and relative ages of all human NANOG pseudogenes, comprising ten processed pseudogenes and one tandem duplicate. These are compared to the original, intact human NANOG gene. Eleven is an unusually high number of pseudogenes for a homeobox gene and must reflect expression in the human germ line. A pseudogene orthologous to NANOGP4 was found in chimpanzee and an expressed pseudogene in macaque. Examining pseudogenes of differing ages gives insight into pseudogene decay, which involves an excess of deletion mutations over insertions. The mouse genome has two processed pseudogenes, which are not clear orthologues of the primate pseudogenes.  相似文献   

7.
8.
9.
Homma K  Fukuchi S  Kawabata T  Ota M  Nishikawa K 《Gene》2002,294(1-2):25-33
Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology to known protein-coding genes. Although pseudogenes were reported to exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157 revealed that many protein-coding sequences have prematurely terminated orthologs encoding unstable proteins. To systematically screen for pseudogenes, we selected ORFs generated by premature termination of the orthologous protein-coding genes and subsequently excluded those possibly arising from sequence errors. Lastly we eliminated those with close homologs in this and other species, as these shortened ORFs may actually have functions. The process produced 95 and 101 pseudogene candidates in K-12 and O157, respectively. The assigned three-dimensional structures suggest that most of the encoded proteins cannot fold properly and thus are dysfunctional, indicating that they are probably pseudogenes. Therefore, the existence of a significant number of probable pseudogenes in E. coli is predicted, awaiting experimental verification. Most of them were found to be genes with paralogs or horizontally transferred genes or both. We suggest that pseudogenes constitute a small fraction of the genomes of free-living bacteria in general, reflecting the faster elimination than production of pseudogenes.  相似文献   

10.
Pseudogenes are nonfunctional copies of protein-coding genes that are presumed to evolve without selective constraints on their coding function. They are of considerable utility in evolutionary genetics because, in the absence of selection, different types of mutations in pseudogenes should have equal probabilities of fixation. This theoretical inference justifies the estimation of patterns of spontaneous mutation from the analysis of patterns of substitutions in pseudogenes. Although it is possible to test whether pseudogene sequences evolve without constraints for their protein-coding function, it is much more difficult to ascertain whether pseudogenes may affect fitness in ways unrelated to their nucleotide sequence. Consider the possibility that a pseudogene affects fitness merely by increasing genome size. If a larger genome is deleterious--for example, because of increased energetic costs associated with genome replication and maintenance--then deletions, which decrease the length of a pseudogene, should be selectively advantageous relative to insertions or nucleotide substitutions. In this article we examine the implications of selection for genome size relative to small (1-400 bp) deletions, in light of empirical evidence pertaining to the size distribution of deletions observed in Drosophila and mammalian pseudogenes. There is a large difference in the deletion spectra between these organisms. We argue that this difference cannot easily be attributed to selection for overall genome size, since the magnitude of selection is unlikely to be strong enough to significantly affect the probability of fixation of small deletions in Drosophila.  相似文献   

11.
12.
13.
The aim of this article is to demonstrate possible recombination‐associated evolutionary forces affecting the genomic distribution of processed pseudogenes. The relationship between recombination rate and the distribution of processed pseudogenes is analysed in the human genome. The results show that processed pseudogenes preferentially accumulate in regions of low recombination rates and this correlation cannot be explained by indirect relationships with GC content and gene density. Several explanatory models for the observation are discussed. A model of selection against ectopic recombination is tested based on the difference in distribution pattern between two classes of processed pseudogenes, which differ in the possibility of stimulating ectopic recombination. Our results indicate that the correlation between processed pseudogene density and recombination rate is probably results, in part, from the selection against ectopic recombination between closely located homologous processed pseudogenes. We also found a length effect in processed pseudogene distribution, namely long processed pseudogenes are located more preferentially in regions of low recombination rates than short ones.  相似文献   

14.
Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21 and 22 only). Each of our nearly 2500 pseudogenes is characterized by one or more disablements mid-domain, such as premature stops and frameshifts. Here, we perform a comprehensive survey of the amino acid and nucleotide composition of these pseudogenes in comparison to that of functional genes and intergenic DNA. We show that pseudogenes invariably have an amino acid composition intermediate between genes and translated intergenic DNA. Although the degree of intermediacy varies among the four organisms, in all cases, it is most evident for amino acid types that differ most in occurrence between genes and intergenic regions. The same intermediacy also applies to codon frequencies, especially in the worm and human. Moreover, the intermediate composition of pseudogenes applies even though the composition of the genes in the four organisms is markedly different, showing a strong correlation with the overall A/T content of the genomic sequence. Pseudogenes can be divided into ‘ancient’ and ‘modern’ subsets, based on the level of sequence identity with their closest matching homolog (within the same genome). Modern pseudogenes usually have a much closer sequence composition to genes than ancient pseudogenes. Collectively, our results indicate that the composition of pseudogenes that are under no selective constraints progressively drifts from that of coding DNA towards non-coding DNA. Therefore, we propose that the degree to which pseudogenes approach a random sequence composition may be useful in dating different sets of pseudogenes, as well as to assess the rate at which intergenic DNA accumulates mutations. Our compositional analyses with the interactive viewer are available over the web at http://genecensus.org/pseudogene.  相似文献   

15.
16.
17.
The structure of the human gene encoding the mitochondrial outer membrane receptor Tom20 has been determined from overlapping clones obtained using PCR-based techniques. The 20kb human Tom20 gene (hTom20) consists of five exons separated by four introns. The 5' flanking region presents features common with other nuclear genes encoding mitochondrial proteins. Comparison with its homologs and putative homologs in other species has revealed common features in their TPR motifs and other relevant protein domains. Aspects concerning evolutionary origins of the family of processed pseudogenes of hTom20 are also discussed.  相似文献   

18.
19.
Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity ≥90% and length ≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents’) characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a ‘parent pseudogene’, followed by further duplication creating duplicated–duplicated or duplicated–processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号