首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21 and 22 only). Each of our nearly 2500 pseudogenes is characterized by one or more disablements mid-domain, such as premature stops and frameshifts. Here, we perform a comprehensive survey of the amino acid and nucleotide composition of these pseudogenes in comparison to that of functional genes and intergenic DNA. We show that pseudogenes invariably have an amino acid composition intermediate between genes and translated intergenic DNA. Although the degree of intermediacy varies among the four organisms, in all cases, it is most evident for amino acid types that differ most in occurrence between genes and intergenic regions. The same intermediacy also applies to codon frequencies, especially in the worm and human. Moreover, the intermediate composition of pseudogenes applies even though the composition of the genes in the four organisms is markedly different, showing a strong correlation with the overall A/T content of the genomic sequence. Pseudogenes can be divided into ‘ancient’ and ‘modern’ subsets, based on the level of sequence identity with their closest matching homolog (within the same genome). Modern pseudogenes usually have a much closer sequence composition to genes than ancient pseudogenes. Collectively, our results indicate that the composition of pseudogenes that are under no selective constraints progressively drifts from that of coding DNA towards non-coding DNA. Therefore, we propose that the degree to which pseudogenes approach a random sequence composition may be useful in dating different sets of pseudogenes, as well as to assess the rate at which intergenic DNA accumulates mutations. Our compositional analyses with the interactive viewer are available over the web at http://genecensus.org/pseudogene.  相似文献   

2.
3.
4.
5.
6.
7.
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the ‘current’ proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences (‘the orfome’). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes (‘dead’ genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)  相似文献   

8.
Yuan JD  Shi JX  Meng GX  An LG  Hu GX 《Cell research》1999,9(4):281-290
INTRODUCTIONNuclearpseudogenesofmitochondrial(mt)DNAwereinitiallydiscoveredintheearly80's[1--6].However,mechanismsforthegenerationofmtDNApseudogenesarestillnotclearandmayvaryindifferentcases.BothRNA--[7--8]andDNAmediated[9--11]processeshavebeensugges...  相似文献   

9.
10.
The aim of this article is to demonstrate possible recombination‐associated evolutionary forces affecting the genomic distribution of processed pseudogenes. The relationship between recombination rate and the distribution of processed pseudogenes is analysed in the human genome. The results show that processed pseudogenes preferentially accumulate in regions of low recombination rates and this correlation cannot be explained by indirect relationships with GC content and gene density. Several explanatory models for the observation are discussed. A model of selection against ectopic recombination is tested based on the difference in distribution pattern between two classes of processed pseudogenes, which differ in the possibility of stimulating ectopic recombination. Our results indicate that the correlation between processed pseudogene density and recombination rate is probably results, in part, from the selection against ectopic recombination between closely located homologous processed pseudogenes. We also found a length effect in processed pseudogene distribution, namely long processed pseudogenes are located more preferentially in regions of low recombination rates than short ones.  相似文献   

11.
12.
MOTIVATION: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. RESULTS: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" -- i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.  相似文献   

13.
14.
Eleven daughters of NANOG   总被引:6,自引:0,他引:6  
Booth HA  Holland PW 《Genomics》2004,84(2):229-238
Nanog is a recently discovered ANTP class homeobox gene. Mouse Nanog is expressed in the inner cell mass and in embryonic stem cells and has roles in self-renewal and maintenance of pluripotency. Here we describe the location, genomic organization, and relative ages of all human NANOG pseudogenes, comprising ten processed pseudogenes and one tandem duplicate. These are compared to the original, intact human NANOG gene. Eleven is an unusually high number of pseudogenes for a homeobox gene and must reflect expression in the human germ line. A pseudogene orthologous to NANOGP4 was found in chimpanzee and an expressed pseudogene in macaque. Examining pseudogenes of differing ages gives insight into pseudogene decay, which involves an excess of deletion mutations over insertions. The mouse genome has two processed pseudogenes, which are not clear orthologues of the primate pseudogenes.  相似文献   

15.
We present a new likelihood method for detecting constrained evolution at synonymous sites and other forms of nonneutral evolution in putative pseudogenes. The model is applicable whenever the DNA sequence is available from a protein-coding functional gene, a pseudogene derived from the protein-coding gene, and an orthologous functional copy of the gene. Two nested likelihood ratio tests are developed to test the hypotheses that (1) the putative pseudogene has equal rates of silent and replacement substitutions; and (2) the rate of synonymous substitution in the functional gene equals the rate of substitution in the pseudogene. The method is applied to a data set containing 74 human processed-pseudogene loci, 25 mouse processed-pseudogene loci, and 22 rat processed-pseudogene loci. Using the informatics resources of the Human Genome Project, we localized 67 of the human-pseudogene pairs in the genome and estimated the GC content of a large surrounding genomic region for each. We find that, for pseudogenes deposited in GC regions similar to those of their paralogs, the assumption of equal rates of silent and replacement site evolution in the pseudogene is upheld; in these cases, the rate of silent site evolution in the functional genes is approximately 70% the rate of evolution in the pseudogene. On the other hand, for pseudogenes located in genomic regions of much lower GC than their functional gene, we see a sharp increase in the rate of silent site substitutions, leading to a large rate of rejection for the pseudogene equality likelihood ratio test.  相似文献   

16.
分析了人类加工假基因在染色体上的分布,发现加工假基因密度与重组率负相关,而与基因密度正相关。加工假基因在低重组区的积累与插入有害模型和异位重组模型相吻合:在插入有害模型下,低重组区的选择强度由于Hill.Robertson干涉而变弱,所以加工假基因较多地插入到低重组区;在异位重组模型下,同源加工假基因家族(包括同源祖先基因)之内可能发生异位重组而对机体造成危害,所以加工假基因在高重组区的插入受到较强的负选择,导致加工假基因较多地分布在低重组区。除以上两种模型以外,加工假基因还可能通过降低重组率的方式对加工假基因密度与重组率的负相关有所贡献。加工假基因偏好分布在基因密区,这可能与异位重组在该区较少发生有关。  相似文献   

17.
18.
We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.  相似文献   

19.
We sequenced three argininosuccinate-synthetase-processed pseudogenes (ΨAS-A1, ΨAS-A3, ΨAS-3) and their noncoding flanking sequences in human, orangutan, baboon, and colobus. Our data showed that these pseudogenes were incorporated into the genome of the Old World monkeys after the divergence of the Old World and New World monkey lineages. These pseudogene flanking regions show variable mutation rates and patterns. The variation in the G/C to A/T mutation rate (u) can account for the unequal GC contents at equilibrium: 34.9, 36.9, and 41.7% in the pseudogene ΨAS-A1, ΨAS-A3, and ΨAS-3 flanking regions, respectively. The A/T to G/C mutation rate (v) seems stable and the u/v ratios equal 1.9, 1.7, and 1.4 in the flanking regions of ΨAS-A1, ΨAS-A3, and ΨAS-3, respectively. These ``regional' variations of the mutation rate affect the evolution of the pseudogenes, too. The ratio u/v being greater than 1.0 in each case, the overall mutation rate in the GC-rich pseudogenes is, as expected, higher than in their GC-poor flanking regions. Moreover, a ``sequence effect' has been found. In the three cases examined u and v are higher (at least 20%) in the pseudogene than in its flanking region—i.e., the pseudogene appears as mutation ``hot' spots embedded in ``cold' regions. This observation could be partly linked to the fact that the pseudogene flanking regions are long-standing unconstrained DNA sequences, whereas the pseudogenes were relieved of selection on their coding functions only around 30–40 million years ago. We suspect that relatively more mutable sites maintained unchanged during the evolution of the argininosuccinate gene are able to change in the pseudogenes, such sites being eliminated or rare in the flanking regions which have been void of strong selective constraints over a much longer period. Our results shed light on (1) the multiplicity of factors that tune the spontaneous mutation rate and (2) the impact of the genomic position of a sequence on its evolution. Received: 10 February 1997 / Accepted: 21 April 1997  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号