首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Complete sequences of multiple strains of the same microbial species provide an invaluable source for studying the evolutionary dynamics between orthologous genes over a relatively short time scale. Usually the intensity of the selection pressure is inferred from a comparison between the nonsynonymous substitution rate and the synonymous substitution rate. In this paper, we propose an alternative method for detecting genes with one or more fast-evolving regions from pairwise comparisons of orthologous genes. Our method looks for regions with overrepresented nonsynonymous mutations along the alignment, and requires a higher nonsynonymous evolution rate in those regions than the neutral evolution rate. It identifies gene targets under intensive selection pressure that are not detected from the conventional rate comparison analysis. For those identified genes with known annotations, most of them have a clear role in processes such as bacterial defense and host–pathogen interactions. Gene sets reported from our method provide a measure of the phenotypic divergence between two closely related genomes.  相似文献   

3.
4.
Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ∼23 Mb genomes encoding ∼5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome alignments of seven Plasmodium species, we show that protein-coding, intergenic and intronic regions are all subject to purifying selection and we identify 670 conserved non-genic elements. We then use genome-wide polymorphism data from P. falciparum to describe short-term selective processes in this species and identify some candidate genes for balancing (diversifying) selection. Our analyses suggest that there are many functional elements in the non-genic regions of these genomes and that adaptive evolution has occurred more frequently in the protein-coding regions of the genome.  相似文献   

5.
Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.  相似文献   

6.
7.
8.
We present a survey for non-coding RNAs and other structured RNA motifs in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae using the RNAz program. This approach explicitly evaluates comparative sequence information to detect stabilizing selection acting on RNA secondary structure. We detect 3,672 structured RNA motifs, of which only 678 are known non-translated RNAs (ncRNAs) or clear homologs of known C. elegans ncRNAs. Most of these signals are located in introns or at a distance from known protein-coding genes. With an estimated false positive rate of about 50% and a sensitivity on the order of 50%, we estimate that the nematode genomes contain between 3,000 and 4,000 RNAs with evolutionary conserved secondary structures. Only a small fraction of these belongs to the known RNA classes, including tRNAs, snoRNAs, snRNAs, or microRNAs. A relatively small class of ncRNA candidates is associated with previously observed RNA-specific upstream elements.  相似文献   

9.
Recent studies have demonstrated that non-coding RNAs (ncRNAs) play important roles during development and evolution. Chicken, the first genome-sequenced non-mammalian amniote, possesses unique features for developmental and evolutionary studies. However, apart from microRNAs, information on chicken ncRNAs has mainly been obtained from computational predictions without experimental validation. In the present study, we performed a systematic identification of intermediate size ncRNAs (50–500 nt) by ncRNA library construction and identified 125 chicken ncRNAs. Importantly, through the bioinformatics and expression analysis, we found the chicken ncRNAs has several novel features: (i) comparative genomic analysis against 18 sequenced vertebrate genomes revealed that the majority of the newly identified ncRNA candidates is not conserved and most are potentially bird/chicken specific, suggesting that ncRNAs play roles in lineage/species specification during evolution. (ii) The expression pattern analysis of intronic snoRNAs and their host genes suggested the coordinated expression between snoRNAs and their host genes. (iii) Several spatio-temporal specific expression patterns suggest involvement of ncRNAs in tissue development. Together, these findings provide new clues for future functional study of ncRNAs during development and evolution.  相似文献   

10.
Several previous comparisons of the human genome with other primate and vertebrate genomes identified genomic regions that are highly conserved in vertebrate evolution but fast-evolving on the human lineage. These human accelerated regions (HARs) may be regions of past adaptive evolution in humans. Alternatively, they may be the result of non-adaptive processes, such as biased gene conversion. We captured and sequenced DNA from a collection of previously published HARs using DNA from an Iberian Neandertal. Combining these new data with shotgun sequence from the Neandertal and Denisova draft genomes, we determine at least one archaic hominin allele for 84% of all positions within HARs. We find that 8% of HAR substitutions are not observed in the archaic hominins and are thus recent in the sense that the derived allele had not come to fixation in the common ancestor of modern humans and archaic hominins. Further, we find that recent substitutions in HARs tend to have come to fixation faster than substitutions elsewhere in the genome and that substitutions in HARs tend to cluster in time, consistent with an episodic rather than a clock-like process underlying HAR evolution. Our catalog of sequence changes in HARs will help prioritize them for functional studies of genomic elements potentially responsible for modern human adaptations.  相似文献   

11.
Plant genomes have undergone multiple rounds of duplications that contributed massively to the growth of gene families. The structure of resulting families has been studied in depth for protein-coding genes. However, little is known about the impact of duplications on noncoding RNA (ncRNA) genes. Here we perform a systematic analysis of duplicated regions in the rice genome in search of such ncRNA repeats. We observe that, just like their protein counterparts, most ncRNA genes have undergone multiple duplications that left visible sequence conservation footprints. The extent of ncRNA gene duplication in plants is such that these sequence footprints can be exploited for the discovery of novel ncRNA gene families on a large scale. We developed an SVM model that is able to retrieve likely ncRNA candidates among the 100,000+ repeat families in the rice genome, with a reasonably low false-positive discovery rate. Among the nearly 4000 ncRNA families predicted by this means, only 90 correspond to putative snoRNA or miRNA families. About half of the remaining families are classified as structured RNAs. New candidate ncRNAs are particularly enriched in UTR and intronic regions. Interestingly, 89% of the putative ncRNA families do not produce a detectable signal when their sequences are compared to another grass genome such as maize. Our results show that a large fraction of rice ncRNA genes are present in multiple copies and are species-specific or of recent origin. Intragenome comparison is a unique and potent source for the computational annotation of this major class of ncRNA.  相似文献   

12.
13.
14.
15.
16.
17.
In the human HOXA locus a number of ncRNAs are transcribed from the intergenic regions in the opposite direction to HOXA mRNAs. We observed that the genomic organization of genes for the ncRNAs and HOXA proteins is highly conserved between human and mouse. We examined the expression profiles of these ncRNAs and HOXA mRNAs in various human tissues. The expression patterns of ncRNAs in human tissues coincide with those of the adjacent HOXA mRNAs that are collinearly expressed along the anteroposterior axis. This coordinated expression was observed even in transformed tumors and cancer cell lines, suggesting that the expression of ncRNAs is prerequisite for the regulated expression of HOXA genes. HIT18844 ncRNA transcribed from the most upstream position of the HOXA cluster possesses an ultra-conserved short stretch which potentially forms an evolutionarily conserved secondary structure. Our data suggest a critical role for ncRNAs in the regulation of HOXA gene expression.  相似文献   

18.
19.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号