首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.  相似文献   

2.
Sakai H  Tanaka T  Itoh T 《Gene》2007,392(1-2):59-63
Despite a wide distribution of transposable elements (TEs) in the genomes of higher eukaryotes, much of their evolutionary significance remains unclear. Recent studies have indicated that TEs are involved with biological processes such as gene regulation and the generation of new exons in mammals. In addition, the completion of the genome sequencings in Arabidopsis thaliana and Oryza sativa has permitted scientist to describe a genome-wide overview in plants. In this study, we examined the positions of TEs in the genome of O. sativa. Although we found that more than 10% of the structural genes contained TEs, they were underrepresented in exons compared with non-exonic regions. TEs also appeared to be inserted preferentially in 3'-untranslated regions in exons. These results suggested that purifying selection against TE insertion has played a major role during evolution. Moreover, our comparison of the numbers of TEs in the protein-coding regions between single copy genes and duplicate genes showed that TEs were more frequent in duplicate than single copy genes. This observation indicated that gene duplication events created a large number of functionally redundant genes. Subsequently, many of them were destroyed by TEs because the redundant copies were released from purifying selection. Another biological role of TEs was found to be the recruitment of new exons. We found that approximately 2% of protein-coding genes contained TEs in their coding regions. Insertion of TEs in genic regions may have the potential to be an evolutionary driving force for the creation of new biological functions.  相似文献   

3.
Apicomplexan parasites of the genus Plasmodium, pathogens causing malaria, and the genera Babesia and Theileria, aetiological agents of piroplasmosis, are closely related. However, their mitochondrial (mt) genome structures are highly divergent: Plasmodium has a concatemer of 6-kb unit and Babesia/Theileria a monomer of 6.6- to 8.2-kb with terminal inverted repeats. Fragmentation of ribosomal RNA (rRNA) genes and gene arrangements are remarkably distinctive. To elucidate the evolutionary origin of this structural divergence, we determined the mt genome of Eimeria tenella, pathogens of coccidiosis in domestic fowls. Analysis revealed that E. tenella mt genome was concatemeric with similar protein-coding genes and rRNA gene fragments to Plasmodium. Copy number was 50-fold of the nuclear genome. Evolution of structural divergence in the apicomplexan mt genomes is discussed.  相似文献   

4.
Biologists routinely use molecular markers to identify conservation units, to quantify genetic connectivity, to estimate population sizes, and to identify targets of selection. Many imperiled eagle populations require such efforts and would benefit from enhanced genomic resources. We sequenced, assembled, and annotated the first eagle genome using DNA from a male golden eagle (Aquila chrysaetos) captured in western North America. We constructed genomic libraries that were sequenced using Illumina technology and assembled the high-quality data to a depth of ∼40x coverage. The genome assembly includes 2,552 scaffolds >10 Kb and 415 scaffolds >1.2 Mb. We annotated 16,571 genes that are involved in myriad biological processes, including such disparate traits as beak formation and color vision. We also identified repetitive regions spanning 92 Mb (∼6% of the assembly), including LINES, SINES, LTR-RTs and DNA transposons. The mitochondrial genome encompasses 17,332 bp and is ∼91% identical to the Mountain Hawk-Eagle (Nisaetus nipalensis). Finally, the data reveal that several anonymous microsatellites commonly used for population studies are embedded within protein-coding genes and thus may not have evolved in a neutral fashion. Because the genome sequence includes ∼800,000 novel polymorphisms, markers can now be chosen based on their proximity to functional genes involved in migration, carnivory, and other biological processes.  相似文献   

5.
Identification of the full complement of genes and other functional elements in any virus is crucial to fully understand its molecular biology and guide the development of effective control strategies. RNA viruses have compact multifunctional genomes that frequently contain overlapping genes and non-coding functional elements embedded within protein-coding sequences. Overlapping features often escape detection because it can be difficult to disentangle the multiple roles of the constituent nucleotides via mutational analyses, while high-throughput experimental techniques are often unable to distinguish functional elements from incidental features. However, RNA viruses evolve very rapidly so that, even within a single species, substitutions rapidly accumulate at neutral or near-neutral sites providing great potential for comparative genomics to distinguish the signature of purifying selection. Computationally identified features can then be efficiently targeted for experimental analysis. Here we analyze alignments of protein-coding virus sequences to identify regions where there is a statistically significant reduction in the degree of variability at synonymous sites, a characteristic signature of overlapping functional elements. Having previously tested this technique by experimental verification of discoveries in selected viruses, we now analyze sequence alignments for ∼700 RNA virus species to identify hundreds of such regions, many of which have not been previously described.  相似文献   

6.
《遗传学报》2020,47(1):49-60
Noncoding RNAs(ncRNAs) play important roles in many biological processes and provide materials for evolutionary adaptations beyond protein-coding genes, such as in the arms race between the host and pathogen. However, currently, a comprehensive high-resolution analysis of primate genomes that includes the latest annotated ncRNAs is not available. Here, we developed a computational pipeline to estimate the selections that act on noncoding regions based on comparisons with a large number of reference sequences in introns adjacent to the interested regions. Our method yields result comparable with those of the established codon-based method and phyloP method for coding genes; thus, it provides a holistic framework for estimating the selection on the entire genome. We further showed that fastevolving protein-coding genes and their corresponding 50 UTRs have a significantly lower frequency of the CpG dinucleotides than those evolving at an average pace, and these fast-evolving genes are enriched in the process of immunity and host defense. We also identified fast-evolving miRNAs with antiviral functions in cells. Our results provide a resource for high-resolution evolution analysis of the primate genomes.  相似文献   

7.
The unprecedented pace of the sequencing of the SARS-CoV-2 virus genomes provides us with unique information about the genetic changes in a single pathogen during ongoing pandemic. By the analysis of close to 200,000 genomes we show that the patterns of the SARS-CoV-2 virus mutations along its genome are closely correlated with the structural and functional features of the encoded proteins. Requirements of foldability of proteins’ 3D structures and the conservation of their key functional regions, such as protein-protein interaction interfaces, are the dominant factors driving evolutionary selection in protein-coding genes. At the same time, avoidance of the host immunity leads to the abundance of mutations in other regions, resulting in high variability of the missense mutation rate along the genome. “Unexplained” peaks and valleys in the mutation rate provide hints on function for yet uncharacterized genomic regions and specific protein structural and functional features they code for. Some of these observations have immediate practical implications for the selection of target regions for PCR-based COVID-19 tests and for evaluating the risk of mutations in epitopes targeted by specific antibodies and vaccine design strategies.  相似文献   

8.
Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.  相似文献   

9.
The current knowledge on genomes of non-falciparum malaria species and the potential of model malaria parasites for functional analyses are reviewed and compared with those of the most pathogenic human parasite, Plasmodium falciparum. There are remarkable similarities in overall genome composition among the different species at the level of chromosome organisation and chromosome number, conserved order of individual genes, and even conserved functions of specific gene domains and regulatory control elements. With the initiative taken to sequence the genome of P. falciparum, a wealth of information is already becoming available to the scientific community. In order to exploit the biological information content of a complete genome sequence, simple storage of the bulk of sequence data will be inadequate. The requirement for functional analyses to determine the biological role of the open reading frames is commonly accepted and knowledge of the genomes of the animal model malaria species will facilitate these analyses. Detailed comparative genome information and sequencing of additional Plasmodium genomes will provide a deeper insight into the evolutionary history of the species, the biology of the parasite, and its interactions with the mammalian host and mosquito vector. Therefore, an extended and integrated approach will enhance our knowledge of malaria and will ultimately lead to a more rational approach that identifies and evaluates new targets for anti-malarial drug and vaccine development.  相似文献   

10.
Charles Darwin believed that all traits of organisms have been honed to near perfection by natural selection. The empirical basis underlying Darwin's conclusions consisted of numerous observations made by him and other naturalists on the exquisite adaptations of animals and plants to their natural habitats and on the impressive results of artificial selection. Darwin fully appreciated the importance of heredity but was unaware of the nature and, in fact, the very existence of genomes. A century and a half after the publication of the "Origin", we have the opportunity to draw conclusions from the comparisons of hundreds of genome sequences from all walks of life. These comparisons suggest that the dominant mode of genome evolution is quite different from that of the phenotypic evolution. The genomes of vertebrates, those purported paragons of biological perfection, turned out to be veritable junkyards of selfish genetic elements where only a small fraction of the genetic material is dedicated to encoding biologically relevant information. In sharp contrast, genomes of microbes and viruses are incomparably more compact, with most of the genetic material assigned to distinct biological functions. However, even in these genomes, the specific genome organization (gene order) is poorly conserved. The results of comparative genomics lead to the conclusion that the genome architecture is not a straightforward result of continuous adaptation but rather is determined by the balance between the selection pressure, that is itself dependent on the effective population size and mutation rate, the level of recombination, and the activity of selfish elements. Although genes and, in many cases, multigene regions of genomes possess elaborate architectures that ensure regulation of expression, these arrangements are evolutionarily volatile and typically change substantially even on short evolutionary scales when gene sequences diverge minimally. Thus, the observed genome architectures are, mostly, products of neutral processes or epiphenomena of more general selective processes, such as selection for genome streamlining in successful lineages with large populations. Selection for specific gene arrangements (elements of genome architecture) seems only to modulate the results of these processes.  相似文献   

11.
12.
The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.  相似文献   

13.
Popescu CE  Lee RW 《Genetics》2007,175(2):819-826
The mitochondrial genomes of the Chlorophyta exhibit significant diversity with respect to gene content and genome compactness; however, quantitative data on the rates of nucleotide substitution in mitochondrial DNA, which might help explain the origin of this diversity, are lacking. To gain insight into the evolutionary forces responsible for mitochondrial genome diversification, we sequenced to near completion the mitochondrial genome of the chlorophyte Chlamydomonas incerta, estimated the evolutionary divergence between Chlamydomonas reinhardtii and C. incerta mitochondrial protein-coding genes and rRNA-coding regions, and compared the relative evolutionary rates in mitochondrial and nuclear genes. Synonymous and nonsynonymous substitution rates do not differ significantly between the mitochondrial and nuclear protein-coding genes. The mitochondrial rRNA-coding regions, however, are evolving much faster than their nuclear counterparts, and this difference might be explained by relaxed functional constraints on the mitochondrial translational apparatus due to the small number of proteins synthesized in Chlamydomonas mitochondria. Substitution rates at synonymous sites in a nonstandard mitochondrial gene (rtl) and at intronic and synonymous sites in nuclear genes expressed at low levels suggest that the mutation rate is similar in these two genetic compartments. Potential evolutionary forces shaping mitochondrial genome evolution in Chlamydomonas are discussed.  相似文献   

14.
Genes underlying important phenotypic differences between Plasmodium species, the causative agents of malaria, are frequently found in only a subset of species and cluster at dynamically evolving subtelomeric regions of chromosomes. We hypothesized that chromosome-internal regions of Plasmodium genomes harbour additional species subset-specific genes that underlie differences in human pathogenicity, human-to-human transmissibility, and human virulence. We combined sequence similarity searches with synteny block analyses to identify species subset-specific genes in chromosome-internal regions of six published Plasmodium genomes, including Plasmodium falciparum, Plasmodium vivax, Plasmodium knowlesi, Plasmodium yoelii, Plasmodium berghei, and Plasmodium chabaudi. To improve comparative analysis, we first revised incorrectly annotated gene models using homology-based gene finders and examined putative subset-specific genes within syntenic contexts. Confirmed subset-specific genes were then analyzed for their role in biological pathways and examined for molecular functions using publicly available databases. We identified 16 genes that are well conserved in the three primate parasites but not found in rodent parasites, including three key enzymes of the thiamine (vitamin B1) biosynthesis pathway. Thirteen genes were found to be present in both human parasites but absent in the monkey parasite P. knowlesi, including genes specifically upregulated in sporozoites or gametocytes that could be linked to parasite transmission success between humans. Furthermore, we propose 15 chromosome-internal P. falciparum-specific genes as new candidate genes underlying increased human virulence and detected a currently uncharacterized cluster of P. vivax-specific genes on chromosome 6 likely involved in erythrocyte invasion. In conclusion, Plasmodium species harbour many chromosome-internal differences in the form of protein-coding genes, some of which are potentially linked to human disease and thus promising leads for future laboratory research.  相似文献   

15.
Malaria has been one of the strongest selective pressures on our species. Many of the best-characterized cases of adaptive evolution in humans are in genes tied to malaria resistance. However, the complex evolutionary patterns at these genes are poorly captured by standard scans for nonneutral evolution. Here, we present three new statistical tests for selection based on population genetic patterns that are observed more than once among key malaria resistance loci. We assess these tests using forward-time evolutionary simulations and apply them to global whole-genome sequencing data from humans, and thus we show that they are effective at distinguishing selection from neutrality. Each test captures a distinct evolutionary pattern, here called Divergent Haplotypes, Repeated Shifts, and Arrested Sweeps, associated with a particular period of human prehistory. We clarify the selective signatures at known malaria-relevant genes and identify additional genes showing similar adaptive evolutionary patterns. Among our top outliers, we see a particular enrichment for genes involved in erythropoiesis and for genes previously associated with malaria resistance, consistent with a major role for malaria in shaping these patterns of genetic diversity. Polymorphisms at these genes are likely to impact resistance to malaria infection and contribute to ongoing host–parasite coevolutionary dynamics.  相似文献   

16.
The genome of model malaria parasites, and comparative genomics   总被引:1,自引:0,他引:1  
The field of comparative genomics of malaria parasites has recently come of age with the completion of the whole genome sequences of the human malaria parasite Plasmodium falciparum and a rodent malaria model, Plasmodium yoelii yoelii. With several other genome sequencing projects of different model and human malaria parasite species underway, comparing genomes from multiple species has necessitated the development of improved informatics tools and analyses. Results from initial comparative analyses reveal striking conservation of gene synteny between malaria species within conserved chromosome cores, in contrast to reduced homology within subtelomeric regions, in line with previous findings on a smaller scale. Genes that elicit a host immune response are frequently found to be species-specific, although a large variant multigene family is common to many rodent malaria species and Plasmodium vivax. Sequence alignment of syntenic regions from multiple species has revealed the similarity between species in coding regions to be high relative to non-coding regions, and phylogenetic footprinting studies promise to reveal conserved motifs in the latter. Comparison of non-synonymous substitution rates between orthologous genes is proving a powerful technique for identifying genes under selection pressure, and may be useful for vaccine design. This is a stimulating time for comparative genomics of model and human malaria parasites, which promises to produce useful results for the development of antimalarial drugs and vaccines.  相似文献   

17.
Although sequences containing regulatory elements located close to protein-coding genes are often only weakly conserved during evolution, comparisons of rodent genomes have implied that these sequences are subject to some selective constraints. Evolutionary conservation is particularly apparent upstream of coding sequences and in first introns, regions that are enriched for regulatory elements. By comparing the human and chimpanzee genomes, we show here that there is almost no evidence for conservation in these regions in hominids. Furthermore, we show that gene expression is diverging more rapidly in hominids than in murids per unit of neutral sequence divergence. By combining data on polymorphism levels in human noncoding DNA and the corresponding human–chimpanzee divergence, we show that the proportion of adaptive substitutions in these regions in hominids is very low. It therefore seems likely that the lack of conservation and increased rate of gene expression divergence are caused by a reduction in the effectiveness of natural selection against deleterious mutations because of the low effective population sizes of hominids. This has resulted in the accumulation of a large number of deleterious mutations in sequences containing gene control elements and hence a widespread degradation of the genome during the evolution of humans and chimpanzees.  相似文献   

18.
19.
Apicoplast, a nonphotosynthetic plastid derived from secondary symbiotic origin, is essential for the survival of malaria parasites of the genus Plasmodium. Elucidation of the evolution of the apicoplast genome in Plasmodium species is important to better understand the functions of the organelle. However, the complete apicoplast genome is available for only the most virulent human malaria parasite, Plasmodium falciparum. Here, we obtained the near-complete apicoplast genome sequences from eight Plasmodium species that infect a wide variety of vertebrate hosts and performed structural and phylogenetic analyses. We found that gene repertoire, gene arrangement, and other structural attributes were highly conserved. Phylogenetic reconstruction using 30 protein-coding genes of the apicoplast genome inferred, for the first time, a close relationship between P. ovale and rodent parasites. This close relatedness was robustly supported using multiple evolutionary assumptions and models. The finding suggests that an ancestral host switch occurred between rodent and human Plasmodium parasites.  相似文献   

20.
The three green algal mitochondrial genomes completely sequenced to date — those of Chlamydomonas reinhardtii Dangeard, Chlamydomonas eugametos Gerloff, and Prototheca wickerhamii Soneda & Tubaki — revealed very different mitochondrial genome organizations and sequence affiliations. The Chlamydomonas genomes resemble the ciliate / fungal / animal counterparts, and the Prototheca genome resembles land plant homologues. This review points out that all the green algal mitochondrial genomes examined to date resemble either the Chlamydomonas or the Prototheca mitochondrial genome; the Chlamydomonas- like mitochondrial genomes are small and have a reduced gene content (no ribosomal protein or 5S rRNA genes and only a few protein-coding and tRNA genes) and fragmented and scrambled rRNA coding regions, whereas the Prototheca- like mitochondrial genomes are larger and have a larger set of protein-coding genes (including ribosomal protein genes), more tRNA genes, and 5S rRNA and conventional continuous small-subunit (SSU) and large-subunit (LSU) rRNA coding regions. It appears, therefore, that the differences previously observed between the mitochondrial genomes of C. reinhardtii and P. wickerhamii extend to the two green algal mitochondrial lineages to which they belong and are significant enough to raise questions about the causes and mechanisms responsible for such contrasting evolutionary strategies among green algae. This review suggests an integrative approach in explaining the occurrence of distinct evolutionary strategies and apparent phylogenetic affiliations among the known green algal mitochondrial lineages. The observed differences could be the result of distinct genetic potentials differentiated during the previous evolutionary history of the flagellate ancestors and / or of subsequent changes in habitat and life history of the more advanced green algal lineages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号