首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Long terminal repeat retrotransposons (LTR‐RTs) represent a major fraction of plant genomes, but processes leading to transposition bursts remain elusive. Polyploidy expectedly leads to LTR‐RT proliferation, as the merging of divergent diploids provokes a genome shock activating LTR‐RTs and/or genetic redundancy supports the accumulation of active LTR‐RTs through relaxation of selective constraints. Available evidence supports interspecific hybridization as the main trigger of genome dynamics, but few studies have addressed the consequences of intraspecific polyploidy (i.e. autopolyploidy), where the genome shock is expectedly minimized. The dynamics of LTR‐RTs was thus here evaluated through low coverage 454 sequencing of three closely related diploid progenitors and three independent autotetraploids from the young Biscutella laevigata species complex. Genomes from this early diverging Brassicaceae lineage presented a minimum of 40% repeats and a large diversity of transposable elements. Differential abundances and patterns of sequence divergence among genomes for 37 LTR‐RT families revealed contrasted dynamics during species diversification. Quiescent LTR‐RT families with limited genetic variation among genomes were distinguished from active families (37.8%) having proliferated in specific taxa. Specific families proliferated in autopolyploids only, but most transpositionally active families in polyploids were also differentiated among diploids. Low expression levels of transpositionally active LTR‐RT families in autopolyploids further supported that genome shock and redundancy are non‐mutually exclusive triggers of LTR‐RT proliferation. Although reputed stable, autopolyploid genomes show LTR‐RT fractions presenting analogies with polyploids between widely divergent genomes.  相似文献   

2.
In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements.  相似文献   

3.
We propose a method to engineer the genome of bacteriophages to increase their effectiveness as antibacterial agents. Specifically, we exploit the redundancy of the triplet code to design genomes that avoid restriction sites while producing the same proteins as wild-type phages. We give an efficient algorithm to minimize the number of restriction sites against sets of cutter sequences, and demonstrate that that phage genomes can be significantly protected against surprisingly large sets of enzymes with no loss of function. Finally, we develop a model to explain why evolution has failed to eliminate many possible restriction sites despite selective pressure, thus motivating the need for genome-level sequence engineering.  相似文献   

4.

Background

Prolyl oligopeptidases (POPs) are proteolytic enzymes, widely distributed in all the kingdoms of life. Bacterial POPs are pharmaceutically important enzymes, yet their functional and evolutionary details are not fully explored. Therefore, current analysis is aimed at understanding the distribution, domain architecture, probable biological functions and gene family expansion of POPs in bacterial and archaeal lineages.

Results

Exhaustive sequence analysis of 1,202 bacterial and 91 archaeal genomes revealed ~3,000 POP homologs, with only 638 annotated POPs. We observed wide distribution of POPs in all the analysed bacterial lineages. Phylogenetic analysis and co-clustering of POPs of different phyla suggested their common functions in all the prokaryotic species. Further, on the basis of unique sequence motifs we could classify bacterial POPs into eight subtypes. Analysis of coexisting domains in POPs highlighted their involvement in protein-protein interactions and cellular signaling. We proposed significant extension of this gene family by characterizing 39 new POPs and 158 new α/β hydrolase members.

Conclusions

Our study reflects diversity and functional importance of POPs in bacterial species. Many genomes with multiple POPs were identified with high sequence variations and different cellular localizations. Such anomalous distribution of POP genes in different bacterial genomes shows differential expansion of POP gene family primarily by multiple horizontal gene transfer events.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-985) contains supplementary material, which is available to authorized users.  相似文献   

5.
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.  相似文献   

6.
7.
Transposable elements make up a significant fraction of many eukaryotic genomes. Although both classes of transposable elements, the DNA transposons and the retrotransposons, show substantial expansion in plants and invertebrates, the DNA transposons are thought to have become inactive in mammalian genomes long ago. Here, we report the first evidence for recent activity of DNA transposons in a mammalian lineage, the bat genus Myotis. Six recently active families of nonautonomous hobo/Activator/TAM transposons were identified in the Myotis lucifugus genome using computational tools. Low sequence divergence among the individual sequences and between individual sequences and their respective consensus sequences suggest their recent expansion in the M. lucifugus genome. Furthermore, amplification and sequencing of polymorphic insertion loci in a related taxon, M. austroriparius, confirms their recent activity. Myotis is one of the largest mammalian genera with 103 species. The discovery of DNA transposon activity in this genus may therefore influence our understanding of genome evolution and diversification in bats and in mammals in general. Furthermore, the identification of a likely autonomous element may lead to new approaches for mammalian genetic manipulation.  相似文献   

8.
Unmethylated CpG islands associated with genes in higher plant DNA   总被引:16,自引:0,他引:16       下载免费PDF全文
The genomes of many higher plant species are the most highly methylated among eukaryotes. We report here that in spite of their heavy methylation, genomic DNAs from four plant species contain a fraction that is very rich in non-methylated sites. The fraction was characterized in maize where it represents about 2.5% of the total nuclear genome. In order to establish the genomic origin of the fraction, three maize genes containing clustered CpG were tested for methylation and were found to be non-methylated in the CpG-rich regions. By contrast, tested CpGs were methylated in a gene whose sequence showed no clustering of CpG. These observations suggest that the CpG-rich fraction of plants is at least partially derived from non-methylated regions that are associated with genes. A similar phenomenon has been described in vertebrate genomes. We discuss the evolution of CpG islands in both groups of organisms, and their possible uses in mapping and gene isolation in plants.  相似文献   

9.
The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component "presence-absence" protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings.  相似文献   

10.
11.
Mobile genetic elements (MGEs) account for a significant fraction of eukaryotic genomes and are implicated in altered gene expression and disease. We present an efficient computational protocol for MGE insertion site analysis. ELAN, the suite of tools described here uses standard techniques to identify different MGEs and their distribution on the genome. One component, DNASCANNER analyses known insertion sites of MGEs for the presence of signals that are based on a combination of local physical and chemical properties. ISF (insertion site finder) is a machine-learning tool that incorporates information derived from DNASCANNER. ISF permits classification of a given DNA sequence as a potential insertion site or not, using a support vector machine. We have studied the genomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Entamoeba histolytica via a protocol whereby DNASCANNER is used to identify a common set of statistically important signals flanking the insertion sites in the various genomes. These are used in ISF for insertion site prediction, and the current accuracy of the tool is over 65%. We find similar signals at gene boundaries and splice sites. Together, these data are suggestive of a common insertion mechanism that operates in a variety of eukaryotes.  相似文献   

12.
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.  相似文献   

13.
Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.  相似文献   

14.
15.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

16.
Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.  相似文献   

17.
The current knowledge on genomes of non-falciparum malaria species and the potential of model malaria parasites for functional analyses are reviewed and compared with those of the most pathogenic human parasite, Plasmodium falciparum. There are remarkable similarities in overall genome composition among the different species at the level of chromosome organisation and chromosome number, conserved order of individual genes, and even conserved functions of specific gene domains and regulatory control elements. With the initiative taken to sequence the genome of P. falciparum, a wealth of information is already becoming available to the scientific community. In order to exploit the biological information content of a complete genome sequence, simple storage of the bulk of sequence data will be inadequate. The requirement for functional analyses to determine the biological role of the open reading frames is commonly accepted and knowledge of the genomes of the animal model malaria species will facilitate these analyses. Detailed comparative genome information and sequencing of additional Plasmodium genomes will provide a deeper insight into the evolutionary history of the species, the biology of the parasite, and its interactions with the mammalian host and mosquito vector. Therefore, an extended and integrated approach will enhance our knowledge of malaria and will ultimately lead to a more rational approach that identifies and evaluates new targets for anti-malarial drug and vaccine development.  相似文献   

18.
根据已知小麦正源基因TaDEP1 cDNA序列设计引物,成功克隆了小麦TaDEP1基因组序列,发现该基因包含5个外显子,4个内含子.通过比较该基因在六倍体普通小麦A、B、D基因组中的差异,筛选出可以区分A、B、D基因组的分子标记Ta956.以中国春缺体-四体系为材料,利用该标记将TaDEP1基因定位于小麦5A、5B和5...  相似文献   

19.
Phaeoviruses infect the brown algae, which are major contributors to primary production of coastal waters and estuaries. They exploit a Persistent evolutionary strategy akin to a K- selected life strategy via genome integration and are the only known representatives to do so within the giant algal viruses that are typified by r- selected Acute lytic viruses. In screening the genomes of five species within the filamentous brown algal lineage, here we show an unprecedented diversity of viral gene sequence variants especially amongst the smaller phaeoviral genomes. Moreover, one variant shares features from both the two major sub-groups within the phaeoviruses. These phaeoviruses have exploited the reduction of their giant dsDNA genomes and accompanying loss of DNA proofreading capability, typical of an Acute life strategist, but uniquely retain a Persistent life strategy.  相似文献   

20.
All vertebrate genomes have been colonized by retroviruses along their evolutionary trajectory. Although endogenous retroviruses (ERVs) can contribute important physiological functions to contemporary hosts, such benefits are attributed to long-term coevolution of ERV and host because germline infections are rare and expansion is slow, and because the host effectively silences them. The genomes of several outbred species including mule deer (Odocoileus hemionus) are currently being colonized by ERVs, which provides an opportunity to study ERV dynamics at a time when few are fixed. We previously established the locus-specific distribution of cervid ERV (CrERV) in populations of mule deer. In this study, we determine the molecular evolutionary processes acting on CrERV at each locus in the context of phylogenetic origin, genome location, and population prevalence. A mule deer genome was de novo assembled from short- and long-insert mate pair reads and CrERV sequence generated at each locus. We report that CrERV composition and diversity have recently measurably increased by horizontal acquisition of a new retrovirus lineage. This new lineage has further expanded CrERV burden and CrERV genomic diversity by activating and recombining with existing CrERV. Resulting interlineage recombinants then endogenize and subsequently expand. CrERV loci are significantly closer to genes than expected if integration were random and gene proximity might explain the recent expansion of one recombinant CrERV lineage. Thus, in mule deer, retroviral colonization is a dynamic period in the molecular evolution of CrERV that also provides a burst of genomic diversity to the host population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号