共查询到20条相似文献,搜索用时 15 毫秒
1.
Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as "orphans." The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of "authentic" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea. 相似文献
2.
3.
Kathrin M. Seibt Thomas Schmidt Tony Heitkam 《The Plant journal : for cell and molecular biology》2020,101(3):681-699
Repetitive sequences are ubiquitous components of eukaryotic genomes affecting genome size and evolution as well as gene regulation. Among them, short interspersed nuclear elements (SINEs) are non‐coding retrotransposons usually shorter than 1000 bp. They contain only few short conserved structural motifs, in particular an internal promoter derived from cellular RNAs and a mostly AT‐rich 3′ tail, whereas the remaining regions are highly variable. SINEs emerge and vanish during evolution, and often diversify into numerous families and subfamilies that are usually specific for only a limited number of species. In contrast, at the 3′ end of multiple plant SINEs we detected the highly conserved ‘Angio‐domain’. This 37 bp segment defines the Angio‐SINE superfamily, which encompasses 24 plant SINE families widely distributed across 13 orders within the plant kingdom. We retrieved 28 433 full‐length Angio‐SINE copies from genome assemblies of 46 plant species, frequently located in genes. Compensatory mutations in and adjacent to the Angio‐domain imply selective restraints maintaining its RNA structure. Angio‐SINE families share segmental sequence similarities, indicating a modular evolution with strong Angio‐domain preservation. We suggest that the conserved domain contributes to the evolutionary success of Angio‐SINEs through either structural interactions between SINE RNA and proteins increasing their transpositional efficiency, or by enhancing their accumulation in genes. 相似文献
4.
Peter Sarkies; 《Wiley interdisciplinary reviews. RNA》2024,15(2):e1849
Small non-coding RNAs are key regulators of gene expression across eukaryotes. Piwi-interacting small RNAs (piRNAs) are a specific type of small non-coding RNAs, conserved across animals, which are best known as regulators of genome stability through their ability to target transposable elements for silencing. Despite the near ubiquitous presence of piRNAs in animal lineages, there are some examples where the piRNA pathway has been lost completely, most dramatically in nematodes where loss has occurred in at least four independent lineages. In this perspective I will provide an evaluation of the presence of piRNAs across animals, explaining how it is known that piRNAs are missing from certain organisms. I will then consider possible explanations for why the piRNA pathway might have been lost and evaluate the evidence in favor of each possible mechanism. While it is still impossible to provide definitive answers, these theories will prompt further investigations into why such a highly conserved pathway can nevertheless become dispensable in certain lineages. 相似文献
5.
6.
Genome scans using large numbers of randomly selected markers have revealed a small proportion of loci that deviate from neutral expectations and so may mark genomic regions that contribute to local adaptation. Measurements of sequence differentiation and identification of genes in these regions is important but difficult, especially in organisms with limited genetic information available. We have followed up a genome scan in the marine gastropod, Littorina saxatilis, by searching a bacterial artificial chromosome library with differentiated and undifferentiated markers, sequencing four bacterial artificial chromosomes and then analysing sequence variation in population samples for fragments at, and close to the original marker polymorphisms. We show that sequence differentiation follows the patterns expected from the original marker frequencies, that differentiated markers identify independent and highly localized sites and that these sites fall outside coding regions. Two differentiated loci are characterized by insertions of putative transposable elements that appear to have increased in frequency recently and which might influence expression of downstream genes. These results provide strong candidate loci for the study of local adaptation in Littorina. They demonstrate an approach that can be applied to follow up genome scans in other taxa and they show that the genome scan approach can lead rapidly to candidate genes in nonmodel organisms. 相似文献
7.
The human genome gives rise to different epigenomic landscapes that define each cell type and can be deregulated in disease. Recent efforts by ENCODE, the NIH Roadmap and the International Human Epigenome Consortium (IHEC) have made significant advances towards assembling reference epigenomic maps of various tissues. Notably, these projects have found that approximately 80% of human DNA was biochemically active in at least one epigenomic assay while only approximately 10% of the sequence displayed signs of purifying selection. Given that transposable elements (TEs) make up at least 50% of the human genome and can be actively transcribed or act as regulatory elements either for their own purposes or be co‐opted for the benefit of their host; we are interested in exploring their overall contribution to the “functional” genome. Traditional methods used to identify functional DNA have relied on comparative genomics, conservation analysis and low throughput validation assays. To discover co‐opted TEs, and distinguish them from noisy genomic elements, we argue that comparative epigenomic methods will also be important. 相似文献
8.
9.
Jolien J.E. van Hooff Eelco Tromer Teunis J.P. van Dam Geert J.P.L. Kops Berend Snel 《BioEssays : news and reviews in molecular, cellular and developmental biology》2019,41(5)
Comparative genomics has proven a fruitful approach to acquire many functional and evolutionary insights into core cellular processes. Here it is argued that in order to perform accurate and interesting comparative genomics, one first and foremost has to be able to recognize, postulate, and revise different evolutionary scenarios. After all, these studies lack a simple protocol, due to different proteins having different evolutionary dynamics and demanding different approaches. The authors here discuss this challenge from a practical (what are the observations?) and conceptual (how do these indicate a specific evolutionary scenario?) viewpoint, with the aim to guide investigators who want to analyze the evolution of their protein(s) of interest. By sharing how the authors draft, test, and update such a scenario and how it directs their investigations, the authors hope to illuminate how to execute molecular evolution studies and how to interpret them. Also see the video abstract here https://youtu.be/VCt3l2pbdbQ . 相似文献
10.
11.
12.
Yu Q Guyot R de Kochko A Byers A Navajas-Pérez R Langston BJ Dubreuil-Tranchant C Paterson AH Poncet V Nagai C Ming R 《The Plant journal : for cell and molecular biology》2011,67(2):305-317
Arabica coffee (Coffea arabica L.) is a self-compatible perennial allotetraploid species (2n=4x=44), whereas Robusta coffee (C. canephora L.) is a self-incompatible perennial diploid species (2n=2x=22). C. arabica (C(a) C(a) E(a) E(a) ) is derived from a spontaneous hybridization between two closely related diploid coffee species, C. canephora (CC) and C. eugenioides (EE). To investigate the patterns and degree of DNA sequence divergence between the Arabica and Robusta coffee genomes, we identified orthologous bacterial artificial chromosomes (BACs) from C. arabica and C. canephora, and compared their sequences to trace their evolutionary history. Although a high level of sequence similarity was found between BACs from C. arabica and C. canephora, numerous chromosomal rearrangements were detected, including inversions, deletions and insertions. DNA sequence identity between C. arabica and C. canephora orthologous BACs ranged from 93.4% (between E(a) and C(a) ) to 94.6% (between C(a) and C). Analysis of eight orthologous gene pairs resulted in estimated ages of divergence between 0.046 and 0.665 million years, indicating a recent origin of the allotetraploid species C. arabica. Analysis of transposable elements revealed differential insertion events that contributed to the size increase in the C(a) sub-genome compared to its diploid relative. In particular, we showed that insertion of a Ty1-copia LTR retrotransposon occurred specifically in C. arabica, probably shortly after allopolyploid formation. The two sub-genomes of C. arabica, C(a) and E(a) , showed sufficient sequence differences, and a whole-genome shotgun approach could be suitable for sequencing the allotetraploid genome of C. arabica. 相似文献
13.
Bousios A Kourmpetis YA Pavlidis P Minga E Tsaftaris A Darzentas N 《The Plant journal : for cell and molecular biology》2012,69(3):475-488
Sireviruses are one of the three genera of Copia long terminal repeat (LTR) retrotransposons, exclusive to and highly abundant in plants, and with a unique, among retrotransposons, genome structure. Yet, perhaps due to the few references to the Sirevirus origin of some families, compounded by the difficulty in correctly assigning retrotransposon families into genera, Sireviruses have hardly featured in recent research. As a result, analysis at this key level of classification and details of their colonization and impact on plant genomes are currently lacking. Recently, however, it became possible to accurately assign elements from diverse families to this genus in one step, based on highly conserved sequence motifs. Hence, Sirevirus dynamics in the relatively obese maize genome can now be comprehensively studied. Overall, we identified >10 600 intact and approximately 28 000 degenerate Sirevirus elements from a plethora of families, some brought into the genus for the first time. Sireviruses make up approximately 90% of the Copia population and it is the only genus that has successfully infiltrated the genome, possibly by experiencing intense amplification during the last 600 000 years, while being constantly recycled by host mechanisms. They accumulate in chromosome-distal gene-rich areas, where they insert in between gene islands, mainly in preferred zones within their own genomes. Sirevirus LTRs are heavily methylated, while there is evidence for a palindromic consensus target sequence. This work brings Sireviruses in the spotlight, elucidating their lifestyle and history, and suggesting their crucial role in the current genomic make-up of maize, and possibly other plant hosts. 相似文献
14.
《Genetics》2013,195(1):275-287
Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast. 相似文献
15.
《Expert review of proteomics》2013,10(1):65-77
This review describes how intimately proteogenomics and system biology are imbricated. Quantitative cell-wide monitoring of cellular processes and the analysis of this information is the basis for systems biology. Establishing the most comprehensive protein-parts list is an essential prerequisite prior to analysis of the cell-wide dynamics of proteins, their post-translational modifications, their complex network interactions and interpretation of these data as a whole. High-quality genome annotation is, thus, a crucial basis. Proteogenomics consists of high-throughput identification and characterization of proteins by extra-large shotgun MS/MS approaches and the integration of these data with genomic data. Discovery of the remaining unannotated genes, defining translational start sites, listing signal peptide processing events and post-translational modifications, are tasks that can currently be carried out at a full-genomic scale as soon as the genomic sequence is available. Proteomics is increasingly being used at the primary stage of genome annotation and such an approach may become standard in the near future for genome projects. Advantageously, the same experimental proteomic datasets may be used to characterize the specific metabolic traits of the organism under study. Undoubtedly, comparative genomics will experience a renaissance taking into account this new dimension. Synthetic biology aimed at re-engineering living systems will also benefit from these significant progresses. 相似文献
16.
17.
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses. 相似文献
18.
19.
Major histocompatibility complex (MHC) genes in vertebrates are vital in defending against pathogenic infections. To gain new insights into the evolution of MHC Class I (MHCI) genes and test competing hypotheses on the origin of the MHCI region in eutherian mammals, we studied available genome assemblies of nine species in Afrotheria, Xenarthra, and Laurasiatheria, and successfully characterized the MHCI region in six species. The following numbers of putatively functional genes were detected: in the elephant, four, one, and eight in the extended class I region, and κ and β duplication blocks, respectively; in the tenrec, one in the κ duplication block; and in the four bat species, one or two in the β duplication block. Our results indicate that MHCI genes in the κ and β duplication blocks may have originated in the common ancestor of eutherian mammals. In the elephant, tenrec, and all four bats, some MHCI genes occurred outside the MHCI region, suggesting that eutherians may have a more complex MHCI genomic organization than previously thought. Bat‐specific three‐ or five‐amino‐acid insertions were detected in the MHCI α1 domain in all four bats studied, suggesting that pathogen defense in bats relies on MHCIs having a wider peptide‐binding groove, as previously assayed by a bat MHCI gene with a three‐amino‐acid insertion showing a larger peptide repertoire than in other mammals. Our study adds to knowledge on the diversity of eutherian MHCI genes, which may have been shaped in a taxon‐specific manner. 相似文献
20.
Many genes are involved in mammalian cell apoptosis pathway. These apoptosis genes often contain characteristic functional domains, and can be classified into at least 15 functional groups, according to previous reports. Using an integrated bioinformatics platform for motif or domain search from three public mammalian proteomes (International Protein Index database for human, mouse, and rat), we systematically cataloged all of the proteins involved in mammalian apoptosis pathway. By localizing those proteins onto the genomes, we obtained a gene locus centric apoptosis gene catalog for human, mouse and rat.Further phylogenetic analysis showed that most of the apoptosis related gene loci are conserved among these three mammals. Interestingly, about one-third of apoptosis gene loci form gene clusters on mammal chromosomes, and exist in the three species, which indicated that mammalian apoptosis gene orders are also conserved. In addition, some tandem duplicated gene loci were revealed by comparing gene loci clusters in the three species. All data produced in this work were stored in a relational database and may be viewed at http://pcas.cbi.pku.edu.cn/database/apd.php. 相似文献