共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Recent genomic scale survey of epigenetic states in the mammalian genomes has shown that promoters and enhancers are correlated with distinct chromatin signatures, providing a pragmatic way for systematic mapping of these regulatory elements in the genome. With rapid accumulation of chromatin modification profiles in the genome of various organisms and cell types, this chromatin based approach promises to uncover many new regulatory elements, but computational methods to effectively extract information from these datasets are still limited. 相似文献2.
Exploring plant genomes by RNA-induced gene silencing 总被引:2,自引:0,他引:2
The nucleotide sequences of several animal, plant and bacterial genomes are now known, but the functions of many of the proteins that they are predicted to encode remain unclear. RNA interference is a gene-silencing technology that is being used successfully to investigate gene function in several organisms--for example, Caenorhabditis elegans. We discuss here that RNA-induced gene silencing approaches are also likely to be effective for investigating plant gene function in a high-throughput, genome-wide manner. 相似文献
3.
A major goal of post-genomic biology is to reconstruct and model in silico the metabolic networks of entire organisms. Work on bacteria is well advanced, and is now under way for plants and other eukaryotes. Genome-scale modelling in plants is much more challenging than in bacteria. The challenges come from features characteristic of higher organisms (subcellular compartmentation, tissue differentiation) and also from the particular severity in plants of a general problem: genome content whose functions remain undiscovered. This problem results in thousands of genes for which no function is known ('undiscovered genome content') and hundreds of enzymatic and transport functions for which no gene is yet identified. The severity of the undiscovered genome content problem in plants reflects their genome size and complexity. To bring the challenges of plant genome-scale modelling into focus, we first summarize the current status of plant genome-scale models. We then highlight the challenges - and ways to address them - in three areas: identifying genes for missing processes, modelling tissues as opposed to single cells, and finding metabolic functions encoded by undiscovered genome content. We also discuss the emerging view that a significant fraction of undiscovered genome content encodes functions that counter damage to metabolites inflicted by spontaneous chemical reactions or enzymatic mistakes. 相似文献
4.
Background
Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as vertical inheritance, or functional selection. Thus, identification of conserved gene clusters that are under functional selection provides an effective channel for gene annotation, microarray screening, and pathway reconstruction. The problem of devising a robust method to identify these conserved gene clusters and to evaluate the significance of the conservation in multiple genomes has a number of implications for comparative, evolutionary and functional genomics as well as synthetic biology. 相似文献5.
Ancient origin of elicitin gene clusters in Phytophthora genomes 总被引:1,自引:0,他引:1
Jiang RH Tyler BM Whisson SC Hardham AR Govers F 《Molecular biology and evolution》2006,23(2):338-351
The genus Phytophthora belongs to the oomycetes in the eukaryotic stramenopile lineage and is comprised of over 65 species that are all destructive plant pathogens on a wide range of dicotyledons. Phytophthora produces elicitins (ELIs), a group of extracellular elicitor proteins that cause a hypersensitive response in tobacco. Database mining revealed several new classes of elicitin-like (ELL) sequences with diverse elicitin domains in Phytophthora infestans, Phytophthora sojae, Phytophthora brassicae, and Phytophthora ramorum. ELIs and ELLs were shown to be unique to Phytophthora and Pythium species. They are ubiquitous among Phytophthora species and belong to one of the most highly conserved and complex protein families in the Phytophthora genus. Phylogeny construction with elicitin domains derived from 156 ELIs and ELLs showed that most of the diversified family members existed prior to divergence of Phytophthora species from a common ancestor. Analysis to discriminate diversifying and purifying selection showed that all 17 ELI and ELL clades are under purifying selection. Within highly similar ELI groups there was no evidence for positively selected amino acids suggesting that purifying selection contributes to the continued existence of this diverse protein family. Characteristic cysteine spacing patterns were found for each phylogenetic clade. Except for the canonical clade ELI-1, ELIs and ELLs possess C-terminal domains of variable length, many of which have a high threonine, serine, or proline content suggesting an association with the cell wall. In addition, some ELIs and ELLs have a predicted glycosylphosphatidylinositol site suggesting anchoring of the C-terminal domain to the cell membrane. The eli and ell genes belonging to different clades are clustered in the genomes. Overall, eli and ell genes are expressed at different levels and in different life cycle stages but those sharing the same phylogenetic clade appear to have similar expression patterns. 相似文献
6.
High rate of chimeric gene origination by retroposition in plant genomes 总被引:17,自引:0,他引:17
下载免费PDF全文

Wang W Zheng H Fan C Li J Shi J Cai Z Zhang G Liu D Zhang J Vang S Lu Z Wong GK Long M Wang J 《The Plant cell》2006,18(8):1791-1802
Retroposition is widely found to play essential roles in origination of new mammalian and other animal genes. However, the scarcity of retrogenes in plants has led to the assumption that plant genomes rarely evolve new gene duplicates by retroposition, despite abundant retrotransposons in plants and a reported long terminal repeat (LTR) retrotransposon-mediated mechanism of retroposing cellular genes in maize (Zea mays). We show extensive retropositions in the rice (Oryza sativa) genome, with 1235 identified primary retrogenes. We identified 27 of these primary retrogenes within LTR retrotransposons, confirming a previously observed role of retroelements in generating plant retrogenes. Substitution analyses revealed that the vast majority are subject to negative selection, suggesting, along with expression data and evidence of age, that they are likely functional retrogenes. In addition, 42% of these retrosequences have recruited new exons from flanking regions, generating a large number of chimerical genes. We also identified young chimerical genes, suggesting that gene origination through retroposition is ongoing, with a rate an order of magnitude higher than the rate in primates. Finally, we observed that retropositions have followed an unexpected spatial pattern in which functional retrogenes avoid centromeric regions, while retropseudogenes are randomly distributed. These observations suggest that retroposition is an important mechanism that governs gene evolution in rice and other grass species. 相似文献
7.
Viruses are a driving force of microbial evolution. Despite their importance, the evolutionary dynamics that shape diversity in viral populations are not well understood. One of the primary factors that define viral population structure is coevolution with microbial hosts. Experimental models predict that the trajectory of coevolution will be determined by the relative migration rates of viruses and their hosts; however, there are no natural microbial systems in which both have been examined. The biogeographic distribution of viruses that infect Sulfolobus islandicus is investigated using genome comparisons among four newly identified, integrated, Sulfolobus spindle-shaped viruses and previously sequenced viral strains. Core gene sequences show a biogeographic distribution where viral genomes are specifically associated with each local population. In addition, signatures of host–virus interactions recorded in the sequence-specific CRISPR (clustered regularly interspaced short palindromic repeats) system show that hosts have interacted with viral communities that are more closely related to local viral strains than to foreign ones. Together, both proviral and CRISPR sequences show a clear biogeographic structure for Sulfolobus viral populations. Our findings demonstrate that virus–microbe coevolution must be examined in a spatially explicit framework. The combination of host and virus biogeography suggests a model for viral diversification driven by host immunity and local adaptation. 相似文献
8.
Complete archaeal genomes were probed for the presence of long (> or = 25 bp) oligonucleotide repeats (words). We detected the presence of many words distributed in tandem with narrow ranges of periodicity (i.e., spacer length between repeats). Similar words were not identified in genomes of non-archaeal species, namely Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium and Mycoplasma pneumoniae. BLAST similarity searches against the GenBank nucleotide sequence database revealed that these words were archaeal species-specific, indicating that they are of a signature character. Sequence analysis and genome viewing tools showed these repeats to be restricted to non-coding regions. Thus, archaea appear to possess a non-coding genomic signature that is absent in bacterial species. The identification of a species-specific genomic signature would be of great value to archaeal genome mapping, evolutionary studies and analyses of genome complexity. 相似文献
9.
10.
The first comprehensive comparison of gene content between higher plant species provided the unexpected conclusions that rice contained about twice as many genes as Arabidopsis, and that about half of the rice genes had no obvious homologs in any other organism. Our subsequent analyses indicate that most of these "extra, novel" rice genes are mis-annotated segments of transposable elements, especially retrotransposons. Aggressive annotation of a randomly selected subset of the rice genome suggests that the gene number is less than 40000. The five fantasies of automated plant gene discovery are described and a protocol is provided to minimize (or at least predict) the inaccuracy of future plant genome annotations. 相似文献
11.
Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping 总被引:2,自引:0,他引:2
下载免费PDF全文

We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers. 相似文献
12.
13.
Identifying clusters of functionally related genes in genomes 总被引:4,自引:0,他引:4
MOTIVATION: An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromosomes that are linked by common attributes. A generalized method that can find gene clusters regardless of the mechanism of origin would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. RESULTS: We present an algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. We tested the algorithm by analyzing genomes of a representative set of species. We identified species-specific variation in percentage of clustered genes as well as in properties of gene clusters including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. AVAILABILITY: A software implementation of the algorithm and example output files are available at http://fcg.tamu.edu/C_Hunter/. 相似文献
14.
Ming-Ying Leung Kwok Pui Choi Aihua Xia Louis H Y Chen 《Journal of computational biology》2005,12(3):331-354
Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins. 相似文献
15.
16.
17.
Dinucleotide usage is known to vary in the genomes of organisms. The dinucleotide usage profiles or genome signatures are similar for sequence samples taken from the same genome, but are different for taxonomically distant species. This concept of genome signatures has been used to study several organisms including viruses, to elucidate the signatures of evolutionary processes at the genome level. Genome signatures assume greater importance in the case of host–pathogen interactions, where molecular interactions between the two species take place continuously, and can influence their genomic composition. In this study, analyses of whole genome sequences of the HIV-1 subtype B, a retrovirus that caused global pandemic of AIDS, have been carried out to analyse the variation in genome signatures of the virus from 1983 to 2007. We show statistically significant temporal variations in some dinucleotide patterns highlighting the selective evolution of the dinucleotide profiles of HIV-1 subtype B, possibly a consequence of host specific selection. 相似文献
18.
We have extended to about 75 the number of genes mapped on the Chlamydomonas moewusii and Chlamydomonas reinhardtii chloroplast DNAs (cpDNAs) by partial sequencing of the very closely related C. eugametos and C. moewusii cpDNAs and by hybridizations with Chlamydomonas chloroplast gene-specific sequences. Only four of these genes (tscA and three reading frames) have not been identified in any other algal cpDNAs and thus may be specific to Chlamydomonas. Although the C. moewusii and C. reinhardtii cpDNAs differ by complex sequence rearrangements, 38 genes scattered throughout the genome define 12 conserved clusters of closely linked loci. Aside from the rRNA operon, four of these gene clusters share similarity to evolutionarily primitive operons found in other cpDNAs, representing in fact remnants of these operons. Our results thus indicate that most of the ancestral bacterial operons that characterize the chloroplast genome organization of land plants and early-diverging photosynthetic eukaryotes have been disrupted before the emergence of the polyphyletic genus Chlamydomonas. All gene rearrangements between the C. moewusii and C. reinhardtii cpDNAs, with the exception of those accounting for the relocations of atpA, psbI and rbcL, occurred within corresponding regions of the genome. One of these rearrangements seems to have led to disruption of the ancestral region containing rpl23, rpl2, rps19, rpl16, rpl14, rpl5, rps8 and the psaA exon 1. This gene cluster, which bears striking similarity to the Escherichia coli S10 and spc operons, spans a continuous DNA segment in C. reinhardtii, while it maps to two separate fragments in C. moewusii. 相似文献
19.
One of the key challenges in computational genomics is annotating coding genes and identification of regulatory RNAs in complete genomes. An attempt is made in this study which uses the regulatory RNA locations and their conserved flanking genes identified within the genomic backbone of template genome to search for similar RNA locations in query genomes. The search is based on recently reported coexistence of small RNAs and their conserved flanking genes in related genomes. Based on our study, 54 additional sRNA locations and functions of 96 uncharacterized genes are predicted in two draft genomes viz., Serratia marcesens Db1 and Yersinia enterocolitica 8081. Although most of the identified additional small RNA regions and their corresponding flanking genes are homologous in nature, the proposed anchoring technique could successfully identify four non-homologous small RNA regions in Y. enterocolitica genome also. The KEGG Orthology (KO) based automated functional predictions confirms the predicted functions of 65 flanking genes having defined KO numbers, out of the total 96 predictions made by this method. This coexistence based method shows more sensitivity than controlled vocabularies in locating orthologous gene pairs even in the absence of defined Orthology numbers. All functional predictions made by this study in Y. enterocolitica 8081 were confirmed by the recently published complete genome sequence and annotations. This study also reports the possible regions of gene rearrangements in these two genomes and further characterization of such RNA regions could shed more light on their possible role in genome evolution. 相似文献
20.
SC Parker J Gartner I Cardenas-Navia X Wei H Ozel Abaan SS Ajay NF Hansen L Song UK Bhanot JK Killian Y Gindin RL Walker PS Meltzer JC Mullikin TS Furey GE Crawford SA Rosenberg Y Samuels EH Margulies 《PLoS genetics》2012,8(8):e1002871
Much emphasis has been placed on the identification, functional characterization, and therapeutic potential of somatic variants in tumor genomes. However, the majority of somatic variants lie outside coding regions and their role in cancer progression remains to be determined. In order to establish a system to test the functional importance of non-coding somatic variants in cancer, we created a low-passage cell culture of a metastatic melanoma tumor sample. As a foundation for interpreting functional assays, we performed whole-genome sequencing and analysis of this cell culture, the metastatic tumor from which it was derived, and the patient-matched normal genomes. When comparing somatic mutations identified in the cell culture and tissue genomes, we observe concordance at the majority of single nucleotide variants, whereas copy number changes are more variable. To understand the functional impact of non-coding somatic variation, we leveraged functional data generated by the ENCODE Project Consortium. We analyzed regulatory regions derived from multiple different cell types and found that melanocyte-specific regions are among the most depleted for somatic mutation accumulation. Significant depletion in other cell types suggests the metastatic melanoma cells de-differentiated to a more basal regulatory state. Experimental identification of genome-wide regulatory sites in two different melanoma samples supports this observation. Together, these results show that mutation accumulation in metastatic melanoma is nonrandom across the genome and that a de-differentiated regulatory architecture is common among different samples. Our findings enable identification of the underlying genetic components of melanoma and define the differences between a tissue-derived tumor sample and the cell culture created from it. Such information helps establish a broader mechanistic understanding of the linkage between non-coding genomic variations and the cellular evolution of cancer. 相似文献