首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

3.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

4.
We present the first comprehensive analysis of RNA polymerase III (Pol III) transcribed genes in ten yeast genomes. This set includes all tRNA genes (tDNA) and genes coding for SNR6 (U6), SNR52, SCR1 and RPR1 RNA in the nine hemiascomycetes Saccharomyces cerevisiae, Saccharomyces castellii, Candida glabrata, Kluyveromyces waltii, Kluyveromyces lactis, Eremothecium gossypii, Debaryomyces hansenii, Candida albicans, Yarrowia lipolytica and the archiascomycete Schizosaccharomyces pombe. We systematically analysed sequence specificities of tRNA genes, polymorphism, variability of introns, gene redundancy and gene clustering. Analysis of decoding strategies showed that yeasts close to S.cerevisiae use bacterial decoding rules to read the Leu CUN and Arg CGN codons, in contrast to all other known Eukaryotes. In D.hansenii and C.albicans, we identified a novel tDNA-Leu (AAG), reading the Leu CUU/CUC/CUA codons with an unusual G at position 32. A systematic 'p-distance tree' using the 60 variable positions of the tRNA molecule revealed that most tDNAs cluster into amino acid-specific sub-trees, suggesting that, within hemiascomycetes, orthologous tDNAs are more closely related than paralogs. We finally determined the bipartite A- and B-box sequences recognized by TFIIIC. These minimal sequences are nearly conserved throughout hemiascomycetes and were satisfactorily retrieved at appropriate locations in other Pol III genes.  相似文献   

5.
Cai J  Zhao R  Jiang H  Wang W 《Genetics》2008,179(1):487-496
Origination of new genes is an important mechanism generating genetic novelties during the evolution of an organism. Processes of creating new genes using preexisting genes as the raw materials are well characterized, such as exon shuffling, gene duplication, retroposition, gene fusion, and fission. However, the process of how a new gene is de novo created from noncoding sequence is largely unknown. On the basis of genome comparison among yeast species, we have identified a new de novo protein-coding gene, BSC4 in Saccharomyces cerevisiae. The BSC4 gene has an open reading frame (ORF) encoding a 132-amino-acid-long peptide, while there is no homologous ORF in all the sequenced genomes of other fungal species, including its closely related species such as S. paradoxus and S. mikatae. The functional protein-coding feature of the BSC4 gene in S. cerevisiae is supported by population genetics, expression, proteomics, and synthetic lethal data. The evidence suggests that BSC4 may be involved in the DNA repair pathway during the stationary phase of S. cerevisiae and contribute to the robustness of S. cerevisiae, when shifted to a nutrient-poor environment. Because the corresponding noncoding sequences in S. paradoxus, S. mikatae, and S. bayanus also transcribe, we propose that a new de novo protein-coding gene may have evolved from a previously expressed noncoding sequence.  相似文献   

6.
This is the year of the chimpanzee genome. Chimpanzee chromosome 22 has been sequenced and soon will be followed by the whole genome, and thousands of chimpanzee cDNA sequences are available for comparative analysis. Not only does this genomic information allow us to identify human-specific changes in particular genes that are potentially under selection, but also to understand molecular evolutionary dynamics characterizing the two most closely related mammalian genomes sequenced so far. Studies comparing gene expression in chimpanzees and other closely related primates reveal significant species differences in brain, liver and fibroblasts. New empirical data, in combination with models of speciation, are giving insight into how humans and chimpanzees speciated.  相似文献   

7.
Summary Mitochondrial genomes from yeasts in the Dekkera/Brettanomyces/Eeniella group vary in size from 28 to 101 kb. Mapping of genes has shown that the three smallest genomes, of 28–42 kb, have the same gene order, whereas the three larger mitochondrial DNAs of 57–101 kb are rearranged relative to the smaller molecules and between themselves. To examine the relationships between these genomes, a phylogenetic tree has been constructed by sequence comparison of the mitochondrialencoded cytochrome oxidase subunit gene (COX2) from the six species. Contrary to expectation, the tree shows that the larger rearranged genomes are more closely related than the smaller mtDNAs. This result indicates that the gene order of the smaller mtDNAs (28–42 kb) is ancestral and that larger mtDNA molecules (57–101 kb) are more prone to rearrangement than smaller forms.Offprint requests to: G.D. Clark-Walker  相似文献   

8.
Ashbya gossypii is a riboflavin-overproducing filamentous fungus that is closely related to unicellular yeasts such as Saccharomyces cerevisiae. With its close ties to yeast and the ease of genetic manipulation in this fungal species, A. gossypii is well suited as a model to elucidate the regulatory networks that govern the functional differences between filamentous growth and yeast growth, especially now that the A. gossypii genome sequence has been completed. Understanding these networks could be relevant to related dimorphic yeasts such as the human fungal pathogen Candida albicans, in which a switch in morphology from the yeast to the filamentous form in response to specific environmental stimuli is important for virulence.  相似文献   

9.
Comparative genomics of yeast species: new insights into their biology   总被引:2,自引:0,他引:2  
The genomes of two hemiascomycetous yeasts (Saccharomyces cerevisiae and Candida albicans) and one archiascomycete (Schizosaccharomyces pombe) have been completely sequenced and the genes have been annotated. In addition, the genomes of 13 more Hemiascomycetes have been partially sequenced. The amount of data thus obtained provides information on the evolutionary relationships between yeast species. In addition, the differential genetic characteristics of the microorganisms explain a number of distinctive biological traits. Gene order conservation is observed between phylogenetically close species and is lost in distantly related species, probably due to rearrangements of short regions of DNA. However, gene function is much more conserved along evolution. Compared to S. cerevisiae and S. pombe, C. albicans has a larger number of specific genes, i.e., genes not found in other organisms, a fact that can account for the biological characteristics of this pathogenic dimorphic yeast which is able to colonize a large variety of environments.  相似文献   

10.
We describe the sequence of a gene encoding a high molecular weight glutenin subunit (HMW-GS) expressed in the endosperm of the wheat relative Australopyrum retrofractum. Although the subunit has a similar primary structure to that HMW-GS genes present in other Triticeae species, its N-terminal domain is shorter, its central repetitive domain includes a unique dodecameric motif, and its C-terminal domain contain an extra cysteine residue. A phylogenetic analysis showed that the Glu-W1 gene is neither a true x- nor a true y-type subunit, although it is more closely related to the y-type genes present in the K and E genomes than to any other published HMW-GS gene. All these results indicated that this novel subunit may undergo a special evolutionary process different from other Triticeae species. A flour supplementation experiment showed that the Glu-W1 subunit has a negative effect on dough quality, which might be the result of interaction between the two closely placed cysteine residues in the C-terminal region.  相似文献   

11.
The biochemical characterization of sugar uptake in yeasts started five decades ago and led to the early production of abundant kinetic and mechanistic data. However, the first accurate overview of the underlying sugar transporter genes was obtained relatively late, due mainly to the genetic complexity of hexose uptake in the model yeast Saccharomyces cerevisiae . The genomic era generated in turn a massive amount of information, allowing the identification of a multitude of putative sugar transporter and sensor-encoding genes in yeast genomes, many of which are phylogenetically related. This review aims to briefly summarize our current knowledge on the biochemical and molecular features of the transporters of hexoses and pentoses in yeasts, when possible establishing links between previous kinetic studies and genomic data currently available. Emphasis is given to recent developments concerning the identification of d -xylose and l -arabinose transporter genes, which are thought to be key players in the optimization of S. cerevisiae strains for bioethanol production from lignocellulose hydrolysates.  相似文献   

12.
Investigations into the phylogenetics of closely related animal species are dominated by the use of mitochondrial DNA (mtDNA) sequence data. However, the near-ubiquitous use of mtDNA to infer phylogeny among closely related animal lineages is tempered by an increasing number of studies that document high rates of transfer of mtDNA genomes among closely related species through hybridization, leading to substantial discordance between phylogenies inferred from mtDNA and nuclear gene sequences. In addition, the recent development of methods that simultaneously infer a species phylogeny and estimate divergence times, while accounting for incongruence among individual gene trees, has ushered in a new era in the investigation of phylogeny among closely related species. In this study we assess if DNA sequence data sampled from a modest number of nuclear genes can resolve relationships of a species-rich clade of North American freshwater teleost fishes, the darters. We articulate and expand on a recently introduced method to infer a time-calibrated multi-species coalescent phylogeny using the computer program *BEAST. Our analyses result in well-resolved and strongly supported time-calibrated darter species tree. Contrary to the expectation that mtDNA will provide greater phylogenetic resolution than nuclear gene data; the darter species tree inferred exclusively from nuclear genes exhibits a higher frequency of strongly supported nodes than the mtDNA time-calibrated gene tree.  相似文献   

13.
The pentatricopeptide repeat (PPR) gene family, with hundreds of members in land plant genomes, has been recognized as a tremendous resource for plant phylogenetic studies based on publicly available genomic data from model organisms. However, whether this appealing nuclear gene marker system can be readily applied to non-model organisms remains questionable, particularly given the potential uncertainties in designing specific primers to only amplify the locus of interest from the sea of PPR genes. Here we demonstrate empirically the use of PPR genes in the family Verbenaceae and the Verbena complex. We also lay out a general scheme to design locus-specific primers to amplify and sequence PPR genes in non-model organisms. Intergeneric relationships within the family Verbenaceae were fully resolved with strong support. Relationships among the closely related genera within the Verbena complex and among some species groups within each genus were also well resolved, but resolution among very closely related species was limited. Our results suggest that PPR genes can be readily employed in non-model organisms. They may be best used to resolve relationships in a spectrum from among distantly related genera to among not-so-closely related congeneric species, but may have limited use among very closely related species.  相似文献   

14.
GH Liu  SY Wang  WY Huang  GH Zhao  SJ Wei  HQ Song  MJ Xu  RQ Lin  DH Zhou  XQ Zhu 《PloS one》2012,7(7):e42172
Complete mitochondrial (mt) genomes and the gene rearrangements are increasingly used as molecular markers for investigating phylogenetic relationships. Contributing to the complete mt genomes of Gastropoda, especially Pulmonata, we determined the mt genome of the freshwater snail Galba pervia, which is an important intermediate host for Fasciola spp. in China. The complete mt genome of G. pervia is 13,768 bp in length. Its genome is circular, and consists of 37 genes, including 13 genes for proteins, 2 genes for rRNA, 22 genes for tRNA. The mt gene order of G. pervia showed novel arrangement (tRNA-His, tRNA-Gly and tRNA-Tyr change positions and directions) when compared with mt genomes of Pulmonata species sequenced to date, indicating divergence among different species within the Pulmonata. A total of 3655 amino acids were deduced to encode 13 protein genes. The most frequently used amino acid is Leu (15.05%), followed by Phe (11.24%), Ser (10.76%) and IIe (8.346%). Phylogenetic analyses using the concatenated amino acid sequences of the 13 protein-coding genes, with three different computational algorithms (maximum parsimony, maximum likelihood and Bayesian analysis), all revealed that the families Lymnaeidae and Planorbidae are closely related two snail families, consistent with previous classifications based on morphological and molecular studies. The complete mt genome sequence of G. pervia showed a novel gene arrangement and it represents the first sequenced high quality mt genome of the family Lymnaeidae. These novel mtDNA data provide additional genetic markers for studying the epidemiology, population genetics and phylogeographics of freshwater snails, as well as for understanding interplay between the intermediate snail hosts and the intra-mollusca stages of Fasciola spp..  相似文献   

15.
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.  相似文献   

16.
We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species.  相似文献   

17.
The Saccharomyces cerevisiae genome sequence, augmented by new data on gene expression and function, continues to yield new findings about eukaryote genome evolution. Analysis of the duplicate gene pairs formed by whole-genome duplication indicates that selection for increased levels of gene expression was a significant factor determining which genes were retained as duplicates and which were returned to a single-copy state, possibly in addition to selection for novel gene functions. Proteome comparisons between worm and yeast show that genes for core metabolic processes are shared among eukaryotes and unchanging in function, while comparisons between different yeast species identify 'orphan' genes as the most rapidly evolving fraction of the proteome. Natural hybridisation among yeast species is frequent, but its long-term evolutionary significance is unknown.  相似文献   

18.
DNA-DNA hybridization has been established as an important technology in bacterial species taxonomy and phylogenetic analysis. In this study, we analyzed how the efficiency with which the genomic DNA from one species hybridizes to the genomic DNA of another species (DNA-DNA hybridization) in microarray analysis relates to the similarity between two genomes. We found that the predicted DNA-DNA hybridization based on genome sequence similarity correlated well with the experimentally determined microarray hybridization. Between closely related strains, significant numbers of highly divergent genes (<55% identity) and/or the accumulation of mismatches between conserved genes lowered the DNA-DNA hybridization signal, and this reduced the hybridization signals to below 70% for even bacterial strains with over 97% 16S rRNA gene identity. In addition, our results also suggest that a DNA-DNA hybridization signal intensity of over 40% indicates that two genomes at least shared 30% conserved genes (>60% gene identity). This study may expand our knowledge of DNA-DNA hybridization based on genomic sequence similarity comparison and further provide insights for bacterial phylogeny analyses.  相似文献   

19.
Mitochondrial genomes are useful tools for inferring evolutionary history. However, many taxa are poorly represented by available data. Thus, to further understand the phylogenetic potential of complete mitochondrial genome sequence data in Annelida (segmented worms), we examined the complete mitochondrial sequence for Clymenella torquata (Maldanidae) and an estimated 80% of the sequence of Riftia pachyptila (Siboglinidae). These genomes have remarkably similar gene orders to previously published annelid genomes, suggesting that gene order is conserved across annelids. This result is interesting, given the high variation seen in the closely related Mollusca and Brachiopoda. Phylogenetic analyses of DNA sequence, amino acid sequence, and gene order all support the recent hypothesis that Sipuncula and Annelida are closely related. Our findings suggest that gene order data is of limited utility in annelids but that sequence data holds promise. Additionally, these genomes show AT bias (approximately 66%) and codon usage biases but have a typical gene complement for bilaterian mitochondrial genomes.  相似文献   

20.
During alcoholic fermentations yeast cells are subjected to several stress conditions and, therefore, yeasts have developed molecular mechanisms in order to resist this adverse situation. The mechanisms involved in stress response have been studied in Saccharomyces cerevisiae laboratory strains. However a better understanding of these mechanisms in wine yeasts could open the possibility to improve the fermentation process. In this work an analysis of the stress response in three wine yeasts has been carried out by studying the expression of several representative genes under several stress conditions which occur during fermentation. We propose a simplified method to study how these stress conditions affect the viability of yeast cells. Using this approach an inverse correlation between stress-resistance and stuck fermentations has been found. We also have preliminary data about the use of the HSP12 gene as a molecular marker for stress-resistance in wine yeasts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号