首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use and development of post-genomic tools naturally depends on large-scale genome sequencing projects. The usefulness of post-genomic applications is dependent on the accuracy of genome annotations, for which the correct identification of intron-exon borders in complex genomes of eukaryotic organisms is often an error-prone task. Although automated algorithms for predicting intron-exon structures are available, supporting exon evidence is necessary to achieve comprehensive genome annotation. Besides cDNA and EST support, peptides identified via MS/MS can be used as extrinsic evidence in a proteogenomic approach. We describe an improved version of the Genomic Peptide Finder (GPF), which aligns de novo predicted amino acid sequences to the genomic DNA sequence of an organism while correcting for peptide sequencing errors and accounting for the possibility of splicing. We have coupled GPF and the gene finding program AUGUSTUS in a way that provides automatic structural annotations of the Chlamydomonas reinhardtii genome, using highly unbiased GPF evidence. A comparison of the AUGUSTUS gene set incorporating GPF evidence to the standard JGI FM4 (Filtered Models 4) gene set reveals 932 GPF peptides that are not contained in the Filtered Models 4 gene set. Furthermore, the GPF evidence improved the AUGUSTUS gene models by altering 65 gene models and adding three previously unidentified genes.  相似文献   

2.
3.
In susceptible strains of mice, leukemia is caused by the somatic integration of murine leukemia retroviruses into the host genome. Integration sites that are common to several tumors are likely to affect genes that are important in oncogenesis. Here we present the analysis of a common site of retroviral integration on mouse chromosome 15, which includes the genomic structure of three genes near the integration site. One of the genes misexpressed at the insertion site has recently been characterized as a B-cell receptor, Tnfrsf13c (formerly Baffr), indicating that this approach is useful in defining genes that function in lymphocyte development and tumor progression. Current genome databases provide powerful resources for the rapid identification of genes at common proviral insertion sites. The characterization of these genes in tumor samples will allow a function to be assigned to many novel loci identified by the genome sequencing projects.  相似文献   

4.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

5.
There are ∼1.4 million organisms on this planet that have been described morphologically but there is no comparable coverage of biodiversity at the molecular level. Little more than 1% of the known species have been subject to any molecular scrutiny and eukaryotic genome projects have focused on a group of closely related model organisms. The past year, however, has seen an ∼80% increase in the number of species represented in sequence databases and the completion of the sequencing of three prokaryotic genomes. Large-scale sequencing projects seem set to begin coverage of a wider range of the eukaryotic diversity, including green plants, microsporidians and diplomonads.  相似文献   

6.
Impact of genomics on microbial food safety   总被引:3,自引:0,他引:3  
Genome sequences are now available for many of the microbes that cause food-borne diseases. The information contained in pathogen genome sequences, together with the development of themed and whole-genome DNA microarrays and improved proteomics techniques, might provide tools for the rapid detection and identification of such organisms, for assessing their biological diversity and for understanding their ability to respond to stress. The genomic information also provides insight into the metabolic capacity and versatility of microbes; for example, specific metabolic pathways might contribute to the growth and survival of pathogens in a range of niches, such as food-processing environments and the human host. New concepts are emerging about how pathogens function, both within foods and in interactions with the host. The future should bring the first practical benefits of genome sequencing to the field of microbial food safety, including strategies and tools for the identification and control of emerging pathogens.  相似文献   

7.
The expansion of genome sequencing projects has produced accumulating evidence for lateral transfer of genes between prokaryotic and eukaryotic genomes. However, it remains controversial whether these genes are of functional importance in their recipient host. Nikoh and Nakabachi, in a recent paper in BMC Biology, take a first step and show that two genes of bacterial origin are highly expressed in the pea aphid Acyrthosiphon pisum. Active gene expression of transferred genes is supported by three other recent studies. Future studies should reveal whether functional proteins are produced and whether and how these are targeted to the appropriate compartment. We argue that the transfer of genes between host and symbiont may occasionally be of great evolutionary importance, particularly in the evolution of the symbiotic interaction itself.  相似文献   

8.
Assessment of phylogenetic positions of predicted gene and protein sequences is a routine step in any genome project, useful for validating the species' taxonomic position and for evaluating hypotheses about genome evolution and function. Several recent eukaryotic genome projects have reported multiple gene sequences that were much more similar to homologues in bacteria than to any eukaryotic sequence. In the spirit of the times, horizontal gene transfer from bacteria to eukaryotes has been invoked in some of these cases. Here, we show, using comparative sequence analysis, that some of those bacteria‐like genes indeed appear likely to have been horizontally transferred from bacteria to eukaryotes. In other cases, however, the evidence strongly indicates that the eukaryotic DNA sequenced in the genome project contains a sample of non‐integrated DNA from the actual bacteria, possibly providing a window into the host microbiome. Recent literature suggests also that common reagents, kits and laboratory equipment may be systematically contaminated with bacterial DNA, which appears to be sampled by metagenome projects non‐specifically. We review several bioinformatic criteria that help to distinguish putative horizontal gene transfers from the admixture of genes from autonomously replicating bacteria in their hosts' genome databases or from the reagent contamination.  相似文献   

9.
Stevens TJ  Arkin IT 《Proteins》2000,39(4):417-420
One may speculate that higher organisms require a proportionately greater abundance of membrane proteins within their genomes in order to furnish the requirements of differentiated cell types, compartmentalization, and intercellular signalling. With the recent availability of several complete prokaryotic genome sequences and sufficient progress in many eukaryotic genome sequencing projects, we seek to test this hypothesis. Using optimized hydropathy analysis of proteins in several, diverse proteomes, we show that organisms of the three domains of life-Eukarya, Eubacteria, and Archaea-have similar proportions of alpha-helical membrane proteins within their genomes and that these are matched by the complexity of the aqueous components.  相似文献   

10.
The human genome initiative has provided the motivating force for launching sequencing projects suitable for testing various DNA-sequencing strategies, as well as motivating the development of mapping and sequencing technologies. In addition to projects targeting selected regions of the human genome, other projects are based on model organisms such as yeast, nematode and mouse. The sequencing of homologous regions of human and mouse genomes is a new approach to genome analysis, and is providing insights into gene evolution, function and regulation which could not be determined so easily from the analysis of just one species.  相似文献   

11.
Liu H  Fu Y  Xie J  Cheng J  Ghabrial SA  Li G  Peng Y  Yi X  Jiang D 《Journal of virology》2011,85(19):9863-9876
Parvoviruses infect humans and a broad range of animals, from mammals to crustaceans, and generally are associated with a variety of acute and chronic diseases. However, many others cause persistent infections and are not known to be associated with any disease. Viral persistence is likely related to the ability to integrate into the chromosomal DNA and to establish a latent infection. However, there is little evidence for genome integration of parvoviral DNA except for Adeno-associated virus (AAV). Here we performed a systematic search for homologs of parvoviral proteins in publicly available eukaryotic genome databases followed by experimental verification and phylogenetic analysis. We conclude that parvoviruses have frequently invaded the germ lines of diverse animal species, including mammals, fishes, birds, tunicates, arthropods, and flatworms. The identification of orthologous endogenous parvovirus sequences in the genomes of humans and other mammals suggests that parvoviruses have coexisted with mammals for at least 98 million years. Furthermore, some of the endogenized parvoviral genes were expressed in eukaryotic organisms, suggesting that these viral genes are also functional in the host genomes. Our findings may provide novel insights into parvovirus biology, host interactions, and evolution.  相似文献   

12.
Identifying bacterial genes and endosymbiont DNA with Glimmer   总被引:11,自引:0,他引:11  
MOTIVATION: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. RESULTS: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. AVAILABILITY: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.  相似文献   

13.
14.
BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes.  相似文献   

15.
Genome compaction and stability in microsporidian intracellular parasites   总被引:13,自引:0,他引:13  
Microsporidian genomes are extraordinary among eukaryotes for their extreme reduction: although they are similar in form to other eukaryotic genomes, they are typically smaller than many prokaryotic genomes. At the same time, their rates of sequence evolution are among the highest for eukaryotic organisms. To explore the effects of compaction on nuclear genome evolution, we sequenced 685,000 bp of the Antonospora locustae genome (formerly Nosema locustae) and compared its organization with the recently completed genome of the human parasite Encephalitozoon cuniculi. Despite being very distantly related, the genomes of these two microsporidian species have retained an unexpected degree of synteny: 13% of genes are in the same context, and 30% of the genes were separated by a small number of short rearrangements. Microsporidian genomes are, therefore, paradoxically composed of rapidly evolving sequences harbored within a slowly evolving genome, although these two processes are sometimes considered to be coupled. Microsporidian genomes show that eukaryotic genomes (like genes) do not evolve in a clock-like fashion, and genome stability may result from compaction in addition to a lack of recombination, as has been traditionally thought to occur in bacterial and organelle genomes.  相似文献   

16.
The budding yeast, Saccharomyces cerevisiae, is an excellent model system for the study of DNA polymerases and their roles in DNA replication, repair, and recombination. Presently ten DNA polymerases have been purified and characterized from S. cerevisiae. Rapid advances in genome sequencing projects for yeast and other organisms have greatly facilitated and accelerated the identification of yeast enzymes and their homologues in other eukaryotic species. This article reviews current available research on yeast DNA polymerases and their functional roles in DNA metabolism. Relevant information about eukaryotic homologues of these enzymes will also be discussed.  相似文献   

17.
18.
Heterodera glycines, the soybean cyst nematode (SCN), is a damaging agricultural pest that could be effectively managed if critical phenotypes, such as virulence and host range could be understood. While SCN is amenable to genetic analysis, lack of DNA sequence data prevents the use of such methods to study this pathogen. Fortunately, new methods of DNA sequencing that produced large amounts of data and permit whole genome comparative analyses have become available. In this study, 400 million bases of genomic DNA sequence were collected from two inbred biotypes of SCN using 454 micro-bead DNA sequencing. Comparisons to a BAC, sequenced by Sanger sequencing, showed that the micro-bead sequences could identify low and high copy number regions within the BAC. Potential single nucleotide polymorphisms (SNPs) between the two SCN biotypes were identified by comparing the two sets of sequences. Selected resequencing revealed that up to 84% of the SNPs were correct. We conclude that the quality of the micro-bead sequence data was sufficient for de novo SNP identification and should be applicable to organisms with similar genome sizes and complexities. The SNPs identified will be an important starting point in associating phenotypes with specific regions of the SCN genome.  相似文献   

19.
Eukaryotes arose from an endosymbiotic association of an alpha-proteobacterium-like organism (the ancestor of mitochondria) with a host cell (lacking mitochondria or plastids). Plants arose by the addition of a cyanobacterium-like endosymbiont (the ancestor of plastids) to the two-member association. Each member of the association brought a unique internal environment and a unique genome. Analyses of recently acquired genomic sequences with newly developed algorithms have revealed (a) that the number of endosymbiont genes that remain in eukaryotic cells-principally in the nucleus-is surprisingly large, (b) that protein products of a large number of genes (or their descendents) that entered the association in the genome of the host are now directed to an organelle derived from an endosymbiont, and (c) that protein products of genes traceable to endosymbiont genomes are directed to the nucleo-cytoplasmic compartment. Consideration of these remarkable findings has led to the present suggestion that contemporary eukaryotic cells evolved through continual chance relocation and testing of genes as well as combinations of gene products and biochemical processes in each unique cell compartment derived from a member of the eukaryotic association. Most of these events occurred during about 300 million years, or so, before contemporary forms of eukaryotic cells appear in the fossil record; they continue today.  相似文献   

20.
Gene-array technologies have been applied in a wide number of organisms to study gene expression profiling under several physiological and experimental conditions. Gene-array implementations combined with the information arising from emerging genome sequencing projects are expected to be in the near future a major tool to characterize genes involved in different processes. So far, gene expression profile studies in trypanosomatids have been performed in microarrays that use a glass support to immobilize fragments of genomic DNA followed by fluorescent detection. Here, we wanted to test the potential of genomic DNA macroarrays of Leishmania infantum using nylon membranes and radioactive detection. Nylon macroarrays present a number of advantages since the processing of the membranes is based on standard Southern blotting protocols familiar to molecular biologists, and the data acquisition equipment is available to most research institutions. Nylon macroarrays were employed to search for genes showing increased mRNA abundance during an axenic differentiation of L. infantum promastigotes to amastigotes. Several clones were rescued and, after validation by Northern blot assays, these L. infantum sequences were used to screen the Leishmania major gene database. The L. major contigs with high homology to the L. infantum sequences allowed a consistent identification of the regulated genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号