首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale.It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics.The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome.In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http://bioinfo.mbb.yale.edu/integrate/interactions/.Abbreviations: TP: true possitive; TN: true negative; FP: false positive; FN: false negative; Y2H: yeast two-hybrid.  相似文献   

2.
Kwon KH  Kim M  Kim JY  Kim KW  Kim SI  Park YM  Yoo JS 《Proteomics》2003,3(12):2305-2309
We compared peptide identification by database (DB) search methods with de novo sequencing results for proteomics study in an organism without genome sequence information. When the former was done by searching the Expressed Sequence Tag (EST) DB of the sample organism or the NCBI nonredundant (nr) protein DB of green plants using either the MASCOT or SEQUEST software program, it was confirmed that the former is as accurate as the latter. Peptides identified from EST DB were twice as many as those from the nr protein DB, in spite of the fact that the EST DB has less data (26 222 EST) than the NCBI nr protein DB (224 238). This study demonstrates that EST DB with tandem mass spectra can be used reliably for high-throughput proteomics studies in an organism without genome information.  相似文献   

3.
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of new genomic data?” Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity—even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.  相似文献   

4.
Complete genome doubling has long-term consequences for the genome structure and the subsequent evolution of an organism. It has been suggested that two genome duplications occurred at the origin of vertebrates (known as the 2R hypothesis). However, there has been considerable debate as to whether these were two successive duplications, or whether a single duplication occurred, followed by large-scale segmental duplications. In this article, we review and compare the evidence for the 2R duplications from vertebrate genomes with similar data from other more recent polyploids.  相似文献   

5.

Background

Three complete genomes of Prochlorococcus species, the smallest and most abundant photosynthetic organism in the ocean, have recently been published. Comparative genome analyses reveal that genome shrinkage has occurred within this genus, associated with a sharp reduction in G+C content. As all examples of genome reduction characterized so far have been restricted to endosymbionts or pathogens, with a host-dependent lifestyle, the observed genome reduction in Prochlorococcus is the first documented example of such a process in a free-living organism.

Results

Our results clearly indicate that genome reduction has been accompanied by an increased rate of protein evolution in P. marinus SS120 that is even more pronounced in P. marinus MED4. This acceleration has affected every functional category of protein-coding genes. In contrast, the 16S rRNA gene seems to have evolved clock-like in this genus. We observed that MED4 and SS120 have lost several DNA-repair genes, the absence of which could be related to the mutational bias and the acceleration of amino-acid substitution.

Conclusions

We have examined the evolutionary mechanisms involved in this process, which are different from those known from host-dependent organisms. Indeed, most substitutions that have occurred in Prochlorococcus have to be selectively neutral, as the large size of populations imposes low genetic drift and strong purifying selection. We assume that the major driving force behind genome reduction within the Prochlorococcus radiation has been a selective process favoring the adaptation of this organism to its environment. A scenario is proposed for genome evolution in this genus.  相似文献   

6.
We have sequenced and characterized the complete mitochondrial genome of the sea slug, Aplysia californica, an important model organism in experimental biology and a representative of Anaspidea (Opisthobranchia, Gastropoda). The mitochondrial genome of Aplysia is in the small end of the observed sizes of animal mitochondrial genomes (14,117 bp, NCBI Accession No. NC_005827). The Aplysia genome, like most other mitochondrial genomes, encodes genes for 2 ribosomal subunit RNAs (small and large rRNAs), 22 tRNAs, and 13 protein subunits (cytochrome c oxidase subunits 1-3, cytochrome b apoenzyme, ATP synthase subunits 6 and 8, and NADH dehydrogenase subunits 1-6 and 4L). The gene order is virtually identical between opisthobranchs and pulmonates, with the majority of differences arising from tRNA translocations. In contrast, the gene order from representatives of basal gastropods and other molluscan classes is significantly different from opisthobranchs and pulmonates. The Aplysia genome was compared to all other published molluscan mitochondrial genomes and phylogenetic analyses were carried out using a concatenated protein alignment. Phylogenetic analyses using maximum likelihood based analyses of the well aligned regions of the protein sequences support both monophyly of Euthyneura (a group including both the pulmonates and opisthobranchs) and Opisthobranchia (as a more derived group). The Aplysia mitochondrial genome sequenced here will serve as an important platform in both comparative and neurobiological studies using this model organism.  相似文献   

7.
MOTIVATION: Since the newly developed Grid platform has been considered as a powerful tool to share resources in the Internet environment, it is of interest to demonstrate an efficient methodology to process massive biological data on the Grid environments at a low cost. This paper presents an efficient and economical method based on a Grid platform to predict secondary structures of all proteins in a given organism, which normally requires a long computation time through sequential execution, by means of processing a large amount of protein sequence data simultaneously. From the prediction results, a genome scale protein fold space can be pursued. RESULTS: Using the improved Grid platform, the secondary structure prediction on genomic scale and protein topology derived from the new scoring scheme for four different model proteomes was presented. This protein fold space was compared with structures from the Protein Data Bank, database and it showed similarly aligned distribution. Therefore, the fold space approach based on this new scoring scheme could be a guideline for predicting a folding family in a given organism.  相似文献   

8.
Comparisons of Two Large Phaeoviral Genomes and Evolutionary Implications   总被引:1,自引:0,他引:1  
The evolution of viral genomes has recently attracted considerable attention. We compare the sequences of two large viral genomes, EsV-1 and FirrV-1, belonging to the family of phaeoviruses which infect different species of marine brown algae. Although their genomes differ substantially in size, these viruses share similar morphologies and similar latent infection cycles. In fact, sequence comparisons show that the viruses have more than 60% of their genes in common. However, the order of genes is completely different in the two genomes, suggesting that extensive recombinational events in addition to several large deletions had occurred during the separate evolutionary routes from a common ancestor. We investigated genes encoding components of signal transduction pathways and genes encoding replicative functions in more detail. We found that the two genomes possess different, although overlapping, sets of genes in both classes, suggesting that different genes from each class were lost, perhaps randomly, after the separate evolution from an ancestral genome. Random loss would also account for the fact that more than one-third of the genes in one viral genome has no counterparts in the other genome. We speculate that the ancestral genome belonged to a cellular organism that had once invaded a primordial brown algal host.  相似文献   

9.
Rao Y  Wu G  Wang Z  Chai X  Nie Q  Zhang X 《DNA research》2011,18(6):499-512
Synonymous codons are used with different frequencies both among species and among genes within the same genome and are controlled by neutral processes (such as mutation and drift) as well as by selection. Up to now, a systematic examination of the codon usage for the chicken genome has not been performed. Here, we carried out a whole genome analysis of the chicken genome by the use of the relative synonymous codon usage (RSCU) method and identified 11 putative optimal codons, all of them ending with uracil (U), which is significantly departing from the pattern observed in other eukaryotes. Optimal codons in the chicken genome are most likely the ones corresponding to highly expressed transfer RNA (tRNAs) or tRNA gene copy numbers in the cell. Codon bias, measured as the frequency of optimal codons (Fop), is negatively correlated with the G + C content, recombination rate, but positively correlated with gene expression, protein length, gene length and intron length. The positive correlation between codon bias and protein, gene and intron length is quite different from other multi-cellular organism, as this trend has been only found in unicellular organisms. Our data displayed that regional G + C content explains a large proportion of the variance of codon bias in chicken. Stepwise selection model analyses indicate that G + C content of coding sequence is the most important factor for codon bias. It appears that variation in the G + C content of CDSs accounts for over 60% of the variation of codon bias. This study suggests that both mutation bias and selection contribute to codon bias. However, mutation bias is the driving force of the codon usage in the Gallus gallus genome. Our data also provide evidence that the negative correlation between codon bias and recombination rates in G. gallus is determined mostly by recombination-dependent mutational patterns.  相似文献   

10.
11.
The onset of the genome era means different things to different people, but it is clear that this new age brings with it paradigm shifts that will forever affect biological research. Less clear is just how these shifts are changing the scope and scale of research. Are gigabases of raw data more useful than a single well-understood gene? Do we really need a full genome to understand the physiology of a single organism? The photosynthetic field is poised at the periphery of the bulk of genome sequencing work--understandably skewed toward health-related disciplines--and, as such, is subject to different motivations, limitations, and primary focus for each new genome. To understand some of these differences, we focus here on various indicators of the impact that genomics has had on the photosynthetic community, now a full decade since the publication of the first photosynthetic genome. Many useful indicators are indexed in public databases, providing pre- and post-genome sequence snapshots of changes in factors such as publication rate, number of proteins characterized, and sequenced genome coverage versus known diversity. As more genomes are sequenced and metagenomic projects begin to pour out billions of bases, it becomes crucial to understand how to harness this data in order to accumulate possible benefits and avoid possible pitfalls, especially as resources become increasingly directed toward natural environments governed by photosynthetic activity, ranging from hot springs to tropical forest ecosystems to the open ocean.  相似文献   

12.
Analysing proteomic data   总被引:5,自引:0,他引:5  
The rapid growth of proteomics has been made possible by the development of reproducible 2D gels and biological mass spectrometry. However, despite technical improvements 2D gels are still less than perfectly reproducible and gels have to be aligned so spots for identical proteins appear in the same place. Gels can be warped by a variety of techniques to make them concordant. When gels are manipulated to improve registration, information is lost, so direct methods for gel registration which make use of all available data for spot matching are preferable to indirect ones. In order to identify proteins from gel spots a property or combination of properties that are unique to that protein are required. These can then be used to search databases for possible matches. Molecular mass, pI, amino acid composition and short sequence tags can all be used in database searches. Currently the method of choice for protein identification is mass spectrometry. Proteins are eluted from the gels and cleaved with specific endoproteases to produce a series of peptides of different molecular mass. In peptide mass fingerprinting, the peptide profile of the unknown protein is compared with theoretical peptide libraries generated from sequences in the different databases. Tandem mass spectroscopy (MS/MS) generates short amino acid sequence tags for the individual peptides. These partial sequences combined with the original peptide masses are then used for database searching, greatly improving specificity. Increasingly protein identification from MS/MS data is being fully or partially automated. When working with organisms, which do not have sequenced genomes (the case with most helminths), protein identification by database searching becomes problematical. A number of approaches to cross species protein identification have been suggested, but if the organism being studied is only distantly related to any organism with a sequenced genome then the likelihood of protein identification remains small. The dynamic nature of the proteome means that there really is no such thing as a single representative proteome and a complete set of metadata (data about the data) is going to be required if the full potential of database mining is to be realised in the future.  相似文献   

13.
Ribosomal 5S RNA is the only identified target for proteins of the CTC family. All known proteins of this family, except for CTC from Aquifex aeolicus, contain a full-sized 5S rRNA-binding domain. In the present study a mistake in the published A. aeolicus genome is corrected. It has been demonstrated that the ctc gene of this organism encodes the protein with a full-length 5S rRNA-binding domain. This protein binds specifically to the bacterial 5S rRNA. Thereby, our data show that CTC A. aeolicus is not an exception from the other known CTC proteins.  相似文献   

14.
15.
Because ambient temperature affects biochemical reactions, organisms living in extreme temperature conditions adapt protein composition and structure to maintain biochemical functions. While it is not feasible to experimentally determine optimal growth temperature (OGT) for every known microbial species, organisms adapted to different temperatures have measurable differences in DNA, RNA and protein composition that allow OGT prediction from genome sequence alone. In this study, we built a ‘tRNA thermometer’ model using tRNA sequence to predict OGT. We used sequences from 100 archaea and 683 bacteria species as input to train two Convolutional Neural Network models. The first pairs individual tRNA sequences from different species to predict which comes from a more thermophilic organism, with accuracy ranging from 0.538 to 0.992. The second uses the complete set of tRNAs in a species to predict optimal growth temperature, achieving a maximum of 0.86; comparable with other prediction accuracies in the literature despite a significant reduction in the quantity of input data. This model improves on previous OGT prediction models by providing a model with minimum input data requirements, removing laborious feature extraction and data preprocessing steps and widening the scope of valid downstream analyses.  相似文献   

16.
With the availability of the nearly complete genomic sequence of C. elegans, the first multicellular organism to be sequenced, molecular biology has definitely entered the postgenomic era. Annotation of the genomic sequence, which refers to identifying the genes and other biologically relevant sections of the genome, is an important and nontrivial next step. A first-pass annotation will be necessarily incomplete but will drive further biological experiments, which in turn will help to annotate the genome better. Given the scale of the genome sequence analysis, it is clear that the annotation should be automated as much as possible without sacrificing the quality of analysis. In this work, we outline our approach to identifying the protein kinases of C. elegans from the genomic sequence. We describe new tools we have developed for analysis, management and visualization of genomic data. By developing modular and scalable solutions, this study has provided a framework for future analysis of the Drosophila and human genomes.  相似文献   

17.
18.
We propose two-dimensional gel electrophoresis (2-DE) and mass spectrometry to define the protein components of regulons and stimulons in bacteria, including those organisms where genome sequencing is still in progress. The basic 2-DE protocol allows high resolution and reproducibility and enables the direct comparison of hundreds or even thousands of proteins simultaneously. To identify proteins that comprise stimulons and regulons, peptide mass fingerprint (PMF) with matrix-assisted laser desorption ionization/time-of-flight mass spectrometry (MALDI-TOF-MS) analysis is the first option and, if results from this tool are insufficient, complementary data obtained with electrospray ionization tandem-MS (ESI-MS/MS) may permit successful protein identification. ESI-MS/MS and MALDI-TOF-MS provide complementary data sets, and so a more comprehensive coverage of a proteome can be obtained using both techniques with the same sample, especially when few sequenced proteins of a particular organism exist or genome sequencing is still in progress.  相似文献   

19.
20.
Mass spectrometry‐based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph‐based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号