首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Studying the evolution of the function of duplicated genes usually implies an estimation of the extent of functional conservation/divergence between duplicates from comparison of actual sequences. This only reveals the possible molecular function of genes without taking into account their cellular function(s). We took into consideration this latter dimension of gene function to approach the functional evolution of duplicated genes by analyzing the protein-protein interaction network in which their products are involved. For this, we derived a functional classification of the proteins using PRODISTIN, a bioinformatics method allowing comparison of protein function. Our work focused on the duplicated yeast genes, remnants of an ancient whole-genome duplication.  相似文献   

2.
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.  相似文献   

3.
J. H. Nadeau  D. Sankoff 《Genetics》1997,147(3):1259-1266
Duplicated genes are an important source of new protein functions and novel developmental and physiological pathways. Whereas most models for fate of duplicated genes show that they tend to be rapidly lost, models for pathway evolution suggest that many duplicated genes rapidly acquire novel functions. Little empirical evidence is available, however, for the relative rates of gene loss vs. divergence to help resolve these contradictory expectations. Gene families resulting from genome duplications provide an opportunity to address this apparent contradiction. With genome duplication, the number of duplicated genes in a gene family is at most 2(n), where n is the number of duplications. The size of each gene family, e.g., 1, 2, 3, . . . , 2(n), reflects the patterns of gene loss vs. functional divergence after duplication. We focused on gene families in humans and mice that arose from genome duplications in early vertebrate evolution and we analyzed the frequency distribution of gene family size, i.e., the number of families with two, three or four members. All the models that we evaluated showed that duplicated genes are almost as likely to acquire a new and essential function as to be lost through acquisition of mutations that compromise protein function. An explanation for the unexpectedly high rate of functional divergence is that duplication allows genes to accumulate more neutral than disadvantageous mutations, thereby providing more opportunities to acquire diversified functions and pathways.  相似文献   

4.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

5.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

6.
The identification of over 500 protein kinases encoded by the human genome sequence offers one measure of the importance of protein kinase networks in cell biology. High throughput technologies for inactivating genes are producing an awe-inspiring amount of data on the cellular and organismal effects of reducing the levels of individual protein kinases. Despite these technical advances, our understanding of kinase networks remains imprecise. Major challenges include correctly assigning kinases to particular networks, understanding how they are regulated, and identifying the relevant in vivo substrates. Genetic methods provide a way of addressing these questions, but their application requires understanding the nuances of how different types of mutations can affect protein kinases. The goal of this article is to provide a brief introductory primer into these issues using examples from yeast MAPK cascades and to motivate future systematic genetic analysis focusing on individual residues of protein kinases.  相似文献   

7.
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.  相似文献   

8.
Protein function is a complex notion, which is now receiving renewed attention from a bioinformatics and genomics perspective. After a general discussion of the principles of experimental methods employed to decipher gene/protein function, the contributions made by new, high-throughput methods in terms of function discovery are discussed. Recent work on functional ontologies and the necessity to describe function within the context of hierarchical levels of complexity are presented. The concepts of molecular interactions and genetic networks are then discussed, leading to a useful new framework with which to describe protein function using new tools such as 2D interaction maps. Finally, it is proposed that interaction data could be used to develop new methods for the functional classification of proteins. An example of functional comparisons on a real data set of yeast chromosomal proteins is presented.  相似文献   

9.
The ends of eukaryotic chromosomes are protected by telomeres, nucleoprotein structures that are essential for chromosomal stability and integrity. Understanding how telomere length is controlled has significant medical implications, especially in the fields of aging and cancer. Two recent systematic genome‐wide surveys measuring the telomere length of deleted mutants in the yeast Saccharomyces cerevisiae have identified hundreds of telomere length maintenance (TLM) genes, which span a large array of functional categories and different localizations within the cell. This study presents a novel general method that integrates large‐scale screening mutant data with protein–protein interaction information to rigorously chart the cellular subnetwork underlying the function investigated. Applying this method to the yeast telomere length control data, we identify pathways that connect the TLM proteins to the telomere‐processing machinery, and predict new TLM genes and their effect on telomere length. We experimentally validate some of these predictions, demonstrating that our method is remarkably accurate. Our results both uncover the complex cellular network underlying TLM and validate a new method for inferring such networks.  相似文献   

10.
11.
The existence of genes that, when knocked out, result in no obvious phenotype has puzzled biologists for many years. The phenomenon is often ascribed to redundancy in regulatory networks, caused by duplicated genes. However, a recent systematic analysis of data from the yeast genome projects does not support a link between gene duplications and redundancies. An alternative explanation suggests that genes might also evolve by very weak selection, which would mean that their true function cannot be studied in normal laboratory experiments. This problem is comparable to Heisenberg's uncertainty relationship in physics. It is possible to formulate an analogous relationship for biology, which, at its extreme, predicts that the understanding of the full function of a gene might require experiments on an evolutionary scale, involving the entire effective population size of a given species.  相似文献   

12.
Protein-protein interaction networks are useful in contextual annotation of protein function and in general to achieve a system-level understanding of cellular behavior. This work reports on the social behavior of the yeast protein-protein interaction network and concludes that it is non-random. This work, while providing an analysis of organization of genes into functional societies, can potentially be useful in assessing the accuracy of contextual gene annotation based on such interaction networks.  相似文献   

13.
Study of mutant phenotypes is a fundamental method for understanding gene function. The construction of a near-complete collection of yeast knockouts (YKO) and the unique molecular barcodes (or TAGs) that identify each strain has enabled quantitative functional profiling of Saccharomyces cerevisiae. By using these TAGs and the SGA reporter, MFA1pr-HIS3, which facilitates conversion of heterozygous diploid YKO strains into haploid mutants, we have developed a set of highly efficient microarray-based techniques, collectively referred as dSLAM (diploid-based synthetic lethality analysis on microarrays), to probe genome-wide gene-chemical and gene-gene interactions. Direct comparison revealed that these techniques are more robust than existing methods in functional profiling of the yeast genome. Widespread application of these tools will elucidate a comprehensive yeast genetic network.  相似文献   

14.
15.
Theoretical and practical advances in genome halving   总被引:4,自引:0,他引:4  
MOTIVATION: Duplication of an organism's entire genome is a rare but spectacular event, enabling the rapid emergence of multiple new gene functions. Over time, the parallel linkage of duplicated genes across chromosomes may be disrupted by reciprocal translocations, while the intra-chromosomal order of genes may be shuffled by inversions and transpositions. Some duplicate genes may evolve unrecognizably or be deleted. As a consequence, the only detectable signature of an ancient duplication event in a modern genome may be the presence of various chromosomal segments containing parallel paralogous genes, with each segment appearing exactly twice in the genome. The problem of reconstructing the linkage structure of an ancestral genome before duplication is known as genome halving with unordered chromosomes. RESULTS: In this paper, we derive a new upper bound on the genome halving distance that is tighter than the best known, and a new lower bound that is almost always tighter than the best known. We also define the notion of genome halving diameter, and obtain both upper and lower bounds for it. Our tighter bounds on genome halving distance yield a new algorithm for reconstructing an ancestral duplicated genome. We create a software package GenomeHalving based on this new algorithm and test it on the yeast genome, identifying a sequence of translocations for halving the yeast genome that is shorter than previously conjectured possible.  相似文献   

16.
The KEGG databases at GenomeNet   总被引:30,自引:0,他引:30       下载免费PDF全文
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).  相似文献   

17.
Population genetic theory of gene duplication suggests that the preservation of duplicate copies requires functional divergence upon duplication. Genes that can be readily modified to produce new gene expression patterns may thus be duplicated often. In yeast, genes exhibit dichotomous expression patterns based on their promoter architectures. The expression of genes that contain TATA box or occupied proximal nucleosome (OPN) tends to be variable and respond to external signals. On the other hand, genes without TATA box or with depleted proximal nucleosome (DPN) are expressed constitutively. We find that recent duplicates in the yeast genome are heavily biased to be TATA box containing genes and not to be DPN genes. This suggests that variably expressed genes, due to the functional organization in their promoters, have higher duplicability than constitutively expressed genes.  相似文献   

18.
Being in the right location at the right time   总被引:1,自引:1,他引:0       下载免费PDF全文
Pepperkok R  Simpson JC  Wiemann S 《Genome biology》2001,2(9):reviews1024.1-reviews10244
Taking each coding sequence from the human genome in turn and identifying the subcellular localization of the corresponding protein would be a significant contribution to understanding the function of each of these genes and to deciphering functional networks. This article highlights current approaches aimed at achieving this goal.  相似文献   

19.
The function of the cell division cycle gene, CDC4, is required in Saccharomyces cerevisiae for progression beyond the G1 phase of the cell cycle. The wild-type gene was isolated from a plasmid library by selection for complementation of a recessive, temperature-sensitive allele. Hybridization of genomic sequences with the cloned gene revealed the presence of a duplicated sequence. Both CDC4 and the duplicated sequence were subjected to DNA sequence analysis. These analyses revealed (1) that CDC4 contains a large open reading frame encoding a protein of 779 amino acids, and (2) that the duplicated sequence bears strong homology with the carboxy-terminal segment of this open reading frame. Presence of a nonsense codon within the duplicated sequence suggested that it does not encode a functional product. Disruption of the duplicated sequence within the yeast genome provided a more critical test for function. The absence of any detectable phenotype for this disruption confirms that the sequence should be considered a pseudogene. The marker inserted to disrupt the sequence also served to map the duplication and to establish that it is not genetically linked to CDC4. The structural features determined suggest evolutionary relationships between these genes as well as between the CDC4 product and other proteins.  相似文献   

20.
Patterns of network connection of members of multigene families were examined for two biological networks: a genetic network from the yeast Saccharomyces cerevisiae and a protein–protein interaction network from Caenorhabditis elegans. In both networks, genes belonging to gene families represented by a single member in the genome (“singletons”) were disproportionately represented among the nodes having large numbers of connections. Of 68 single-member yeast families with 25 or more network connections, 28 (44.4%) were located in duplicated genomic segments believed to have originated from an ancient polyploidization event; thus, each of these 28 loci was thus presumably duplicated along with the genomic segment to which it belongs, but one of the two duplicates has subsequently been deleted. Nodes connected to major “hubs” with a large number of connections, tended to be relatively sparsely interconnected among themselves. Furthermore, duplicated genes, even those arising from recent duplication, rarely shared many network connections, suggesting that network connections are remarkably labile over evolutionary time. These factors serve to explain well-known general properties of biological networks, including their scale-free and modular nature. [Reviewing Editor : Dr. Manyuan Long]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号