首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large-scale protein interaction networks (PINs) have typically been discerned using affinity purification followed by mass spectrometry (AP/MS) and yeast two-hybrid (Y2H) techniques. It is generally recognized that Y2H screens detect direct binary interactions while the AP/MS method captures co-complex associations; however, the latter technique is known to yield prevalent false positives arising from a number of effects, including abundance. We describe a novel approach to compute the propensity for two proteins to co-purify in an AP/MS data set, thereby allowing us to assess the detected level of interaction specificity by analyzing the corresponding distribution of interaction scores. We find that two recent AP/MS data sets of yeast contain enrichments of specific, or high-scoring, associations as compared to commensurate random profiles, and that curated, direct physical interactions in two prominent data bases have consistently high scores. Our scored interaction data sets are generally more comprehensive than those of previous studies when compared against four diverse, high-quality reference sets. Furthermore, we find that our scored data sets are more enriched with curated, direct physical associations than Y2H sets. A high-confidence protein interaction network (PIN) derived from the AP/MS data is revealed to be highly modular, and we show that this topology is not the result of misrepresenting indirect associations as direct interactions. In fact, we propose that the modularity in Y2H data sets may be underrepresented, as they contain indirect associations that are significantly enriched with false negatives. The AP/MS PIN is also found to contain significant assortative mixing; however, in line with a previous study we confirm that Y2H interaction data show weak disassortativeness, thus revealing more clearly the distinctive natures of the interaction detection methods. We expect that our scored yeast data sets are ideal for further biological discovery and that our scoring system will prove useful for other AP/MS data sets.  相似文献   

2.
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.  相似文献   

3.
4.
5.
The yeast genetics community has embraced genomic biology, and there is a general understanding that obtaining a full encyclopedia of functions of the approximately 6000 genes is a worthwhile goal. The yeast literature comprises over 40,000 research papers, and the number of yeast researchers exceeds the number of genes. There are mutated and tagged alleles for virtually every gene, and hundreds of high-throughput data sets and computational analyses have been described. Why, then, are there >1000 genes still listed as uncharacterized on the Saccharomyces Genome Database, 10 years after sequencing the genome of this powerful model organism? Examination of the currently uncharacterized gene set suggests that while some are small or newly discovered, the vast majority were evident from the initial genome sequence. Most are present in multiple genomics data sets, which may provide clues to function. In addition, roughly half contain recognizable protein domains, and many of these suggest specific metabolic activities. Notably, the uncharacterized gene set is highly enriched for genes whose only homologs are in other fungi. Achieving a full catalog of yeast gene functions may require a greater focus on the life of yeast outside the laboratory.  相似文献   

6.
A neural network has been used to reduce the dimensionality of multivariate data sets to produce two-dimensional (2D) displays of these sets. The data consisted of physicochemical properties for sets of biologically active molecules calculated by computational chemistry methods. Previous work has demonstrated that these data contain sufficient relevant information to classify the compounds according to their biological activity. The plots produced by the neural network are compared with results from two other techniques for linear and nonlinear dimension reduction, and are shown to give comparable and, in one case, superior results. Advantages of this technique are discussed.  相似文献   

7.
We examine the translated open reading frames (ORFs) of the yeast Saccharomyces cerevisiae, focusing on those that have FASTA matches in phyletically defined sets of completely sequenced genomes. On this basis, we identify archaeal yeast, bacterial yeast, universal yeast, and yeast ORFs that do not have a match in any of nine prokaryote genomes. Similarly, we examine the yeast mitochondrial genome and the subset of the yeast nuclear ORFs identified as being involved in mitochondrial biogenesis. For the yeast ORFs that match one or more ORFs in these prokaryote genomes, we examine the phyletic and functional distributions of these matches as a function of match strength. These results provide genome level insights into the origin of the eukaryotic cell and the origin of mitochondria. More generally, they exemplify how the growing database of prokaryote genome sequences can help us understand eukaryote genomes.  相似文献   

8.
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters.  相似文献   

9.
Although still not much understood, the universal reverse complement symmetry in genomes may contain much information about the genome. In this article, under the hypothesis that recombination rate variations may be related to the high order DNA structure, we studied the association between local recombination rates and local symmetry levels in mouse, rat and human. We found significant negative correlations between recombination rates and reverse complement compositional symmetries in these three organisms. This negative correlation pattern also held at individual chromosome levels when data only from each individual chromosome was analyzed.  相似文献   

10.
Hepatitis E virus (HEV) is a major human pathogen in much of the developing world. It is a plus-strand RNA virus with a 7.2-kb polyadenylated genome consisting of three open reading frames, ORF1, ORF2, and ORF3. Of these, ORF2 encodes the major capsid protein of the virus and ORF3 encodes a small protein of unknown function. Using the yeast three-hybrid system and traditional biochemical techniques, we have studied the RNA binding activities of ORF2 and ORF3, two proteins encoded in the 3' structural part of the genome. Since the genomic RNA from HEV has been postulated to contain secondary structures at the 5' and 3' ends, we used these two terminal regions, besides other regions within the genome, in this study. Experiments were designed to test for interactions between the genomic RNA fusion constructs with ORF2 and ORF3 hybrid proteins in a yeast cellular environment. We show here that the ORF2 protein contains RNA binding activity. The ORF2 protein specifically bound the 5' end of the HEV genome. Deletion analysis of this protein showed that its RNA binding activity was lost when deletions were made beyond the N-terminal 111 amino acids. Finer mapping of the interacting RNA revealed that a 76-nucleotide (nt) region at the 5' end of the HEV genome was responsible for binding the ORF2 protein. This 76-nt region included the 51-nt HEV sequence, conserved across alphaviruses. Our results support the requirement of this conserved sequence for interaction with ORF2 and also indicate an increase in the strength of the RNA-protein interaction when an additional 44 bases downstream of this 76-nt region were included. Secondary-structure predictions and the location of the ORF2 binding region within the HEV genome indicate that this interaction may play a role in viral encapsidation.  相似文献   

11.
High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone.  相似文献   

12.
13.
Summary The genus Avena consists of at least 23 species composed of three ploidy levels. Cytogenetic analysis has characterised four distinct karyotypes. These are the A, B, C and D genomes. We have isolated a repeated sequence clone that can be used for the detection of the C genome in Avena by filter hybridization techniques. This clone, termed RS-1, is a genomic DNA clone containing at least one highly repeated sequence that is abundant in Avena species containing the C genome. This sequence or a related sequence is also present, but at much reduced levels, in species that do not contain the C genome. Because of its abundance and the characteristic Southern blot pattern, we have termed this clone a C genome specific clone. We have also done similar analysis of the Avena genus using a rDNA clone from wheat. The results of these experiments demonstrate that clearly definable C genome-specific markers can be identified with both probes. These molecular probes can be useful in studying the genomic relationships of Avena and can provide some clues as to the origin of the cultivated Avena species. These results can, therefore, provide breeders with directions for the efficient transfer of desirable traits of wild Avena species into commencal varieties.  相似文献   

14.
Experimental protein-protein interaction (PPI) networks are increasingly being exploited in diverse ways for biological discovery. Accordingly, it is vital to discern their underlying natures by identifying and classifying the various types of deterministic (specific) and probabilistic (nonspecific) interactions detected. To this end, we have analyzed PPI networks determined using a range of high-throughput experimental techniques with the aim of systematically quantifying any biases that arise from the varying cellular abundances of the proteins. We confirm that PPI networks determined using affinity purification methods for yeast and Eschericia coli incorporate a correlation between protein degree, or number of interactions, and cellular abundance. The observed correlations are small but statistically significant and occur in both unprocessed (raw) and processed (high-confidence) data sets. In contrast, the yeast two-hybrid system yields networks that contain no such relationship. While previously commented based on mRNA abundance, our more extensive analysis based on protein abundance confirms a systematic difference between PPI networks determined from the two technologies. We additionally demonstrate that the centrality-lethality rule, which implies that higher-degree proteins are more likely to be essential, may be misleading, as protein abundance measurements identify essential proteins to be more prevalent than nonessential proteins. In fact, we generally find that when there is a degree/abundance correlation, the degree distributions of nonessential and essential proteins are also disparate. Conversely, when there is no degree/abundance correlation, the degree distributions of nonessential and essential proteins are not different. However, we show that essentiality manifests itself as a biological property in all of the yeast PPI networks investigated here via enrichments of interactions between essential proteins. These findings provide valuable insights into the underlying natures of the various high-throughput technologies utilized to detect PPIs and should lead to more effective strategies for the inference and analysis of high-quality PPI data sets.  相似文献   

15.
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.  相似文献   

16.
Eucaryotic cells contain at least two general classes of oxygen-regulated nuclear genes: aerobic genes and hypoxic genes. Hypoxic genes are induced upon exposure to anoxia while aerobic genes are down-regulated. Recently, it has been reported that induction of some hypoxic nuclear genes in mammals and yeast requires mitochondrial respiration and that cytochrome-c oxidase functions as an oxygen sensor during this process. In this study, we have examined the role of the mitochondrion and cytochrome-c oxidase in the expression of yeast aerobic nuclear COX genes. We have found that the down-regulation of these genes in anoxic cells is reflected in reduced levels of their subunit polypeptides and that cytochrome-c oxidase subunits I, II, III, Vb, VI, VII, and VIIa are present in promitochondria from anoxic cells. By using nuclear cox mutants and mitochondrial rho(0) and mit(-) mutants, we have found that neither respiration nor cytochrome-c oxidase is required for the down-regulation of these genes in cells exposed to anoxia but that a mitochondrial genome is required for their full expression under both normoxic and anoxic conditions. This requirement for a mitochondrial genome is unrelated to the presence or absence of a functional holocytochrome-c oxidase. We have also found that the down-regulation of these genes in cells exposed to anoxia and the down-regulation that results from the absence of a mitochondrial genome are independent of one another. These findings indicate that the mitochondrial genome, acting independently of respiration and oxidative phosphorylation, affects the expression of the aerobic nuclear COX genes and suggest the existence of a signaling pathway from the mitochondrial genome to the nucleus.  相似文献   

17.
Reconstructions of phylogenetic relationships in the flowering plant family Rubiaceae have up until now relied heavily on single‐ or multi‐gene data, primarily from the plastid compartment. With the availability of cost‐ and time‐efficient techniques for generating complete genome sequences, the opportunity arises to resolve some of the relationships that, up until now, have proven problematic. Here, we contribute new data from complete 58 plastid genome sequences, representing 55 of the currently 65 recognized tribes of the Rubiaceae. Also contributed are new data from the nuclear rDNA cistrons for corresponding taxa. Phylogenetic analyses are conducted on two plastid data sets, one including data from the protein coding genes only, and a second where protein coding data are combined with non‐coding regions, and on a nuclear rDNA data set. Our results clearly show that simply adopting a “more characters” approach does not resolve the relationships in the Rubiaceae. More importantly, we identify conflicting phylogenetic signals in the data. Analyses of the same plastid data, treated as nucleotides or as codon‐degenerated data, resolve and support conflicting topologies in the subfamily Cinchonoideae. As these analyses use the same data, we interpret the conflict to result from erroneous assumptions in the models used to reconstruct our phylogenies. Conflicting signals are also identified in the analyses of the plastid versus the nuclear rDNA data sets. These analyses use data from different genomic compartments, with different inheritance patterns, and we interpret the conflicts as representing “real” conflicts, reflecting biological processes of the past.  相似文献   

18.
19.
The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale.It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics.The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome.In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http://bioinfo.mbb.yale.edu/integrate/interactions/.Abbreviations: TP: true possitive; TN: true negative; FP: false positive; FN: false negative; Y2H: yeast two-hybrid.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号