首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 554 毫秒
1.
2.
3.
GeneRAGE: a robust algorithm for sequence clustering and domain detection   总被引:9,自引:0,他引:9  
MOTIVATION: Efficient, accurate and automatic clustering of large protein sequence datasets, such as complete proteomes, into families, according to sequence similarity. Detection and correction of false positive and negative relationships with subsequent detection and resolution of multi-domain proteins. RESULTS: A new algorithm for the automatic clustering of protein sequence datasets has been developed. This algorithm represents all similarity relationships within the dataset in a binary matrix. Removal of false positives is achieved through subsequent symmetrification of the matrix using a Smith-Waterman dynamic programming alignment algorithm. Detection of multi-domain protein families and further false positive relationships within the symmetrical matrix is achieved through iterative processing of matrix elements with successive rounds of Smith-Waterman dynamic programming alignments. Recursive single-linkage clustering of the corrected matrix allows efficient and accurate family representation for each protein in the dataset. Initial clusters containing multi-domain families, are split into their constituent clusters using the information obtained by the multi-domain detection step. This algorithm can hence quickly and accurately cluster large protein datasets into families. Problems due to the presence of multi-domain proteins are minimized, allowing more precise clustering information to be obtained automatically. AVAILABILITY: GeneRAGE (version 1.0) executable binaries for most platforms may be obtained from the authors on request. The system is available to academic users free of charge under license.  相似文献   

4.
5.
6.
An algorithm is presented for the automatic detection of dynamic moiety pools within steady state metabolic subnetworks which may be embedded within larger dynamic networks. This is an aid in the quantitative development of the hierarchical structure of metabolic networks. The algorithm can also be used to test for the physical readability of a stoichiometric matrix of a closed metabolic network.  相似文献   

7.
A major challenge in systems biology is to understand how complex and highly connected metabolic networks are organized. The structure of these networks is investigated here by identifying sets of metabolites that have a similar biosynthetic potential. We measure the biosynthetic potential of a particular compound by determining all metabolites than can be produced from it and, following a terminology introduced previously, call this set the scope of the compound. To identify groups of compounds with similar scopes, we apply a hierarchical clustering method. We find that compounds within the same cluster often display similar chemical structures and appear in the same metabolic pathway. For each cluster we define a consensus scope by determining a set of metabolites that is most similar to all scopes within the cluster. This allows for a generalization from scopes of single compounds to scopes of a chemical family. We observe that most of the resulting consensus scopes overlap or are fully contained in others, revealing a hierarchical ordering of metabolites according to their biosynthetic potential. Our investigations show that this hierarchy is not only determined by the chemical complexity of the metabolites, but also strongly by their biological function. As a general tendency, metabolites which are necessary for essential cellular processes exhibit a larger biosynthetic potential than those involved in secondary metabolism. A central result is that chemically very similar substances with different biological functions may differ significantly in their biosynthetic potentials. Our studies provide an important step towards understanding fundamental design principles of metabolic networks determined by the structural and functional complexity of metabolites.  相似文献   

8.
Biological networks are a topic of great current interest, particularly with the publication of a number of large genome-wide interaction datasets. They are globally characterized by a variety of graph-theoretic statistics, such as the degree distribution, clustering coefficient, characteristic path length and diameter. Moreover, real protein networks are quite complex and can often be divided into many sub-networks through systematic selection of different nodes and edges. For instance, proteins can be sub-divided by expression level, length, amino-acid composition, solubility, secondary structure and function. A challenging research question is to compare the topologies of sub- networks, looking for global differences associated with different types of proteins. TopNet is an automated web tool designed to address this question, calculating and comparing topological characteristics for different sub-networks derived from any given protein network. It provides reasonable solutions to the calculation of network statistics for sub-networks embedded within a larger network and gives simplified views of a sub-network of interest, allowing one to navigate through it. After constructing TopNet, we applied it to the interaction networks and protein classes currently available for yeast. We were able to find a number of potential biological correlations. In particular, we found that soluble proteins had more interactions than membrane proteins. Moreover, amongst soluble proteins, those that were highly expressed, had many polar amino acids, and had many alpha helices, tended to have the most interaction partners. Interestingly, TopNet also turned up some systematic biases in the current yeast interaction network: on average, proteins with a known functional classification had many more interaction partners than those without. This phenomenon may reflect the incompleteness of the experimentally determined yeast interaction network.  相似文献   

9.
Primarily used for metabolic engineering and synthetic biology, genome-scale metabolic modeling shows tremendous potential as a tool for fundamental research and curation of metabolism. Through a novel integration of flux balance analysis and genetic algorithms, a strategy to curate metabolic networks and facilitate identification of metabolic pathways that may not be directly inferable solely from genome annotation was developed. Specifically, metabolites involved in unknown reactions can be determined, and potentially erroneous pathways can be identified. The procedure developed allows for new fundamental insight into metabolism, as well as acting as a semi-automated curation methodology for genome-scale metabolic modeling. To validate the methodology, a genome-scale metabolic model for the bacterium Mycoplasma gallisepticum was created. Several reactions not predicted by the genome annotation were postulated and validated via the literature. The model predicted an average growth rate of 0.358±0.12, closely matching the experimentally determined growth rate of M. gallisepticum of 0.244±0.03. This work presents a powerful algorithm for facilitating the identification and curation of previously known and new metabolic pathways, as well as presenting the first genome-scale reconstruction of M. gallisepticum.  相似文献   

10.
Mass peak alignment (ion-wise alignment) has recently become a popular method for unsupervised data analysis in untargeted metabolic profiling. Here we present MSClust-a software tool for analysis GC-MS and LC-MS datasets derived from untargeted profiling. MSClust performs data reduction using unsupervised clustering and extraction of putative metabolite mass spectra from ion-wise chromatographic alignment data. The algorithm is based on the subtractive fuzzy clustering method that allows unsupervised determination of a number of metabolites in a data set and can deal with uncertain memberships of mass peaks in overlapping mass spectra. This approach is based purely on the actual information present in the data and does not require any prior metabolite knowledge. MSClust can be applied for both GC-MS and LC-MS alignment data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0368-2) contains supplementary material, which is available to authorized users.  相似文献   

11.
In this article, we introduce metabolite concentration coupling analysis (MCCA) to study conservation relationships for metabolite concentrations in genome-scale metabolic networks. The analysis allows the global identification of subsets of metabolites whose concentrations are always coupled within common conserved pools. Also, the minimal conserved pool identification (MCPI) procedure is developed for elucidating conserved pools for targeted metabolites without computing the entire basis conservation relationships. The approaches are demonstrated on genome-scale metabolic reconstructions of Helicobacter pylori, Escherichia coli, and Saccharomyces cerevisiae. Despite significant differences in the size and complexity of the examined organism's models, we find that the concentrations of nearly all metabolites are coupled within a relatively small number of subsets. These correspond to the overall exchange of carbon molecules into and out of the networks, interconversion of energy and redox cofactors, and the transfer of nitrogen, sulfur, phosphate, coenzyme A, and acyl carrier protein moieties among metabolites. The presence of large conserved pools can be viewed as global biophysical barriers protecting cellular systems from stresses, maintaining coordinated interconversions between key metabolites, and providing an additional mode of global metabolic regulation. The developed approaches thus provide novel and versatile tools for elucidating coupling relationships between metabolite concentrations with implications in biotechnological and medical applications.  相似文献   

12.
The increasing interest in systems biology has resulted in extensive experimental data describing networks of interactions (or associations) between molecules in metabolism, protein-protein interactions and gene regulation. Comparative analysis of these networks is central to understanding biological systems. We report a novel method (PHUNKEE: Pairing subgrapHs Using NetworK Environment Equivalence) by which similar subgraphs in a pair of networks can be identified. Like other methods, PHUNKEE explicitly considers the graphical form of the data and allows for gaps. However, it is novel in that it includes information about the context of the subgraph within the adjacent network. We also explore a new approach to quantifying the statistical significance of matching subgraphs. We report similar subgraphs in metabolic pathways and in protein-protein interaction networks. The most similar metabolic subgraphs were generally found to occur in processes central to all life, such as purine, pyrimidine and amino acid metabolism. The most similar pairs of subgraphs found in the protein-protein interaction networks of Drosophila melanogaster and Saccharomyces cerevisiae also include central processes such as cell division but, interestingly, also include protein sub-networks involved in pre-mRNA processing. The inclusion of network context information in the comparison of protein interaction networks increased the number of similar subgraphs found consisting of proteins involved in the same functional process. This could have implications for the prediction of protein function.  相似文献   

13.
14.
BNArray is a systemized tool developed in R. It facilitates the construction of gene regulatory networks from DNA microarray data by using Bayesian network. Significant sub-modules of regulatory networks with high confidence are reconstructed by using our extended sub-network mining algorithm of directed graphs. BNArray can handle microarray datasets with missing data. To evaluate the statistical features of generated Bayesian networks, re-sampling procedures are utilized to yield collections of candidate 1st-order network sets for mining dense coherent sub-networks. AVAILABILITY: The R package and the supplementary documentation are available at http://www.cls.zju.edu.cn/binfo/BNArray/.  相似文献   

15.
Emerging evidence indicates that gene products implicated in human cancers often cluster together in “hot spots” in protein-protein interaction (PPI) networks. Additionally, small sub-networks within PPI networks that demonstrate synergistic differential expression with respect to tumorigenic phenotypes were recently shown to be more accurate classifiers of disease progression when compared to single targets identified by traditional approaches. However, many of these studies rely exclusively on mRNA expression data, a useful but limited measure of cellular activity. Proteomic profiling experiments provide information at the post-translational level, yet they generally screen only a limited fraction of the proteome. Here, we demonstrate that integration of these complementary data sources with a “proteomics-first” approach can enhance the discovery of candidate sub-networks in cancer that are well-suited for mechanistic validation in disease. We propose that small changes in the mRNA expression of multiple genes in the neighborhood of a protein-hub can be synergistically associated with significant changes in the activity of that protein and its network neighbors. Further, we hypothesize that proteomic targets with significant fold change between phenotype and control may be used to “seed” a search for small PPI sub-networks that are functionally associated with these targets. To test this hypothesis, we select proteomic targets having significant expression changes in human colorectal cancer (CRC) from two independent 2-D gel-based screens. Then, we use random walk based models of network crosstalk and develop novel reference models to identify sub-networks that are statistically significant in terms of their functional association with these proteomic targets. Subsequently, using an information-theoretic measure, we evaluate synergistic changes in the activity of identified sub-networks based on genome-wide screens of mRNA expression in CRC. Cross-classification experiments to predict disease class show excellent performance using only a few sub-networks, underwriting the strength of the proposed approach in discovering relevant and reproducible sub-networks.  相似文献   

16.
Precision mapping of the metabolome   总被引:6,自引:0,他引:6  
The global study of the structure and dynamics of metabolic networks has been hindered by a lack of techniques that identify metabolites and their biochemical relationship in complex mixtures. The recent application of Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) to metabolomic analysis suggests a way to tackle the problem. A lower-cost alternative to high-field FTICR-MS, the Orbitrap mass analyzer, promises accelerated activity in this area. Here, we show how the ultra-high mass accuracy and resolution provided by this new generation of mass spectrometers can help to identify metabolites and connect them into metabolic networks. Data from perturbation studies and isotope-tracking experiments can complement this information to create metabolic maps de novo and chart unexplored areas of metabolism.  相似文献   

17.
Fourier transform mass spectrometry has recently been introduced into the field of metabolomics as a technique that enables the mass separation of complex mixtures at very high resolution and with ultra high mass accuracy. Here we show that this enhanced mass accuracy can be exploited to predict large metabolic networks ab initio, based only on the observed metabolites without recourse to predictions based on the literature. The resulting networks are highly information-rich and clearly non-random. They can be used to infer the chemical identity of metabolites and to obtain a global picture of the structure of cellular metabolic networks. This represents the first reconstruction of metabolic networks based on unbiased metabolomic data and offers a breakthrough in the systems-wide analysis of cellular metabolism.  相似文献   

18.
Mass spectrometry in combination with tracer experiments based on 13C substrates can serve as a powerful tool for the modeling and analysis of intracellular fluxes and the investigation of biochemical networks. The theoretical background for the application of mass spectrometry to metabolic flux analysis is discussed. Mass spectrometry methods are especially useful to determine mass distribution of metabolites. Additional information gained from fragmentation of metabolites, e.g., by electron impact ionization, allows further localization of labeling positions, up to complete resolution of isotopomer pools. To effectively handle mass distributions in simulation experiments, a matrix based general methodology is formulated. The natural isotope distribution of carbon, oxygen, hydrogen and nitrogen in the target metabolites is considered by introduction of correction matrices. It is shown by simulation results for the central carbon metabolism that neglecting natural isotope distributions causes significant errors in intracellular flux distributions. By varying relative fluxes into pentosephosphate pathway and pyruvate carboxylation reaction, marked changes in the mass distributions of metabolites result, which are illustrated for pyruvate, oxaloacetate, and alpha-ketoglutarate. In addition mass distributions of metabolites are significantly influenced over a broad range by the degree of reversibility of transaldolase and transketolase reactions in the pentosephosphate pathway. The mass distribution of metabolites is very sensitive towards intracellular flux patterns and can be measured with high accuracy by routine mass spectrometry methods. Copyright 1999 John Wiley & Sons, Inc.  相似文献   

19.
Baxter CJ  Liu JL  Fernie AR  Sweetlove LJ 《Phytochemistry》2007,68(16-18):2313-2319
Estimation of fluxes through metabolic networks from redistribution patterns of (13)C has become a well developed technique in recent years. However, the approach is currently limited to systems at metabolic steady-state; dynamic changes in metabolic fluxes cannot be assessed. This is a major impediment to understanding the behaviour of metabolic networks, because steady-state is not always experimentally achievable and a great deal of information about the control hierarchy of the network can be derived from the analysis of flux dynamics. To address this issue, we have developed a method for estimating non-steady-state fluxes based on the mass-balance of mass isotopomers. This approach allows multiple mass-balance equations to be written for the change in labelling of a given metabolite pool and thereby permits over-determination of fluxes. We demonstrate how linear regression methods can be used to estimate non-steady-state fluxes from these mass balance equations. The approach can be used to calculate fluxes from both mass isotopomer and positional isotopomer labelling information and thus has general applicability to data generated from common spectrometry- or NMR-based analytical platforms. The approach is applied to a GC-MS time-series dataset of (13)C-labelling of metabolites in a heterotrophic Arabidopsis cell suspension culture. Threonine biosynthesis is used to demonstrate that non-steady-state fluxes can be successfully estimated from such data while organic acid metabolism is used to highlight some common issues that can complicate flux estimation. These include multiple pools of the same metabolite that label at different rates and carbon skeleton rearrangements.  相似文献   

20.
Large-scale metabolic profiling is expected to develop into an integral part of functional genomics and systems biology. The metabolome of a cell or an organism is chemically highly complex. Therefore, comprehensive biochemical phenotyping requires a multitude of analytical techniques. Here, we describe a profiling approach that combines separation by capillary liquid chromatography with the high resolution, high sensitivity, and high mass accuracy of quadrupole time-of-flight mass spectrometry. About 2000 different mass signals can be detected in extracts of Arabidopsis roots and leaves. Many of these originate from Arabidopsis secondary metabolites. Detection based on retention times and exact masses is robust and reproducible. The dynamic range is sufficient for the quantification of metabolites. Assessment of the reproducibility of the analysis showed that biological variability exceeds technical variability. Tools were optimized or established for the automatic data deconvolution and data processing. Subtle differences between samples can be detected as tested with the chalcone synthase deficient tt4 mutant. The accuracy of time-of-flight mass analysis allows to calculate elemental compositions and to tentatively identify metabolites. In-source fragmentation and tandem mass spectrometry can be used to gain structural information. This approach has the potential to significantly contribute to establishing the metabolome of Arabidopsis and other model systems. The principles of separation and mass analysis of this technique, together with its sensitivity and resolving power, greatly expand the range of metabolic profiling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号