首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
RATIONALE: Modern molecular biology is generating data of unprecedented quantity and quality. Particularly exciting for biochemical pathway modeling and proteomics are comprehensive, time-dense profiles of metabolites or proteins that are measurable, for instance, with mass spectrometry, nuclear magnetic resonance or protein kinase phosphorylation. These profiles contain a wealth of information about the structure and dynamics of the pathway or network from which the data were obtained. The retrieval of this information requires a combination of computational methods and mathematical models, which are typically represented as systems of ordinary differential equations. RESULTS: We show that, for the purpose of structure identification, the substitution of differentials with estimated slopes in non-linear network models reduces the coupled system of differential equations to several sets of decoupled algebraic equations, which can be processed efficiently in parallel or sequentially. The estimation of slopes for each time series of the metabolic or proteomic profile is accomplished with a 'universal function' that is computed directly from the data by cross-validated training of an artificial neural network (ANN). CONCLUSIONS: Without preprocessing, the inverse problem of determining structure from metabolic or proteomic profile data is challenging and computationally expensive. The combination of system decoupling and data fitting with universal functions simplifies this inverse problem very significantly. Examples show successful estimations and current limitations of the method. AVAILABILITY: A preliminary Web-based application for ANN smoothing is accessible at http://bioinformatics.musc.edu/webmetabol/. S-systems can be interactively analyzed with the user-friendly freeware PLAS (http://correio.cc.fc.ul.pt/~aenf/plas.html) or with the MATLAB module BSTLab (http://bioinformatics.musc.edu/bstlab/), which is currently being beta-tested.  相似文献   

2.
TH Chueh  HH Lu 《PloS one》2012,7(8):e42095
One great challenge of genomic research is to efficiently and accurately identify complex gene regulatory networks. The development of high-throughput technologies provides numerous experimental data such as DNA sequences, protein sequence, and RNA expression profiles makes it possible to study interactions and regulations among genes or other substance in an organism. However, it is crucial to make inference of genetic regulatory networks from gene expression profiles and protein interaction data for systems biology. This study will develop a new approach to reconstruct time delay Boolean networks as a tool for exploring biological pathways. In the inference strategy, we will compare all pairs of input genes in those basic relationships by their corresponding [Formula: see text]-scores for every output gene. Then, we will combine those consistent relationships to reveal the most probable relationship and reconstruct the genetic network. Specifically, we will prove that [Formula: see text] state transition pairs are sufficient and necessary to reconstruct the time delay Boolean network of [Formula: see text] nodes with high accuracy if the number of input genes to each gene is bounded. We also have implemented this method on simulated and empirical yeast gene expression data sets. The test results show that this proposed method is extensible for realistic networks.  相似文献   

3.

Background

Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes.

Methodology

A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target.

Conclusions/Significance

Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).  相似文献   

4.
We developed PathAct, a novel method for pathway analysis to investigate the biological and clinical implications of the gene expression profiles. The advantage of PathAct in comparison with the conventional pathway analysis methods is that it can estimate pathway activity levels for individual patient quantitatively in the form of a pathway-by-sample matrix. This matrix can be used for further analysis such as hierarchical clustering and other analysis methods. To evaluate the feasibility of PathAct, comparison with frequently used gene-enrichment analysis methods was conducted using two public microarray datasets. The dataset #1 was that of breast cancer patients, and we investigated pathways associated with triple-negative breast cancer by PathAct, compared with those obtained by gene set enrichment analysis (GSEA). The dataset #2 was another breast cancer dataset with disease-free survival (DFS) of each patient. Contribution by each pathway to prognosis was investigated by our method as well as the Database for Annotation, Visualization and Integrated Discovery (DAVID) analysis. In the dataset #1, four out of the six pathways that satisfied p < 0.05 and FDR < 0.30 by GSEA were also included in those obtained by the PathAct method. For the dataset #2, two pathways (“Cell Cycle” and “DNA replication”) out of four pathways by PathAct were commonly identified by DAVID analysis. Thus, we confirmed a good degree of agreement among PathAct and conventional methods. Moreover, several applications of further statistical analyses such as hierarchical cluster analysis by pathway activity, correlation analysis and survival analysis between pathways were conducted.  相似文献   

5.
6.
An improved algorithm for clustering gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. RESULTS: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.  相似文献   

7.
8.

Background  

Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives.  相似文献   

9.

Background

Protein interaction networks (PINs) are known to be useful to detect protein complexes. However, most available PINs are static, which cannot reflect the dynamic changes in real networks. At present, some researchers have tried to construct dynamic networks by incorporating time-course (dynamic) gene expression data with PINs. However, the inevitable background noise exists in the gene expression array, which could degrade the quality of dynamic networkds. Therefore, it is needed to filter out contaminated gene expression data before further data integration and analysis.

Results

Firstly, we adopt a dynamic model-based method to filter noisy data from dynamic expression profiles. Then a new method is proposed for identifying active proteins from dynamic gene expression profiles. An active protein at a time point is defined as the protein the expression level of whose corresponding gene at that time point is higher than a threshold determined by a standard variance involved threshold function. Furthermore, a noise-filtered active protein interaction network (NF-APIN) is constructed. To demonstrate the efficiency of our method, we detect protein complexes from the NF-APIN, compared with those from other dynamic PINs.

Conclusion

A dynamic model based method can effectively filter out noises in dynamic gene expression data. Our method to compute a threshold for determining the active time points of noise-filtered genes can make the dynamic construction more accuracy and provide a high quality framework for network analysis, such as protein complex prediction.
  相似文献   

10.

Background  

Application of phenetic methods to gene expression analysis proved to be a successful approach. Visualizing the results in a 3-dimentional space may further enhance these techniques.  相似文献   

11.
12.
MOTIVATION: A number of community profiling approaches have been widely used to study the microbial community composition and its variations in environmental ecology. Automated Ribosomal Intergenic Spacer Analysis (ARISA) is one such technique. ARISA has been used to study microbial communities using 16S-23S rRNA intergenic spacer length heterogeneity at different times and places. Owing to errors in sampling, random mutations in PCR amplification, and probably mostly variations in readings from the equipment used to analyze fragment sizes, the data read directly from the fragment analyzer should not be used for down stream statistical analysis. No optimal data preprocessing methods are available. A commonly used approach is to bin the reading lengths of the 16S-23S intergenic spacer. We have developed a dynamic programming algorithm based binning method for ARISA data analysis which minimizes the overall differences between replicates from the same sampling location and time. RESULTS: In a test example from an ocean time series sampling program, data preprocessing identified several outliers which upon re-examination were found to be because of systematic errors. Clustering analysis of the ARISA from different times based on the dynamic programming algorithm binned data revealed important features of the biodiversity of the microbial communities.  相似文献   

13.
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.  相似文献   

14.
In systems biology, molecular interactions are typically modelled using white-box methods, usually based on mass action kinetics. Unfortunately, problems with dimensionality can arise when the number of molecular species in the system is very large, which makes the system modelling and behavior simulation extremely difficult or computationally too expensive. As an alternative, this paper investigates the identification of two molecular interaction pathways using a black-box approach. This type of method creates a simple linear-in-the-parameters model using regression of data, where the output of the model at any time is a function of previous system states of interest. One of the main objectives in building black-box models is to produce an optimal sparse nonlinear one to effectively represent the system behavior. In this paper, it is achieved by applying an efficient iterative approach, where the terms in the regression model are selected and refined using a forward and backward subset selection algorithm. The method is applied to model identification for the MAPK signal transduction pathway and the Brusselator using noisy data of different sizes. Simulation results confirm the efficacy of the black-box modelling method which offers an alternative to the computationally expensive conventional approach.  相似文献   

15.
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, “RFE_Relief algorithm” was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

16.
MOTIVATION: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. RESULTS: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and na?ve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data. AVAILABILITY: The software and datasets are available at http://www.ccbm.jhu.edu CONTACT: actan@jhu.edu.  相似文献   

17.
Tumor-specific gene expression patterns with gene expression profiles   总被引:1,自引:0,他引:1  
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, "RFE_Relief algorithm" was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

18.
ABSTRACT: BACKGROUND: Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. RESULTS: We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in-silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters. Furthermore, in both the in silico and experimental case studies, the predicted gene expression profiles are in very close agreement with the dynamics of the input data. CONCLUSIONS: Our integer programming algorithm effectively utilizes bootstrapping to identify robust gene regulatory networks from noisy, non-linear time-series gene expression data. With significant noise and non-linearities being inherent to biological systems, the present formulism, with the incorporation of network sparsity, is extremely relevant to gene regulatory networks, and while the formulation has been validated against in silico and E. Coli data, it can be applied to any biological system.  相似文献   

19.
20.
Although metastasis is the principal cause of death cause for colorectal cancer (CRC) patients, the molecular mechanisms underlying CRC metastasis are still not fully understood. In an attempt to identify metastasis-related genes in CRC, we obtained gene expression profiles of 55 early stage primary CRCs, 56 late stage primary CRCs, and 34 metastatic CRCs from the expression project in Oncology (http://www.intgen.org/expo/). We developed a novel gene selection algorithm (SVM-T-RFE), which extends support vector machine recursive feature elimination (SVM-RFE) algorithm by incorporating T-statistic. We achieved highest classification accuracy (100%) with smaller gene subsets (10 and 6, respectively), when classifying between early and late stage primary CRCs, as well as between metastatic CRCs and late stage primary CRCs. We also compared the performance of SVM-T-RFE and SVM-RFE gene selection algorithms on another large-scale CRC dataset and the five public microarray datasets. SVM-T-RFE bestowed SVM-RFE algorithm in identifying more differentially expressed genes, and achieving highest prediction accuracy using equal or smaller number of selected genes. A fraction of selected genes have been reported to be associated with CRC development or metastasis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号