首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.  相似文献   

2.

Background

Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets.

Methods

In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem.

Results

We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified.

Conclusion

A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.
  相似文献   

3.
4.
In Africa, more than 4 million people suffer from active tuberculosis (TB) resulting in an estimated 650,000 deaths every year. The etiologic agent of TB, Mycobacterium tuberculosis, survives in resting macrophages, which control the pathogen after activation by specific T lymphocytes. Here, we describe the basic mechanisms underlying the host response to TB with an emphasis on immunity and discuss diagnostics, drugs, and vaccines for TB. Moreover, we outline our attempts to develop biomarkers, which could help the monitoring of TB clinical trials, provide the basis for new diagnostics, and allow prognosis of outcome of infection and of drug treatment.  相似文献   

5.
Proteomic analysis is not limited to the analysis of serum or tissues. Synovial, peritoneal, pericardial and cerebrospinal fluid represent unique proteomes for disease diagnosis and prognosis. In particular, cerebrospinal fluid serves as a rich source of putative biomarkers that are not solely limited to neurologic disorders. Peptides, proteolytic fragments and antibodies are capable of crossing the blood-brain barrier, thus providing a repository of pathologic information. Proteomic technologies such as immunoblotting, isoelectric focusing, 2D gel electrophoresis and mass spectrometry have proven useful for deciphering this unique proteome. Cerebrospinal fluid proteins are generally less abundant than their corresponding serum counterparts, necessitating the development and use of sensitive analytical techniques. This review highlights some of the promising areas of cerebrospinal fluid proteomic research and their clinical applications.  相似文献   

6.
7.
Serum samples from non-Hodgkin lymphoma (NHL) patients who had not undergone chemotherapy, lymphnoditis patients, and healthy adults were analyzed using surface-enhanced laser desorption–ionization time-of-flight mass spectrometry (SELDI-TOF MS) to detect the differentially expressed serum proteins. Models were developed to distinguish between the healthy adult group and the NHL group, with a sensitivity of 69% and specificity of 90%, and between the lymphnoditis group and the NHL group with a sensitivity of 74% and specificity of 84%. A protein with the m/z of M10 197.91 u was expressed at a significantly higher level in the NHL group, compared to the other groups. Furthermore, differences were also significant among different stages of NHL and among samples with different International Prognosis Index (IPI) scores or lactase dehydrogenase (LDH) levels. The three identified proteins may offer a new serological approach for early diagnosis, differential diagnosis, and pathogenic investigation of NHL. And the protein with the m/z of M10 197.91 u may be a new serological biomarker for monitoring treatment response and evaluating the prognosis of patients with NHL.  相似文献   

8.
Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms.  相似文献   

9.
Biomarkers predict World Trade Center-Lung Injury (WTC-LI); however, there remains unaddressed multicollinearity in our serum cytokines, chemokines, and high-throughput platform datasets used to phenotype WTC-disease. To address this concern, we used automated, machine-learning, high-dimensional data pruning, and validated identified biomarkers. The parent cohort consisted of male, never-smoking firefighters with WTC-LI (FEV1, %Pred< lower limit of normal (LLN); n = 100) and controls (n = 127) and had their biomarkers assessed. Cases and controls (n = 15/group) underwent untargeted metabolomics, then feature selection performed on metabolites, cytokines, chemokines, and clinical data. Cytokines, chemokines, and clinical biomarkers were validated in the non-overlapping parent-cohort via binary logistic regression with 5-fold cross validation. Random forests of metabolites (n = 580), clinical biomarkers (n = 5), and previously assayed cytokines, chemokines (n = 106) identified that the top 5% of biomarkers important to class separation included pigment epithelium-derived factor (PEDF), macrophage derived chemokine (MDC), systolic blood pressure, macrophage inflammatory protein-4 (MIP-4), growth-regulated oncogene protein (GRO), monocyte chemoattractant protein-1 (MCP-1), apolipoprotein-AII (Apo-AII), cell membrane metabolites (sphingolipids, phospholipids), and branched-chain amino acids. Validated models via confounder-adjusted (age on 9/11, BMI, exposure, and pre-9/11 FEV1, %Pred) binary logistic regression had AUCROC [0.90(0.84–0.96)]. Decreased PEDF and MIP-4, and increased Apo-AII were associated with increased odds of WTC-LI. Increased GRO, MCP-1, and simultaneously decreased MDC were associated with decreased odds of WTC-LI. In conclusion, automated data pruning identified novel WTC-LI biomarkers; performance was validated in an independent cohort. One biomarker—PEDF, an antiangiogenic agent—is a novel, predictive biomarker of particulate-matter-related lung disease. Other biomarkers—GRO, MCP-1, MDC, MIP-4—reveal immune cell involvement in WTC-LI pathogenesis. Findings of our automated biomarker identification warrant further investigation into these potential pharmacotherapy targets.  相似文献   

10.
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step toward a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Most of these methods focus on classification problems, that is learn a model from data that discriminates patients into distinct clinical groups. Far less has been published on approaches that predict a patient's event risk. In this paper, we investigate eight methods that integrate network information into multivariable Cox proportional hazard models for risk prediction in breast cancer. We compare the prediction performance of our tested algorithms via cross‐validation as well as across different datasets. In addition, we highlight the stability and interpretability of obtained gene signatures. In conclusion, we find GeneRank‐based filtering to be a simple, computationally cheap and highly predictive technique to integrate network information into event time prediction models. Signatures derived via this method are highly reproducible.  相似文献   

11.
We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state). Finding genetic biomarkers and searching genetic–epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS). CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.  相似文献   

12.
13.
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. Selecting the optimal graph, which gives the best representation of the system among genes, is still a problem to be solved. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.  相似文献   

14.
Significant efforts are underway to develop new biomarkers from pancreatic cyst fluid. Previous research has made use of cyst fluid collected from surgically removed cysts, but the clinical implementation of biomarkers would use cyst fluid collected by endoscopic ultrasound-guided, fine-needle aspiration (EUS-FNA). The purpose of this study was to investigate the clinical applicability of cyst fluid research obtained using surgical specimens. Matched pairs of operating-room collected (OR) and EUS-FNA samples from 12 patients were evaluated for the levels of three previously described biomarkers, CA 19-9, CEA, and glycan levels detected by wheat germ agglutinin on MUC5AC (MUC5AC-WGA). CA 19-9 and MUC5AC-WGA correlated well between the sample types, although CEA was more variable between the sample types for certain patients. The variability was not due to the time delay between EUS-FNA and OR collection or differences in total protein concentrations but may be caused by contamination of the cyst fluid with blood proteins. The classification of each patient based on thresholds for each marker was perfectly consistent between sample types for CA 19-9 and MUC5AC-WGA and mostly consistent for CEA. Therefore, results obtained using OR-collected pancreatic cyst fluid samples should reliably transfer to the clinical setting using EUS-FNA samples.  相似文献   

15.
A key step in network analysis is to partition a complex network into dense modules. Currently, modularity is one of the most popular benefit functions used to partition network modules. However, recent studies suggested that it has an inherent limitation in detecting dense network modules. In this study, we observed that despite the limitation, modularity has the advantage of preserving the primary network structure of the undetected modules. Thus, we have developed a simple iterative Network Partition (iNP) algorithm to partition a network. The iNP algorithm provides a general framework in which any modularity-based algorithm can be implemented in the network partition step. Here, we tested iNP with three modularity-based algorithms: multi-step greedy (MSG), spectral clustering and Qcut. Compared with the original three methods, iNP achieved a significant improvement in the quality of network partition in a benchmark study with simulated networks, identified more modules with significantly better enrichment of functionally related genes in both yeast protein complex network and breast cancer gene co-expression network, and discovered more cancer-specific modules in the cancer gene co-expression network. As such, iNP should have a broad application as a general method to assist in the analysis of biological networks.  相似文献   

16.
Background: Identifying biomarkers for accurate diagnosis and prognosis of diseases is important for the prevention of disease development. The molecular networks that describe the functional relationships among molecules provide a global view of the complex biological systems. With the molecular networks, the molecular mechanisms underlying diseases can be unveiled, which helps identify biomarkers in a systematic way. Results: In this survey, we report the recent progress on identifying biomarkers based on the topology of molecular networks, and we categorize those biomarkers into three groups, including node biomarkers, edge biomarkers and network biomarkers. These distinct types of biomarkers can be detected under different conditions depending on the data available. Conclusions: The biomarkers identified based on molecular networks can provide more accurate diagnosis and prognosis. The pros and cons of different types of biomarkers as well as future directions to improve the methods for identifying biomarkers are also discussed.  相似文献   

17.
Xing H  Gardner TS 《Nature protocols》2006,1(6):2551-2554
This protocol details the use of the mode-of-action by network identification (MNI) algorithm to identify the gene targets of a drug treatment based on gene-expression data. Investigators might also use the MNI algorithm to identify the gene mediators of a disease or the physiological state of cells and tissues. The MNI algorithm uses a training data set of hundreds of expression profiles to construct a statistical model of gene-regulatory networks in a cell or tissue. The model describes combinatorial influences of genes on one another. The algorithm then uses the model to filter the expression profile of a particular experimental treatment and thereby distinguish the molecular targets or mediators of the treatment response from hundreds of additional genes that also exhibit expression changes. It takes approximately 1 h per run, although run time is significantly affected by the size of the genome and data set.  相似文献   

18.

Background

Glutathione metabolism can determine an individual's ability to detoxify drugs. To increase understanding of the dynamics of cellular glutathione homeostasis, we have developed an experiment-based mathematical model of the kinetics of the glutathione network. This model was used to simulate perturbations observed when human liver derived THLE cells, transfected with human cytochrome P452E1 (THLE-2E1 cells), were exposed to paracetamol (acetaminophen).

Methods

Human liver derived cells containing extra human cytochrome P4502E1 were treated with paracetamol at various levels of methionine and in the presence and absence of an inhibitor of glutamyl-cysteine synthetase (GCS). GCS activity was also measured in extracts. Intracellular and extracellular concentrations of substances involved in glutathione metabolism were measured as was damage to mitochondria and proteins. A bottom up mathematical model was made of the metabolic pathways around and including glutathione.

Results

Our initial model described some, but not all the metabolite-concentration and flux data obtained when THLE-2E1 cells were exposed to paracetamol at concentrations high enough to affect glutathione metabolism. We hypothesized that the lack of correspondence could be due to upregulation of expression of glutamyl cysteine synthetase, one of the enzymes controlling glutathione synthesis, and confirmed this experimentally. A modified model which incorporated this adaptive response adequately described the observed changes in the glutathione pathway. Use of the adaptive model to analyze the functioning of the glutathione network revealed that a threshold input concentration of methionine may be required for effective detoxification of reactive metabolites by glutathione conjugation. The analysis also provided evidence that 5-oxoproline and ophthalmic acid are more useful biomarkers of glutathione status when analyzed together than when analyzed in isolation, especially in a new, model-assisted integrated biomarker strategy.

Conclusion

A robust mathematical model of the dynamics of cellular changes in glutathione homeostasis in cells has been developed and tested in vitro.

General significance

Mathematical models of the glutathione pathway that help examine mechanisms of cellular protection against xenobiotic toxicity and the monitoring thereof, can now be made.  相似文献   

19.

Background  

The discovery of biomarkers is an important step towards the development of criteria for early diagnosis of disease status. Recently electrospray ionization (ESI) and matrix assisted laser desorption (MALDI) time-of-flight (TOF) mass spectrometry have been used to identify biomarkers both in proteomics and metabonomics studies. Data sets generated from such studies are generally very large in size and thus require the use of sophisticated statistical techniques to glean useful information. Most recent attempts to process these types of data model each compound's intensity either discretely by positional (mass to charge ratio) clustering or through each compounds' own intensity distribution. Traditionally data processing steps such as noise removal, background elimination and m/z alignment, are generally carried out separately resulting in unsatisfactory propagation of signals in the final model.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号