首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ghosh D  Poisson LM 《Genomics》2009,93(1):13-16
With the development of new technologies for assaying biological activity on a global basis in experimental samples, various new "-omics" signatures have been developed to predict disease progression. Such signatures hold the potential to alter the nature of clinical management of human disease. In this article, we describe some necessary statistical considerations needed to take these signatures from the discovery phase to a clinically useful assay. Much of the work discussed is in the area of cancer.  相似文献   

2.

Background

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.

Methodology and Principal Findings

Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.

Conclusions and Significance

Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution.  相似文献   

3.
Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC''s ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung''s disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.  相似文献   

4.
Fröhlich H 《PloS one》2011,6(10):e25364
Diagnostic and prognostic biomarkers for cancer based on gene expression profiles are viewed as a major step towards a better personalized medicine. Many studies using various computational approaches have been published in this direction during the last decade. However, when comparing different gene signatures for related clinical questions often only a small overlap is observed. This can have various reasons, such as technical differences of platforms, differences in biological samples or their treatment in lab, or statistical reasons because of the high dimensionality of the data combined with small sample size, leading to unstable selection of genes. In conclusion retrieved gene signatures are often hard to interpret from a biological point of view. We here demonstrate that it is possible to construct a consensus signature from a set of seemingly different gene signatures by mapping them on a protein interaction network. Common upstream proteins of close gene products, which we identified via our developed algorithm, show a very clear and significant functional interpretation in terms of overrepresented KEGG pathways, disease associated genes and known drug targets. Moreover, we show that such a consensus signature can serve as prior knowledge for predictive biomarker discovery in breast cancer. Evaluation on different datasets shows that signatures derived from the consensus signature reveal a much higher stability than signatures learned from all probesets on a microarray, while at the same time being at least as predictive. Furthermore, they are clearly interpretable in terms of enriched pathways, disease associated genes and known drug targets. In summary we thus believe that network based consensus signatures are not only a way to relate seemingly different gene signatures to each other in a functional manner, but also to establish prior knowledge for highly stable and interpretable predictive biomarkers.  相似文献   

5.

Background

In recent years real-time PCR has become a leading technique for nucleic acid detection and quantification. These assays have the potential to greatly enhance efficiency in the clinical laboratory. Choice of primer and probe sequences is critical for accurate diagnosis in the clinic, yet current primer/probe signature design strategies are limited, and signature evaluation methods are lacking.

Methods

We assessed the quality of a signature by predicting the number of true positive, false positive and false negative hits against all available public sequence data. We found real-time PCR signatures described in recent literature and used a BLAST search based approach to collect all hits to the primer-probe combinations that should be amplified by real-time PCR chemistry. We then compared our hits with the sequences in the NCBI taxonomy tree that the signature was designed to detect.

Results

We found that many published signatures have high specificity (almost no false positives) but low sensitivity (high false negative rate). Where high sensitivity is needed, we offer a revised methodology for signature design which may designate that multiple signatures are required to detect all sequenced strains. We use this methodology to produce new signatures that are predicted to have higher sensitivity and specificity.

Conclusion

We show that current methods for real-time PCR assay design have unacceptably low sensitivities for most clinical applications. Additionally, as new sequence data becomes available, old assays must be reassessed and redesigned. A standard protocol for both generating and assessing the quality of these assays is therefore of great value. Real-time PCR has the capacity to greatly improve clinical diagnostics. The improved assay design and evaluation methods presented herein will expedite adoption of this technique in the clinical lab.  相似文献   

6.
Changes in the glycosylation process appear early in carcinogenesis and evolve with the growth and spread of cancer. The correlation of the characteristic glycosylation signature with the tumor stage and the appropriate therapy choice is an important issue in translational medicine. Oncologists also pay attention to extracellular vesicles as reservoirs of new cancer glycomarkers that can be potent for cancer diagnosis/prognosis. In this review, we recall glycomarkers used in oncology and show their new glycoforms of improved clinical relevance. We summarize current knowledge on the biological functions of glycoepitopes in cancer-derived extracellular vesicles and their potential use in clinical practice. Is glycomics a future of cancer diagnosis? It may be, but in combination with other omics analyses than alone.  相似文献   

7.
8.

Background  

Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability.  相似文献   

9.
10.
The scientific techniques used in molecular biological research and drug discovery have changed dramatically over the past 10 years due to the influence of genomics, proteomics and bioinformatics. Furthermore, genomics and functional genomics are now merging into a new scientific approach called chemogenomics. Advancements in the study of molecular cell biology are dependent upon "omics" researchers realizing the importance of and using the experimental tools currently available to cell biologists. For example, novel microscopic techniques utilizing advanced computer imaging allow for the examination of live specimens in a fourth dimension, viz., time. Yet, molecular biologists have not taken full advantage of these and other traditional and novel cell biology techniques for the further advancement of genomic and proteomic-oriented research. The application of traditional and novel cellular biological techniques will enhance the science of genomics. The authors hypothesize that a stronger interdisciplinary approach must be taken between cell biology (and its closely related fields) and genomics, proteomics and bio-chemoinformatics. Since there is a lot of confusion regarding many of the "omics" definitions, this article also clarifies some of the basic terminology used in genomics, and related fields. It also reviews the current status and future potential of chemogenomics and its relationship to cell biology. The authors also discuss and expand upon the differences between chemogenomics and the relatively new term--chemoproteomics. We conclude that the advances in cell biology methods and approaches and their adoption by "omics" researchers will allow scientists to maximize our knowledge about life.  相似文献   

11.
Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age), the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further validation may be valuable in the clinical management or treatment of ovarian cancer.  相似文献   

12.
In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens.  相似文献   

13.
14.
15.
BackgroundRecent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer''s disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data.ObjectiveTo introduce and summarize the applications and challenges of machine learning methods in Alzheimer''s disease multi-source data analysis.MethodsThe literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer''s disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on.ConclusionThis study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions.  相似文献   

16.

Background

Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved.

Results

In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms.

Conclusions

The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.  相似文献   

17.

Background

Gene signatures are important to represent the molecular changes in the disease genomes or the cells in specific conditions, and have been often used to separate samples into different groups for better research or clinical treatment. While many methods and applications have been available in literature, there still lack powerful ones that can take account of the complex data and detect the most informative signatures.

Methods

In this article, we present a new framework for identifying gene signatures using Pareto-optimal cluster size identification for RNA-seq data. We first performed pre-filtering steps and normalization, then utilized the empirical Bayes test in Limma package to identify the differentially expressed genes (DEGs). Next, we used a multi-objective optimization technique, “Multi-objective optimization for collecting cluster alternatives” (MOCCA in R package) on these DEGs to find Pareto-optimal cluster size, and then applied k-means clustering to the RNA-seq data based on the optimal cluster size. The best cluster was obtained through computing the average Spearman’s Correlation Score among all the genes in pair-wise manner belonging to the module. The best cluster is treated as the signature for the respective disease or cellular condition.

Results

We applied our framework to a cervical cancer RNA-seq dataset, which included 253 squamous cell carcinoma (SCC) samples and 22 adenocarcinoma (ADENO) samples. We identified a total of 582 DEGs by Limma analysis of SCC versus ADENO samples. Among them, 260 are up-regulated genes and 322 are down-regulated genes. Using MOCCA, we obtained seven Pareto-optimal clusters. The best cluster has a total of 35 DEGs consisting of all-upregulated genes. For validation, we ran PAMR (prediction analysis for microarrays) classifier on the selected best cluster, and assessed the classification performance. Our evaluation, measured by sensitivity, specificity, precision, and accuracy, showed high confidence.

Conclusions

Our framework identified a multi-objective based cluster that is treated as a signature that can classify the disease and control group of samples with higher classification performance (accuracy 0.935) for the corresponding disease. Our method is useful to find signature for any RNA-seq or microarray data.
  相似文献   

18.
19.
YY Park  ES Park  SB Kim  SC Kim  BH Sohn  IS Chu  W Jeong  GB Mills  LA Byers  JS Lee 《PloS one》2012,7(9):e44225
Although several prognostic signatures have been developed in lung cancer, their application in clinical practice has been limited because they have not been validated in multiple independent data sets. Moreover, the lack of common genes between the signatures makes it difficult to know what biological process may be reflected or measured by the signature. By using classical data exploration approach with gene expression data from patients with lung adenocarcinoma (n = 186), we uncovered two distinct subgroups of lung adenocarcinoma and identified prognostic 193-gene gene expression signature associated with two subgroups. The signature was validated in 4 independent lung adenocarcinoma cohorts, including 556 patients. In multivariate analysis, the signature was an independent predictor of overall survival (hazard ratio, 2.4; 95% confidence interval, 1.2 to 4.8; p = 0.01). An integrated analysis of the signature revealed that E2F1 plays key roles in regulating genes in the signature. Subset analysis demonstrated that the gene signature could identify high-risk patients in early stage (stage I disease), and patients who would have benefit of adjuvant chemotherapy. Thus, our study provided evidence for molecular basis of clinically relevant two distinct two subtypes of lung adenocarcinoma.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号