首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data.

Methods

We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting.

Results

We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods.

Conclusions

Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
  相似文献   

2.

Background

The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.

Methods

To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).

Results

We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE).

Conclusions

Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.
  相似文献   

3.
4.
Xia  Xiaoxuan  Weng  Haoyi  Men  Ruoting  Sun  Rui  Zee  Benny Chung Ying  Chong  Ka Chun  Wang  Maggie Haitian 《BMC genetics》2018,19(1):67-37

Background

Association studies using a single type of omics data have been successful in identifying disease-associated genetic markers, but the underlying mechanisms are unaddressed. To provide a possible explanation of how these genetic factors affect the disease phenotype, integration of multiple omics data is needed.

Results

We propose a novel method, LIPID (likelihood inference proposal for indirect estimation), that uses both single nucleotide polymorphism (SNP) and DNA methylation data jointly to analyze the association between a trait and SNPs. The total effect of SNPs is decomposed into direct and indirect effects, where the indirect effects are the focus of our investigation. Simulation studies show that LIPID performs better in various scenarios than existing methods. Application to the GAW20 data also leads to encouraging results, as the genes identified appear to be biologically relevant to the phenotype studied.

Conclusions

The proposed LIPID method is shown to be meritorious in extensive simulations and in real-data analyses.
  相似文献   

5.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

6.

Background

Post-genome era brings about diverse categories of omics data. Inference and analysis of genetic regulatory networks act prominently in extracting inherent mechanisms, discovering and interpreting the related biological nature and living principles beneath mazy phenomena, and eventually promoting the well-beings of humankind.

Results

A supervised combinatorial-optimization pattern based on information and signal-processing theories is introduced into the inference and analysis of genetic regulatory networks. An associativity measure is proposed to define the regulatory strength/connectivity, and a phase-shift metric determines regulatory directions among components of the reconstructed networks. Thus, it solves the undirected regulatory problems arising from most of current linear/nonlinear relevance methods. In case of computational and topological redundancy, we constrain the classified group size of pair candidates within a multiobjective combinatorial optimization (MOCO) pattern.

Conclusions

We testify the proposed approach on two real-world microarray datasets of different statistical characteristics. Thus, we reveal the inherent design mechanisms for genetic networks by quantitative means, facilitating further theoretic analysis and experimental design with diverse research purposes. Qualitative comparisons with other methods and certain related focuses needing further work are illustrated within the discussion section.
  相似文献   

7.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

8.

Background

An accumulation of evidence has revealed the important role of epigenetic factors in explaining the etiopathogenesis of human diseases. Several empirical studies have successfully incorporated methylation data into models for disease prediction. However, it is still a challenge to integrate different types of omics data into prediction models, and the contribution of methylation information to prediction remains to be fully clarified.

Results

A stratified drug-response prediction model was built based on an artificial neural network to predict the change in the circulating triglyceride level after fenofibrate intervention. Associated single-nucleotide polymorphisms (SNPs), methylation of selected cytosine-phosphate-guanine (CpG) sites, age, sex, and smoking status, were included as predictors. The model with selected SNPs achieved a mean 5-fold cross-validation prediction error rate of 43.65%. After adding methylation information into the model, the error rate dropped to 41.92%. The combination of significant SNPs, CpG sites, age, sex, and smoking status, achieved the lowest prediction error rate of 41.54%.

Conclusions

Compared to using SNP data only, adding methylation data in prediction models slightly improved the error rate; further prediction error reduction is achieved by a combination of genome, methylation genome, and environmental factors.
  相似文献   

9.

Introduction

Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.

Objectives

Merge in the same platform the steps required for metabolomics data processing.

Methods

KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.

Results

The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.

Conclusion

KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.
  相似文献   

10.

Background

In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.

Methods

The physical basis of these intuitive maps is clarified by means of analytically solvable problems.

Results

Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.

Conclusion

Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.
  相似文献   

11.

Background

Security concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives.

Methods

This paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov’s accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier.

Results

Our method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable.

Conclusions

We present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality.
  相似文献   

12.

Background

Recently, measuring phenotype similarity began to play an important role in disease diagnosis. Researchers have begun to pay attention to develop phenotype similarity measurement. However, existing methods ignore the interactions between phenotype-associated proteins, which may lead to inaccurate phenotype similarity.

Results

We proposed a network-based method PhenoNet to calculate the similarity between phenotypes. We localized phenotypes in the network and calculated the similarity between phenotype-associated modules by modeling both the inter- and intra-similarity.

Conclusions

PhenoNet was evaluated on two independent evaluation datasets: gene ontology and gene expression data. The result shows that PhenoNet performs better than the state-of-art methods on all evaluation tests.
  相似文献   

13.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

14.

Background

During the last few years, the knowledge of drug, disease phenotype and protein has been rapidly accumulated and more and more scientists have been drawn the attention to inferring drug-disease associations by computational method. Development of an integrated approach for systematic discovering drug-disease associations by those informational data is an important issue.

Methods

We combine three different networks of drug, genomic and disease phenotype and assign the weights to the edges from available experimental data and knowledge. Given a specific disease, we use our network propagation approach to infer the drug-disease associations.

Results

We apply prostate cancer and colorectal cancer as our test data. We use the manually curated drug-disease associations from comparative toxicogenomics database to be our benchmark. The ranked results show that our proposed method obtains higher specificity and sensitivity and clearly outperforms previous methods. Our result also show that our method with off-targets information gets higher performance than that with only primary drug targets in both test data.

Conclusions

We clearly demonstrate the feasibility and benefits of using network-based analyses of chemical, genomic and phenotype data to reveal drug-disease associations. The potential associations inferred by our method provide new perspectives for toxicogenomics and drug reposition evaluation.
  相似文献   

15.

Introduction

New platforms are emerging that enable more data providers to publish life cycle inventory data.

Background

Providing datasets that are not complete LCA models results in fragments that are difficult for practitioners to integrate and use for LCA modeling. Additionally, when proxies are used to provide a technosphere input to a process that was not originally intended by the process authors, in most LCA software, this requires modifying the original process.

Results

The use of a bridge process, which is a process created to link two existing processes, is proposed as a solution.

Discussion

Benefits to bridge processes include increasing model transparency, facilitating dataset sharing and integration without compromising original dataset integrity and independence, providing a structure with which to make the data quality associated with process linkages explicit, and increasing model flexibility in the case that multiple bridges are provided. A drawback is that they add additional processes to existing LCA models which will increase their size.

Conclusions

Bridge processes can be an enabler in allowing users to integrate new datasets without modifying them to link to background databases or other processes they have available. They may not be the ideal long-term solution but provide a solution that works within the existing LCA data model.
  相似文献   

16.
Gao S  Xu S  Fang Y  Fang J 《Proteome science》2012,10(Z1):S7

Background

Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.

Methods

A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).

Results

Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.

Conclusions

The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
  相似文献   

17.

Introduction

Mass spectrometry imaging (MSI) experiments result in complex multi-dimensional datasets, which require specialist data analysis tools.

Objectives

We have developed massPix—an R package for analysing and interpreting data from MSI of lipids in tissue.

Methods

massPix produces single ion images, performs multivariate statistics and provides putative lipid annotations based on accurate mass matching against generated lipid libraries.

Results

Classification of tissue regions with high spectral similarly can be carried out by principal components analysis (PCA) or k-means clustering.

Conclusion

massPix is an open-source tool for the analysis and statistical interpretation of MSI data, and is particularly useful for lipidomics applications.
  相似文献   

18.
Fan  Yetian  Tang  Xiwei  Hu  Xiaohua  Wu  Wei  Ping  Qing 《BMC bioinformatics》2017,18(13):470-21

Background

Essential proteins are indispensable to the survival and development process of living organisms. To understand the functional mechanisms of essential proteins, which can be applied to the analysis of disease and design of drugs, it is important to identify essential proteins from a set of proteins first. As traditional experimental methods designed to test out essential proteins are usually expensive and laborious, computational methods, which utilize biological and topological features of proteins, have attracted more attention in recent years. Protein-protein interaction networks, together with other biological data, have been explored to improve the performance of essential protein prediction.

Results

The proposed method SCP is evaluated on Saccharomyces cerevisiae datasets and compared with five other methods. The results show that our method SCP outperforms the other five methods in terms of accuracy of essential protein prediction.

Conclusions

In this paper, we propose a novel algorithm named SCP, which combines the ranking by a modified PageRank algorithm based on subcellular compartments information, with the ranking by Pearson correlation coefficient (PCC) calculated from gene expression data. Experiments show that subcellular localization information is promising in boosting essential protein prediction.
  相似文献   

19.

Background

Centrifugation is an indispensable procedure for plasma sample preparation, but applied conditions can vary between labs.

Aim

Determine whether routinely used plasma centrifugation protocols (1500×g 10 min; 3000×g 5 min) influence non-targeted metabolomic analyses.

Methods

Nuclear magnetic resonance spectroscopy (NMR) and High Resolution Mass Spectrometry (HRMS) data were evaluated with sparse partial least squares discriminant analyses and compared with cell count measurements.

Results

Besides significant differences in platelet count, we identified substantial alterations in NMR and HRMS data related to the different centrifugation protocols.

Conclusion

Already minor differences in plasma centrifugation can significantly influence metabolomic patterns and potentially bias metabolomics studies.
  相似文献   

20.

Background

Combination of different agents is widely used in clinic to combat complex diseases with improved therapy and reduced side effects. However, the identification of effective drug combinations remains a challenging task due to the huge number of possible combinations among candidate drugs that makes it impractical to screen putative combinations.

Results

In this work, we construct a 'drug cocktail network' using all the known effective drug combinations extracted from the Drug Combination Database (DCDB), and propose a network-based approach to investigate drug combinations. Our results show that the agents in an effective combination tend to have more similar therapeutic effects and share more interaction partners. Based on our observations, we further develop a statistical approach termed as DCPred (D rug C ombination Pred ictor) to predict possible drug combinations by exploiting the topological features of the drug cocktail network. Validating on the known drug combinations, DCPred achieves the overall AUC (Area Under the receiver operating characteristic Curve) score of 0.92, indicating the predictive power of our proposed approach.

Conclusions

The drug cocktail network constructed in this work provides useful insights into the underlying rules of effective drug combinations and offer important clues to accelerate the future discovery of new drug combinations.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号