首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Over the last decade there has been an extensive evolution in the Artificial Intelligence (AI) field. Modern radiation oncology is based on the exploitation of advanced computational methods aiming to personalization and high diagnostic and therapeutic precision. The quantity of the available imaging data and the increased developments of Machine Learning (ML), particularly Deep Learning (DL), triggered the research on uncovering “hidden” biomarkers and quantitative features from anatomical and functional medical images. Deep Neural Networks (DNN) have achieved outstanding performance and broad implementation in image processing tasks. Lately, DNNs have been considered for radiomics and their potentials for explainable AI (XAI) may help classification and prediction in clinical practice. However, most of them are using limited datasets and lack generalized applicability. In this study we review the basics of radiomics feature extraction, DNNs in image analysis, and major interpretability methods that help enable explainable AI. Furthermore, we discuss the crucial requirement of multicenter recruitment of large datasets, increasing the biomarkers variability, so as to establish the potential clinical value of radiomics and the development of robust explainable AI models.  相似文献   

2.
Deep learning (DL) is one of the most powerful data-driven machine-learning techniques in artificial intelligence (AI). It can automatically learn from raw data without manual feature selection. DL models have led to remarkable advances in data extraction and analysis for medical imaging. Magnetic resonance imaging (MRI) has proven useful in delineating the characteristics and extent of breast lesions and tumors. This review summarizes the current state-of-the-art applications of DL models in breast MRI. Many recent DL models were examined in this field, along with several advanced learning approaches and methods for data normalization and breast and lesion segmentation. For clinical applications, DL-based breast MRI models were proven useful in five aspects: diagnosis of breast cancer, classification of molecular types, classification of histopathological types, prediction of neoadjuvant chemotherapy response, and prediction of lymph node metastasis. For subsequent studies, further improvement in data acquisition and preprocessing is necessary, additional DL techniques in breast MRI should be investigated, and wider clinical applications need to be explored.  相似文献   

3.
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized.  相似文献   

4.
PurposeArtificial intelligence (AI) models are playing an increasing role in biomedical research and healthcare services. This review focuses on challenges points to be clarified about how to develop AI applications as clinical decision support systems in the real-world context.MethodsA narrative review has been performed including a critical assessment of articles published between 1989 and 2021 that guided challenging sections.ResultsWe first illustrate the architectural characteristics of machine learning (ML)/radiomics and deep learning (DL) approaches. For ML/radiomics, the phases of feature selection and of training, validation, and testing are described. DL models are presented as multi-layered artificial/convolutional neural networks, allowing us to directly process images. The data curation section includes technical steps such as image labelling, image annotation (with segmentation as a crucial step in radiomics), data harmonization (enabling compensation for differences in imaging protocols that typically generate noise in non-AI imaging studies) and federated learning. Thereafter, we dedicate specific sections to: sample size calculation, considering multiple testing in AI approaches; procedures for data augmentation to work with limited and unbalanced datasets; and the interpretability of AI models (the so-called black box issue). Pros and cons for choosing ML versus DL to implement AI applications to medical imaging are finally presented in a synoptic way.ConclusionsBiomedicine and healthcare systems are one of the most important fields for AI applications and medical imaging is probably the most suitable and promising domain. Clarification of specific challenging points facilitates the development of such systems and their translation to clinical practice.  相似文献   

5.
《IRBM》2022,43(1):62-74
BackgroundThe prediction of breast cancer subtypes plays a key role in the diagnosis and prognosis of breast cancer. In recent years, deep learning (DL) has shown good performance in the intelligent prediction of breast cancer subtypes. However, most of the traditional DL models use single modality data, which can just extract a few features, so it cannot establish a stable relationship between patient characteristics and breast cancer subtypes.DatasetWe used the TCGA-BRCA dataset as a sample set for molecular subtype prediction of breast cancer. It is a public dataset that can be obtained through the following link: https://portal.gdc.cancer.gov/projects/TCGA-BRCAMethodsIn this paper, a Hybrid DL model based on the multimodal data is proposed. We combine the patient's gene modality data with image modality data to construct a multimodal fusion framework. According to the different forms and states, we set up feature extraction networks respectively, and then we fuse the output of the two feature networks based on the idea of weighted linear aggregation. Finally, the fused features are used to predict breast cancer subtypes. In particular, we use the principal component analysis to reduce the dimensionality of high-dimensional data of gene modality and filter the data of image modality. Besides, we also improve the traditional feature extraction network to make it show better performance.ResultsThe results show that compared with the traditional DL model, the Hybrid DL model proposed in this paper is more accurate and efficient in predicting breast cancer subtypes. Our model achieved a prediction accuracy of 88.07% in 10 times of 10-fold cross-validation. We did a separate AUC test for each subtype, and the average AUC value obtained was 0.9427. In terms of subtype prediction accuracy, our model is about 7.45% higher than the previous average.  相似文献   

6.
Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods.  相似文献   

7.
Crop health monitoring and weed removal are two crucial elements dictating efficient, productive and resilient cultivation. Due to frequent attacks by pest and pathogens, the crops become diseased resulting in degradation of the quality and quantity of the production. The process of continuous monitoring of crop health is challenging and requires the involvement of information and communication technologies (ICT). The outcome is precision agriculture where the Internet of Things (IoT) and Artificial Intelligence (AI) techniques are vital ingredients. The design of an integrated approach of precision agriculture based on IoT and AI is discussed here which is tailored for real time crop health monitoring and performs various other operations like weed detection, ambient air sensing, watering the vegetation automatically at regular intervals of time, spraying of pesticides etc. The proposed system is a combination of an IoT formed using sensors and devices, image processing and machine learning (ML)/ deep learning (DL) techniques confined to the cultivation of fifteen varieties of beans found in India. The work involves two intelligent learning models configured to capture spatio-temporal attributes of image samples and sensor inputs and for real time discrimination between healthy and diseased bean leaves, detection of weeds growing around the cultivation land and also for process control. The first approach employs a DL structure named EfficientNetB7 along with a Bidirectional Long Short Term Memory (BiLSTM) while the second method adopts a VGG16 with an integrated attention mechanism. Also experiments have been carried out using benchmark ML classifiers like Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP) and Time Delay Neural Network (TDNN) combined with feature extraction techniques. Segmentation methods have been used to separate out the diseased sections of the leaves which are then used as apriori labels for the classifiers to reinforce the previously known details of the bean varieties. Subsequently, the trained networks are tested with bean leaf samples collected from cultivation farms. Results show that our proposed DL models could accurately predict the health state of the bean leaves with less computation time. With an automated approach of bean leaf health discrimination, weed detection and process control, the cost effectiveness of the overall effort is enhanced. Further, the sensor pack also provides precise thresholds at which water sprinkling could be initiated resulting in water conservation.  相似文献   

8.
9.
Huo  Zhiguang  Zhu  Li  Ma  Tianzhou  Liu  Hongcheng  Han  Song  Liao  Daiqing  Zhao  Jinying  Tseng  George 《Statistics in biosciences》2020,12(1):1-22

Disease subtype discovery is an essential step in delivering personalized medicine. Disease subtyping via omics data has become a common approach for this purpose. With the advancement of technology and the lower price for generating omics data, multi-level and multi-cohort omics data are prevalent in the public domain, providing unprecedented opportunities to decrypt disease mechanisms. How to fully utilize multi-level/multi-cohort omics data and incorporate established biological knowledge toward disease subtyping remains a challenging problem. In this paper, we propose a meta-analytic integrative sparse Kmeans (MISKmeans) algorithm for integrating multi-cohort/multi-level omics data and prior biological knowledge. Compared with previous methods, MISKmeans shows better clustering accuracy and feature selection relevancy. An efficient R package, “MIS-Kmeans”, calling C++ is freely available on GitHub (https://github.com/Caleb-Huo/MIS-Kmeans).

  相似文献   

10.
The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a “wisdom of the crowd” approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.  相似文献   

11.
The recent increase in high‐throughput capacity of ‘omics datasets combined with advances and interest in machine learning (ML) have created great opportunities for systems metabolic engineering. In this regard, data‐driven modeling methods have become increasingly valuable to metabolic strain design. In this review, the nature of ‘omics is discussed and a broad introduction to the ML algorithms combining these datasets into predictive models of metabolism and metabolic rewiring is provided. Next, this review highlights recent work in the literature that utilizes such data‐driven methods to inform various metabolic engineering efforts for different classes of application including product maximization, understanding and profiling phenotypes, de novo metabolic pathway design, and creation of robust system‐scale models for biotechnology. Overall, this review aims to highlight the potential and promise of using ML algorithms with metabolic engineering and systems biology related datasets.  相似文献   

12.
13.
Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.  相似文献   

14.

Background

Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data.

Methods

We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting.

Results

We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods.

Conclusions

Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
  相似文献   

15.
微生物油脂是未来燃料和食品用油的重要潜在资源。近年来,随着系统生物学技术的快速发展,从全局角度理解产油微生物生理代谢及脂质积累的特征成为研究热点。组学技术作为系统生物学研究的重要工具被广泛用于揭示产油微生物脂质高效生产的机制研究中,这为产油微生物理性遗传改造和发酵过程控制提供了基础。文中对组学技术在产油微生物中的应用概况进行了综述,介绍了产油微生物组学分析常用的样品前处理及数据分析方法,综述了包括基因组、转录组、蛋白(修饰)组及代谢(脂质)组等在内的多种组学技术,以及组学数据基础上的数学模型在揭示产油微生物脂质高效生产机制中的研究,并对未来发展和应用进行了展望。  相似文献   

16.
Genetic epidemiology is a rapidly advancing field due to the recent availability of large amounts of omics data. In recent years, it has become possible to obtain omics information at the single-cell level, so genetic epidemiological models need to be updated to integrate with single-cell expression data. In this perspective paper, we propose a cell population-based framework for genetic epidemiology in the single-cell era. In this framework, genetic diversity influences phenotypic diversity through the diversity of cell population profiles, which are defined as high-dimensional probability distributions of the state spaces of biomolecules of each omics layer. We discuss how biomolecular experimental measurement data can capture the different properties of this distribution. In particular, single-cell data constitute a sample from this population distribution where only some coordinate values are observable. From a data analysis standpoint, we introduce methodology for feature extraction from cell population profiles. Finally, we discuss how this framework can be applied not only to genetic epidemiology but also to systems biology.  相似文献   

17.
Advancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, many databases have amassed information about pathways and gene “signatures”—patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets. However, few such integrative approaches exist that also provide interpretable results quantifying the importance of individual genes and pathways to model accuracy. We introduce AKLIMATE, a first kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks. AKLIMATE uses a novel multiple-kernel learning framework where individual kernels capture the prediction propensities recorded in random forests, each built from a specific pathway gene set that integrates all omics data for its member genes. AKLIMATE has comparable or improved performance relative to state-of-the-art methods on diverse phenotype learning tasks, including predicting microsatellite instability in endometrial and colorectal cancer, survival in breast cancer, and cell line response to gene knockdowns. We show how AKLIMATE is able to connect feature data across data platforms through their common pathways to identify examples of several known and novel contributors of cancer and synthetic lethality.  相似文献   

18.
In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens.  相似文献   

19.
Over the last few years, Deep learning (DL) approaches have been shown to outperform state-of-the-art machine learning (ML) techniques in many applications such as vegetation forecasting, sales forecast, weather conditions, crop yield prediction, landslides detection and even COVID-19 spread predictions. Several DL algorithms have been employed to facilitate vegetation forecasting research using Remotely Sensed (RS) data. Vegetation is an extremely important component of our global ecosystem and a necessary indicator of land cover dynamics and productivity. Vegetation phenology is influenced by lifecycle patterns, seasonality and weather conditions, leading to changes in their spectral reflectance. Various relevant information, such as vegetation indices (VIs), can be extracted from RS data for vegetation forecasting. Therefore, the Normalized Difference Vegetation Index (NDVI) is known as one of the most widely recognized indices for vegetation related studies. This paper reviews the related works on DL-based spatio-temporal vegetation forecasting using RS data over the period between 2015 and 2021. In this review, we present several DL-based studies and discuss DL algorithms and various sources of data that have been used in these studies. The purpose of this work is to highlight the open challenges such as spatio-temporal prediction issues, spatial and temporal non-stationarity, fusion data, hybrid approaches, deep transfer learning and large parameter requirements. We also attempt to figure out the future directions and limits of DL for vegetation forecasting.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号