首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Among the main learning methods reviewed in this study and used in synthetic biology and metabolic engineering are supervised learning, reinforcement and active learning, and in vitro or in vivo learning.In the context of biosynthesis, supervised machine learning is being exploited to predict biological sequence activities, predict structures and engineer sequences, and optimize culture conditions.Active and reinforcement learning methods use training sets acquired through an iterative process generally involving experimental measurements. They are applied to design, engineer, and optimize metabolic pathways and bioprocesses.The nascent but promising developments with in vitro and in vivo learning comprise molecular circuits performing simple tasks such as pattern recognition and classification.  相似文献   

2.
With the rapid progress in metabolomics and sequencing technologies, more data on the metabolome of single microbes and their communities become available, revealing the potential of microorganisms to metabolize a broad range of chemical compounds. The analysis of microbial metabolomics datasets remains challenging since it inherits the technical challenges of metabolomics analysis, such as compound identification and annotation, while harboring challenges in data interpretation, such as distinguishing metabolite sources in mixed samples. This review outlines the recent advances in computational methods to analyze primary microbial metabolism: knowledge-based approaches that take advantage of metabolic and molecular networks and data-driven approaches that employ machine/deep learning algorithms in combination with large-scale datasets. These methods aim at improving metabolite identification and disentangling reciprocal interactions between microbes and metabolites. We also discuss the perspective of combining these approaches and further developments required to advance the investigation of primary metabolism in mixed microbial samples.  相似文献   

3.
Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. To demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of ~80% using ~33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. The application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.  相似文献   

4.
Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.  相似文献   

5.
Traditional laboratory experiments, rehabilitation clinics, and wearable sensors offer biomechanists a wealth of data on healthy and pathological movement. To harness the power of these data and make research more efficient, modern machine learning techniques are starting to complement traditional statistical tools. This survey summarizes the current usage of machine learning methods in human movement biomechanics and highlights best practices that will enable critical evaluation of the literature. We carried out a PubMed/Medline database search for original research articles that used machine learning to study movement biomechanics in patients with musculoskeletal and neuromuscular diseases. Most studies that met our inclusion criteria focused on classifying pathological movement, predicting risk of developing a disease, estimating the effect of an intervention, or automatically recognizing activities to facilitate out-of-clinic patient monitoring. We found that research studies build and evaluate models inconsistently, which motivated our discussion of best practices. We provide recommendations for training and evaluating machine learning models and discuss the potential of several underutilized approaches, such as deep learning, to generate new knowledge about human movement. We believe that cross-training biomechanists in data science and a cultural shift toward sharing of data and tools are essential to maximize the impact of biomechanics research.  相似文献   

6.
For many years, psychiatrists have tried to understand factors involved in response to medications or psychotherapies, in order to personalize their treatment choices. There is now a broad and growing interest in the idea that we can develop models to personalize treatment decisions using new statistical approaches from the field of machine learning and applying them to larger volumes of data. In this pursuit, there has been a paradigm shift away from experimental studies to confirm or refute specific hypotheses towards a focus on the overall explanatory power of a predictive model when tested on new, unseen datasets. In this paper, we review key studies using machine learning to predict treatment outcomes in psychiatry, ranging from medications and psychotherapies to digital interventions and neurobiological treatments. Next, we focus on some new sources of data that are being used for the development of predictive models based on machine learning, such as electronic health records, smartphone and social media data, and on the potential utility of data from genetics, electrophysiology, neuroimaging and cognitive testing. Finally, we discuss how far the field has come towards implementing prediction tools in real‐world clinical practice. Relatively few retrospective studies to‐date include appropriate external validation procedures, and there are even fewer prospective studies testing the clinical feasibility and effectiveness of predictive models. Applications of machine learning in psychiatry face some of the same ethical challenges posed by these techniques in other areas of medicine or computer science, which we discuss here. In short, machine learning is a nascent but important approach to improve the effectiveness of mental health care, and several prospective clinical studies suggest that it may be working already.  相似文献   

7.
PurposeArtificial intelligence (AI) models are playing an increasing role in biomedical research and healthcare services. This review focuses on challenges points to be clarified about how to develop AI applications as clinical decision support systems in the real-world context.MethodsA narrative review has been performed including a critical assessment of articles published between 1989 and 2021 that guided challenging sections.ResultsWe first illustrate the architectural characteristics of machine learning (ML)/radiomics and deep learning (DL) approaches. For ML/radiomics, the phases of feature selection and of training, validation, and testing are described. DL models are presented as multi-layered artificial/convolutional neural networks, allowing us to directly process images. The data curation section includes technical steps such as image labelling, image annotation (with segmentation as a crucial step in radiomics), data harmonization (enabling compensation for differences in imaging protocols that typically generate noise in non-AI imaging studies) and federated learning. Thereafter, we dedicate specific sections to: sample size calculation, considering multiple testing in AI approaches; procedures for data augmentation to work with limited and unbalanced datasets; and the interpretability of AI models (the so-called black box issue). Pros and cons for choosing ML versus DL to implement AI applications to medical imaging are finally presented in a synoptic way.ConclusionsBiomedicine and healthcare systems are one of the most important fields for AI applications and medical imaging is probably the most suitable and promising domain. Clarification of specific challenging points facilitates the development of such systems and their translation to clinical practice.  相似文献   

8.
The production of recombinant therapeutic proteins from animal or human cell lines entails the risk of endogenous viral contamination from cell substrates and adventitious agents from raw materials and environment. One of the approaches to control such potential viral contamination is to ensure the manufacturing process can adequately clear the potential viral contaminants. Viral clearance for production of human monoclonal antibodies is achieved by dedicated unit operations, such as low pH inactivation, viral filtration, and chromatographic separation. The process development of each viral clearance step for a new antibody production requires significant effort and resources invested in wet laboratory experiments for process characterization studies. Machine learning methods have the potential to help streamline the development and optimization of viral clearance unit operations for new therapeutic antibodies. The current work focuses on evaluating the usefulness of machine learning methods for process understanding and predictive modeling for viral clearance via a case study on low pH viral inactivation.  相似文献   

9.
Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning—the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system.  相似文献   

10.
Accurate retention time (RT) prediction is important for spectral library-based analysis in data-independent acquisition mass spectrometry-based proteomics. The deep learning approach has demonstrated superior performance over traditional machine learning methods for this purpose. The transformer architecture is a recent development in deep learning that delivers state-of-the-art performance in many fields such as natural language processing, computer vision, and biology. We assess the performance of the transformer architecture for RT prediction using datasets from five deep learning models Prosit, DeepDIA, AutoRT, DeepPhospho, and AlphaPeptDeep. The experimental results on holdout datasets and independent datasets exhibit state-of-the-art performance of the transformer architecture. The software and evaluation datasets are publicly available for future development in the field.  相似文献   

11.
Progress in deep learning, more specifically in using convolutional neural networks (CNNs) for the creation of classification models, has been tremendous in recent years. Within bioacoustics research, there has been a large number of recent studies that use CNNs. Designing CNN architectures from scratch is non-trivial and requires knowledge of machine learning. Furthermore, hyper-parameter tuning associated with CNNs is extremely time consuming and requires expensive hardware. In this paper we assess whether it is possible to build good bioacoustic classifiers by adapting and re-using existing CNNs pre-trained on the ImageNet dataset – instead of designing them from scratch, a strategy known as transfer learning that has proved highly successful in other domains. This study is a first attempt to conduct a large-scale investigation on how transfer learning can be used for passive acoustic monitoring (PAM), to simplify the implementation of CNNs and the design decisions when creating them, and to remove time consuming hyper-parameter tuning phases. We compare 12 modern CNN architectures across 4 passive acoustic datasets that target calls of the Hainan gibbon Nomascus hainanus, the critically endangered black-and-white ruffed lemur Varecia variegata, the vulnerable Thyolo alethe Chamaetylas choloensis, and the Pin-tailed whydah Vidua macroura. We focus our work on data scarcity issues by training PAM binary classification models very small datasets, with as few as 25 verified examples. Our findings reveal that transfer learning can result in up to 82% F1 score while keeping CNN implementation details to a minimum, thus rendering this approach accessible, easier to design, and speeding up further vocalisation annotations to create PAM robust models.  相似文献   

12.
With the availability of public databases that store compound-target/compound-toxicity information, and Traditional Chinese medicine (TCM) databases, in silico approaches are used in toxicity studies of TCM herbal medicine. Here, three in silico approaches for toxicity studies were reviewed, which include machine learning, network toxicology and molecular docking. For each method, its application and implementation e.g., single classifier vs. multiple classifier, single compound vs. multiple compounds, validation vs. screening, were explored. While these methods provide data-driven toxicity prediction that is validated in vitro and/or in vivo, it is still limited to single compound analysis. In addition, these methods are limited to several types of toxicity, with hepatotoxicity being the most dominant. Future studies involving the testing of combination of compounds on the front end i.e., to generate data for in silico modeling, and back end i.e., validate findings from prediction models will advance the in silico toxicity modeling of TCM compounds.  相似文献   

13.
BackgroundPiwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete.MethodsIn this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models.ResultsWe construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets.ConclusionsCompared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.  相似文献   

14.
The recent increase in high‐throughput capacity of ‘omics datasets combined with advances and interest in machine learning (ML) have created great opportunities for systems metabolic engineering. In this regard, data‐driven modeling methods have become increasingly valuable to metabolic strain design. In this review, the nature of ‘omics is discussed and a broad introduction to the ML algorithms combining these datasets into predictive models of metabolism and metabolic rewiring is provided. Next, this review highlights recent work in the literature that utilizes such data‐driven methods to inform various metabolic engineering efforts for different classes of application including product maximization, understanding and profiling phenotypes, de novo metabolic pathway design, and creation of robust system‐scale models for biotechnology. Overall, this review aims to highlight the potential and promise of using ML algorithms with metabolic engineering and systems biology related datasets.  相似文献   

15.
The globe's population is increasing day by day, which causes the severe problem of organic food for everyone. Farmers are becoming progressively conscious of the need to control numerous essential factors such as crop health, water or fertilizer use, and harmful diseases in the field. However, it is challenging to monitor agricultural activities. Therefore, precision agriculture is an important decision support system for food production and decision-making. Several methods and approaches have been used to support precision agricultural practices. The present study performs a systematic literature review on hyperspectral imaging technology and the most advanced deep learning and machine learning algorithm used in agriculture applications to extract and synthesize the significant datasets and algorithms. We reviewed legal studies carefully, highlighted hyperspectral datasets, focused on the most methods used for hyperspectral applications in agricultural sectors, and gained insight into the critical problems and challenges in the hyperspectral data processing. According to our study, it has been found that the Hyperion hyperspectral, Landsat-8, and Sentinel 2 multispectral datasets were mainly used for agricultural applications. The most applied machine learning method was support vector machine and random forest. In addition, the deep learning-based Convolutional Neural Networks (CNN) model is mainly used for crop classification due to its high performance with hyperspectral datasets. The present review will be helpful to the new researchers working in the field of hyperspectral remote sensing for agricultural applications with a machine and deep learning methods.  相似文献   

16.
Species Distribution Modelling (SDM) determines habitat suitability of a species across geographic areas using macro-climatic variables; however, micro-habitats can buffer or exacerbate the influence of macro-climatic variables, requiring links between physiology and species persistence. Experimental approaches linking species physiology to micro-climate are complex, time consuming and expensive. E.g., what combination of exposure time and temperature is important for a species thermal tolerance is difficult to judge a priori. We tackled this problem using an active learning approach that utilized machine learning methods to guide thermal tolerance experimental design for three kissing-bug species: Triatoma infestans, Rhodnius prolixus, and Panstrongylus megistus (Hemiptera: Reduviidae: Triatominae), vectors of the parasite causing Chagas disease. As with other pathogen vectors, triatomines are well known to utilize micro-habitats and the associated shift in microclimate to enhance survival. Using a limited literature-collected dataset, our approach showed that temperature followed by exposure time were the strongest predictors of mortality; species played a minor role, and life stage was the least important. Further, we identified complex but biologically plausible nonlinear interactions between temperature and exposure time in shaping mortality, together setting the potential thermal limits of triatomines. The results from this data led to the design of new experiments with laboratory results that produced novel insights of the effects of temperature and exposure for the triatomines. These results, in turn, can be used to better model micro-climatic envelope for the species. Here we demonstrate the power of an active learning approach to explore experimental space to design laboratory studies testing species thermal limits. Our analytical pipeline can be easily adapted to other systems and we provide code to allow practitioners to perform similar analyses. Not only does our approach have the potential to save time and money: it can also increase our understanding of the links between species physiology and climate, a topic of increasing ecological importance.  相似文献   

17.
18.
BackgroundA newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe, on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspectives for treatment.MethodsWe developed an alignment-free framework that utilizes machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of human-adapted coronavirus using genomic sequences. We performed extensive experiments through six different feature transformation and machine learning algorithms combining digital signal processing to identify the lethality of possible future novel coronaviruses using existing strains.ResultsThe results tested on SARS-CoV, MERS-CoV and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our framework achieves high levels of prediction performance that is alignment-free and based on RNA sequences alone without genome annotations and specialized biological knowledge.ConclusionThe results demonstrate that, for any novel human coronavirus strains, this study can offer a reliable real-time estimation for its viral lethality.  相似文献   

19.
药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   

20.
Vegetation maps are models of the real vegetation patterns and are considered important tools in conservation and management planning. Maps created through traditional methods can be expensive and time‐consuming, thus, new more efficient approaches are needed. The prediction of vegetation patterns using machine learning shows promise, but many factors may impact on its performance. One important factor is the nature of the vegetation–environment relationship assessed and ecological redundancy. We used two datasets with known ecological redundancy levels (strength of the vegetation–environment relationship) to evaluate the performance of four machine learning (ML) classifiers (classification trees, random forests, support vector machines, and nearest neighbor). These models used climatic and soil variables as environmental predictors with pretreatment of the datasets (principal component analysis and feature selection) and involved three spatial scales. We show that the ML classifiers produced more reliable results in regions where the vegetation–environment relationship is stronger as opposed to regions characterized by redundant vegetation patterns. The pretreatment of datasets and reduction in prediction scale had a substantial influence on the predictive performance of the classifiers. The use of ML classifiers to create potential vegetation maps shows promise as a more efficient way of vegetation modeling. The difference in performance between areas with poorly versus well‐structured vegetation–environment relationships shows that some level of understanding of the ecology of the target region is required prior to their application. Even in areas with poorly structured vegetation–environment relationships, it is possible to improve classifier performance by either pretreating the dataset or reducing the spatial scale of the predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号