首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Machine learning methods without tears: a primer for ecologists   总被引:1,自引:0,他引:1  
Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.  相似文献   

2.
The discovery of novel bioactive molecules advances our systems‐level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold‐hopping compounds. Through a machine‐learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G‐protein‐coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand‐screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.  相似文献   

3.
Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.  相似文献   

4.
Extremozymes   总被引:10,自引:0,他引:10  
Extremozymes offer new opportunities for biocatalysis and biotransformations as a result of their extreme stability. From recent work, major approaches to extending the range of applications of extremozymes have emerged. Both the discovery of new extremophilic species and the determination of genome sequences provide a route to new enzymes, with the possibility that these will lead to novel applications. Of equal importance, protein engineering and directed evolution provide approaches to improve enzyme stability and modify specificity in ways that may not exist in the natural world.  相似文献   

5.
Improving predictions of restoration outcomes is increasingly important to resource managers for accountability and adaptive management, yet there is limited guidance for selecting a predictive model from the multitude available. The goal of this article was to identify an optimal predictive framework for restoration ecology using 11 modeling frameworks (including machine learning, inferential, and ensemble approaches) and three data groups (field data, geographic data [GIS], and a combination thereof). We test this approach with a dataset from a large postfire sagebrush reestablishment project in the Great Basin, U.S.A. Predictive power varied among models and data groups, ranging from 58% to 79% accuracy. Finer‐scale field data generally had the greatest predictive power, although GIS data were present in the best models overall. An ensemble prediction computed from the 10 models parameterized to field data was well above average for accuracy but was outperformed by others that prioritized model parsimony by selecting predictor variables based on rankings of their importance among all candidate models. The variation in predictive power among a suite of modeling frameworks underscores the importance of a model comparison and refinement approach that evaluates multiple models and data groups, and selects variables based on their contribution to predictive power. The enhanced understanding of factors influencing restoration outcomes accomplished by this framework has the potential to aid the adaptive management process for improving future restoration outcomes.  相似文献   

6.
Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. To demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of ~80% using ~33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. The application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.  相似文献   

7.
Chymotrypsin family serine proteases play essential roles in key biological and pathological processes and are frequently targets of drug discovery efforts. This large enzyme family is also among the most advanced model systems for detailed studies of enzyme mechanism and structure/function relationships. Productive interactions between these enzymes and their substrates are widely believed to mimic the "canonical" interactions between serine proteases and "standard" inhibitors observed in numerous protease-inhibitor complexes. To test this central hypothesis we have synthesized and characterized a series of peptide analogs, based on model substrates and inhibitors of trypsin, that contain unnatural main chains. These results call into question a long accepted theory regarding the interaction of chymotrypsin family serine proteases with substrates and suggest that the canonical interactions observed between these enzymes and standard inhibitors may represent nonproductive rather than productive, substrate-like interactions.  相似文献   

8.
Cytochrome P450 2C9 (CYP2C9) is a major drug-metabolizing enzyme that represents 20% of the hepatic CYPs and is responsible for the metabolism of 15% of drugs. A general concern in drug discovery is to avoid the inhibition of CYP leading to toxic drug accumulation and adverse drug–drug interactions. However, the prediction of CYP inhibition remains challenging due to its complexity. We developed an original machine learning approach for the prediction of drug-like molecules inhibiting CYP2C9. We created new predictive models by integrating CYP2C9 protein structure and dynamics knowledge, an original selection of physicochemical properties of CYP2C9 inhibitors, and machine learning modeling. We tested the machine learning models on publicly available data and demonstrated that our models successfully predicted CYP2C9 inhibitors with an accuracy, sensitivity and specificity of approximately 80%. We experimentally validated the developed approach and provided the first identification of the drugs vatalanib, piriqualone, ticagrelor and cloperidone as strong inhibitors of CYP2C9 with IC values <18 μM and sertindole, asapiprant, duvelisib and dasatinib as moderate inhibitors with IC50 values between 40 and 85 μM. Vatalanib was identified as the strongest inhibitor with an IC50 value of 0.067 μM. Metabolism assays allowed the characterization of specific metabolites of abemaciclib, cloperidone, vatalanib and tarafenacin produced by CYP2C9. The obtained results demonstrate that such a strategy could improve the prediction of drug-drug interactions in clinical practice and could be utilized to prioritize drug candidates in drug discovery pipelines.  相似文献   

9.
自然界最有效的分子是由酶催化的反应所产生,并对这些产物进行自然选择,使其具有优化的生理活性,组合生物催化(Combinatorial Biocatalysis)利用酶反应的多样性,完成有机库(Organic Library)的反复合成,这些反复的反应,可以用分离的酶或全细胞,在天然或非天然的环境中、在溶液或固相中与底物进行反应。组合生物催化是组合方法的在药物发现和发展中产生和优化先导化合物(LeadCompound)的一个有力补充。  相似文献   

10.
Extremophlic microorganisms have developed a variety of molecular strategies in order to survive in harsh conditions. For the utilization of natural polymeric substrates such as starch, a number of extremophiles, belonging to different taxonomic groups, produce amylolytic enzymes. This class of enzyme is important not only for the study of biocatalysis and protein stability at extreme conditions but also for the many biotechnological opportunities they offer. In this review, we report on the different molecular properties of thermostable archaeal and bacterial enzymes including alpha-amylase, alpha-glucosidase, glucoamylase, pullulanase, and cyclodextrin glycosyltransferase. Comparison of the primary sequence of the pyrococcal pullulanase with other members of the glucosyl hydrolase family revealed that significant differences are responsible for the mode of action of these enzymes.  相似文献   

11.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.  相似文献   

12.
Novel approaches for discovering industrial enzymes.   总被引:8,自引:0,他引:8  
New technologies for enzyme discovery are changing the rules of the game for industrial biocatalysis. More kinds of enzymes are available, their hardiness is increasing, and their costs are coming down. These changes are the key drivers for a rebirth of interest in industrial applications of enzymes. The major enabling discovery approaches include screening of biodiversity, genomic sequencing, directed evolution and phage display.  相似文献   

13.
14.
We explore humans’ rule-based category learning using analytic approaches that highlight their psychological transitions during learning. These approaches confirm that humans show qualitatively sudden psychological transitions during rule learning. These transitions contribute to the theoretical literature contrasting single vs. multiple category-learning systems, because they seem to reveal a distinctive learning process of explicit rule discovery. A complete psychology of categorization must describe this learning process, too. Yet extensive formal-modeling analyses confirm that a wide range of current (gradient-descent) models cannot reproduce these transitions, including influential rule-based models (e.g., COVIS) and exemplar models (e.g., ALCOVE). It is an important theoretical conclusion that existing models cannot explain humans’ rule-based category learning. The problem these models have is the incremental algorithm by which learning is simulated. Humans descend no gradient in rule-based tasks. Very different formal-modeling systems will be required to explain humans’ psychology in these tasks. An important next step will be to build a new generation of models that can do so.  相似文献   

15.
Histone deacetylases (HDACs) are important class of enzymes that deacetylate the ε-amino group of the lysine residues in the histone tails to form a closed chromatin configuration resulting in the regulation of gene expression. Inhibition of these HDACs enzymes have been identified as one of the promising approaches for cancer treatment. The type-specific inhibition of class I HDAC enzymes is known to elicit improved therapeutic effects and thus, the search for promising type-specific HDAC inhibitors compounds remains an ongoing research interest in cancer drug discovery. Several different strategies are employed to identify the features that could identify the isoform specificity factors in these HDAC enzymes. This study combines the insilico docking and energy-optimized pharmacophore (e-pharmacophore) mapping of several known HDACi's to identify the structural variants that are significant for the interactions against each of the four class I HDAC enzymes. Our hybrid approach shows that all the inhibitors with at least one aromatic ring in their linker regions hold higher affinities against the target enzymes, while those without any aromatic rings remain as poor binders. We hypothesize the e-pharmacophore models for the HDACi's against all the four Class I HDAC enzymes which are not reported elsewhere. The results from this work will be useful in the rational design and virtual screening of more isoform specific HDACi's against the class I HDAC family of proteins.  相似文献   

16.
随着石油等不可再生资源的日益减少以及环境污染问题的日益严重,应用工业生物催化技术改造或取代传统化工工艺已经成为新世纪化学工业可持续发展的研究热点。工业生物催化技术的研究对象是生物催化剂及其催化过程。近来,利用生物信息学技术进行工业生物催化研究已经越来越受到人们的重视。随着工业生物催化的发展,生物信息学将直接指导并加快新型高效生物催化剂的发现及功能改造进程。  相似文献   

17.
Research into how and what families learn in science museums and other informal science learning settings suggests that parent-child interactions play an important role in shaping children’s learning experiences. Our exploratory case study set out to discover and analyse learning happening within family groups during a visit to a traditional museum natural history gallery. Research methods were influenced by a growing body of literature that looks for learning in family visitor talk. Conversations of 18 families were recorded as they explored a gallery after being introduced to six learning games which fostered a ‘climate of inquiry’ and which were designed to spark family dialogue. Our findings indicate that families adopt a range of interactional approaches for building meaning together in a museum gallery. These approaches fell along a spectrum that varied according to the level of co-investigation and co-operation between group members. We suggest that family learning could be supported in informal learning contexts through simple, low-cost learning strategies that encourage dialogue and co-investigatory behaviours.  相似文献   

18.
Haloalkane dehalogenases (HLDs) have recently been discovered in a number of bacteria, including symbionts and pathogens of both plants and humans. However, the biological roles of HLDs in these organisms are unclear. The development of efficient HLD inhibitors serving as molecular probes to explore their function would represent an important step toward a better understanding of these interesting enzymes. Here we report the identification of inhibitors for this enzyme family using two different approaches. The first builds on the structures of the enzymes'' known substrates and led to the discovery of less potent nonspecific HLD inhibitors. The second approach involved the virtual screening of 150,000 potential inhibitors against the crystal structure of an HLD from the human pathogen Mycobacterium tuberculosis H37Rv. The best inhibitor exhibited high specificity for the target structure, with an inhibition constant of 3 μM and a molecular architecture that clearly differs from those of all known HLD substrates. The new inhibitors will be used to study the natural functions of HLDs in bacteria, to probe their mechanisms, and to achieve their stabilization.  相似文献   

19.
Power management in large-scale computational environments can significantly benefit from predictive models. Such models provide information about the power consumption behavior of workloads prior to running them. Power consumption depends on the characteristics of both the machine and the workload. However, combinational features such as the cache miss rate cannot be considered due to their unavailability before running the workload. Therefore, pre-execution power modeling requires both machine-independent workload characteristics and workload-independent machine characteristics. In this paper the predictive modeling problem is tackled by the proposal of a two-stage modeling framework. In the first stage, a machine learning approach is taken to predict single-threaded workload power consumption at a specific frequency. The second stage analytically scales this output to any intended thread/frequency configuration. Experimental results show that the proposed approach can yield highly accurate predictions about workload power consumption with an average error of 3.7 % on six different test platforms.  相似文献   

20.
Crystallization of proteins is a nontrivial task, and despite the substantial efforts in robotic automation, crystallization screening is still largely based on trial-and-error sampling of a limited subset of suitable reagents and experimental parameters. Funding of high throughput crystallography pilot projects through the NIH Protein Structure Initiative provides the opportunity to collect crystallization data in a comprehensive and statistically valid form. Data mining and machine learning algorithms thus have the potential to deliver predictive models for protein crystallization. However, the underlying complex physical reality of crystallization, combined with a generally ill-defined and sparsely populated sampling space, and inconsistent scoring and annotation make the development of predictive models non-trivial. We discuss the conceptual problems, and review strengths and limitations of current approaches towards crystallization prediction, emphasizing the importance of comprehensive and valid sampling protocols. In view of limited overlap in techniques and sampling parameters between the publicly funded high throughput crystallography initiatives, exchange of information and standardization should be encouraged, aiming to effectively integrate data mining and machine learning efforts into a comprehensive predictive framework for protein crystallization. Similar experimental design and knowledge discovery strategies should be applied to valid analysis and prediction of protein expression, solubilization, and purification, as well as crystal handling and cryo-protection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号