首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Application of learning techniques to splicing site recognition   总被引:2,自引:0,他引:2  
J Quinqueton  J Moreau 《Biochimie》1985,67(5):541-547
Most genes of eukaryotic genomes are disrupted by introns. The application of a learning technique which uses both statistic and syntactic analysis lead to the establishment of logical rules enabling the recognition of intron/exon junctions between uncoding and coding sequences. The rules were tested on rat actin gene sequences containing some or all of the introns and 50 exon nucleotides on either side of the intron. The results show good recognition of the excision site. This recognition is more ambiguous when the sequence is short; for the acceptor sequence it presents a good selection. The learning achieved with both the donor and acceptor sequence does not lead to recognition. This result indicates that it is not the relationship between donor and acceptor sites in the same intron which determines sequence selection or the splicing mechanism.  相似文献   

2.
蛋白质是有机生命体内不可或缺的化合物,在生命活动中发挥着多种重要作用,了解蛋白质的功能有助于医学和药物研发等领域的研究。此外,酶在绿色合成中的应用一直备受人们关注,但是由于酶的种类和功能多种多样,获取特定功能酶的成本高昂,限制了其进一步的应用。目前,蛋白质的具体功能主要通过实验表征确定,该方法实验工作繁琐且耗时耗力,同时,随着生物信息学和测序技术的高速发展,已测序得到的蛋白质序列数量远大于功能获得注释的序列数量,高效预测蛋白质功能变得至关重要。随着计算机技术的蓬勃发展,由数据驱动的机器学习方法已成为应对这些挑战的有效解决方案。本文对蛋白质功能及其注释方法以及机器学习的发展历程和操作流程进行了概述,聚焦于机器学习在酶功能预测领域的应用,对未来人工智能辅助蛋白质功能高效研究的发展方向提出了展望。  相似文献   

3.
Search for promoter sites of prokaryotic DNA using learning techniques   总被引:1,自引:0,他引:1  
J Sallantin  J Haiech  F Rodier 《Biochimie》1985,67(5):549-553
Using learning techniques previously described in this journal, we have built an expert system able to point to the start DNA point of a sequence and therefore to recognize a promoter. However, to build this system, we have focused on the TATA box and its environment. We have used this expert system to look for new promoters and also to construct new promoters. The results obtained are discussed.  相似文献   

4.
赵学彤  杨亚东  渠鸿竹  方向东 《遗传》2018,40(9):693-703
随着组学技术的不断发展,对于不同层次和类型的生物数据的获取方法日益成熟。在疾病诊治过程中会产生大量数据,通过机器学习等人工智能方法解析复杂、多维、多尺度的疾病大数据,构建临床决策支持工具,辅助医生寻找快速且有效的疾病诊疗方案是非常必要的。在此过程中,机器学习等人工智能方法的选择显得尤为重要。基于此,本文首先从类型和算法角度对临床决策支持领域中常用的机器学习等方法进行简要综述,分别介绍了支持向量机、逻辑回归、聚类算法、Bagging、随机森林和深度学习,对机器学习等方法在临床决策支持中的应用做了相应总结和分类,并对它们的优势和不足分别进行讨论和阐述,为临床决策支持中机器学习等人工智能方法的选择提供有效参考。  相似文献   

5.
6.
J Haiech  J Sallantin 《Biochimie》1985,67(5):555-560
Using a learning set of 28 sequences able to bind calcium (each sequence is 12 residues long), we have built two filters by learning on this set. The first filter uses a pattern-matching technique and the second one takes into account the environment of amino-acids. These two filters have been used to find new calcium-binding proteins in a data bank. The results are discussed.  相似文献   

7.
Optical coherence tomography angiography (OCTA) offers a noninvasive label-free solution for imaging retinal vasculatures at the capillary level resolution. In principle, improved resolution implies a better chance to reveal subtle microvascular distortions associated with eye diseases that are asymptomatic in early stages. However, massive screening requires experienced clinicians to manually examine retinal images, which may result in human error and hinder objective screening. Recently, quantitative OCTA features have been developed to standardize and document retinal vascular changes. The feasibility of using quantitative OCTA features for machine learning classification of different retinopathies has been demonstrated. Deep learning-based applications have also been explored for automatic OCTA image analysis and disease classification. In this article, we summarize recent developments of quantitative OCTA features, machine learning image analysis, and classification.  相似文献   

8.
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.  相似文献   

9.
To predict rice blast, many machine learning methods have been proposed. As the quality and quantity of input data are essential for machine learning techniques, this study develops three artificial neural network (ANN)-based rice blast prediction models by combining two ANN models, the feed-forward neural network (FFNN) and long short-term memory (LSTM), with diverse input datasets, and compares their performance. The Blast_Weather_FFNN model had the highest recall score (66.3%) for rice blast prediction. This model requires two types of input data: blast occurrence data for the last 3 years and weather data (daily maximum temperature, relative humidity, and precipitation) between January and July of the prediction year. This study showed that the performance of an ANN-based disease prediction model was improved by applying suitable machine learning techniques together with the optimization of hyperparameter tuning involving input data. Moreover, we highlight the importance of the systematic collection of long-term disease data.  相似文献   

10.
In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data‐generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.  相似文献   

11.
随着世界人口的不断增长、食物需求量的不断增加,以及气候的不断变化,如何提高农作物产量已成为人类面临的一个巨大挑战。传统设计育种耗时长、效率低,已经不能满足新时代的育种需求。随着基因型和表型数据成本的不断降低,以及各种组学数据的爆炸式增长,人工智能技术作为能够在大数据中高效率挖掘信息的工具,在生物学领域受到了广泛关注。人工智能指导的设计育种将大大加快育种的效率,给育种带来革命性的变化。介绍了人工智能特别是深度学习在作物基因组学和遗传改良中的应用,并进行了总结与展望,以期为智能设计育种提供新的思路。  相似文献   

12.
  1. Download : Download high-res image (142KB)
  2. Download : Download full-size image
  相似文献   

13.
Developments in biotechnology are increasingly dependent on the extensive use of big data, generated by modern high‐throughput instrumentation technologies, and stored in thousands of databases, public and private. Future developments in this area depend, critically, on the ability of biotechnology researchers to master the skills required to effectively integrate their own contributions with the large amounts of information available in these databases. This article offers a perspective of the relations that exist between the fields of big data and biotechnology, including the related technologies of artificial intelligence and machine learning and describes how data integration, data exploitation, and process optimization correspond to three essential steps in any future biotechnology project. The article also lists a number of application areas where the ability to use big data will become a key factor, including drug discovery, drug recycling, drug safety, functional and structural genomics, proteomics, pharmacogenetics, and pharmacogenomics, among others.  相似文献   

14.
Auscultation plays an important role in the clinic, and the research community has been exploring machine learning (ML) to enable remote and automatic auscultation for respiratory condition screening via sounds. To give the big picture of what is going on in this field, in this narrative review, we describe publicly available audio databases that can be used for experiments, illustrate the developed ML methods proposed to date, and flag some under-considered issues which still need attention. Compared to existing surveys on the topic, we cover the latest literature, especially those audio-based COVID-19 detection studies which have gained extensive attention in the last two years. This work can help to facilitate the application of artificial intelligence in the respiratory auscultation field.  相似文献   

15.
Attaining personalized healthy aging requires accurate monitoring of physiological changes and identifying subclinical markers that predict accelerated or delayed aging. Classic biostatistical methods most rely on supervised variables to estimate physiological aging and do not capture the full complexity of inter-parameter interactions. Machine learning (ML) is promising, but its black box nature eludes direct understanding, substantially limiting physician confidence and clinical usage. Using a broad population dataset from the National Health and Nutrition Examination Survey (NHANES) study including routine biological variables and after selection of XGBoost as the most appropriate algorithm, we created an innovative explainable ML framework to determine a Personalized physiological age (PPA). PPA predicted both chronic disease and mortality independently of chronological age. Twenty-six variables were sufficient to predict PPA. Using SHapley Additive exPlanations (SHAP), we implemented a precise quantitative associated metric for each variable explaining physiological (i.e., accelerated or delayed) deviations from age-specific normative data. Among the variables, glycated hemoglobin (HbA1c) displays a major relative weight in the estimation of PPA. Finally, clustering profiles of identical contextualized explanations reveal different aging trajectories opening opportunities to specific clinical follow-up. These data show that PPA is a robust, quantitative and explainable ML-based metric that monitors personalized health status. Our approach also provides a complete framework applicable to different datasets or variables, allowing precision physiological age estimation.  相似文献   

16.
17.
F Rodier  J Sallantin 《Biochimie》1985,67(5):533-539
Learning processes are applied to the recognition of protein coding regions in prokaryotes. Non-contradictory, statistical and logical rules are deduced from a set of known examples of coding sequences. These rules enable to build characteristic patterns on the m-RNA upstream of the initiating codon. These rules are applied with success to recognize more than 180 coding sequences and to detect and/or eliminate hypothetical reading frames or unknown genes.  相似文献   

18.
《Cell》2022,185(21):4008-4022.e14
  1. Download : Download high-res image (269KB)
  2. Download : Download full-size image
  相似文献   

19.
Camera traps often produce massive images, and empty images that do not contain animals are usually overwhelming. Deep learning is a machine‐learning algorithm and widely used to identify empty camera trap images automatically. Existing methods with high accuracy are based on millions of training samples (images) and require a lot of time and personnel costs to label the training samples manually. Reducing the number of training samples can save the cost of manually labeling images. However, the deep learning models based on a small dataset produce a large omission error of animal images that many animal images tend to be identified as empty images, which may lead to loss of the opportunities of discovering and observing species. Therefore, it is still a challenge to build the DCNN model with small errors on a small dataset. Using deep convolutional neural networks and a small‐size dataset, we proposed an ensemble learning approach based on conservative strategies to identify and remove empty images automatically. Furthermore, we proposed three automatic identifying schemes of empty images for users who accept different omission errors of animal images. Our experimental results showed that these three schemes automatically identified and removed 50.78%, 58.48%, and 77.51% of the empty images in the dataset when the omission errors were 0.70%, 1.13%, and 2.54%, respectively. The analysis showed that using our scheme to automatically identify empty images did not omit species information. It only slightly changed the frequency of species occurrence. When only a small dataset was available, our approach provided an alternative to users to automatically identify and remove empty images, which can significantly reduce the time and personnel costs required to manually remove empty images. The cost savings were comparable to the percentage of empty images removed by models.  相似文献   

20.
M Nanard  J Nanard 《Biochimie》1985,67(5):429-432
Learning methods developed by artificial intelligence research teams are very efficient for biological sequences analysis but they need running on large computers accessed by terminals. These computers are interfaced with standard displays involving long and unpleasant alphanumerical data handling. The "biological work station" is a personal computer with a color graphic screen providing a user-friendly interface for the artificial intelligence learning programs running on large computers. It provides to biologist a graphical convenient tool for sequence analysis built with efficient man-machine communication methods such as multiwindows, icons and mouse selection. It allows the biologist to edit and display sequences in an efficient and natural way, showing off directly on color pictures the data and the results of learning programs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号