共查询到20条相似文献,搜索用时 15 毫秒
1.
Genetic relationships and population structure of 8 horse breeds in the Czech and Slovak Republics were investigated using
classification methods for breed discrimination. To demonstrate genetic differences among these breeds, we used genetic information
— genotype data of microsatellite markers and classification algorithms — to perform a probabilistic prediction of an individual’s
breed. In total, 932 unrelated animals were genotyped for 17 microsatellite markers recommended by the ISAG for parentage
testing (AHT4, AHT5, ASB2, HMS3, HMS6, HMS7, HTG4, HTG10, VHL20, HTG6, HMS2, HTG7, ASB17, ASB23, CA425, HMS1, LEX3). Algorithms
of classification methods — J48 (decision trees); Naive Bayes, Bayes Net (probability predictors); IB1, IB5 (instance-based
machine learning methods); and JRip (decision rules) — were used for analysis of their classification performance and of results
of classification on this genotype dataset. Selected classification methods (Naive Bayes, Bayes Net, IB1), based on machine
learning and principles of artificial intelligence, appear usable for these tasks. 相似文献
2.
MOTIVATION: With the increase in submission of sequences to public databases, the curators of these are not able to cope with the amount of information. The motivation of this work is to generate a system for automated annotation of data we are particularly interested in, namely proteins related to the Mycoplasmataceae family. Following previous works on automatic annotation using symbolic machine learning techniques, the present work proposes a method of automatic annotation of keywords (a part of the SWISS-PROT annotation procedure), and the validation, by an expert, of the annotation rules generated. The aim of this procedure is twofold: to complete the annotation of keywords of those proteins which is far from adequate, and to produce a prototype of the validation environment, which is aimed at an expert who does not have a deep knowledge of the structure of the current databases containing the necessary information s/he needs. RESULTS: As for the first objective, a rate of correct keywords annotation of 60% is reported in the literature. Our preliminary results show that with a slightly different method, applied this method to data related to Mycoplasmataceae only, we are able to increase that rate of correct annotation. 相似文献
3.
Automatic particle selection from electron micrographs using machine learning techniques 总被引:1,自引:0,他引:1
C.O.S. Sorzano E. Recarte M. Alcorlo J.R. Bilbao-Castro C. San-Martín R. Marabini J.M. Carazo 《Journal of structural biology》2009,167(3):252-260
The 3D reconstruction of biological specimens using Electron Microscopy is currently capable of achieving subnanometer resolution. Unfortunately, this goal requires gathering tens of thousands of projection images that are frequently selected manually from micrographs. In this paper we introduce a new automatic particle selection that learns from the user which particles are of interest. The training phase is semi-supervised so that the user can correct the algorithm during picking and specifically identify incorrectly picked particles. By treating such errors specially, the algorithm attempts to minimize the number of false positives. We show that our algorithm is able to produce datasets with fewer wrongly selected particles than previously reported methods. Another advantage is that we avoid the need for an initial reference volume from which to generate picking projections by instead learning which particles to pick from the user. This package has been made publicly available in the open-source package Xmipp. 相似文献
4.
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy. 相似文献
5.
6.
In drug delivery, there is often a trade-off between effective killing of the pathogen, and harmful side effects associated with the treatment. Due to the difficulty in testing every dosing scenario experimentally, a computational approach will be helpful to assist with the prediction of effective drug delivery methods. In this paper, we have developed a data-driven predictive system, using machine learning techniques, to determine, in silico, the effectiveness of drug dosing. The system framework is scalable, autonomous, robust, and has the ability to predict the effectiveness of the current drug treatment and the subsequent drug-pathogen dynamics. The system consists of a dynamic model incorporating both the drug concentration and pathogen population into distinct states. These states are then analyzed using a temporal model to describe the drug-cell interactions over time. The dynamic drug-cell interactions are learned in an adaptive fashion and used to make sequential predictions on the effectiveness of the dosing strategy. Incorporated into the system is the ability to adjust the sensitivity and specificity of the learned models based on a threshold level determined by the operator for the specific application. As a proof-of-concept, the system was validated experimentally using the pathogen Giardia lamblia and the drug metronidazole in vitro. 相似文献
7.
This article offers a novel sequence-based approach to discriminate outer membrane proteins (OMPs). The first step is to use a new representation approach, factor analysis scales of generalized amino acid information (FASGAI) representing hydrophobicity, alpha and turn propensities, bulky properties, compositional characteristics, local flexibility and electronic properties, etc., to characterize sequences of OMPs and non-OMPs. The subsequent data is then transformed into a uniform matrix by the auto cross covariance (ACC). The second step is to develop discrimination predictors of OMPs from non-OMPs using a support vector machine (SVM). The SVM predictors thus successfully produce a high Matthews correlation coefficient (MCC) of 0.916 on 208 OMPs from non-OMPs including 206 α-helical membrane proteins and 673 globular proteins by a fivefold cross validation test. Meanwhile, overall MCC values of 0.923 and 0.930 are obtained for the discrimination OMPs from the α-helical membrane proteins and the globular proteins, respectively. The results demonstrate that the FASGAI-ACC-SVM combination approach shows great prospect of application in the field of bioinformatics or proteomics studies. 相似文献
8.
9.
Perales Gmez ngel Luis Lpez-de-Teruel Pedro E. Ruiz Alberto Garca-Mateos Gins Bernab Garca Gregorio Garca Clemente Flix J. 《Cluster computing》2022,25(3):2163-2178
Cluster Computing - The race for automation has reached farms and agricultural fields. Many of these facilities use the Internet of Things technologies to automate processes and increase... 相似文献
10.
11.
Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences. 相似文献
12.
Huang-Wen Chen Sunayan Bandyopadhyay Dennis E Shasha Kenneth D Birnbaum 《BMC evolutionary biology》2010,10(1):357
Background
Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. 相似文献13.
DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones. 相似文献
14.
We report our finding of linear clustering of signal sequences at the N-terminus of M.tb membrane proteins, directing membrane localization. Although it is widely accepted that membrane proteins have signal peptides at the N-terminus, statistical ensemble analysis of Support Vector Machine prediction results indicate that M.tb membrane proteins have embedded N-terminal sequence patterns beyond the signal peptides previously identified in E. coli. The additional patterns at the N-terminus of M.tb membrane proteins may have correlations to their unique enzymatic functions and unusual characteristics such as membrane interaction in pathogenes. 相似文献
15.
Shriyaa Mittal 《Molecular simulation》2018,44(11):891-904
AbstractMolecular dynamics (MD) simulations are critical to understanding the movements of proteins in time. Yet, MD simulations are limited due to the availability of high-resolution protein structures, accuracy of the underlying force-field, computational expense, and difficulty in analysing big data-sets. Machine learning algorithms are now routinely used to circumvent many of these limitations and computational biophysicists are continuously making progress in developing novel applications. Here, we discuss some of these methods, varying from traditional dimensionality reduction approaches to more recent abstractions such as transfer learning and reinforcement learning, and how they have been used to deal with the challenges in MD. We conclude with the prospective issues in the application of machine learning methods in MD, to increase accuracy and efficiency of protein dynamics studies in general. 相似文献
16.
Guido van Mierlo Jurriaan R.G. Jansen Jie Wang Ina Poser Simon J. van Heeringen Michiel Vermeulen 《Cell reports》2021,34(5):108705
- Download : Download high-res image (122KB)
- Download : Download full-size image
17.
Ludis Morales Orlando Acevedo María Martínez Dmitry Gokhman Carlos Corredor 《Central European Journal of Biology》2009,4(1):41-49
One of the most important goals in structural biology is the identification of functional relationships among the structure of proteins and peptides. The purpose of this study was to (1) generate a model based on theoretical and computational considerations among amino acid sequences within select neurotoxin peptides, and (2) compare the relationship these values have to the various toxins tested. We employed isolated neurotoxins from sea anemones with established specific potential to act on voltage-dependent sodium and potassium channel activity as our model. Values were assigned to each amino acid in the peptide sequence of the neurotoxins tested using the Number of Lareo and Acevedo algorithm (NULA). Once the NULA number was obtained, it was then plotted using three dimensional space coordinates. The results of this study allow us to report, for the first time, that there is a different numerical and functional relationship between the sequences of amino acids from sea anemone neurotoxins, and the resulting numerical relationship for each peptide, or NULA number, has a unique location in three-dimensional space. 相似文献
18.
A new strategy for the functional reconstitution of membrane proteins is described. This approach introduces a new class of protein stabilizing agents--osmolytes--whose presence at high concentration (10-20%) during detergent solubilization prevents the inactivations that normally occur when proteins are extracted from natural membranes. Osmolytes that act in this way include compounds such as glycerol and higher polyols (erythritol, xylitol, sorbitol), sugars (glucose, trehalose), and certain amino acids (glycine, proline, betaine). The beneficial effects of osmolytes are documented by reconstitution of a variety of prokaryote and eukaryote membrane proteins, including several proton- and calcium-motive ATPases, cation- and anion-linked solute carriers (symport and antiport), and a membrane-bound hydrolase from endoplasmic reticulum. In all cases, the presence of 20% glycerol or other osmolyte during detergent solubilization led to 10-fold or more increased specific activity in proteoliposomes. These positive effects did not depend on use of any specific detergent for protein solubilization, nor on any particular method of reconstitution, but for convenience most of the work reported here has used octylglucoside as the solubilizing agent, followed by detergent-dilution to form proteoliposomes. The overall approach outlined by these experiments is simple and flexible. It is now feasible to use reconstitution as an analytical tool to study the biochemical and physiological properties of membrane proteins. 相似文献
19.
Membrane transporters set the framework organising the complexity of plant metabolism in cells, tissues and organisms. Their substrate specificity and controlled activity in different cells is a crucial part for plant metabolism to run pathways in concert. Transport proteins catalyse the uptake and exchange of ions, substrates, intermediates, products and cofactors across membranes. Given the large number of metabolites, a wide spectrum of transporters is required. The vast majority of in silico annotated membrane transporters in plant genomes, however, has not yet been functionally characterised. Hence, to understand the metabolic network as a whole, it is important to understand how transporters connect and control the metabolic pathways of plant cells. Heterologous expression and in vitro activity studies of recombinant transport proteins have highly improved their functional analysis in the last two decades. This review provides a comprehensive overview of the recent advances in membrane protein expression and functional characterisation using various host systems and transport assays. 相似文献
20.
Fall prevention is a critical component of health care; falls are a common source of injury in the elderly and are associated with significant levels of mortality and morbidity. Automatically detecting falls can allow rapid response to potential emergencies; in addition, knowing the cause or manner of a fall can be beneficial for prevention studies or a more tailored emergency response. The purpose of this study is to demonstrate techniques to not only reliably detect a fall but also to automatically classify the type. We asked 15 subjects to simulate four different types of falls-left and right lateral, forward trips, and backward slips-while wearing mobile phones and previously validated, dedicated accelerometers. Nine subjects also wore the devices for ten days, to provide data for comparison with the simulated falls. We applied five machine learning classifiers to a large time-series feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy. This work demonstrates how current machine learning approaches can simplify data collection for prevention in fall-related research as well as improve rapid response to potential injuries due to falls. 相似文献