共查询到20条相似文献,搜索用时 15 毫秒
1.
《Animal : an international journal of animal bioscience》2020,14(2):223-232
Single nucleotide polymorphisms (SNPs) able to describe population differences can be used for important applications in livestock, including breed assignment of individual animals, authentication of mono-breed products and parentage verification among several other applications. To identify the most discriminating SNPs among thousands of markers in the available commercial SNP chip tools, several methods have been used. Random forest (RF) is a machine learning technique that has been proposed for this purpose. In this study, we used RF to analyse PorcineSNP60 BeadChip array genotyping data obtained from a total of 2737 pigs of 7 Italian pig breeds (3 cosmopolitan-derived breeds: Italian Large White, Italian Duroc and Italian Landrace, and 4 autochthonous breeds: Apulo-Calabrese, Casertana, Cinta Senese and Nero Siciliano) to identify breed informative and reduced SNP panels using the mean decrease in the Gini Index and the Mean Decrease in Accuracy parameters with stability evaluation. Other reduced informative SNP panels were obtained using Delta, Fixation index and principal component analysis statistics, and their performances were compared with those obtained using the RF-defined panels using the RF classification method and its derived Out Of Bag rates and correct prediction proportions. Therefore, the performances of a total of six reduced panels were evaluated. The correct assignment of the animals to its breed was close to 100% for all tested approaches. Porcine chromosome 8 harboured the largest number of selected SNPs across all panels. Many SNPs were included in genomic regions in which previous studies identified signatures of selection or genes (e.g. ESR1, KITL and LCORL) that could contribute to explain, at least in part, phenotypically or economically relevant traits that might differentiate cosmopolitan and autochthonous pig breeds. Random forest used as preselection statistics highlighted informative SNPs that were not the same as those identified by other methods. This might be due to specific features of this machine learning methodology. It will be interesting to explore if the adaptation of RF methods for the identification of selection signature regions could be able to describe population-specific features that are not captured by other approaches. 相似文献
2.
Background
Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins. 相似文献3.
Leimarembi Devi Naorem Mathavan Muthaiyan Amouda Venkatesan 《Journal of cellular biochemistry》2019,120(4):6154-6167
Triple-negative breast cancer (TNBC) has attracted more attention compared with other breast cancer subtypes due to its aggressive nature, poor prognosis, and chemotherapy remains the mainstay of treatment with no other approved targeted therapy. Therefore, the study aimed to discover more promising therapeutic targets and investigating new insights of biological mechanism of TNBC. Six microarray data sets consisting of 463 non-TNBC and 405 TNBC samples were mined from Gene Expression Omnibus. The data sets were integrated by meta-analysis and identified 1075 differentially expressed genes. Protein-protein interaction network was constructed which consists of 486 nodes and 1932 edges, where 29 hub genes were obtained with high topological measures. Further, 16 features (hub genes), 12 upregulated (AURKB, CCNB2, CDC20, DDX18, EGFR, ENO1, MYC, NUP88, PLK1, PML, POLR2F, and SKP2) and four downregulated ( CCND1, GLI3, SKP1, and TGFB3) were selected through machine learning correlation based feature selection method on training data set. A naïve Bayes based classifier built using the expression profiles of 16 features (hub genes) accurately and reliably classify TNBC from non-TNBC samples in the validation test data set with a receiver operating curve of 0.93 to 0.98. Subsequently, Gene Ontology analysis revealed that the hub genes were enriched in mitotic cell cycle processes and Kyoto Encyclopedia of Genes and Genomes pathway analysis showed that they were enriched in cell cycle pathways. Thus, the identified key hub genes and pathways highlighted in the study would enhance the understanding of molecular mechanism of TNBC which may serve as potential therapeutic target. 相似文献
4.
The accuracy of the secondary structure element (SSE) identification from volumetric protein density maps is critical for de-novo backbone structure derivation in electron cryo-microscopy (cryoEM). It is still challenging to detect the SSE automatically and accurately from the density maps at medium resolutions (~5-10 ?). We present a machine learning approach, SSELearner, to automatically identify helices and β-sheets by using the knowledge from existing volumetric maps in the Electron Microscopy Data Bank. We tested our approach using 10 simulated density maps. The averaged specificity and sensitivity for the helix detection are 94.9% and 95.8%, respectively, and those for the β-sheet detection are 86.7% and 96.4%, respectively. We have developed a secondary structure annotator, SSID, to predict the helices and β-strands from the backbone Cα trace. With the help of SSID, we tested our SSELearner using 13 experimentally derived cryo-EM density maps. The machine learning approach shows the specificity and sensitivity of 91.8% and 74.5%, respectively, for the helix detection and 85.2% and 86.5% respectively for the β-sheet detection in cryoEM maps of Electron Microscopy Data Bank. The reduced detection accuracy reveals the challenges in SSE detection when the cryoEM maps are used instead of the simulated maps. Our results suggest that it is effective to use one cryoEM map for learning to detect the SSE in another cryoEM map of similar quality. 相似文献
5.
6.
7.
De Bruyne K Slabbinck B Waegeman W Vauterin P De Baets B Vandamme P 《Systematic and applied microbiology》2011,34(1):20-29
At present, there is much variability between MALDI-TOF MS methodology for the characterization of bacteria through differences in e.g., sample preparation methods, matrix solutions, organic solvents, acquisition methods and data analysis methods. After evaluation of the existing methods, a standard protocol was developed to generate MALDI-TOF mass spectra obtained from a collection of reference strains belonging to the genera Leuconostoc, Fructobacillus and Lactococcus. Bacterial cells were harvested after 24 h of growth at 28 °C on the media MRS or TSA. Mass spectra were generated, using the CHCA matrix combined with a 50:48:2 acetonitrile:water:trifluoroacetic acid matrix solution, and analyzed by the cell smear method and the cell extract method. After a data preprocessing step, the resulting high quality data set was used for PCA, distance calculation and multi-dimensional scaling. Using these analyses, species-specific information in the MALDI-TOF mass spectra could be demonstrated. As a next step, the spectra, as well as the binary character set derived from these spectra, were successfully used for species identification within the genera Leuconostoc, Fructobacillus, and Lactococcus. Using MALDI-TOF MS identification libraries for Leuconostoc and Fructobacillus strains, 84% of the MALDI-TOF mass spectra were correctly identified at the species level. Similarly, the same analysis strategy within the genus Lactococcus resulted in 94% correct identifications, taking species and subspecies levels into consideration. Finally, two machine learning techniques were evaluated as alternative species identification tools. The two techniques, support vector machines and random forests, resulted in accuracies between 94% and 98% for the identification of Leuconostoc and Fructobacillus species, respectively. 相似文献
8.
Krishna Kumar Kandaswamy Ganesan Pugalenthi Kai-Uwe Kalies P.N. Suganthan 《Biochemical and biophysical research communications》2010,391(3):1306-1311
Eukaryotic protein secretion generally occurs via the classical secretory pathway that traverses the ER and Golgi apparatus. Secreted proteins usually contain a signal sequence with all the essential information required to target them for secretion. However, some proteins like fibroblast growth factors (FGF-1, FGF-2), interleukins (IL-1 alpha, IL-1 beta), galectins and thioredoxin are exported by an alternative pathway. This is known as leaderless or non-classical secretion and works without a signal sequence. Most computational methods for the identification of secretory proteins use the signal peptide as indicator and are therefore not able to identify substrates of non-classical secretion. In this work, we report a random forest method, SPRED, to identify secretory proteins from protein sequences irrespective of N-terminal signal peptides, thus allowing also correct classification of non-classical secretory proteins. Training was performed on a dataset containing 600 extracellular proteins and 600 cytoplasmic and/or nuclear proteins. The algorithm was tested on 180 extracellular proteins and 1380 cytoplasmic and/or nuclear proteins. We obtained 85.92% accuracy from training and 82.18% accuracy from testing. Since SPRED does not use N-terminal signals, it can detect non-classical secreted proteins by filtering those secreted proteins with an N-terminal signal by using SignalP. SPRED predicted 15 out of 19 experimentally verified non-classical secretory proteins. By scanning the entire human proteome we identified 566 protein sequences potentially undergoing non-classical secretion. The dataset and standalone version of the SPRED software is available at http://www.inb.uni-luebeck.de/tools-demos/spred/spred. 相似文献
9.
10.
11.
Vieira J Mendes MV Albuquerque P Moradas-Ferreira P Tavares F 《Letters in applied microbiology》2007,44(5):506-512
AIMS: To develop and establish a methodology for an oriented and fast identification of species taxa-specific molecular markers useful for the identification of micro-organisms. METHODS AND RESULTS: From the complete microbial genomes available in Pfam database, taxa-specific protein domains were identified which lead to the selection of taxa-specific loci. This strategy was used to identify six genetic markers: four specific for Pseudomonas syringae pv. tomato, one specific for P. syringae pv. syringae and one specific for P. putida. The discriminatory potential of these loci was evaluated by Southern hybridization using several pseudomonad species and pathovars, by dot-blot hybridization and by multiplex PCR optimized for the simultaneous detection of P. putida, P. syringae pv. syringae and P. syringae pv. tomato. Sensitivity assays indicated a detection limit of approximately 10 pg of chromosomal DNA template needed for each bacterium. CONCLUSIONS: The proposed methodology was efficient on the selection of six Pseudomonas-specific markers able to discriminate Pseudomonas at the species and pathovar level. SIGNIFICANCE AND IMPACT OF THE STUDY: The oriented search of taxa-specific molecular probes described in this work, which can be easily extended to other groups of bacteria, will improve the accuracy and expedite the identification of micro-organisms by DNA-based molecular methods. 相似文献
12.
Bielach A Duclercq J Marhavý P Benková E 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2012,367(1595):1469-1478
Phytohormones are important plant growth regulators that control many developmental processes, such as cell division, cell differentiation, organogenesis and morphogenesis. They regulate a multitude of apparently unrelated physiological processes, often with overlapping roles, and they mutually modulate their effects. These features imply important synergistic and antagonistic interactions between the various plant hormones. Auxin and cytokinin are central hormones involved in the regulation of plant growth and development, including processes determining root architecture, such as root pole establishment during early embryogenesis, root meristem maintenance and lateral root organogenesis. Thus, to control root development both pathways put special demands on the mechanisms that balance their activities and mediate their interactions. Here, we summarize recent knowledge on the role of auxin and cytokinin in the regulation of root architecture with special focus on lateral root organogenesis, discuss the latest findings on the molecular mechanisms of their interactions, and present forward genetic screen as a tool to identify novel molecular components of the auxin and cytokinin crosstalk. 相似文献
13.
Junior Barrera Roberto M CesarJr Carlos HumesJr David C MartinsJr Diogo FC Patrão Paulo JS Silva Helena Brentani 《BMC bioinformatics》2007,8(1):169
Background
One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. 相似文献14.
15.
Proteins often undergo conformational changes when binding to each other. A major fraction of backbone conformational changes involves motion on the protein surface, particularly in loops. Accounting for the motion of protein surface loops represents a challenge for protein-protein docking algorithms. A first step in addressing this challenge is to distinguish protein surface loops that are likely to undergo backbone conformational changes upon protein-protein binding (mobile loops) from those that are not (stationary loops). In this study, we developed a machine learning strategy based on support vector machines (SVMs). Our SVM uses three features of loop residues in the unbound protein structures-Ramachandran angles, crystallographic B-factors, and relative accessible surface area-to distinguish mobile loops from stationary ones. This method yields an average prediction accuracy of 75.3% compared with a random prediction accuracy of 50%, and an average of 0.79 area under the receiver operating characteristic (ROC) curve using cross-validation. Testing the method on an independent dataset, we obtained a prediction accuracy of 70.5%. Finally, we applied the method to 11 complexes that involve members from the Ras superfamily and achieved prediction accuracy of 92.8% for the Ras superfamily proteins and 74.4% for their binding partners. 相似文献
16.
17.
Background
A better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification. 相似文献18.
19.
High-throughput genotyping and sequencing techniques are rapidly and inexpensively providing large amounts of human genetic variation data. Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability and have been implicated in several human diseases, including cancer. Amino acid mutations resulting from non-synonymous SNPs in coding regions may generate protein functional changes that affect cell proliferation. In this study, we developed a machine learning approach to predict cancer-causing missense variants. We present a Support Vector Machine (SVM) classifier trained on a set of 3163 cancer-causing variants and an equal number of neutral polymorphisms. The method achieve 93% overall accuracy, a correlation coefficient of 0.86, and area under ROC curve of 0.98. When compared with other previously developed algorithms such as SIFT and CHASM our method results in higher prediction accuracy and correlation coefficient in identifying cancer-causing variants. 相似文献
20.
This paper investigated application of a machine learning approach (Support vector machine, SVM) for the automatic recognition of gait changes due to ageing using three types of gait measures: basic temporal/spatial, kinetic and kinematic. The gaits of 12 young and 12 elderly participants were recorded and analysed using a synchronized PEAK motion analysis system and a force platform during normal walking. Altogether, 24 gait features describing the three types of gait characteristics were extracted for developing gait recognition models and later testing of generalization performance. Test results indicated an overall accuracy of 91.7% by the SVM in its capacity to distinguish the two gait patterns. The classification ability of the SVM was found to be unaffected across six kernel functions (linear, polynomial, radial basis, exponential radial basis, multi-layer perceptron and spline). Gait recognition rate improved when features were selected from different gait data type. A feature selection algorithm demonstrated that as little as three gait features, one selected from each data type, could effectively distinguish the age groups with 100% accuracy. These results demonstrate considerable potential in applying SVMs in gait classification for many applications. 相似文献