共查询到20条相似文献,搜索用时 15 毫秒
1.
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp. 相似文献
2.
Neural stem cells (NSCs) are capable of self-renewal and differentiation into neurons, astrocytes and oligodendrocytes under specific local microenvironments. In here, we present a set of methods used for three dimensional (3D) differentiation and miRNA analysis of a clonal human neural stem cell (hNSC) line, currently in clinical trials for stroke disability ( and NCT01151124, Clinicaltrials.gov). HNSCs were derived from an ethical approved first trimester human fetal cortex and conditionally immortalized using retroviral integration of a single copy of the c-mycERTAMconstruct. We describe how to measure axon process outgrowth of hNSCs differentiated on 3D scaffolds and how to quantify associated changes in miRNA expression using PCR array. Furthermore we exemplify computational analysis with the aim of selecting miRNA putative targets. SOX5 and NR4A3 were identified as suitable miRNA putative target of selected significantly down-regulated miRNAs in differentiated hNSC. MiRNA target validation was performed on SOX5 and NR4A3 3’UTRs by dual reporter plasmid transfection and dual luciferase assay. NCT02117635相似文献
3.
4.
5.
6.
7.
8.
Abdominal aortic aneurysm (AAA) is frequently lethal and has no effective pharmaceutical treatment, posing a great threat to human health. Previous bioinformatics studies of the mechanisms underlying AAA relied largely on the detection of direct protein-protein interactions (level-1 PPI) between the products of reported AAA-related genes. Thus, some proteins not suspected to be directly linked to previously reported genes of pivotal importance to AAA might have been missed. In this study, we constructed an indirect protein-protein interaction (level-2 PPI) network based on common interacting proteins encoded by known AAA-related genes and successfully predicted previously unreported AAA-related genes using this network. We used four methods to test and verify the performance of this level-2 PPI network: cross validation, human AAA mRNA chip array comparison, literature mining, and verification in a mouse CaPO4 AAA model. We confirmed that the new level-2 PPI network is superior to the original level-1 PPI network and proved that the top 100 candidate genes predicted by the level-2 PPI network shared similar GO functions and KEGG pathways compared with positive genes. 相似文献
9.
10.
Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV. 相似文献
11.
Identifying the tissues in which a microRNA is expressed could enhance the understanding of the functions, the biological processes, and the diseases associated with that microRNA. However, the mechanisms of microRNA biogenesis and expression remain largely unclear and the identification of the tissues in which a microRNA is expressed is limited. Here, we present a machine learning based approach to predict whether an intronic microRNA show high co-expression with its host gene, by doing so, we could infer the tissues in which a microRNA is high expressed through the expression profile of its host gene. Our approach is able to achieve an accuracy of 79% in the leave-one-out cross validation and 95% on an independent testing dataset. We further estimated our method through comparing the predicted tissue specific microRNAs and the tissue specific microRNAs identified by biological experiments. This study presented a valuable tool to predict the co-expression patterns between human intronic microRNAs and their host genes, which would also help to understand the microRNA expression and regulation mechanisms. Finally, this framework can be easily extended to other species. 相似文献
12.
Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially could lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for robust identification of the most predictive features in such data. Within this framework, we propose a simple selective naive Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness for small sample sizes.An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown to be capable of selecting genes known to be associated with prostate cancer and viral infections. 相似文献
13.
14.
15.
16.
17.
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. 相似文献
18.
19.
20.