首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
W Zhang  Y Niu  Y Xiong  M Zhao  R Yu  J Liu 《PloS one》2012,7(8):e43575

Motivation

The conformational B-cell epitopes are the specific sites on the antigens that have immune functions. The identification of conformational B-cell epitopes is of great importance to immunologists for facilitating the design of peptide-based vaccines. As an attempt to narrow the search for experimental validation, various computational models have been developed for the epitope prediction by using antigen structures. However, the application of these models is undermined by the limited number of available antigen structures. In contrast to the most of available structure-based methods, we here attempt to accurately predict conformational B-cell epitopes from antigen sequences.

Methods

In this paper, we explore various sequence-derived features, which have been observed to be associated with the location of epitopes or ever used in the similar tasks. These features are evaluated and ranked by their discriminative performance on the benchmark datasets. From the perspective of information science, the combination of various features can usually lead to better results than the individual features. In order to build the robust model, we adopt the ensemble learning approach to incorporate various features, and develop the ensemble model to predict conformational epitopes from antigen sequences.

Results

Evaluated by the leave-one-out cross validation, the proposed method gives out the mean AUC scores of 0.687 and 0.651 on two datasets respectively compiled from the bound structures and unbound structures. When compared with publicly available servers by using the independent dataset, our method yields better or comparable performance. The results demonstrate the proposed method is useful for the sequence-based conformational epitope prediction.

Availability

The web server and datasets are freely available at http://bcell.whu.edu.cn.  相似文献   

2.
3.

Background

Many computational microRNA target prediction tools are focused on several key features, including complementarity to 5′seed of miRNAs and evolutionary conservation. While these features allow for successful target identification, not all miRNA target sites are conserved and adhere to canonical seed complementarity. Several studies have propagated the use of energy features of mRNA:miRNA duplexes as an alternative feature. However, different independent evaluations reported conflicting results on the reliability of energy-based predictions. Here, we reassess the usefulness of energy features for mammalian target prediction, aiming to relax or eliminate the need for perfect seed matches and conservation requirement.

Methodology/Principal Findings

We detect significant differences of energy features at experimentally supported human miRNA target sites and at genome-wide sites of AGO protein interaction. This trend is confirmed on datasets that assay the effect of miRNAs on mRNA and protein expression changes, and a simple linear regression model leads to significant correlation of predicted versus observed expression change. Compared to 6-mer seed matches as baseline, application of our energy-based model leads to ∼3–5-fold enrichment on highly down-regulated targets, and allows for prediction of strictly imperfect targets with enrichment above baseline.

Conclusions/Significance

In conclusion, our results indicate significant promise for energy-based miRNA target prediction that includes a broader range of targets without having to use conservation or impose stringent seed match rules.  相似文献   

4.

Background

MicroRNA (miRNA) sponges with multiple tandem miRNA binding sequences can sequester miRNAs from their endogenous target mRNAs. Therefore, miRNA sponge acting as a decoy is extremely important for long-term loss-of-function studies both in vivo and in silico. Recently, a growing number of in silico methods have been used as an effective technique to generate hypotheses for in vivo methods for studying the biological functions and regulatory mechanisms of miRNA sponges. However, most existing in silico methods only focus on studying miRNA sponge interactions or networks in cancer, the module-level properties of miRNA sponges in cancer is still largely unknown.

Results

We propose a novel in silico method, called miRSM (miRNA Sponge Module) to infer miRNA sponge modules in breast cancer. We apply miRSM to the breast invasive carcinoma (BRCA) dataset provided by The Cancer Genome Altas (TCGA), and make functional validation of the computational results. We discover that most miRNA sponge interactions are module-conserved across two modules, and a minority of miRNA sponge interactions are module-specific, existing only in a single module. Through functional annotation and differential expression analysis, we also find that the modules discovered using miRSM are functional miRNA sponge modules associated with BRCA. Moreover, the module-specific miRNA sponge interactions among miRNA sponge modules may be involved in the progression and development of BRCA. Our experimental results show that miRSM is comparable to the benchmark methods in recovering experimentally confirmed miRNA sponge interactions, and miRSM outperforms the benchmark methods in identifying interactions that are related to breast cancer.

Conclusions

Altogether, the functional validation results demonstrate that miRSM is a promising method to identify miRNA sponge modules and interactions, and may provide new insights for understanding the roles of miRNA sponges in cancer progression and development.
  相似文献   

5.
6.

Background

MicroRNAs (miRNAs) are a class of endogenous small regulatory RNAs. Identifications of the dys-regulated or perturbed miRNAs and their key target genes are important for understanding the regulatory networks associated with the studied cellular processes. Several computational methods have been developed to infer the perturbed miRNA regulatory networks by integrating genome-wide gene expression data and sequence-based miRNA-target predictions. However, most of them only use the expression information of the miRNA direct targets, rarely considering the secondary effects of miRNA perturbation on the global gene regulatory networks.

Results

We proposed a network propagation based method to infer the perturbed miRNAs and their key target genes by integrating gene expressions and global gene regulatory network information. The method used random walk with restart in gene regulatory networks to model the network effects of the miRNA perturbation. Then, it evaluated the significance of the correlation between the network effects of the miRNA perturbation and the gene differential expression levels with a forward searching strategy. Results show that our method outperformed several compared methods in rediscovering the experimentally perturbed miRNAs in cancer cell lines. Then, we applied it on a gene expression dataset of colorectal cancer clinical patient samples and inferred the perturbed miRNA regulatory networks of colorectal cancer, including several known oncogenic or tumor-suppressive miRNAs, such as miR-17, miR-26 and miR-145.

Conclusions

Our network propagation based method takes advantage of the network effect of the miRNA perturbation on its target genes. It is a useful approach to infer the perturbed miRNAs and their key target genes associated with the studied biological processes using gene expression data.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-255) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background  

Experimental identification of microRNA (miRNA) targets is a difficult and time consuming process. As a consequence several computational prediction methods have been devised in order to predict targets for follow up experimental validation. Current computational target prediction methods use only the miRNA sequence as input. With an increasing number of experimentally validated targets becoming available, utilising this additional information in the search for further targets may help to improve the specificity of computational methods for target site prediction.  相似文献   

8.
9.
10.
11.
12.
Zhang  Wen  Zhu  Xiaopeng  Fu  Yu  Tsuji  Junko  Weng  Zhiping 《BMC bioinformatics》2017,18(13):464-11

Background

Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints.

Results

Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method.

Conclusions

In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
  相似文献   

13.

Background

Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.

Methodology/Principal Findings

A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.

Conclusions

It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.  相似文献   

14.

Motivation

Ischemic stroke, triggered by an obstruction in the cerebral blood supply, leads to infarction of the affected brain tissue. An accurate and reproducible automatic segmentation is of high interest, since the lesion volume is an important end-point for clinical trials. However, various factors, such as the high variance in lesion shape, location and appearance, render it a difficult task.

Methods

In this article, nine classification methods (e.g. Generalized Linear Models, Random Decision Forests and Convolutional Neural Networks) are evaluated and compared with each other using 37 multiparametric MRI datasets of ischemic stroke patients in the sub-acute phase in terms of their accuracy and reliability for ischemic stroke lesion segmentation. Within this context, a multi-spectral classification approach is compared against mono-spectral classification performance using only FLAIR MRI datasets and two sets of expert segmentations are used for inter-observer agreement evaluation.

Results and Conclusion

The results of this study reveal that high-level machine learning methods lead to significantly better segmentation results compared to the rather simple classification methods, pointing towards a difficult non-linear problem. The overall best segmentation results were achieved by a Random Decision Forest and a Convolutional Neural Networks classification approach, even outperforming all previously published results. However, none of the methods tested in this work are capable of achieving results in the range of the human observer agreement and the automatic ischemic stroke lesion segmentation remains a complicated problem that needs to be explored in more detail to improve the segmentation results.  相似文献   

15.

Background

Natural or endogenous sense/antisense miRNAs, located on sense and antisense strands in the same genomic region, respectively, are detected recently. However, little is known about these miRNA pairs, especially for their distributions in different animal species. We herein present systematic analysis of them in human, mouse and rat miRNAs, and their expression patterns based on deep sequencing datasets.

Methods and results

The phenomenon of miRNA–miRNA interaction could be detected in different animal species. The common miRNAs pairs were found across species. These miRNA pairs could form miRNA:miRNA duplex with complete complementary structure, and were prone to be located on specific chromosomes. They might be homologous miRNA genes (especially in human), or clustered in a gene cluster (especially in rat), or simultaneously detected in different genomic regions due to multicopy pre-miRNAs. Remarkably, some miRNA pairs, located in different genomic regions, also showed complementarity as well as endogenous sense/antisense miRNAs. Based on published deep sequencing datasets, one member of miRNA pairs always was abundantly expressed, whereas another was quite rare. Rare common target mRNAs of these miRNA pairs were predicted.

Conclusions

Interaction between miRNAs and significant expression divergence implied complex potential mutual regulatory pattern in the miRNA world. The study would enrich miRNA regulatory network.  相似文献   

16.

Motivation

Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models.

Method

In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models.

Results

We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.  相似文献   

17.
Zhao  Chengshuai  Qiu  Yang  Zhou  Shuang  Liu  Shichao  Zhang  Wen  Niu  Yanqing 《BMC genomics》2020,21(13):1-12
Background

Researchers discover LncRNA–miRNA regulatory paradigms modulate gene expression patterns and drive major cellular processes. Identification of lncRNA-miRNA interactions (LMIs) is critical to reveal the mechanism of biological processes and complicated diseases. Because conventional wet experiments are time-consuming, labor-intensive and costly, a few computational methods have been proposed to expedite the identification of lncRNA-miRNA interactions. However, little attention has been paid to fully exploit the structural and topological information of the lncRNA-miRNA interaction network.

Results

In this paper, we propose novel lncRNA-miRNA prediction methods by using graph embedding and ensemble learning. First, we calculate lncRNA-lncRNA sequence similarity and miRNA-miRNA sequence similarity, and then we combine them with the known lncRNA-miRNA interactions to construct a heterogeneous network. Second, we adopt several graph embedding methods to learn embedded representations of lncRNAs and miRNAs from the heterogeneous network, and construct the ensemble models using two ensemble strategies. For the former, we consider individual graph embedding based models as base predictors and integrate their predictions, and develop a method, named GEEL-PI. For the latter, we construct a deep attention neural network (DANN) to integrate various graph embeddings, and present an ensemble method, named GEEL-FI. The experimental results demonstrate both GEEL-PI and GEEL-FI outperform other state-of-the-art methods. The effectiveness of two ensemble strategies is validated by further experiments. Moreover, the case studies show that GEEL-PI and GEEL-FI can find novel lncRNA-miRNA associations.

Conclusion

The study reveals that graph embedding and ensemble learning based method is efficient for integrating heterogeneous information derived from lncRNA-miRNA interaction network and can achieve better performance on LMI prediction task. In conclusion, GEEL-PI and GEEL-FI are promising for lncRNA-miRNA interaction prediction.

  相似文献   

18.
19.
MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or machine learning methods for the site- and UTR-level classifiers, mirMark significantly improves the overall predictive performance compared to existing publicly available methods. MirMark is available from https://github.com/lanagarmire/MirMark.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0500-5) contains supplementary material, which is available to authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号