期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Effective sample selection for classification of pre-miRNAs

Han K 《Genetics and molecular research : GMR》2011,10(1):506-518

To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences. 相似文献

2.

PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs

Xuan P Guo M Liu X Huang Y Li W Huang Y 《Bioinformatics (Oxford, England)》2011,27(10):1368-1376

相似文献

3.

Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 总被引：2，自引：0，他引：2

Chenghai?Xue Fei?Li Tao?He Guo-Ping?Liu Yanda?Li Xuegong?Zhang Email author 《BMC bioinformatics》2005,6(1):310

Background

MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology. 相似文献

4.

MaturePred: efficient identification of microRNAs within novel plant pre-miRNAs

Xuan P Guo M Huang Y Li W Huang Y 《PloS one》2011,6(11):e27422

相似文献

5.

miRLocator: Machine Learning-Based Prediction of Mature MicroRNAs within Plant Pre-miRNA Sequences

Haibo Cui Jingjing Zhai Chuang Ma 《PloS one》2015,10(11)

相似文献

6.

De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures 总被引：5，自引：0，他引：5

Ng KL Mishra SK 《Bioinformatics (Oxford, England)》2007,23(11):1321-1330

相似文献

7.

Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM

Wang Y Chen X Jiang W Li L Li W Yang L Liao M Lian B Lv Y Wang S Wang S Li X 《Genomics》2011,98(2):73-78

相似文献

8.

Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine

Jun Meng Dong Liu Chao Sun Yushi Luan 《BMC bioinformatics》2014,15(1)

相似文献

9.

MicroRNA Prediction Using a Fixed-Order Markov Model Based on the Secondary Structure Pattern

Wei Shen Ming Chen Guo Wei Yan Li 《PloS one》2012,7(10)

Predicting miRNAs is an arduous task, due to the diversity of the precursors and complexity of enzyme processes. Although several prediction approaches have reached impressive performances, few of them could achieve a full-function recognition of mature miRNA directly from the candidate hairpins across species. Therefore, researchers continue to seek a more powerful model close to biological recognition to miRNA structure. In this report, we describe a novel miRNA prediction algorithm, known as FOMmiR, using a fixed-order Markov model based on the secondary structural pattern. For a training dataset containing 809 human pre-miRNAs and 6441 human pseudo-miRNA hairpins, the model’s parameters were defined and evaluated. The results showed that FOMmiR reached 91% accuracy on the human dataset through 5-fold cross-validation. Moreover, for the independent test datasets, the FOMmiR presented an outstanding prediction in human and other species including vertebrates, Drosophila, worms and viruses, even plants, in contrast to the well-known algorithms and models. Especially, the FOMmiR was not only able to distinguish the miRNA precursors from the hairpins, but also locate the position and strand of the mature miRNA. Therefore, this study provides a new generation of miRNA prediction algorithm, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences. And it presents a new understanding of the biological recognition based on the strongest signal’s location detected by FOMmiR, which might be closely associated with the enzyme cleavage mechanism during the miRNA maturation. 相似文献

10.

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach

Bin Liu Longyun Fang Fule Liu Xiaolong Wang Kuo-Chen Chou 《Journal of biomolecular structure & dynamics》2016,34(1):223-235

相似文献

11.

Correlation between sequence conservation and structural thermodynamics of microRNA precursors from human,mouse, and chicken genomes

Ming Ni Wenjie Shu Xiaochen Bo Shengqi Wang Songgang Li 《BMC evolutionary biology》2010,10(1):329

Background

Previous studies have shown that microRNA precursors (pre-miRNAs) have considerably more stable secondary structures than other native RNAs (tRNA, rRNA, and mRNA) and artificial RNA sequences. However, pre-miRNAs with ultra stable secondary structures have not been investigated. It is not known if there is a tendency in pre-miRNA sequences towards or against ultra stable structures? Furthermore, the relationship between the structural thermodynamic stability of pre-miRNA and their evolution remains unclear.

Results

We investigated the correlation between pre-miRNA sequence conservation and structural stability as measured by adjusted minimum folding free energies in pre-miRNAs isolated from human, mouse, and chicken. The analysis revealed that conserved and non-conserved pre-miRNA sequences had structures with similar average stabilities. However, the relatively ultra stable and unstable pre-miRNAs were more likely to be non-conserved than pre-miRNAs with moderate stability. Non-conserved pre-miRNAs had more G+C than A+U nucleotides, while conserved pre-miRNAs contained more A+U nucleotides. Notably, the U content of conserved pre-miRNAs was especially higher than that of non-conserved pre-miRNAs. Further investigations showed that conserved and non-conserved pre-miRNAs exhibited different structural element features, even though they had comparable levels of stability.

Conclusions

We proposed that there is a correlation between structural thermodynamic stability and sequence conservation for pre-miRNAs from human, mouse, and chicken genomes. Our analyses suggested that pre-miRNAs with relatively ultra stable or unstable structures were less favoured by natural selection than those with moderately stable structures. Comparison of nucleotide compositions between non-conserved and conserved pre-miRNAs indicated the importance of U nucleotides in the pre-miRNA evolutionary process. Several characteristic structural elements were also detected in conserved pre-miRNAs.

相似文献

12.

Prediction of tyrosine sulfation with mRMR feature selection and analysis

Niu S Huang T Feng K Cai Y Li Y 《Journal of proteome research》2010,9(12):6490-6497

Protein tyrosine sulfation is a ubiquitous post-translational modification (PTM) of secreted and transmembrane proteins that pass through the Golgi apparatus. In this study, we developed a new method for protein tyrosine sulfation prediction based on a nearest neighbor algorithm with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). We incorporated features of sequence conservation, residual disorder, and amino acid factor, 229 features in total, to predict tyrosine sulfation sites. From these 229 features, 145 features were selected and deemed as the optimized features for the prediction. The prediction model achieved a prediction accuracy of 90.01% using the optimal 145-feature set. Feature analysis showed that conservation, disorder, and physicochemical/biochemical properties of amino acids all contributed to the sulfation process. Site-specific feature analysis showed that the features derived from its surrounding sites contributed profoundly to sulfation site determination in addition to features derived from the sulfation site itself. The detailed feature analysis in this paper might help understand more of the sulfation mechanism and guide the related experimental validation. 相似文献

13.

Analysis of Nearly One Thousand Mammalian Mirtrons Reveals Novel Features of Dicer Substrates

Jiayu Wen Erik Ladewig Sol Shenker Jaaved Mohammed Eric C. Lai 《PLoS computational biology》2015,11(9)

相似文献

14.

A combined computational and microarray-based approach identifies novel microRNAs encoded by human gamma-herpesviruses 总被引：14，自引：0，他引：14

Grundhoff A Sullivan CS Ganem D 《RNA (New York, N.Y.)》2006,12(5):733-750

We have developed an approach to identify microRNAs (miRNAs) that is based on bioinformatics and array-based technologies, without the use of cDNA cloning. The approach, designed for use on genomes of small size (<2 Mb), was tested on cells infected by either of two lymphotropic herpesviruses, KSHV and EBV. The viral genomes were scanned computationally for pre-miRNAs using an algorithm (VMir) we have developed. Candidate hairpins suggested by this analysis were then synthesized as oligonucleotides on microarrays, and the arrays were hybridized with small RNAs from infected cells. Candidate miRNAs that scored positive on the arrays were then subjected to confirmatory Northern blot analysis. Using this approach, 10 of the known KSHV pre-miRNAs were identified, as well as a novel pre-miRNA that had earlier escaped detection. This method also led to the identification of seven new EBV-encoded pre-miRNAs; by using additional computational approaches, we identified a total of 18 new EBV pre-miRNAs that produce 22 mature miRNA molecules, thereby more than quadrupling the total number of hitherto known EBV miRNAs. The advantages and limitations of the approach are discussed. 相似文献

15.

Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification

Supatcha Lertampaiporn Chinae Thammarongtham Chakarida Nukoolkit Boonserm Kaewkamnerdpong Marasri Ruengjitchatchawalya 《Nucleic acids research》2013,41(1):e21

An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html. 相似文献

16.

Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties

Huang T Niu S Xu Z Huang Y Kong X Cai YD Chou KC 《PloS one》2011,6(8):e22940

相似文献

17.

CBFS: high performance feature selection algorithm based on feature clearness

M Seo S Oh 《PloS one》2012,7(7):e40419

Background

The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy.

Methodology

In this study, we devised a new feature selection algorithm (CBFS) based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy.

Conclusions/Significance

From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification. 相似文献

18.

Selection of relevant features for EEG signal classification of schizophrenic patients

M. Sabeti R. Boostani S.D. Katebi G.W. Price 《Biomedical signal processing and control》2007,2(2):122-134

In this paper, EEG signals of 20 schizophrenic patients and 20 age-matched control participants are analyzed with the objective of determining the more informative channels and finally distinguishing the two groups. For each case, 22 channels of EEG were recorded. A two-stage feature selection algorithm is designed, such that, the more informative channels are first selected to enhance the discriminative information. Two methods, bidirectional search and plus-L minus-R (LRS) techniques are employed to select these informative channels. The interesting point is that most of selected channels are located in the temporal lobes (containing the limbic system) that confirm the neuro-phychological differences in these areas between the schizophrenic and normal participants. After channel selection, genetic algorithm (GA) is employed to select the best features from the selected channels. In this case, in addition to elimination of the less informative channels, the redundant and less discriminant features are also eliminated. A computationally fast algorithm with excellent classification results is obtained. Implementation of this efficient approach involves several features including autoregressive (AR) model parameters, band power, fractal dimension and wavelet energy. To test the performance of the final subset of features, classifiers including linear discriminant analysis (LDA) and support vector machine (SVM) are employed to classify the reduced feature set of the two groups. Using the bidirectional search for channel selection, a classification accuracy of 84.62% and 99.38% is obtained for LDA and SVM, respectively. Using the LRS technique for channel selection, a classification accuracy of 88.23% and 99.54% is also obtained for LDA and SVM, respectively. Finally, the results are compared and contrasted with two well-known methods namely, the single-stage feature selection (evolutionary feature selection) and principal component analysis (PCA)-based feature selection. The results show improved accuracy of classification in relatively low computational time with the two-stage feature selection. 相似文献

19.

MicroRNAome of Porcine Pre- and Postnatal Development

Mingzhou Li Youlin Xia Yiren Gu Kai Zhang Qiulei Lang Lei Chen Jiuqiang Guan Zonggang Luo Haosi Chen Yang Li Qinghai Li Xiang Li An-an Jiang Surong Shuai Jinyong Wang Qi Zhu Xiaochuan Zhou Xiaolian Gao Xuewei Li 《PloS one》2010,5(7)

The domestic pig is of enormous agricultural significance and valuable models for many human diseases. Information concerning the pig microRNAome (miRNAome) has been long overdue and elucidation of this information will permit an atlas of microRNA (miRNA) regulation functions and networks to be constructed. Here we performed a comprehensive search for porcine miRNAs on ten small RNA sequencing libraries prepared from a mixture of tissues obtained during the entire pig lifetime, from the fetal period through adulthood. The sequencing results were analyzed using mammalian miRNAs, the precursor hairpins (pre-miRNAs) and the first release of the high-coverage porcine genome assembly (Sscrofa9, April 2009) and the available expressed sequence tag (EST) sequences. Our results extend the repertoire of pig miRNAome to 867 pre-miRNAs (623 with genomic coordinates) encoding for 1,004 miRNAs, of which 777 are unique. We preformed real-time quantitative PCR (q-PCR) experiments for selected 30 miRNAs in 47 tissue-specific samples and found agreement between the sequencing and q-PCR data. This broad survey provides detailed information about multiple variants of mature sequences, precursors, chromosomal organization, development-specific expression, and conservation patterns. Our data mining produced a broad view of the pig miRNAome, consisting of miRNAs and isomiRs and a wealth of information of pig miRNA characteristics. These results are prelude to the advancement in pig biology as well the use of pigs as model organism for human biological and biomedical studies. 相似文献

20.

SecProMTB: Support Vector Machine‐Based Classifier for Secretory Proteins Using Imbalanced Data Sets Applied to Mycobacterium tuberculosis

Chaolu Meng Leyi Wei Quan Zou 《Proteomics》2019,19(17)

Secretory proteins of Mycobacterium tuberculosis have created more concern, given their dominant immunogenicity and role in pathogenesis. In view of expensive and time‐consuming traditional biochemical experiments, an advanced support vector machine model named SecProMTB is constructed in this study and the proteins are identified by a bioinformatic approach. First, an improved pseudo‐amino acid composition (PseAAC) algorithm is used to extract features from all entities. Second, a novel imbalanced‐data strategy is proposed and adopted to divide the original data set into train set and test set. Third, to overcome the overfitting problem, feature‐ranking algorithms are applied with an increment feature selection. Finally, the model is trained and optimized. Consequently, a model is obtained with an area under the curve of 0.862 and average accuracy of 86% in the independent test. For the convenience of users, SecProMTB and related data are openly accessible at http://server.malab.cn/SecProMTB/index.jsp . 相似文献