首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
假尿苷(ψ)是RNA序列中的一种化学修饰,其在基因转录过程中,由酶的催化作用而形成。它是目前所发现为数最多的一种RNA修饰,并且在正常行使生物学功能方面扮演着重要角色。因此,假尿苷修饰位点的识别是一个非常重要的研究领域。随着RNA序列数据的急速增长,基于机器学习识别假尿苷位点的方法相继提出,但其识别精度有待提高。因此,本文提出了一个新的融合核苷酸化学性质、核苷酸浓度和位置特异性的单核苷酸、双核苷酸、三核苷酸偏好特征的序列编码方式,并基于此编码方式和核极限学习机(Kernel Extreme Learning Machine, KELM)算法,构建了一个新的假尿苷位点预测器,该预测器被称为“KELMPSP”。通过Jackknife测试和独立数据集测试表明,KELMPSP明显优于现有的假尿苷位点预测器。KELMPSP可以通过网站:http://39.10577.161:8890/KELMPSP进行使用。  相似文献   

2.
3.
4.
Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.  相似文献   

5.
DNA序列功能位点的识别是目前生物信息学领域的一个研究热点,剪接位点的识别就是其中之一.为了充分利用剪接位点的特征模式,从而更好地识别剪接位点,建立了一个基于改进Winnow算法的剪接位点识别系统.与其他方法相比较,改进的Winnow算法具有更好的鲁棒性,适用于高维特征空间,能够融合多种模式信息,即使在包含很多不相关特征的情况下,也能有很好的性能.同时在训练的时候,对特征集进行了剪枝,把一些对识别几乎没有贡献的特征去除,这样做对结果的影响可以忽略,而且提高了算法的效率.通过实验验证,改进的Winnow算法可以很好地识别剪接位点,其多个性能指标达到或超过目前国际上流行的剪接位点识别软件.  相似文献   

6.
7.
《IRBM》2023,44(1):100732
ObjectiveClustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a powerful genome editing technology. Guide RNA (gRNA) plays an essential guiding role in the CRISPR system by complementary base pairing with target DNA. Since the CRISPR targeting mechanism problem has not yet been fully resolved, it remains a challenge to predict gRNA on-target efficiency. Current gRNA design tools often lack efficient information extraction and cannot learn the target efficiency patterns thoroughly.Material and methodsIn this study, CRISPR-OTE is proposed to consider both multi-dimensional sequence information and important complementary prior knowledge based on a simple but effective framework. CRISPR-OTE consists of the local-contextual information branch and the prior knowledge branch. The local-contextual information branch extracts multi-dimensional sequence features from the DNA primary sequence by a parallel framework of Convolutional Neural Networks (CNN) and bidirectional Long Short-Term Memory networks (biLSTM). The prior knowledge branch selects the optimal subset of physicochemical features to provide the neural network with complementary knowledge, such as complex secondary structures. A simple feature fusion strategy is also adopted to fully utilize multi-modal data from the two branches.ResultsThe experimental results show that the optimal subset of physicochemical features (RNA secondary structure and melting temperature of 34nt target) can effectively improve the prediction performance. Additionally, combining multi-dimensional sequence features and multi-modal features can extract information more comprehensively. Through transfer learning, CRISPR-OTE trained on the CRISPR-Cpf1 system can also be successfully applied to the CRISPR-Cas9 system.ConclusionThe performance of CRISPR-OTE is superior to other methods in different CRISPR systems and species. Therefore, CRISPR-OTE is a simple on-target efficiency prediction framework with better accuracy and generalization performance.  相似文献   

8.
The synthesis of a 5′-O-BzH–2′-O-ACE-protected pseudouridine phosphoramidite is reported [BzH, benzhydryloxy-bis(trimethylsilyloxy)silyl; ACE, bis(2-acetoxyethoxy)methyl]. The availability of the phosphoramidite allows for reliable and efficient syntheses of hairpin RNAs containing single or multiple pseudouridine modifications in the stem or loop regions. Five 19-nt hairpin RNAs representing the 1920-loop region (G1906–C1924) of Escherichia coli 23S rRNA were synthesized with pseudouridine residues located at positions 1911, 1915 and 1917. Thermodynamic parameters, circular dichroism spectra and NMR data are presented for all five RNAs. Overall, three different structural contexts for the pseudouridine residues were examined and compared with the unmodified RNA. Our main findings are that pseudouridine modifications exhibit a range of effects on RNA stability and structure, depending on their locations. More specifically, pseudouridines in the single-stranded loop regions of the model RNAs are slightly destabilizing, whereas a pseudouridine at the stem–loop junction is stabilizing. Furthermore, the observed effects on stability are approximately additive when multiple pseudouridine residues are present. The possible relationship of these results to RNA function is discussed.  相似文献   

9.
10.
Box C/D small nucleolar RNAs (snoRNAs) are a conserved class of RNA known for their role in guiding ribosomal RNA 2′-O-ribose methylation. Recently, C/D snoRNAs were also implicated in regulating the expression of non-ribosomal genes through different modes of binding. Large scale RNA–RNA interaction datasets detect many snoRNAs binding messenger RNA, but are limited by specific experimental conditions. To enable a more comprehensive study of C/D snoRNA interactions, we created snoGloBe, a human C/D snoRNA interaction predictor based on a gradient boosting classifier. SnoGloBe considers the target type, position and sequence of the interactions, enabling it to outperform existing predictors. Interestingly, for specific snoRNAs, snoGloBe identifies strong enrichment of interactions near gene expression regulatory elements including splice sites. Abundance and splicing of predicted targets were altered upon the knockdown of their associated snoRNA. Strikingly, the predicted snoRNA interactions often overlap with the binding sites of functionally related RNA binding proteins, reinforcing their role in gene expression regulation. SnoGloBe is also an excellent tool for discovering viral RNA targets, as shown by its capacity to identify snoRNAs targeting the heavily methylated SARS-CoV-2 RNA. Overall, snoGloBe is capable of identifying experimentally validated binding sites and predicting novel sites with shared regulatory function.  相似文献   

11.
Wang  Hao  Xi  Qilemuge  Liang  Pengfei  Zheng  Lei  Hong  Yan  Zuo  Yongchun 《Amino acids》2021,53(2):239-251

Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called ‘reduced amino acid cluster’. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac.

  相似文献   

12.
There are 10 known putative pseudouridine synthase genes in Escherichia coli. The products of six have been previously assigned, one to formation of the single pseudouridine in 16S RNA, three to the formation of seven pseudouridines in 23S RNA, and three to the formation of three pseudouridines in tRNA (one synthase makes pseudouridine in 23S RNA and tRNA). Here we show that the remaining four putative synthase genes make bona fide pseudouridine synthases and identify which pseudouridines they make. RluB (formerly YciL) and RluE (formerly YmfC) make pseudouridine2605 and pseudouridine2457, respectively, in 23S RNA. RluF (formerly YjbC) makes the newly discovered pseudouridine2604 in 23S RNA, and TruC (formerly YqcB) makes pseudouridine65 in tRNA(Ile1) and tRNA(Asp). Deletion of each of these synthase genes individually had no effect on exponential growth in rich media at 25 degrees C, 37 degrees C, or 42 degrees C. A strain lacking RluB and RluF also showed no growth defect under these conditions. Mutation of a conserved aspartate in a common sequence motif, previously shown to be essential for the other six E. coli pseudouridine synthases and several yeast pseudouridine synthases, also caused a loss of in vivo activity in all four of the synthases studied in this work.  相似文献   

13.
Infectious disease surveillance systems provide vital data for guiding disease prevention and control policies, yet the formalization of methods to optimize surveillance networks has largely been overlooked. Decisions surrounding surveillance design parameters—such as the number and placement of surveillance sites, target populations, and case definitions—are often determined by expert opinion or deference to operational considerations, without formal analysis of the influence of design parameters on surveillance objectives. Here we propose a simulation framework to guide evidence-based surveillance network design to better achieve specific surveillance goals with limited resources. We define evidence-based surveillance design as an optimization problem, acknowledging the many operational constraints under which surveillance systems operate, the many dimensions of surveillance system design, the multiple and competing goals of surveillance, and the complex and dynamic nature of disease systems. We describe an analytical framework—the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework—for the identification of optimal surveillance designs through mathematical representations of disease and surveillance processes, definition of objective functions, and numerical optimization. We then apply the framework to the problem of selecting candidate sites to expand an existing surveillance network under alternative objectives of: (1) improving spatial prediction of disease prevalence at unmonitored sites; or (2) estimating the observed effect of a risk factor on disease. Results of this demonstration illustrate how optimal designs are sensitive to both surveillance goals and the underlying spatial pattern of the target disease. The findings affirm the value of designing surveillance systems through quantitative and adaptive analysis of network characteristics and performance. The framework can be applied to the design of surveillance systems tailored to setting-specific disease transmission dynamics and surveillance needs, and can yield improved understanding of tradeoffs between network architectures.  相似文献   

14.
15.
16.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.  相似文献   

17.
Computational prediction of RNA‐binding residues is helpful in uncovering the mechanisms underlying protein‐RNA interactions. Traditional algorithms individually applied feature‐ or template‐based prediction strategy to recognize these crucial residues, which could restrict their predictive power. To improve RNA‐binding residue prediction, herein we propose the first integrative algorithm termed RBRDetector (RNA‐Binding Residue Detector) by combining these two strategies. We developed a feature‐based approach that is an ensemble learning predictor comprising multiple structure‐based classifiers, in which well‐defined evolutionary and structural features in conjunction with sequential or structural microenvironment were used as the inputs of support vector machines. Meanwhile, we constructed a template‐based predictor to recognize the putative RNA‐binding regions by structurally aligning the query protein to the RNA‐binding proteins with known structures. The final RBRDetector algorithm is an ingenious fusion of our feature‐ and template‐based approaches based on a piecewise function. By validating our predictors with diverse types of structural data, including bound and unbound structures, native and simulated structures, and protein structures binding to different RNA functional groups, we consistently demonstrated that RBRDetector not only had clear advantages over its component methods, but also significantly outperformed the current state‐of‐the‐art algorithms. Nevertheless, the major limitation of our algorithm is that it performed relatively well on DNA‐binding proteins and thus incorrectly predicted the DNA‐binding regions as RNA‐binding interfaces. Finally, we implemented the RBRDetector algorithm as a user‐friendly web server, which is freely accessible at http://ibi.hzau.edu.cn/rbrdetector . Proteins 2014; 82:2455–2471. © 2014 Wiley Periodicals, Inc.  相似文献   

18.
Hydroxylation of proline or lysine residues in proteins is a common post-translational modification event, and such modifications are found in many physiological and pathological processes. Nonetheless, the exact molecular mechanism of hydroxylation remains under investigation. Because experimental identification of hydroxylation is time-consuming and expensive, bioinformatics tools with high accuracy represent desirable alternatives for large-scale rapid identification of protein hydroxylation sites. In view of this, we developed a supporter vector machine-based tool, OH-PRED, for the prediction of protein hydroxylation sites using the adapted normal distribution bi-profile Bayes feature extraction in combination with the physicochemical property indexes of the amino acids. In a jackknife cross validation, OH-PRED yields an accuracy of 91.88% and a Matthew’s correlation coefficient (MCC) of 0.838 for the prediction of hydroxyproline sites, and yields an accuracy of 97.42% and a MCC of 0.949 for the prediction of hydroxylysine sites. These results demonstrate that OH-PRED increased significantly the prediction accuracy of hydroxyproline and hydroxylysine sites by 7.37 and 14.09%, respectively, when compared with the latest predictor PredHydroxy. In independent tests, OH-PRED also outperforms previously published methods.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号