首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
2.
  1. Download : Download high-res image (147KB)
  2. Download : Download full-size image
  相似文献   

3.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.  相似文献   

4.
Nagata K  Randall A  Baldi P 《Proteins》2012,80(1):142-153
Accurate protein side-chain conformation prediction is crucial for protein modeling and existing methods for the task are widely used; however, faster and more accurate methods are still required. Here we present a new machine learning approach to the problem where an energy function for each rotamer in a structure is computed additively over pairs of contacting atoms. A family of 156 neural networks indexed by amino acid and contacting atom types is used to compute these rotamer energies as a function of atomic contact distances. Although direct energy targets are not available for training, the neural networks can still be optimized by converting the energies to probabilities and optimizing these probabilities using Markov Chain Monte Carlo methods. The resulting predictor SIDEpro makes predictions by initially setting the rotamer probabilities for each residue from a backbone-dependent rotamer library, then iteratively updating these probabilities using the trained neural networks. After convergences of the probabilities, the side-chains are set to the highest probability rotamer. Finally, a post processing clash reduction step is applied to the models. SIDEpro represents a significant improvement in speed and a modest, but statistically significant, improvement in accuracy when compared with the state-of-the-art for rapid side-chain prediction method SCWRL4 on the following datasets: (1) 379 protein test set of SCWRL4; (2) 94 proteins from CASP9; (3) a set of seven large protein-only complexes; and (4) a ribosome with and without the RNA. Using the SCWRL4 test set, SIDEpro's accuracy (χ(1) 86.14%, χ(1+2) 74.15%) is slightly better than SCWRL4-FRM (χ(1) 85.43%, χ(1+2) 73.47%) and it is 7.0 times faster. On the same test set SIDEpro is clearly more accurate than SCWRL4-rigid rotamer model (RRM) (χ(1) 84.15%, χ(1+2) 71.24%) and 2.4 times faster. Evaluation on the additional test sets yield similar accuracy results with SIDEpro being slightly more accurate than SCWRL4-flexible rotamer model (FRM) and clearly more accurate than SCWRL4-RRM; however, the gap in CPU time is much more significant when the methods are applied to large protein complexes. SIDEpro is part of the SCRATCH suite of predictors and available from: http://scratch.proteomics.ics.uci.edu/.  相似文献   

5.
We briefly review the results of other authors concerning the analysis of systems with time hierarchy, especially the Tikhonov theorem. A theorem, recently proved by the authors, making possible rigorous analysis of systems with complex fast dynamics is stated and discussed. A model example of a simple enzymatic reaction with product activation and slow (genetically driven) enzyme turnover is rigorously studied. It is shown that even in such a simple model there exist certain regions of parameters for which fast variables oscillate. Thus the classical Tikhonov theorem is not applicable here and we are forced to use another method-for example the author's presented theorem—or a purely numerical solution. These two methods are compared.  相似文献   

6.
7.
Numerical differentiation is known to be one of the most difficult numerical calculation methods to obtain reliable calculated values at all times. A simple numerical differentiation method using a combination of finite-difference formulas, derived by approximation of Taylor-series equations, is investigated in order to efficiently perform the sensitivity analysis of large-scale metabolic reaction systems. A result of the application to four basic mathematical functions reveals that the use of the eight-point differentiation formula with a non-dimensionalized stepsize close to 0.01 mostly provides more than 14 digits of accuracy in double precision for the numerical derivatives. Moreover, a result of the application to the modified TCA cycle model indicates that the numerical differentiation method gives the calculated values of steady-state metabolite concentrations within a range of round-off error and also makes it possible to transform the Michaelis-Menten equations into the S-system equations having the kinetic orders whose accuracies are mostly more than 14 significant digits. Because of the simple structure of the numerical differentiation formula and its promising high accuracy, it is evident that the present numerical differentiation method is useful for the analysis of large-scale metabolic reaction systems according to the systematic procedure of BST.  相似文献   

8.
In this paper, we introduce a fast and accurate side-chain modeling method, named OPUS-Rota. In a benchmark comparison with the methods SCWRL, NCN, LGA, SPRUCE, Rosetta, and SCAP, OPUS-Rota is shown to be much faster than all the methods except SCWRL, which is comparably fast. In terms of overall chi (1) and chi (1+2) accuracies, however, OPUS-Rota is 5.4 and 8.8 percentage points better, respectively, than SCWRL. Compared with NCN, which has the best accuracy in the literature, OPUS-Rota is 1.6 percentage points better for overall chi (1+2) but 0.3 percentage points weaker for overall chi (1). Hence, our algorithm is much more accurate than SCWRL with similar execution speed, and it has accuracy comparable to or better than the most accurate methods in the literature, but with a runtime that is one or two orders of magnitude shorter. In addition, OPUS-Rota consistently outperforms SCWRL on the Wallner and Elofsson homology-modeling benchmark set when the sequence identity is greater than 40%. We hope that OPUS-Rota will contribute to high-accuracy structure refinement, and the computer program is freely available for academic users.  相似文献   

9.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

10.

Background  

The use of exogenous small interfering RNAs (siRNAs) for gene silencing has quickly become a widespread molecular tool providing a powerful means for gene functional study and new drug target identification. Although considerable progress has been made recently in understanding how the RNAi pathway mediates gene silencing, the design of potent siRNAs remains challenging.  相似文献   

11.
The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.  相似文献   

12.

Background

Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. However, there is still a lack of methods for reconstructing polyploid haplotypes.

Results

In this work, the minimum error correction with genotype information (MEC/GI) model, an important combinatorial model for haplotyping a single individual, is used to study the triploid individual haplotype reconstruction problem. A fast and accurate enumeration-based algorithm enumeration haplotyping triploid with least difference (EHTLD) is proposed for solving the MEC/GI model. The EHTLD algorithm tries to reconstruct the three haplotypes according to the order of single nucleotide polymorphism (SNP) loci along them. When reconstructing a given SNP site, the EHTLD algorithm enumerates three kinds of SNP values in terms of the corresponding site’s genotype value, and chooses the one, which leads to the minimum difference between the reconstructed haplotypes and the sequenced fragments covering that SNP site, to fill the SNP loci being reconstructed.

Conclusion

Extensive experimental comparisons were performed between the EHTLD algorithm and the well known HapCompass and HapTree. Compared with algorithms HapCompass and HapTree, the EHTLD algorithm can reconstruct more accurate haplotypes, which were proven by a number of experiments.
  相似文献   

13.
We have previously developed the software for calculation of dynamic sensitivities, SoftCADS, in which one can calculate dynamic sensitivities with high accuracy by just setting the differential equations for metabolite concentrations. However, SoftCADS did not always provide calculated values with the machine accuracy of a computer, although a Taylor series method was employed to numerically solve the differential equations. This is because numerical derivatives calculated from an approximate formula were directly used in the derivation of the differential equations for sensitivities from those for metabolite concentrations. The present work therefore attempts to further enhance the performance of SoftCADS, including not only the accuracies of the calculated values but also the calculation time. To overcome the problem, the approximate formula is expanded into a Taylor series in time and the first-term value of the series is replaced by the exact coefficient on the second term of the flux function expanded into a Taylor series in an independent or dependent variable. The result reveals that this replacement certainly provides not only numerical derivatives but also dynamic sensitivities with superhigh accuracies comparable to the machine accuracy, regardless of the degree of stiffness of the differential equations. Moreover, a comparison indicates that the improved SoftCADS shortens the calculation time of the dynamic sensitivities without reducing their accuracies, even when the simplest approximate derivative formula is used.  相似文献   

14.
15.
SUMMARY: Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. AVAILABILITY: The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.  相似文献   

16.
J M Chandonia  M Karplus 《Proteins》1999,35(3):293-306
A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target beta-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy.  相似文献   

17.
18.
Random mutagenesis and selection approaches used traditionally for the development of industrial strains have largely been complemented by metabolic engineering, which allows purposeful modification of metabolic and cellular characteristics by using recombinant DNA and other molecular biological techniques. As systems biology advances as a new paradigm of research thanks to the development of genome-scale computational tools and high-throughput experimental technologies including omics, systems metabolic engineering allowing modification of metabolic, regulatory and signaling networks of the cell at the systems-level is becoming possible. In silico genome-scale metabolic model and its simulation play increasingly important role in providing systematic strategies for metabolic engineering. The in silico genome-scale metabolic model is developed using genomic annotation, metabolic reactions, literature information, and experimental data. The advent of in silico genome-scale metabolic model brought about the development of various algorithms to simulate the metabolic status of the cell as a whole. In this paper, we review the algorithms developed for the system-wide simulation and perturbation of cellular metabolism, discuss the characteristics of these algorithms, and suggest future research direction.  相似文献   

19.
We report a very fast and accurate physics-based method to calculate pH-dependent electrostatic effects in protein molecules and to predict the pK values of individual sites of titration. In addition, a CHARMm-based algorithm is included to construct and refine the spatial coordinates of all hydrogen atoms at a given pH. The present method combines electrostatic energy calculations based on the Generalized Born approximation with an iterative mobile clustering approach to calculate the equilibria of proton binding to multiple titration sites in protein molecules. The use of the GBIM (Generalized Born with Implicit Membrane) CHARMm module makes it possible to model not only water-soluble proteins but membrane proteins as well. The method includes a novel algorithm for preliminary refinement of hydrogen coordinates. Another difference from existing approaches is that, instead of monopeptides, a set of relaxed pentapeptide structures are used as model compounds. Tests on a set of 24 proteins demonstrate the high accuracy of the method. On average, the RMSD between predicted and experimental pK values is close to 0.5 pK units on this data set, and the accuracy is achieved at very low computational cost. The pH-dependent assignment of hydrogen atoms also shows very good agreement with protonation states and hydrogen-bond network observed in neutron-diffraction structures. The method is implemented as a computational protocol in Accelrys Discovery Studio and provides a fast and easy way to study the effect of pH on many important mechanisms such as enzyme catalysis, ligand binding, protein-protein interactions, and protein stability.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号