首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers.  相似文献   

2.
A statistical method for determining low-resolution 3-D reconstructions of virus particles from cryoelectron microscope images by an ab initio algorithm is described. The method begins with a novel linear reconstruction method that generates a spherically symmetric reconstruction, which is followed by a nonlinear reconstruction method implementing an expectation-maximization procedure using the spherically symmetric reconstruction as an initial condition and resulting in a reconstruction with icosahedral symmetry. An important characteristic of the complete method is that very little need be known about the particle before the reconstruction is computed, in particular, only the type of symmetry and inner and outer radii. The method is demonstrated on synthetic cowpea mosaic virus data, and its robustness to 5% errors in the contrast transfer function, 5% errors in the location of the center of the particles in the images, and 5% distortion in the 3-D structure from which the images are derived is demonstrated numerically.  相似文献   

3.

Background

Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user''s own high-performance computing cluster.

Methodology/Principal Findings

The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.

Conclusions

The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.  相似文献   

4.
Theoretical investigations are carried out on reaction mechanism of the reactions of CF3CH2NH2 (TFEA) with the OH radical by means of ab initio and DFT methods. The electronic structure information on the potential energy surface for each reaction is obtained at MPWB1K/6-31+G(d,p) level and energetic information is further refined by calculating the energy of the species with a Gaussian-2 method, G2(MP2). The existence of transition states on the corresponding potential energy surface is ascertained by performing intrinsic reaction coordinate (IRC) calculation. Our calculation indicates that the H abstraction from –NH2 group is the dominant reaction channel because of lower energy barrier. The rate constants of the reaction calculated using canonical transition state theory (CTST) utilizing the ab initio data. The agreement between the theoretical and experimental rate constants is good at the measured temperature. From the comparison with CH3CH2NH2, it is shown that the fluorine substution decreases the reactivity of the C-H bond.  相似文献   

5.
The conformational energy surfaces of analogues of the dipeptide unit of polypeptides and proteins are calculated by ab initio methods using extended basis sets.The calculations are not particularly sensitive to the choice of (extended) basis set.The calculations are shown to support a particular empirical method parameterized with respect to crystal data. Non-hydrogen bonded conformations agree to within 3 kcal mol?1, even for conformations in which quite considerable degrees of atomic overlap occur.Hydrogen bonded conformations, are, however, in less satisfactory agreement and it is the ab initio calculations which appear to be at fault.A simple correction is applied to the ab initio energy for hydrogen bonded conformations, and with the use of the empirical energy surface a full quantum mechanical conformational energy map is interpolated for the alanyl dipeptide.The effect of flexibility in the peptide backbone is taken into account, and supports recent empirical findings that distortions in valence angles must be considered in calculations of the conformational behaviour of peptides.  相似文献   

6.

Background

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

Results

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

Conclusion

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
  相似文献   

7.
The thermostable direct hemolysin (TDH) is a major virulence factor of Vibrio parahaemolyticus. We have characterized the conformational properties of TDH by small-angle X-ray scattering (SAXS), ultracentrifugation and transmission electron microscopy. Sedimentation equilibrium and velocity studies revealed that the protein is tetrameric in aqueous solvents. The Guinier plot derived from SAXS data provided a radius of gyration of 29.0 Å. The elongated pattern with a shoulder of a pair distance distribution function derived from SAXS data suggested the presence of molecules with an anisotropic shape having a maximum diameter of 98 Å. Electron microscopic image analysis of the negatively stained TDH oligomer showed the presence of C4 symmetric particles with edge and diagonal lengths of 65 Å and 80 Å, respectively. Shape reconstruction was carried out by ab initio calculations using the SAXS data with a C4 symmetric approximation. These results suggested that the tetrameric TDH assumes an oblate structure. The hydrodynamic parameters predicted from the ab initio model differed slightly from the experimental values, suggesting the presence of flexible segments.  相似文献   

8.
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS.  相似文献   

9.
《Inorganica chimica acta》1986,125(2):107-110
The conformation of N-acetyl alanine methyl ester CH3CONHCH(CH3)COOCH3 is determined by CNDO/2 and ab initio calculations with minimal GLO basis sets. The binding sites of small monovalent cations to the ligand are investigated by the ab initio method. The chelate geometry involving peptide and ester carbonyl groups was found to be the most preferential conformation.  相似文献   

10.

Background

Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure. In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.

Results

We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å. After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å. Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server.

Conclusions

The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints.  相似文献   

11.

Background

Gene prediction is a challenging but crucial part in most genome analysis pipelines. Various methods have evolved that predict genes ab initio on reference sequences or evidence based with the help of additional information, such as RNA-Seq reads or EST libraries. However, none of these strategies is bias-free and one method alone does not necessarily provide a complete set of accurate predictions.

Results

We present IPred (Integrative gene Prediction), a method to integrate ab initio and evidence based gene identifications to complement the advantages of different prediction strategies. IPred builds on the output of gene finders and generates a new combined set of gene identifications, representing the integrated evidence of the single method predictions.

Conclusion

We evaluate IPred in simulations and real data experiments on Escherichia Coli and human data. We show that IPred improves the prediction accuracy in comparison to single method predictions and to existing methods for prediction combination.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1315-9) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector.

Results

We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download.

Conclusion

Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms.  相似文献   

13.
This is an investigation of technetium ligands and their complexes with [TcO]3+ using ab initio population analysis and molecular mechanics conformational searching methods. Calculated atomic electronic populations on the technetium atom in complexes with a number of ligands gauge the degree of covalent bonding between technetium and these ligands. Here a reduction in the positive charge on the [TcO]3+ moiety by complexation with a given ligand is correlated with covalent bonding. Our ab initio results suggest that ligands with more sulphur atoms have better covalent bonding to technetium than do other ligands. A conformational analysis of the uncomplexed ligands indicates that conformational reorganization before complexation correlates inversely with stable complex formation. This conformational analysis shows that ligands with ethylene carbonyl bridges have low energy conformations closer to the final complexation geometries than do ligands with ethylene, propylene or propylene carbonyl bridges. The presence of these low energy conformations facilitates a faster complexation of the ethylene carbonyl [TcO]3+ moiety. This result produces a kinetic explaination why ethylene carbonyl bridged ligands form stable complexes while many other ligands do not [1]. The conclusion is that kinetic and thermodynamic considerations play a role in stable complex formation between these ligands and technetium.  相似文献   

14.
Human metapneumovirus (HMPV) of the family Paramyxoviridae is a major cause of respiratory illness worldwide. Phosphoproteins (P) from Paramyxoviridae are essential co-factors of the viral RNA polymerase that form tetramers and possess long intrinsically disordered regions (IDRs). We located the central region of HMPV P (Pced) which is involved in tetramerization using disorder analysis and modeled its 3D structure ab initio using Rosetta fold-and-dock. We characterized the solution-structure of Pced using small angle X-ray scattering (SAXS) and carried out direct fitting to the scattering data to filter out incorrect models. Molecular dynamics simulations (MDS) and ensemble optimization were employed to select correct models and capture the dynamic character of Pced. Our analysis revealed that oligomerization involves a compact central core located between residues 169-194 (Pcore), that is surrounded by flexible regions with α-helical propensity. We crystallized this fragment and solved its structure at 3.1 Å resolution by molecular replacement, using the folded core from our SAXS-validated ab initio model. The RMSD between modeled and experimental tetramers is as low as 0.9 Å, demonstrating the accuracy of the approach. A comparison of the structure of HMPV P to existing mononegavirales Pced structures suggests that Pced evolved under weak selective pressure. Finally, we discuss the advantages of using SAXS in combination with ab initio modeling and MDS to solve the structure of small, homo-oligomeric protein complexes.  相似文献   

15.
16.
17.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

18.

Background

Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results

We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C α trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C α traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion

Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.  相似文献   

19.
Classical calculations of conformational potential surfaces, based on simple analytical functions of the interactions between atomic centres, continue to be of considerable importance. However, it has become apparent that not all interactions of importance can be included as interactions between positions which are effectively those of the nuclei. Thus, there has been recent interest in including lone pairs and special functions for interactions involving excited state orbitals. A particularly interesting test case is the COP(O2)OC fragment of the nucleic acid backbone, which would seem to be the most flexible “hinge point” of polynucleotides when the classical type of calculation is carried out. In contrast, ab initio quantum mechanical calculations show the conformational space of this fragment to be much more restricted. The disagreement is such that it calls into doubt the validity of the bench-top modelling of nucleotide behaviour. In the following study, a variety of ab initio calculations are carried out to localise, in an objective manner, non-core orbitals. Coulombic interactions are introduced between these localised orbitals, with charge parameters optimised to reproduce the total ab initio potential surface. The results imply an interesting disgreement with other authors, concerning the importance of lone pair interactions in the nucleotide backbone, and the origins of this disagreement are analysed in some detail.  相似文献   

20.
Adenovirus virus-associated RNA (VAI) provides protection against the host antiviral response in part by inhibiting the interferon-induced double stranded RNA-activated protein kinase (PKR). VAI consists of three base-paired regions; the apical stem responsible for the interaction with double-stranded RNA binding motifs (dsRBMs) of PKR, the central stem required for inhibition, and the terminal stem. The solution conformation of VAI and VAI lacking the terminal stem were determined using SAXS that suggested extended conformations that are in agreement with their secondary structures. Solution conformations of VAI lacking the terminal stem in complex with the dsRBMs of PKR indicated that the apical stem interacts with both dsRNA-binding motifs whereas the central stem does not. Hydrodynamic properties calculated from ab initio models were compared to experimentally determined parameters for model validation. Furthermore, SAXS envelopes were used as a constraint for the in silico modeling of tertiary structure for RNA and RNA–protein complex. Finally, full-length PKR was also studied, but concentration-dependent changes in hydrodynamic parameters prevented ab initio shape determination. Taken together, results provide an improved structural framework that further our understanding of the role VAI plays in evading host innate immune responses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号