期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ontology-oriented retrieval of putative microRNAs in Vitis vinifera via GrapeMiRNA: a web database of de novo predicted grape microRNAs

Barbara Lazzari Andrea Caprera Alessandro Cestaro Ivan Merelli Marcello Del Corvo Paolo Fontana Luciano Milanesi Riccardo Velasco Alessandra Stella 《BMC plant biology》2009,9(1):82

Background

Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at . 相似文献

2.

How reliably can we predict the reliability of protein structure predictions?

István Miklós Ádám Novák ' Balázs Dombai Jotun Hein 《BMC bioinformatics》2008,9(1):137

Background

Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. 相似文献

3.

An efficient method for the prediction of deleterious multiple-point mutations in the secondary structure of RNAs using suboptimal folding solutions

Alexander Churkin Danny Barash 《BMC bioinformatics》2008,9(1):222

Background

RNAmute is an interactive Java application which, given an RNA sequence, calculates the secondary structure of all single point mutations and organizes them into categories according to their similarity to the predicted structure of the wild type. The secondary structure predictions are performed using the Vienna RNA package. A more efficient implementation of RNAmute is needed, however, to extend from the case of single point mutations to the general case of multiple point mutations, which may often be desired for computational predictions alongside mutagenesis experiments. But analyzing multiple point mutations, a process that requires traversing all possible mutations, becomes highly expensive since the running time is O(n ^m) for a sequence of length n with m-point mutations. Using Vienna's RNAsubopt, we present a method that selects only those mutations, based on stability considerations, which are likely to be conformational rearranging. The approach is best examined using the dot plot representation for RNA secondary structure. 相似文献

4.

Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases

Jan Charles Biro 《Theoretical biology & medical modelling》2006,3(1):28-11

Background

All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific ab initio structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code. 相似文献

5.

(PS)²-v2: template-based protein structure prediction server

Chih-Chieh Chen Jenn-Kang Hwang Jinn-Moon Yang 《BMC bioinformatics》2009,10(1):366

Background

Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)²-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability. 相似文献

6.

Identification of similar regions of protein structures using integrated sequence and structure analysis tools

Brandon Peters Charles Moad Eunseog Youn Kris Buffington Randy Heiland Sean Mooney 《BMC structural biology》2006,6(1):4-8

Background

Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. 相似文献

7.

Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks

Predrag Kukic Claudio Mirabello Giuseppe Tradigo Ian Walsh Pierangelo Veltri Gianluca Pollastri 《BMC bioinformatics》2014,15(1):1-15

Background

Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure. In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.

Results

We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å. After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å. Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server.

Conclusions

The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints. 相似文献

8.

Structural genomics of human proteins – target selection and generation of a public catalogue of expression clones

Konrad?Büssow Email author Christoph?Scheich Volker?Sievert Ulrich?Harttig J?rg?Schultz Bernd?Simon Peer?Bork Hans?Lehrach Udo?Heinemann 《Microbial cell factories》2005,4(1):21

Background

The availability of suitable recombinant protein is still a major bottleneck in protein structure analysis. The Protein Structure Factory, part of the international structural genomics initiative, targets human proteins for structure determination. It has implemented high throughput procedures for all steps from cloning to structure calculation. This article describes the selection of human target proteins for structure analysis, our high throughput cloning strategy, and the expression of human proteins in Escherichia colihost cells. 相似文献

9.

Structural characterization of CA1462, the <Emphasis Type="Italic">Candida albicans</Emphasis> thiamine pyrophosphokinase

Sébastien Santini Vincent Monchois Nicolas Mouz Cécile Sigoillot Tristan Rousselle Jean-Michel Claverie Chantal Abergel 《BMC structural biology》2008,8(1):33

Background

In search of new antifungal targets of potential interest for pharmaceutical companies, we initiated a comparative genomics study to identify the most promising protein-coding genes in fungal genomes. One criterion was the protein sequence conservation between reference pathogenic genomes. A second criterion was that the corresponding gene in Saccharomyces cerevisiae should be essential. Since thiamine pyrophosphate is an essential product involved in a variety of metabolic pathways, proteins responsible for its production satisfied these two criteria. 相似文献

10.

An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA

Dutta Shayoni Madan Spandan Parikh Harsh Sundar Durai 《BMC genomics》2016,17(13):1033-107

Background

The ability to engineer zinc finger proteins binding to a DNA sequence of choice is essential for targeted genome editing to be possible. Experimental techniques and molecular docking have been successful in predicting protein-DNA interactions, however, they are highly time and resource intensive. Here, we present a novel algorithm designed for high throughput prediction of optimal zinc finger protein for 9 bp DNA sequences of choice. In accordance with the principles of information theory, a subset identified by using K-means clustering was used as a representative for the space of all possible 9 bp DNA sequences. The modeling and simulation results assuming synergistic mode of binding obtained from this subset were used to train an ensemble micro neural network. Synergistic mode of binding is the closest to the DNA-protein binding seen in nature, and gives much higher quality predictions, while the time and resources increase exponentially in the trade off. Our algorithm is inspired from an ensemble machine learning approach, and incorporates the predictions made by 100 parallel neural networks, each with a different hidden layer architecture designed to pick up different features from the training dataset to predict optimal zinc finger proteins for any 9 bp target DNA.

Results

The model gave an accuracy of an average 83% sequence identity for the testing dataset. The BLAST e-value are well within the statistical confidence interval of E-05 for 100% of the testing samples. The geometric mean and median value for the BLAST e-values were found to be 1.70E-12 and 7.00E-12 respectively. For final validation of approach, we compared our predictions against optimal ZFPs reported in literature for a set of experimentally studied DNA sequences. The accuracy, as measured by the average string identity between our predictions and the optimal zinc finger protein reported in literature for a 9 bp DNA target was found to be as high as 81% for DNA targets with a consensus sequence GCNGNNGCN reported in literature. Moreover, the average string identity of our predictions for a catalogue of over 100 9 bp DNA for which the optimal zinc finger protein has been reported in literature was found to be 71%.

Conclusions

Validation with experimental data shows that our tool is capable of domain adaptation and thus scales well to datasets other than the training set with high accuracy. As synergistic binding comes the closest to the ideal mode of binding, our algorithm predicts biologically relevant results in sync with the experimental data present in the literature. While there have been disjointed attempts to approach this problem synergistically reported in literature, there is no work covering the whole sample space. Our algorithm allows designing zinc finger proteins for DNA targets of the user’s choice, opening up new frontiers in the field of targeted genome editing. This algorithm is also available as an easy to use web server, ZifNN, at http://web.iitd.ac.in/~sundar/ZifNN/.

相似文献

11.

A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

Intikhab Alam Simon J Hubbard Stephen G Oliver Magnus Rattray 《BMC genomics》2007,8(1):1-12

Background

Mimivirus isolated from A. polyphaga is the largest virus discovered so far. It is unique among all the viruses in having genes related to translation, DNA repair and replication which bear close homology to eukaryotic genes. Nevertheless, only a small fraction of the proteins (33%) encoded in this genome has been assigned a function. Furthermore, a large fraction of the unassigned protein sequences bear no sequence similarity to proteins from other genomes. These sequences are referred to as ORFans. Because of their lack of sequence similarity to other proteins, they can not be assigned putative functions using standard sequence comparison methods. As part of our genome-wide computational efforts aimed at characterizing Mimivirus ORFans, we have applied fold-recognition methods to predict the structure of these ORFans and further functions were derived based on conservation of functionally important residues in sequence-template alignments.

Results

Using fold recognition, we have identified highly confident computational 3D structural assignments for 21 Mimivirus ORFans. In addition, highly confident functional predictions for 6 of these ORFans were derived by analyzing the conservation of functional motifs between the predicted structures and proteins of known function. This analysis allowed us to classify these 6 previously unannotated ORFans into their specific protein families: carboxylesterase/thioesterase, metal-dependent deacetylase, P-loop kinases, 3-methyladenine DNA glycosylase, BTB domain and eukaryotic translation initiation factor eIF4E.

Conclusion

Using stringent fold recognition criteria we have assigned three-dimensional structures for 21 of the ORFans encoded in the Mimivirus genome. Further, based on the 3D models and an analysis of the conservation of functionally important residues and motifs, we were able to derive functional attributes for 6 of the ORFans. Our computational identification of important functional sites in these ORFans can be the basis for a subsequent experimental verification of our predictions. Further computational and experimental studies are required to elucidate the 3D structures and functions of the remaining Mimivirus ORFans. 相似文献

12.

RNA mutagenesis yields highly diverse mRNA libraries for <Emphasis Type="Italic">in vitro</Emphasis>protein evolution

George Kopsidas Rachael K Carman Emma L Stutt Anna Raicevic Anthony S Roberts Mary-Anne V Siomos Nada Dobric Luisa Pontes-Braz Greg Coia 《BMC biotechnology》2007,7(1):18

Background

In protein drug development, in vitro molecular optimization or protein maturation can be used to modify protein properties. One basic approach to protein maturation is the introduction of random DNA mutations into the target gene sequence to produce a library of variants that can be screened for the preferred protein properties. Unfortunately, the capability of this approach has been restricted by deficiencies in the methods currently available for random DNA mutagenesis and library generation. Current DNA based methodologies generally suffer from nucleotide substitution bias that preferentially mutate particular base pairs or show significant bias with respect to transitions or transversions. In this report, we describe a novel RNA-based random mutagenesis strategy that utilizes Qβ replicase to manufacture complex mRNA libraries with a mutational spectrum that is close to the ideal. 相似文献

13.

The STAR RNA binding proteins GLD-1, QKI, SAM68 and SLM-2 bind bipartite RNA motifs

André Galarneau Stéphane Richard 《BMC molecular biology》2009,10(1):47

Background

SAM68, SAM68-like mammalian protein 1 (SLM-1) and 2 (SLM-2) are members of the K homology (KH) and STAR (signal transduction activator of RNA metabolism) protein family. The function of these RNA binding proteins has been difficult to elucidate mainly because of lack of genetic data providing insights about their physiological RNA targets. In comparison, genetic studies in mice and C. elegans have provided evidence as to the physiological mRNA targets of QUAKING and GLD-1 proteins, two other members of the STAR protein family. The GLD-1 binding site is defined as a hexanucleotide sequence (NACUCA) that is found in many, but not all, physiological GLD-1 mRNA targets. Previously by using Systematic Evolution of Ligands by EXponential enrichment (SELEX), we defined the QUAKING binding site as a hexanucleotide sequence with an additional half-site (UAAY). This sequence was identified in QKI mRNA targets including the mRNAs for myelin basic proteins. 相似文献

14.

Characterization of a variant <Emphasis Type="Italic">vlhA</Emphasis> gene of <Emphasis Type="Italic">Mycoplasma synoviae</Emphasis>, strain WVU 1853, with a highly divergent haemagglutinin region

Awatef Béjaoui Khiari Ibtissem Guériri Radhia Ben Mohammed Boutheina Ben Abdelmoumen Mardassi 《BMC microbiology》2010,10(1):6

Background

In Mycoplasma synoviae, type strain WVU 1853, a single member of the haemaglutinin vlhA gene family has been previously shown to be expressed. Variants of vlhA are expressed from the same unique vlhA promoter by recruiting pseudogene sequences via site-specific recombination events, thus generating antigenic variability. Using a bacterial stock of M. synoviae WVU 1853 that had been colony purified thrice and maintained in our laboratory at low passage level, we previously identified a vlhA gene-related partial coding sequence, referred to as MS2/28.1. The E. coli-expressed product of this partial coding sequence was found to be immunodominant, suggesting that it might be expressed. 相似文献

15.

Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

Jonathan G Lees Robert W Janes 《BMC bioinformatics》2008,9(1):24

Background

A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. 相似文献

16.

A Systems Biology Approach to Transcription Factor Binding Site Prediction

Xiang Zhou Pavel Sumazin Presha Rajbhandari Andrea Califano 《PloS one》2010,5(3)

相似文献

17.

Customised fragments libraries for protein structure prediction based on structural class annotations

Jad Abbass Jean-Christophe Nebel 《BMC bioinformatics》2015,16(1)

Background

Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.

Results

Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.

Conclusions

Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users. 相似文献

18.

An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem

Alena?Shmygelska Holger?H?Hoos Email author 《BMC bioinformatics》2005,6(1):30

Background

The protein folding problem is a fundamental problems in computational molecular biology and biochemical physics. Various optimisation methods have been applied to formulations of the ab-initio folding problem that are based on reduced models of protein structure, including Monte Carlo methods, Evolutionary Algorithms, Tabu Search and hybrid approaches. In our work, we have introduced an ant colony optimisation (ACO) algorithm to address the non-deterministic polynomial-time hard (NP-hard) combinatorial problem of predicting a protein's conformation from its amino acid sequence under a widely studied, conceptually simple model – the 2-dimensional (2D) and 3-dimensional (3D) hydrophobic-polar (HP) model. 相似文献

19.

Comparison of protein structures by growing neighborhood alignments

Sourangshu Bhattacharya Chiranjib Bhattacharyya Nagasuma R Chandra 《BMC bioinformatics》2007,8(1):77

Background

Design of protein structure comparison algorithm is an important research issue, having far reaching implications. In this article, we describe a protein structure comparison scheme, which is capable of detecting correct alignments even in difficult cases, e.g. non-topological similarities. The proposed method computes protein structure alignments by comparing, small substructures, called neighborhoods. Two different types of neighborhoods, sequence and structure, are defined, and two algorithms arising out of the scheme are detailed. A new method for computing equivalences having non-topological similarities from pairwise similarity score is described. A novel and fast technique for comparing sequence neighborhoods is also developed. 相似文献

20.

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Ian Walsh Davide Baù Alberto JM Martin Catherine Mooney Alessandro Vullo Gianluca Pollastri 《BMC structural biology》2009,9(1):5-20

Background

Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results

We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C_αtrace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C_αtraces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion

Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/. 相似文献