首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phylogenetic studies of ciliates are mainly based on the primary structure information of the nuclear genes. Some regions of the small subunit ribosomal RNA (SSU‐rRNA) gene have distinctive secondary structures, which have demonstrated value as phylogenetic/taxonomic characters. In the current work, we predict the secondary structures of four variable regions (V2, V4, V7 and V9) in the SSU‐rRNA gene of 45 urostylids. Structure comparisons indicate that the V4 region is the most effective in revealing interspecific relationships, while the V9 region appears suitable at the family level or higher. The V2 region also offers some taxonomic information, but is too conserved to reflect phylogenetic relationships at the family or lower level, at least for urostylids. The V7 region is the least informative. We constructed several phylogenetic trees, based on the primary sequence alignment and based on an improved alignment according to the secondary structures. The results suggest that including secondary structure information in phylogenetic analyses provides additional insights into phylogenetic relationships. Using urostylid ciliates as an example, we show that secondary structure information results in a better understanding of their relationships, for example generic relationships within the family Pseudokeronopsidae.  相似文献   

2.
Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.  相似文献   

3.
MOTIVATION: The success of the consensus approach to the protein structure prediction problem has led to development of several different consensus methods. Most of them only rely on a structural comparison of a number of different models. However, there are other types of information that might be useful such as the score from the server and structural evaluation. RESULTS: Pcons5 is a new and improved version of the consensus predictor Pcons. Pcons5 integrates information from three different sources: the consensus analysis, structural evaluation and the score from the fold recognition servers. We show that Pcons5 is better than the previous version of Pcons and that it performs better than using only the consensus analysis. In addition, we also present a version of Pmodeller based on Pcons5, which performs significantly better than Pcons5. AVAILABILITY: Pcons5 is the first Pcons version available as a standalone program from http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers.  相似文献   

4.
5.
Protein structure alignment using a genetic algorithm   总被引:3,自引:0,他引:3  
Szustakowski JD  Weng Z 《Proteins》2000,38(4):428-440
We have developed a novel, fully automatic method for aligning the three-dimensional structures of two proteins. The basic approach is to first align the proteins' secondary structure elements and then extend the alignment to include any equivalent residues found in loops or turns. The initial secondary structure element alignment is determined by a genetic algorithm. After refinement of the secondary structure element alignment, the protein backbones are superposed and a search is performed to identify any additional equivalent residues in a convergent process. Alignments are evaluated using intramolecular distance matrices. Alignments can be performed with or without sequential connectivity constraints. We have applied the method to proteins from several well-studied families: globins, immunoglobulins, serine proteases, dihydrofolate reductases, and DNA methyltransferases. Agreement with manually curated alignments is excellent. A web-based server and additional supporting information are available at http://engpub1.bu.edu/-josephs.  相似文献   

6.
MOTIVATION: Modern comparative genomics does not restrict to sequence but involves the comparison of metabolic pathways or protein-protein interactions as well. Central in this approach is the concept of neighbourhood between entities (genes, proteins, chemical compounds). Therefore there is a growing need for new methods aiming at merging the connectivity information from different biological sources in order to infer functional coupling. RESULTS: We present a generic approach to merge the information from two or more graphs representing biological data. The method is based on two concepts. The first one, the correspondence multigraph, precisely defines how correspondence is performed between the primary data-graphs. The second one, the common connected components, defines which property of the multigraph is searched for. Although this problem has already been informally stated in the past few years, we give here a formal and general statement together with an exact algorithm to solve it. AVAILABILITY: The algorithm presented in this paper has been implemented in C. Source code is freely available for download at: http://www.inrialpes.fr/helix/people/viari/cccpart.  相似文献   

7.
8.
Nawrocki, A. M., Schuchert, P. & Cartwright, P. (2009). Phylogenetics and evolution of Capitata (Cnidaria: Hydrozoa), and the systematics of Corynidae.—Zoologica Scripta, 39, 290–304. Generic‐ and family level classifications in Hydrozoa have been historically problematic due to limited morphological characters for phylogenetic analyses and thus taxonomy, as well as disagreement over the relative importance of polyp vs. medusa characters. Within the recently redefined suborder Capitata (Cnidaria: Hydrozoa: Hydroidolina), which includes 15 families and almost 200 valid species, family level relationships based on morphology alone have proven elusive, and there exist numerous conflicting proposals for the relationships of component species. Relationships within the speciose capitate family Corynidae also remain uncertain, for similar reasons. Here, we combine mitochondrial 16S, and nuclear 18S and 28S sequences from capitate hydrozoans representing 12 of the 15 valid capitate families, to examine family level relationships within Capitata. We further sample densely within Corynidae to investigate the validity of several generic‐level classification schemes that rely heavily on the presence/absence of a medusa, a character that has been questioned for its utility in generic‐level classification. We recover largely congruent tree topologies from all three markers, with 28S and the combined dataset providing the most resolution. Our study confirms the monophyly of the redefined Capitata, and provides resolution for family level relationships of most sampled families within the suborder. These analyses reveal Corynidae as paraphyletic and suggest that the limits of the family have been underestimated. Our results contradict all available generic‐level classification schemes for Corynidae. As classification schemes for this family have been largely based on reproductive characters such as the presence/absence of a medusa, our results suggest that these are not valid generic‐level characters for the clade. We suggest a new taxonomic structure for the lineage that includes all members of the newly redefined Corynidae, based on molecular and morphological synapomorphies for recovered clades within the group.  相似文献   

9.
Cardiolipins (CL) represent unique phospholipids of bacteria and eukaryotic mitochondria with four acyl chains and two phosphate groups that have been implicated in numerous functions from energy metabolism to apoptosis. Many proteins are known to interact with CL, and several cocrystal structures of protein-CL complexes exist. In this work, we describe the collection of the first systematic and, to the best of our knowledge, the comprehensive gold standard data set of all known CL-binding proteins. There are 62 proteins in this data set, 21 of which have nonredundant crystal structures with bound CL molecules available. Using binding patch analysis of amino acid frequencies, secondary structures and loop supersecondary structures considering phosphate and acyl chain binding regions together and separately, we gained a detailed understanding of the general structural and dynamic features involved in CL binding to proteins. Exhaustive docking of CL to all known structures of proteins experimentally shown to interact with CL demonstrated the validity of the docking approach, and provides a rich source of information for experimentalists who may wish to validate predictions.  相似文献   

10.
MOTIVATION: Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS: We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.  相似文献   

11.
MOTIVATION: The prediction of beta-turns is an important element of protein secondary structure prediction. Recently, a highly accurate neural network based method Betatpred2 has been developed for predicting beta-turns in proteins using position-specific scoring matrices (PSSM) generated by PSI-BLAST and secondary structure information predicted by PSIPRED. However, the major limitation of Betatpred2 is that it predicts only beta-turn and non-beta-turn residues and does not provide any information of different beta-turn types. Thus, there is a need to predict beta-turn types using an approach based on multiple sequence alignment, which will be useful in overall tertiary structure prediction. RESULTS: In the present work, a method has been developed for the prediction of beta-turn types I, II, IV and VIII. For each turn type, two consecutive feed-forward back-propagation networks with a single hidden layer have been used where the first sequence-to-structure network has been trained on single sequences as well as on PSI-BLAST PSSM. The output from the first network along with PSIPRED predicted secondary structure has been used as input for the second-level structure-to-structure network. The networks have been trained and tested on a non-homologous dataset of 426 proteins chains by 7-fold cross-validation. It has been observed that the prediction performance for each turn type is improved significantly by using multiple sequence alignment. The performance has been further improved by using a second level structure-to-structure network and PSIPRED predicted secondary structure information. It has been observed that Type I and II beta-turns have better prediction performance than Type IV and VIII beta-turns. The final network yields an overall accuracy of 74.5, 93.5, 67.9 and 96.5% with MCC values of 0.29, 0.29, 0.23 and 0.02 for Type I, II, IV and VIII beta-turns, respectively, and is better than random prediction. AVAILABILITY: A web server for prediction of beta-turn types I, II, IV and VIII based on above approach is available at http://www.imtech.res.in/raghava/betaturns/ and http://bioinformatics.uams.edu/mirror/betaturns/ (mirror site).  相似文献   

12.
Nucleic acids are particularly amenable to structural characterization using chemical and enzymatic probes. Each individual structure mapping experiment reveals specific information about the structure and/or dynamics of the nucleic acid. Currently, there is no simple approach for making these data publically available in a standardized format. We therefore developed a standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments, or SNRNASMs. We propose a schema for sharing nucleic acid chemical probing data that uses generic public servers for storing, retrieving, and searching the data. We have also developed a consistent nomenclature (ontology) within the Ontology of Biomedical Investigations (OBI), which provides unique identifiers (termed persistent URLs, or PURLs) for classifying the data. Links to standardized data sets shared using our proposed format along with a tutorial and links to templates can be found at http://snrnasm.bio.unc.edu.  相似文献   

13.
Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.  相似文献   

14.
Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of differentially expressed genes. An automatic ontological analysis approach has been recently proposed to help with the biological interpretation of such results. Currently, this approach is the de facto standard for the secondary analysis of high throughput experiments and a large number of tools have been developed for this purpose. We present a detailed comparison of 14 such tools using the following criteria: scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data. This detailed analysis of the capabilities of these tools will help researchers choose the most appropriate tool for a given type of analysis. More importantly, in spite of the fact that this type of analysis has been generally adopted, this approach has several important intrinsic drawbacks. These drawbacks are associated with all tools discussed and represent conceptual limitations of the current state-of-the-art in ontological analysis. We propose these as challenges for the next generation of secondary data analysis tools.  相似文献   

15.
Multiple sequence alignment was performed against eight proteases from the Flaviviridae family using ClustalW to illustrate conserved domains. Two sets of prediction approaches were applied and the results compared. Firstly, secondary structure prediction was performed using available structure prediction servers. The second approach made use of the information on the secondary structures extracted from structure prediction servers, threading techniques and DSSP database of some of the templates used in the threading techniques. Consensus on the one-dimensional secondary structure of Den2 protease was obtained from each approach and evaluated against data from the recently crystallised Den2 NS2B/NS3 obtained from the Protein Data Bank (PDB). Results indicated the second approach to show higher accuracy compared to the use of prediction servers only. Thus, it is plausible that this approach is applicable to the initial stage of structural studies of proteins with low amino acid sequence homology against other available proteins in the PDB.  相似文献   

16.
Ribosomes are the only cell organelles occurring in all organisms. E. coli ribosomes, which are the best characterized particles, consist of three RNAs and 53 proteins. All components have been isolated and characterized by chemical, physical and immunological methods. The primary structures of the RNAs and of all the proteins are known. Information about the secondary structure of the proteins derives from circular dichroism measurements and from secondary structure prediction methods. The tertiary structure is being studied by limited proteolysis, proton magnetic resonance and crystallization followed by X-ray analysis. Various methods are being used to elucidate the architecture of the ribosomal particle: three-dimensional image reconstruction of crystals of bacterial ribosomes and/or their subunits; immune electron microscopy; neutron scattering; protein-protein, protein-RNA and RNA-RNA crosslinking; total reconstitution of ribosomal subunits. The results from these studies yield valuable information on the architecture of the ribosomal particle. Many mutants have been isolated in which one or a few ribosomal proteins are altered or even deleted. The genetic and biochemical characterization of these mutants allows conclusions about the importance of these proteins for the function of the ribosome. Ribosomal proteins from various prokaryotic and eukaryotic species have been compared by two-dimensional gel electrophoresis, immunological methods, reconstitution and amino acid sequence analysis. These studies show a strong homology among prokaryotic ribosomal proteins but only a weak homology between proteins from prokaryotic and eukaryotic ribosomes. Comparison of the primary and secondary structures of the ribosomal RNAs from various organisms shows that the secondary structure of the RNA molecules has been strongly conserved throughout evolution.  相似文献   

17.
We develop a new approach to the design of neural networks, which utilizes a collaborative framework of knowledge-driven experience. In contrast to the "standard" way of developing neural networks, which explicitly exploits experimental data, this approach incorporates a mechanism of knowledge-driven experience. The essence of the proposed scheme of learning is to take advantage of the parameters (connections) of neural networks built in the past for the same phenomenon (which might also exhibit some variability over time or space) for which are interested to construct the network on a basis of currently available data. We establish a conceptual and algorithmic framework to reconcile these two essential sources of information (data and knowledge) in the process of the development of the network. To make a presentation more focused and come up with a detailed quantification of the resulting architecture, we concentrate on the experience-based design of radial basis function neural networks (RBFNNs). We introduce several performance indexes to quantify an effect of utilization of the knowledge residing within the connections of the networks and establish an optimal level of their use. Experimental results are presented for low-dimensional synthetic data and selected datasets available at the Machine Learning Repository.  相似文献   

18.
The genetic algorithm exploits the principles of natural evolution. Solution trials are evolved by mutation, recombination and selection until they achieve near optimal solutions [1].Our own approach has now been developed [2] after a general overview on the application potential for protein structure analysis [3] to a tool to delineate the three-dimensional topology for the mainchain of small proteins [4], no matter whether they are largely helical, are mixed or -strand rich [5].Results on several protein examples for these different modelling tasks are presented and compared with the experimentally observed structures (RMSDs are around 4.5-5.5 Å). To start a modelling trial only the protein sequence and knowledge of its secondary structure is required. The fittest folds obtained after the evolution at the end of the simulations yield the three dimensional models of the fold. Current limitations are protein size (generally less than 100 aminoacids), number of secondary structure elements [7-8] and irregular topologies (e.g. ferridoxins).Further, preliminary results from current simulations are illustrated. We now want to apply simple experimental or other information, which is available long before the three-dimensional structure of the protein becomes known, to refine the modelling of the protein fold and tackle also more difficult modelling examples by our tool.Supplementary material to this paper is available in electronic form at http://dx.doi.org/10.1007/s0089460020304  相似文献   

19.

Background

Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor.

Results

Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction.

Conclusions

Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.  相似文献   

20.
MOTIVATION: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. AVAILABILITY: The PROMALS web server is available at: http://prodata.swmed.edu/promals/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号