首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hering JA  Innocent PR  Haris PI 《Proteomics》2003,3(8):1464-1475
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins.  相似文献   

2.
丝氨酸蛋白酶超家族分子结构进化研究   总被引:5,自引:0,他引:5  
采用刚体结构比较法进行蛋白质的结构比较,根据结构比较分数构建分子进化树, 研究丝氨酸蛋白酶超家族分子的进化规律。对分子进化树进行了一些初步分析,得到了一些有意义的结果。根据蛋白质的进化,可以比较精确的确定某物种的进化地位,对于物种的分类具有重要意义。通过对超家族分子进化的研究可以了解蛋白质超家族不同蛋白质之间的亲缘关系和蛋白质之间的进化差异,对于蛋白质工程分子设计提供帮助,对蛋白质结构预测具有一定意义  相似文献   

3.
Kurgan LA  Zhang T  Zhang H  Shen S  Ruan J 《Amino acids》2008,35(3):551-564
Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSAsc model) or assigned based on their tertiary structure (SSAsc). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSAsc and PSSAsc can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSAsc and PSSAsc equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSAsc can be used to perform the automated assignment of structural classes based on the sequences.  相似文献   

4.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

5.
Classification is central to many studies of protein structure, function, and evolution. This article presents a strategy for classifying protein three-dimensional structures. Methods for and issues related to secondary structure, domain, and class assignment are discussed, in addition to methods for the comparison of protein three-dimensional structures. Strategies for assigning protein domains to particular folds and homologous superfamilies are then described in the context of the currently available classification schemes. Two examples (adenylate cyclase/DNA polymerase and glycogen phosphorylase/β-glucosyltransferase) are presented to illustrate problems associated with protein classification.  相似文献   

6.
Benthic marine cyanobacteria are known for their prolific biosynthetic capacities to produce structurally diverse secondary metabolites with biomedical application and their ability to form cyanobacterial harmful algal blooms. In an effort to provide taxonomic clarity to better guide future natural product drug discovery investigations and harmful algal bloom monitoring, this study investigated the taxonomy of tropical and subtropical natural product-producing marine cyanobacteria on the basis of their evolutionary relatedness. Our phylogenetic inferences of marine cyanobacterial strains responsible for over 100 bioactive secondary metabolites revealed an uneven taxonomic distribution, with a few groups being responsible for the vast majority of these molecules. Our data also suggest a high degree of novel biodiversity among natural product-producing strains that was previously overlooked by traditional morphology-based taxonomic approaches. This unrecognized biodiversity is primarily due to a lack of proper classification systems since the taxonomy of tropical and subtropical, benthic marine cyanobacteria has only recently been analyzed by phylogenetic methods. This evolutionary study provides a framework for a more robust classification system to better understand the taxonomy of tropical and subtropical marine cyanobacteria and the distribution of natural products in marine cyanobacteria.  相似文献   

7.
Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.  相似文献   

8.
Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree'' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.  相似文献   

9.
10.
Folding rates of small single-domain proteins that fold through simple two-state kinetics can be estimated from details of the three-dimensional protein structure. Previously, predictions of secondary structure had been exploited to predict folding rates from sequence. Here, we estimate two-state folding rates from predictions of internal residue-residue contacts in proteins of unknown structure. Our estimate is based on the correlation between the folding rate and the number of predicted long-range contacts normalized by the square of the protein length. It is well known that long-range order derived from known structures correlates with folding rates. The surprise was that estimates based on very noisy contact predictions were almost as accurate as the estimates based on known contacts. On average, our estimates were similar to those previously published from secondary structure predictions. The combination of these methods that exploit different sources of information improved performance. It appeared that the combined method reliably distinguished fast from slow two-state folders.  相似文献   

11.
Replacement of conserved amino acid residues during evolution of proteins can lead to divergence and the formation of new families with novel functions, but is often deleterious to both protein structure and function. Using the WW domain, we experimentally examined whether and to what degree second-site mutations can compensate for the reduction of function and loss of structure that accompany substitution of a strictly conserved amino acid residue. The W17F mutant of the WW domain, with substitution of the most strictly conserved Trp residue, is known to lack a specific three-dimensional structure and shows reduced binding affinity in comparison to the wild type. To obtain second-site revertants, we performed a selection experiment based on the proline-rich peptide (PY ligand) binding affinity using the W17F mutant as the initial sequence. After selection by ribosome display, we were able to select revertants that exhibited a maximum ninefold higher affinity to the PY ligand than the W17F mutant and showed an even better affinity than the wild type. In addition, we found that the functional restoration resulted in increased binding specificity in selected revertants, and the structures were more compact, with increased amounts of secondary structure, in comparison to the W17F mutant. Our results suggest that the defective structure and function of the proteins caused by mutations in highly conserved residues occurring through divergent evolution not only can be restored but can be further improved by compensatory mutations.  相似文献   

12.
卫矛科16种藤本与乔木次生木质部比较解剖学研究   总被引:2,自引:0,他引:2  
本文对卫矛科(Celastraceae)16种生长于不同生境的藤本和乔木次生木质部的结构,进行比较解剖学的观察,比较它们之间的异同。本科中南蛇藤属(Celastrus L.)独子藤属(Mo-nocelastrus Wang et Tang)和雷公藤属(Tripterygium Hoook.f.)为典型藤本结构,次生木质部特化水平较高;假卫矛属(Microstropis Wall.et Mcissn)、十齿花属(Dipentoden Dunn)和盾柱属(Pleurostylia Wight.et Arn.)为乔木,次生木质部结构特化水平较低;卫矛属(Eu-onymus L.)中既有藤本,又有乔木,其藤本结构与乔木相似,表现出一系列原始性和保守性。本研究初步探讨藤本与乔木的木质部结构、生活习性与生境之间的关系,为进一步研究卫矛科的系统演化提供一些解剖学的证据。  相似文献   

13.
Viruses are the most abundant life form and infect practically all organisms. Consequently, these obligate parasites are a major cause of human suffering and economic loss. Rossmann‐like fold is the most populated fold among α/β‐folds in the Protein Data Bank and proteins containing Rossmann‐like fold constitute 22% of all known proteins 3D structures. Thus, analysis of viral proteins containing Rossmann‐like domains could provide an understanding of viral biology and evolution as well as could propose possible targets for antiviral therapy. We provide functional and evolutionary analysis of viral proteins containing a Rossmann‐like fold found in the evolutionary classification of protein domains (ECOD) database developed in our lab. We identified 81 protein families of bacterial, archeal, and eukaryotic viruses in light of their evolution‐based ECOD classification and Pfam taxonomy. We defined their functional significance using enzymatic EC number assignments as well as domain‐level family annotations.  相似文献   

14.
MOTIVATION: The Bayesian network approach is a framework which combines graphical representation and probability theory, which includes, as a special case, hidden Markov models. Hidden Markov models trained on amino acid sequence or secondary structure data alone have been shown to have potential for addressing the problem of protein fold and superfamily classification. RESULTS: This paper describes a novel implementation of a Bayesian network which simultaneously learns amino acid sequence, secondary structure and residue accessibility for proteins of known three-dimensional structure. An awareness of the errors inherent in predicted secondary structure may be incorporated into the model by means of a confusion matrix. Training and validation data have been derived for a number of protein superfamilies from the Structural Classification of Proteins (SCOP) database. Cross validation results using posterior probability classification demonstrate that the Bayesian network performs better in classifying proteins of known structural superfamily than a hidden Markov model trained on amino acid sequences alone.  相似文献   

15.
In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.  相似文献   

16.
The secondary structure of an RNA molecule is of great importance and possesses influence, e.g., on the interaction of tRNA molecules with proteins or on the stabilization of mRNA molecules. The classification of secondary structures by means of their order proved useful with respect to numerous applications. In 1978, Waterman, who gave the first precise formal framework for the topic, suggested to determine the number a(n,p) of secondary structures of size n and given order p. Since then, no satisfactory result has been found. Based on an observation due to Viennot et al., we will derive generating functions for the secondary structures of order p from generating functions for binary tree structures with Horton-Strahler number p. These generating functions enable us to compute a precise asymptotic equivalent for a(n,p). Furthermore, we will determine the related number of structures when the number of unpaired bases shows up as an additional parameter. Our approach proves to be general enough to compute the average order of a secondary structure together with all the r-th moments and to enumerate substructures such as hairpins or bulges in dependence on the order of the secondary structures considered.  相似文献   

17.
An algorithm for modeling the evolution of the regulatory signals involving the interaction with RNA secondary structure is proposed. The algorithm implies that the species phylogenetic tree is known and is based on the assumption that the considered signals have a conserved secondary structure. The input data are the extant primary structure of a signal for all leaves of the phylogenetic tree; the algorithm computes the signal primary and secondary structures at all the nodes. Concurrently, the algorithm constructs a multiple alignment of the extant (in leaves) sites of a regulatory signal taking into account its secondary structure. The results of successful testing of the algorithm for three main types of attenuation regulation in bacteria—classic attenuation (threonine and leucine biosyntheses in Gammaproteobacteria), T-box (in Actinobacteria), and RFN-mediated (in Eubacteria) regulations—are described.  相似文献   

18.
Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the prediction of glycan biomarkers.  相似文献   

19.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

20.

Background  

Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号