首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We describe a general, modular method for developing protocols to identify the amino acid residues that most likely define the division of a protein superfamily into two subsets. As one possibility, we use PROBE to gather superfamily members and perform an ungapped alignment. We then use a modified BLOSUM62 substitution matrix to determine the discriminating power of each column of aligned residues. The overall method is particularly useful for predicting amino acids responsible for substrate or binding specificity when no structures are available. We apply our method to three pairs of protein classes in three different superfamilies, and present our results, some of which have been experimentally verified. This approach may accelerate the elucidation of enzymic substrate specificity, which is critical for both mechanistic insights into biocatalysis and ultimate application.  相似文献   

2.
Clustal W—蛋白质与核酸序列分析软件   总被引:2,自引:1,他引:2  
蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。  相似文献   

3.
一个新的核酸序列比对算法及其在序列全局比对中的应用   总被引:1,自引:0,他引:1  
目前在序列比对中所广泛使用的动态规划算法,虽然能达到最优比对结果,但却由于具有高计算复杂度O(N_2)而极大地降低了计算效率。将多阶段动态规划决策算法用于两两序列比对并用Visual BASIC编程实现,结果发现该新算法在将计算复杂度减小到O(N)的同时,也能够获得较为理想的计算精度,预期将在序列全局比对中起重要作用。  相似文献   

4.
A novel method has been developed for acquiring the correct alignment of a query sequence against remotely homologous proteins by extracting structural information from profiles of multiple structure alignment. A systematic search algorithm combined with a group of score functions based on sequence information and structural information has been introduced in this procedure. A limited number of top solutions (15,000) with high scores were selected as candidates for further examination. On a test-set comprising 301 proteins from 75 protein families with sequence identity less than 30%, the proportion of proteins with completely correct alignment as first candidate was improved to 39.8% by our method, whereas the typical performance of existing sequence-based alignment methods was only between 16.1% and 22.7%. Furthermore, multiple candidates for possible alignment were provided in our approach, which dramatically increased the possibility of finding correct alignment, such that completely correct alignments were found amongst the top-ranked 1000 candidates in 88.3% of the proteins. With the assistance of a sequence database, completely correct alignment solutions were achieved amongst the top 1000 candidates in 94.3% of the proteins. From such a limited number of candidates, it would become possible to identify more correct alignment using a more time-consuming but more powerful method with more detailed structural information, such as side-chain packing and energy minimization, etc. The results indicate that the novel alignment strategy could be helpful for extending the application of highly reliable methods for fold identification and homology modeling to a huge number of homologous proteins of low sequence similarity. Details of the methods, together with the results and implications for future development are presented.  相似文献   

5.
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

6.
7.
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.  相似文献   

8.
Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.  相似文献   

9.
? Premise of the study: The redundancies in expressed sequence tags (ESTs) in the National Center for Biotechnology Information sequence database were used to identify and develop polymorphic simple sequence repeat (SSR) markers for pepper (Capsicum annuum). ? Methods and Results: Sixty-eight polymorphic SSR loci were identified in the contigs (containing redundant ESTs) generated by assembling 118060 pepper ESTs from the public sequence database. Thirty-three SSR markers exhibited polymorphism among 31 pepper varieties, with alleles per SSR marker ranging from two to six. The mean observed and expected heterozygosity were 0.28 and 0.39, respectively. There were 18 SSR markers with a motif repeat number of less than five, accounting for 55% of the total. ? Conclusions: We demonstrated the value of mining the redundant sequences in public sequence databases for the development of polymorphic SSR markers, which can be used for marker-assisted breeding in pepper.  相似文献   

10.
Banerjee M  Roy D  Bhattacharyya B  Basu G 《FEBS letters》2007,581(26):5019-5023
Colchicine-tubulin interaction, responsible for the disruption of microtubule formation, has immense pharmacological importance but is poorly understood in terms of its biological significance. The interaction is characterized by a marked higher affinity of colchicine for animal tubulins compared to tubulins from plants, fungi and protists. From an analysis of tubulin sequences and colchicine-tubulin crystal structure, we propose that Pro268beta and Ala248beta (270beta and 250beta in the crystal structure 1SA0) in animal tubulin are crucial for the observed differential binding. We also suggest that mediated by the binding of endogenous molecules to the colchicine-binding site, microtubule assembly in eukaryotes may be modulated in a family specific manner.  相似文献   

11.
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.  相似文献   

12.
The traditional approach to bioinformatics analyses relies onindependent task-specific services and applications, using differentinput and output formats, often idiosyncratic, and frequentlynot designed to inter-operate. In general, such analyses wereperformed by experts who manually verified the results obtainedat each step in the process. Today, the amount of bioinformaticsinformation continuously being produced means that handlingthe various applications used to study this information presentsa major data management and analysis challenge to researchers.It is now impossible to manually analyse all this informationand new approaches are needed that are capable of processingthe large-scale heterogeneous data in order to extract the pertinentinformation. We review the recent use of integrated expert systemsaimed at providing more efficient knowledge extraction for bioinformaticsresearch. A general methodology for building knowledge-basedexpert systems is described, focusing on the unstructured informationmanagement architecture, UIMA, which provides facilities forboth data and process management. A case study involving a multiplealignment expert system prototype called AlexSys is also presented.   相似文献   

13.
The co-variance of amino acid positions within a multiple alignment of 294 protein kinases from mammals, plants, and bacteria was studied. Applying mutual information (MI), characteristic amino acid sites have been identified markedly discriminating the different organisms. The relation of surface accessibility of these sites in the 3D structure of a kinase and their MI content is studied. We extended the method to score a predicted phosphorylation site of this highly conserved catalytic protein kinase region. Based on this score mammalian and plant protein kinases were grouped together apart from the bacterial kinases. Thus, the presented method allows us to analyse putative phosphorylation sites in the context of their organism-specific origin.  相似文献   

14.
SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm.  相似文献   

15.
16.
Das R  Gerstein M 《Proteins》2004,55(2):455-463
We have introduced a method to identify functional shifts in protein families. Our method is based on the calculation of an active-site conservation ratio, which we call the "ASC ratio." For a structurally based alignment of a protein family, this ratio is the average sequence similarity of the active-site region compared to the full-length protein. The active-site region is defined as all the residues within a certain radius of the known functionally important groups. Using our method, we have analyzed enzymes of central metabolism from a large number of genomes (35). We found that for most of the enzymes, the active-site region is more highly conserved than the full-length sequence. However, for three tricarboxylic acid (TCA)-cycle enzymes, active-site sequences are considerably more diverged (than full-length ones). In particular, we were able to identify in six pathogens a novel isocitrate dehydrogenase that has very low sequence similarity around the active site. Detailed sequence-structure analysis indicates that while the active-site structure of isocitrate dehydrogenase is most likely similar between pathogens and nonpathogens, the unusual sequence divergence could result from an extra domain added at the N-terminus. This domain has a leucine-rich motif similar one in the Yersinia pestis cytotoxin and may therefore confer additional pathogenic functions.  相似文献   

17.
In this paper, we present an updated classification of the ubiquitous MIP (Major Intrinsic Protein) family proteins, including 153 fully or partially sequenced members available in public databases. Presently, about 30 of these proteins have been functionally characterized, exhibiting essentially two distinct types of channel properties: (1) specific water transport by the aquaporins, and (2) small neutral solutes transport, such as glycerol by the glycerol facilitators. Sequence alignments were used to predict amino acids and motifs discriminant in channel specificity. The protein sequences were also analyzed using statistical tools (comparisons of means and correspondence analysis). Five key positions were clearly identified where the residues are specific for each functional subgroup and exhibit high dissimilar physico-chemical properties. Moreover, we have found that the putative channels for small neutral solutes clearly differ from the aquaporins by the amino acid content and the length of predicted loop regions, suggesting a substrate filter function for these loops. From these results, we propose a signature pattern for water transport.  相似文献   

18.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

19.
Crystal and NMR structures of helical cytokines--interleukin-4 (IL-4), granulocyte-macrophage colony-stimulating factor (GM-CSF), and interleukin-2 (IL-2)--have been compared. Root mean square deviations in the C alpha coordinates for the conserved regions of the helices were 1-2 A between different cytokines, about twice the differences observed for independently determined crystal and solution structures of IL-4. Considerable similarity in amino acid sequence in the areas expected to interact with the receptors was detected, and the available mutagenesis data for these cytokines were correlated with structure conservation. Models of cytokine-receptor interactions were postulated for IL-4 based on its structure as well as on the published structure of human growth hormone interacting with its receptors (de Vos, A.M., Ultsch, M., & Kossiakoff, A.A., 1992, Science 255, 306-312). Patches of positively charged residues on the surfaces of helices C and D of IL-4 may be responsible for the interactions with the negatively charged residues found in the complementary parts of the IL-4 receptors.  相似文献   

20.
 The electrophoretic patterns of dehydrins extracted from mature seeds of a range of pea (Pisum) species revealed extensive variation in dehydrin polypeptide mobility. Variation was also observed among lines of P. sativum. Crosses between lines with different dehydrin electrophoretic patterns produced F1 seeds with additive patterns, and segregation in the F2 generation was consistent with a 1 : 2 : 1 ratio, indicating allelic variation at each of two dehydrin loci (Dhn2, Dhn3). Genetic linkage was observed between Dhn2 and Dhn3, and the segregation ratios indicated preferential transmission of one allele at the Dhn3 locus. Dehydrin cDNA clones were characterised that encoded the allelic variants at Dhn2 and Dhn3. Their deduced amino-acid sequences were very similar to each other as well as to the product of the Dhn1 locus reported previously. Comparisons were made between the sequences of allelic variants at a single locus, and between the products of different loci. Differences in the electrophoretic mobilities between allelic variants at Dhn2 and Dhn3 were associated with differences in polypeptide length resulting principally from tandem duplications of 21 (Dhn2) or 24 (Dhn3) amino-acid residues. These duplications accounted for much of the difference in length between dehydrins encoded by the different loci. The conserved core of one of the duplicated regions varied in copy number, and small insertions/deletions of amino acids near this core also contributed to length variation both between allelic forms and between loci. Dehydrins possess characteristic highly conserved amino-acid sequence motifs, yet vary considerably in length. Mechanisms involving sequence duplication appear to be responsible for generating the length differences observed between allelic variants as well as between the products of different loci. Received: 12 June 1997 / Accepted: 29 October 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号