首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
We present an algorithm to detect protein sub-structural motifs from primary sequence. The input to the algorithm is a set of aligned multiple protein sequences. It uses wavelet transforms to decompose protein sequences represented numerically by different indices (such as polarity, accessible surface area or electron-ion integration potentials of the amino acids). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected to be observable from the wavelet spectrum. The decomposed signals are then up-sampled and similarity search techniques are used to identify similar regions across all the proteins at multiple scales. Results indicate that wavelet transform techniques are a promising approach for rapid motif detection.  相似文献   

2.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.  相似文献   

3.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. Supported by the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) and the Nonlinear Project (973) of the NSM  相似文献   

4.
Conserved segments in DNA or protein sequences are strong candidates for functional elements and thus appropriate methods for computing them need to be developed and compared. We describe five methods and computer programs for finding highly conserved blocks within previously computed multiple alignments, primarily for DNA sequences. Two of the methods are already in common use; these are based on good column agreement and high information content. Three additional methods find blocks with minimal evolutionary change, blocks that differ in at most k positions per row from a known center sequence and blocks that differ in at most k positions per row from a center sequence that is unknown a priori. The center sequence in the latter two methods is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons. Although all five methods have quite different theoretical underpinnings, they produce rather similar results on these data sets when their parameters are adjusted to best approximate the experimental data. The optimal parameters for the method based on information content varied little for different regulatory regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed mismatches per row have simple parameters whose values can be chosen a priori and thus they may be more useful than the other methods when calibration against known functional sites is not available.  相似文献   

5.
CTL constitute an essential part of the immune response against the HIV. CTL recognize peptides derived from viral proteins together with the MHC class I molecules on the surface of infected cells. The CTL response could be important in prevention or control of infection with HIV by destroying virus-producing cells. In this study we have attempted to identify peptide epitopes recognized by HIV-specific CTL. Using synthetic peptides, we have identified six conserved peptidic epitopes on the gp120 envelope glycoprotein recognized by polyclonal human CTL in association with HLA-A2 class I transplantation Ag. These results were confirmed by two approaches: i) blocking of CTL activities with antibodies specific for three of these conserved peptides; and ii) construction of doubly transfected P815-A2 target cells, using deletions of the HIV env gene. Vaccination or immunotherapy in HLA-A2 individuals can thus be considered using highly conserved HIV env peptidic sequences.  相似文献   

6.
A biologically realistic method was used to simulate evolutionary trees. The method uses a real DNA coding sequence as the starting point, simulates mutation according to the mutational spectrum of Escherichia coli-including base substitutions, insertions, and deletions-and separates the processes of mutation and selection. Trees of 8, 16, 32, and 64 taxa were simulated with average branch lengths of 50, 100, 150, 200, and 250 changes per branch. The resulting sequences were aligned with ClustalX, and trees were estimated by Neighbor Joining, Parsimony, Maximum Likelihood, and Bayesian methods from both DNA sequences and the corresponding protein sequences. The estimated trees were compared with the true trees, and both topological and branch length accuracies were scored. Over the variety of conditions tested, Bayesian trees estimated from DNA sequences that had been aligned according to the alignment of the corresponding protein sequences were the most accurate, followed by Maximum Likelihood trees estimated from DNA sequences and Parsimony trees estimated from protein sequences.  相似文献   

7.
A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch.  相似文献   

8.
Pattern recognition in several sequences: Consensus and alignment   总被引:12,自引:0,他引:12  
The comparison of several sequences is central to many problems of molecular biology. Finding consensus patterns that define genetic control regions or that determine structural or functional themes are examples of these problems. Previously proposed methods, such as dynamic programming, are not adequate for solving problems of realistic size. This paper gives a new and practical solution for finding unknown patterns that occur imperfectly above a preset frequency. Algorithms for finding the patterns are given as well as estimates of statistical significance. This author supported by a grant from the System Development Foundation. This author supported by NSF grant MCS-8301960 and by a grant from the System Development Foundation. This author supported by NIH grant GM19036.  相似文献   

9.

Background  

Conserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions. By means of an integrated analysis of large-scale protein structure and sequence data, structural features of conserved protein sequence regions were identified.  相似文献   

10.
The knowledge of protein and domain interactions provide crucial insights into their function within a cell. Several computational methods have been proposed to detect interactions between proteins and their constitutive domains. In this work, we focus on approaches based on correlated evolution (coevolution) of sequences of interacting proteins. In this type of approach, often referred to as the mirrortree method, a high correlation of evolutionary histories of two proteins is used as an indicator to predict protein interactions. Recently, it has been observed that subtracting the underlying speciation process by separating coevolution due to common speciation divergence from that due to common function of interacting pairs greatly improves the predictive power of the mirrortree approach. In this article, we investigate possible improvements and limitations of this method. In particular, we demonstrate that the performance of the mirrortree method that can be further improved by restricting the coevolution analysis to the relatively conserved regions in the protein domain sequences (disregarding highly divergent regions). We provide a theoretical validation of our results leading to new insights into the interplay between coevolution and speciation of interacting proteins.  相似文献   

11.
A general protein sequence alignment methodology for detecting a priori unknown common structural and functional regions is described. The method proposed in this paper is based on two basic requirements for a meaningful alignment. First, each sequence or segment of a sequence is characterized by a multivariate physicochemical profile. Second, the alignment is performed by considering all the sequences simultaneously, and the algorithm detects those regions that form a set of similar profiles. In order to test the structural meaning of the alignment obtained from the sequences, quantitative comparisons are performed with structurally conserved regions (SCR) determined from the X-ray structures of three serine proteases. Results suggest that the limits of the SCR may be predicted from the similarities between the physicochemical profiles of the sequences. The procedures are not completely automated. The final step requires a visual screening of alternative pathways in order to determine an optimal alignment.  相似文献   

12.
Oligonucleotide microarrays have been applied to microbial surveillance and discovery where highly multiplexed assays are required to address a wide range of genetic targets. Although printing density continues to increase, the design of comprehensive microbial probe sets remains a daunting challenge, particularly in virology where rapid sequence evolution and database expansion confound static solutions. Here, we present a strategy for probe design based on protein sequences that is responsive to the unique problems posed in virus detection and discovery. The method uses the Protein Families database (Pfam) and motif finding algorithms to identify oligonucleotide probes in conserved amino acid regions and untranslated sequences. In silico testing using an experimentally derived thermodynamic model indicated near complete coverage of the viral sequence database.  相似文献   

13.
Xanthomonadales comprises one of the largest phytopathogenic bacterial groups, and is currently classified within the gamma-proteobacteria. However, the phylogenetic placement of this group is not clearly resolved, and the results of different studies contradict one another. In this work, the evolutionary position of Xanthomonadales was determined by analyzing the presence of shared insertions and deletions (INDELs) in highly conserved proteins. Several distinctive insertions found in most of the members of the gamma-proteobacteria are absent in Xanthomonadales and groups such as Legionelalles, Chromatiales, Methylococcales, Thiotrichales and Cardiobacteriales. These INDELs were most likely introduced after the branching of Xanthomonadales from most of the gamma-proteobacteria and provide evidence for the phylogenetic placement of the early gamma-proteobacteria. Moreover, other proteins contain insertions exclusive to the Xanthomonadales order, confirming that this is a monophyletic group and provide important specific genetic markers. Thus, the data presented clearly support the Xanthomonadales group as an independent subdivision, and constitute one of the deepest branching lineage within the gamma-proteobacteria clade.  相似文献   

14.
The signal recognition particle (SRP) is a ribonucleoprotein complex responsible for targeting proteins to the endoplasmic reticulum in eukarya or to the inner membrane in prokarya. The crystal structure of the universally conserved RNA-protein core of the Escherichia coli SRP, refined here to 1.5 A resolution, revealed minor groove recognition of the 4.5 S RNA component by the M domain of the Ffh protein. Within the RNA, nucleotides comprising two phylogenetically conserved internal loops create a unique surface for protein recognition. To determine the energetic importance of conserved nucleotides for SRP assembly, we measured the affinity of the M domain for a series of RNA mutants. This analysis reveals how conserved nucleotides within the two internal loop motifs establish the architecture of the macromolecular interface and position essential functional groups for direct recognition by the protein.  相似文献   

15.
16.

Background  

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.  相似文献   

17.
The eukaryotic porin superfamily consists of two families, voltage-dependent anion channel (VDAC) and Tom40, which are both located in the mitochondrial outer membrane. In Trypanosoma brucei, only a single member of the VDAC family has been described. We report the detection of two additional eukaryotic porin-like sequences in T. brucei. By bioinformatic means, we classify both as putative VDAC isoforms.  相似文献   

18.
Bacillus thuringiensis bacteria produce different insecticidal proteins known as Cry and Cyt toxins. Among them the Cyt toxins represent a special and interesting group of proteins. Cyt toxins are able to affect insect midgut cells but also are able to increase the insecticidal damage of certain Cry toxins. Furthermore, the Cyt toxins are able to overcome resistance to Cry toxins in mosquitoes. There is an increasing potential for the use of Cyt toxins in insect control. However, we still need to learn more about its mechanism of action in order to define it at the molecular level. In this review we summarize important aspects of Cyt toxins produced by Bacillus thuringiensis, including current knowledge of their mechanism of action against mosquitoes and also we will present a primary sequence and structural comparison with related proteins found in other pathogenic bacteria and fungus that may indicate that Cyt toxins have been selected by several pathogenic organisms to exert their virulence phenotypes.  相似文献   

19.
P McCaldon  P Argos 《Proteins》1988,4(2):99-122
We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. All sequences in the Protein Identification Resource database were carefully classified by sensitive homology searches into superfamilies to obtain unbiased oligopeptide counts. The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies and to improve the more general methods for detecting protein coding regions within nucleotide sequences.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号