共查询到20条相似文献,搜索用时 0 毫秒
1.
We present an algorithm to detect protein sub-structural motifs from primary sequence. The input to the algorithm is a set of aligned multiple protein sequences. It uses wavelet transforms to decompose protein sequences represented numerically by different indices (such as polarity, accessible surface area or electron-ion integration potentials of the amino acids). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected to be observable from the wavelet spectrum. The decomposed signals are then up-sampled and similarity search techniques are used to identify similar regions across all the proteins at multiple scales. Results indicate that wavelet transform techniques are a promising approach for rapid motif detection. 相似文献
2.
《中国科学:生命科学英文版》2007,(3)
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. 相似文献
3.
Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. 总被引:10,自引:0,他引:10
下载免费PDF全文

N Stojanovic L Florea C Riemer D Gumucio J Slightom M Goodman W Miller R Hardison 《Nucleic acids research》1999,27(19):3899-3910
Conserved segments in DNA or protein sequences are strong candidates for functional elements and thus appropriate methods for computing them need to be developed and compared. We describe five methods and computer programs for finding highly conserved blocks within previously computed multiple alignments, primarily for DNA sequences. Two of the methods are already in common use; these are based on good column agreement and high information content. Three additional methods find blocks with minimal evolutionary change, blocks that differ in at most k positions per row from a known center sequence and blocks that differ in at most k positions per row from a center sequence that is unknown a priori. The center sequence in the latter two methods is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons. Although all five methods have quite different theoretical underpinnings, they produce rather similar results on these data sets when their parameters are adjusted to best approximate the experimental data. The optimal parameters for the method based on information content varied little for different regulatory regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed mismatches per row have simple parameters whose values can be chosen a priori and thus they may be more useful than the other methods when calibration against known functional sites is not available. 相似文献
4.
G Dadaglio A Leroux P Langlade-Demoyen E M Bahraoui F Traincard R Fisher F Plata 《Journal of immunology (Baltimore, Md. : 1950)》1991,147(7):2302-2309
CTL constitute an essential part of the immune response against the HIV. CTL recognize peptides derived from viral proteins together with the MHC class I molecules on the surface of infected cells. The CTL response could be important in prevention or control of infection with HIV by destroying virus-producing cells. In this study we have attempted to identify peptide epitopes recognized by HIV-specific CTL. Using synthetic peptides, we have identified six conserved peptidic epitopes on the gp120 envelope glycoprotein recognized by polyclonal human CTL in association with HLA-A2 class I transplantation Ag. These results were confirmed by two approaches: i) blocking of CTL activities with antibodies specific for three of these conserved peptides; and ii) construction of doubly transfected P815-A2 target cells, using deletions of the HIV env gene. Vaccination or immunotherapy in HLA-A2 individuals can thus be considered using highly conserved HIV env peptidic sequences. 相似文献
5.
Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences 总被引:3,自引:0,他引:3
Hall BG 《Molecular biology and evolution》2005,22(3):792-802
A biologically realistic method was used to simulate evolutionary trees. The method uses a real DNA coding sequence as the starting point, simulates mutation according to the mutational spectrum of Escherichia coli-including base substitutions, insertions, and deletions-and separates the processes of mutation and selection. Trees of 8, 16, 32, and 64 taxa were simulated with average branch lengths of 50, 100, 150, 200, and 250 changes per branch. The resulting sequences were aligned with ClustalX, and trees were estimated by Neighbor Joining, Parsimony, Maximum Likelihood, and Bayesian methods from both DNA sequences and the corresponding protein sequences. The estimated trees were compared with the true trees, and both topological and branch length accuracies were scored. Over the variety of conditions tested, Bayesian trees estimated from DNA sequences that had been aligned according to the alignment of the corresponding protein sequences were the most accurate, followed by Maximum Likelihood trees estimated from DNA sequences and Parsimony trees estimated from protein sequences. 相似文献
6.
Searching databases of conserved sequence regions by aligning protein multiple-alignments. 总被引:14,自引:2,他引:14
下载免费PDF全文

S Pietrokovski 《Nucleic acids research》1996,24(19):3836-3845
A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch. 相似文献
7.
Background
Conserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions. By means of an integrated analysis of large-scale protein structure and sequence data, structural features of conserved protein sequence regions were identified. 相似文献8.
Jabado OJ Liu Y Conlan S Quan PL Hegyi H Lussier Y Briese T Palacios G Lipkin WI 《Nucleic acids research》2008,36(1):e3
Oligonucleotide microarrays have been applied to microbial surveillance and discovery where highly multiplexed assays are required to address a wide range of genetic targets. Although printing density continues to increase, the design of comprehensive microbial probe sets remains a daunting challenge, particularly in virology where rapid sequence evolution and database expansion confound static solutions. Here, we present a strategy for probe design based on protein sequences that is responsive to the unique problems posed in virus detection and discovery. The method uses the Protein Families database (Pfam) and motif finding algorithms to identify oligonucleotide probes in conserved amino acid regions and untranslated sequences. In silico testing using an experimentally derived thermodynamic model indicated near complete coverage of the viral sequence database. 相似文献
9.
A general protein sequence alignment methodology for detecting a priori unknown common structural and functional regions is described. The method proposed in this paper is based on two basic requirements for a meaningful alignment. First, each sequence or segment of a sequence is characterized by a multivariate physicochemical profile. Second, the alignment is performed by considering all the sequences simultaneously, and the algorithm detects those regions that form a set of similar profiles. In order to test the structural meaning of the alignment obtained from the sequences, quantitative comparisons are performed with structurally conserved regions (SCR) determined from the X-ray structures of three serine proteases. Results suggest that the limits of the SCR may be predicted from the similarities between the physicochemical profiles of the sequences. The procedures are not completely automated. The final step requires a visual screening of alternative pathways in order to determine an optimal alignment. 相似文献
10.
The signal recognition particle (SRP) is a ribonucleoprotein complex responsible for targeting proteins to the endoplasmic reticulum in eukarya or to the inner membrane in prokarya. The crystal structure of the universally conserved RNA-protein core of the Escherichia coli SRP, refined here to 1.5 A resolution, revealed minor groove recognition of the 4.5 S RNA component by the M domain of the Ffh protein. Within the RNA, nucleotides comprising two phylogenetically conserved internal loops create a unique surface for protein recognition. To determine the energetic importance of conserved nucleotides for SRP assembly, we measured the affinity of the M domain for a series of RNA mutants. This analysis reveals how conserved nucleotides within the two internal loop motifs establish the architecture of the macromolecular interface and position essential functional groups for direct recognition by the protein. 相似文献
11.
12.
Ania M. Cutiño-Jiménez Marinalva Martins-Pinheiro Wanessa C. Lima Alexander Martín-Tornet Osleidys G. Morales Carlos F.M. Menck 《Molecular phylogenetics and evolution》2010,54(2):524-534
Xanthomonadales comprises one of the largest phytopathogenic bacterial groups, and is currently classified within the gamma-proteobacteria. However, the phylogenetic placement of this group is not clearly resolved, and the results of different studies contradict one another. In this work, the evolutionary position of Xanthomonadales was determined by analyzing the presence of shared insertions and deletions (INDELs) in highly conserved proteins. Several distinctive insertions found in most of the members of the gamma-proteobacteria are absent in Xanthomonadales and groups such as Legionelalles, Chromatiales, Methylococcales, Thiotrichales and Cardiobacteriales. These INDELs were most likely introduced after the branching of Xanthomonadales from most of the gamma-proteobacteria and provide evidence for the phylogenetic placement of the early gamma-proteobacteria. Moreover, other proteins contain insertions exclusive to the Xanthomonadales order, confirming that this is a monophyletic group and provide important specific genetic markers. Thus, the data presented clearly support the Xanthomonadales group as an independent subdivision, and constitute one of the deepest branching lineage within the gamma-proteobacteria clade. 相似文献
13.
Alain Coletta John W Pinney David Y Weiss Solís James Marsh Steve R Pettifer Teresa K Attwood 《BMC systems biology》2010,4(1):43
Background
Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis. 相似文献14.
The cell-free protein synthesis by the postmitochondrial supernatant from chicken cerebrum was twofold greater than protein synthesis by the cerebellum or optic lobes. Ribosomal aggregation of mRNA and ribonuclease activity of the postmitochondrial supernatant from the three brain regions was not statistically different. The higher protein synthetic activity of the cerebral postmitochondrial supernatant was associated with both the postribosomal supernatant (cell sap) and microsomal fractions. Cerebral monomeric ribosomes were more active in polyuridylic acid directed polyphenylalanine synthesis than monomeric ribosomes from either the cerebellum or optic lobes. The ability of cerebral cell sap to support polyuridylic acid directed polyphenylalanine synthesis was 1.6 to 2 times greater than cell sap from the other two regions. Cell sap factors other than tRNAphe or phenylalanyl-tRNA synthetases appear to be responsible for the higher protein synthetic activity of the cbr cell sap. 相似文献
15.
The eukaryotic porin superfamily consists of two families, voltage-dependent anion channel (VDAC) and Tom40, which are both located in the mitochondrial outer membrane. In Trypanosoma brucei, only a single member of the VDAC family has been described. We report the detection of two additional eukaryotic porin-like sequences in T. brucei. By bioinformatic means, we classify both as putative VDAC isoforms. 相似文献
16.
Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences 总被引:16,自引:0,他引:16
We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. All sequences in the Protein Identification Resource database were carefully classified by sensitive homology searches into superfamilies to obtain unbiased oligopeptide counts. The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies and to improve the more general methods for detecting protein coding regions within nucleotide sequences. 相似文献
17.
The MC1 protein is a chromosomal protein likely involved in the DNA compaction of some methanogenic archaea. This small and monomeric protein, structurally unrelated to other DNA binding proteins, bends DNA sharply. By studying the protein binding to various kinds of kinked DNA, we have previously shown that MC1 is able to discriminate between different deformations of the DNA helix. Here we investigate its capacity to recognize particular DNA sequences by using a SELEX procedure. We find that MC1 is able to preferentially bind to a 15 base pair motif [AAAAACACAC(A/C)CCCC]. The structural parameters of this sequence are characterized by molecular dynamics simulation experiments, and the binding mode of the protein to the DNA is studied by footprinting experiments. Our results strongly suggest that the protein realizes an indirect readout of the DNA sequence by binding to the DNA minor groove. 相似文献
18.
19.
Comparison of several protein phylogeny reconstruction methods was realized on a set of natural protein sequences. The programs of the PHYLIP package and FastME, PhyML and TreeTop programs were tested. In contrast to several studied programs that used simulated sequences, our results demonstrate the superiority of distance methods over the maximum likelihood method. 相似文献
20.
Computer-assisted analysis of envelope protein sequences of seven human immunodeficiency virus isolates: prediction of antigenic epitopes in conserved and variable regions. 总被引:35,自引:48,他引:35
下载免费PDF全文

Independent isolates of human immunodeficiency virus (HIV) exhibit a striking genomic diversity, most of which is located in the viral envelope gene. Since this property of the HIV group of viruses may play an important role in the pathobiology of the virus, we analyzed the predicted amino acid sequences of the envelope proteins of seven different HIV strains, three of which represent sequential isolates from a single patient. By using a computer program that predicts the secondary protein structure and superimposes values for hydrophilicity, surface probability, and flexibility, we identified several potential antigenic epitopes in the envelope proteins of the seven different viruses. Interestingly, the majority of the predicted epitopes in the exterior envelope protein (gp120) were found in regions of high sequence variability which are interspersed with highly conserved regions among the independent viral isolates. A comparison of the sequential viral isolates revealed that changes concerning the secondary structure of the protein occurred only in regions which were predicted to be antigenic, predominantly in highly variable regions. The membrane-associated protein gp41 contains no highly variable regions; about 80% of the amino acids were found to be conserved, and only one hydrophilic area was identified as likely to be accessible to antibody recognition. These findings give insight into the secondary and possible tertiary structure of variant HIV envelope proteins and should facilitate experimental approaches directed toward the identification and fine mapping of HIV envelope proteins. 相似文献