共查询到20条相似文献,搜索用时 46 毫秒
1.
A. A. Mironov 《Molecular Biology》2007,41(4):642-649
RNA secondary structure prediction is one of the classic problems of bioinformatics. The most efficient approaches to solving this problem are based on comparative analysis. As a rule, multiple RNA sequence alignment and subsequent determination of a common secondary structure are used. A new algorithm was developed to obviate the need for preliminary multiple sequence alignment. The algorithm is based on a multilevel MEME-like iterative search for a generalized profile. The search for common blocks in RNA sequences is carried out at the first level. Then the algorithm refines the chains consisting of these blocks. Finally, the search for sets of common helices, matched with alignment blocks, is carried out. The algorithm was tested with a tRNA set containing additional junk sequences and with RFN riboswitches. The algorithm is available at http://bioinf.fbb.msu.ru/RNAAlign. 相似文献
2.
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257716 pairs of homologous sequences from 100 protein families. On 168475 of the 257716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM. 相似文献
3.
The alignment of sets of sequences and the construction of phyletic trees: An integrated method 总被引:7,自引:0,他引:7
Summary In this paper we argue that the alignment of sets of sequences and the construction of phyletic trees cannot be treated separately. The concept of good alignment is meaningless without reference to a phyletic tree, and the construction of phyletic trees presupposes alignment of the sequences.We propose an integrated method that generates both an alignment of a set of sequences and a phyletic tree. In this method a putative tree is used to align the sequences and the alignment obtained is used to adjust the tree; this process is iterated. As a demonstration we apply the method to the analysis of the evolution of 5S rRNA sequences in prokaryotes. 相似文献
4.
A system for pattern matching applications on biosequences 总被引:5,自引:0,他引:5
ANREP is a system for finding matches to patterns composed of(i) spacing constraints called spacers, and (ii)approximate matches to motifs that are, recursively,patterns composed of atomic symbols. A user specifiessuch patterns via a declarative, free-format and strongly typedlanguage called A that is presented here in a tutorial stylethrough a series of progressively more complex examples. Thesample patterns are for protein and DNA sequences, the applicationdomain for which ANREP wos specifically created. ANREP providesa unified framework for almost all previously proposed biosequencepatterns and extends them by providing approximate matching,a feature heretofore unavailable except for the limited caseof individual sequences. The pemformance of ANREP is discussedand an appendix gives concise specification of syntax and semantics.A portable C softwore package implementing ANREP is availablevia anonymous remote file transfer. 相似文献
5.
Mass spectrometric analysis of genetic and post-translational heterogeneity in the lectins Jacalin andMaclura pomifera agglutinin 总被引:1,自引:0,他引:1
Jacalin andM. pomifera agglutinin are T-antigen specific lectins with 44 structures that show far greater microheterogeneity than plant lectins from other families, due to multiple genetic isoforms and post-translational processing. Electrospray mass spectrometry and combined liquid chromatography-electrospray mass spectrometry were used to characterize the various forms. For both lectins, the mass data were consistent with previous protein sequencing of the major -chain species of 133 residues and three -chain species of 20 or 21 residues. In addition, for jacalin the mass of one minor -chain species was consistent with a second of the four reported gene sequences. However, the glycopeptide -chain form and one -chain form did not match any of the genes, suggesting a fifth gene remains to be found. ForM. pomifera agglutinin, three more -chain forms were found, but all six could arise from only two genes, with additional post-translational proteolysis and post-translational substitution with an unidentified component of 106 Da creating the set of six forms. Only two -chain forms were found also, with no glycosylation. 相似文献
6.
We analyzed nucleotide and deduced amino acid sequence heterogeneity of sheep T-cell receptor β-chain cDNAs isolated from
an anchored-polymerase chain reaction library. Evaluation of 34 individual rearrangements has defined 18 new β-chain variable
region sequences which have been clustered into 13 families. Presumptive allelic polymorphisms of four of these variable regions
have been defined, as well as ten distinct β-chain joining region sequences. The present analysis indicates that sheep T-cell
receptor β-chains are composed of characteristic leader, variable, joining, and constant region sequences, and that imprecise
joining and N-region addition contribute significantly to diversity in the third hypervariable region. Thus, it appears that
sheep, like all other mammals studied to date, employ somatic rearrangement of multiple germline genes to create β-chain heterogeneity.
These findings have allowed us to estimate the diversity of the sheep T-cell receptor β-chain variable region repertoire,
and they provide information that will permit the evaluation of the role that specific T-cell populations play in naturally
occurring and experimental diseases of sheep.
Received: 20 October 1997 / Revised: 20 April 1998 相似文献
7.
Kitamura Kenichiro; Shiraishi Naoki; Singer William D.; Handlogten Mary E.; Tomita Kimio; Miller R. Tyler 《American journal of physiology. Cell physiology》1999,276(4):C930
Endothelin (ET) receptors activate heterotrimeric G proteinsthat are members of the Gi,Gq, andGs families but may also activatemembers of other families such asG12/13.G13 has multiple complexcellular effects that are similar to those of ET. We studied theability of ET receptors to activateG13 using an assay for Gprotein -chain activation that is based on the fact that an activated (GTP-bound) -chain is resistant to trypsinization compared with an inactive (GDP-bound) -chain. Nonhydrolyzable guanine nucleotides and AlMgF protectedG13 from degradation bytrypsin. In membranes from human embryonic kidney 293 cells thatcoexpress ETB receptors and13, ET-3 and5'-guanylylimidodiphosphate [Gpp(NH)p] increased theprotection of 13 compared withGpp(NH)p alone. The specificity ofETBreceptor-13 coupling wasdocumented by showing that 2receptors and isoproterenol or ETAreceptors and ET-1 did not activate13 and that a specificantagonist for ETB receptorsblocked ET-3-dependent activation of13. 相似文献
8.
Amrita Banerjee Arijit Jana Bikash R. Pati Keshab C. Mondal Pradeep K. Das Mohapatra 《The protein journal》2012,31(4):306-327
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial
and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for
different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction
and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for
these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase,
carboxylesterase/thioesterase 1, carbon–carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and
mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity
with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved
regions at different stretches with maximum homology from amino acid residues 389–469 and 482–523 which could be used for
designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed
two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these
different genera. Although in second cluster near about all fungal species were found together in a corner which indicates
the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature
amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase
sequences representing its participation with the structure and enzymatic function. 相似文献
9.
Robert K. Bradley Adam Roberts Michael Smoot Sudeep Juvekar Jaeyoung Do Colin Dewey Ian Holmes Lior Pachter 《PLoS computational biology》2009,5(5)
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/. 相似文献
10.
Modern computational methods for protein structure prediction have been used to study the structure of the 33 kDa extrinsic membrane protein, associated to the oxygen evolving complex of photosynthetic organisms. A multiple alignment of 14 sequences of this protein from cyanobacteria, algae and plants is presented. The alignment allows the identification of fully conserved residues and the recognition of one deletion and one insertion present in the plant sequences but not in cyanobacteria. A tree of similarity, deduced from pair-wise comparison and cluster analysis of the sequences, is also presented. The alignment and the consensus sequence derived are used for prediction the secondary structure of the protein. This prediction indicates that it is a mainly-beta protein (25–38% of -strands) with no more than 4% of -helix. Fold recognition by threading is applied to obtain a topological 2D model of the protein. In this model the secondary structure elements are located, including several highly conserved loops. Some of these conserved loops are suggested to be important for the binding of the 33 kDa protein to Photosystem II and for the stability of the manganese cluster. These structural predictions are in good agreement with experimental data reported by several authors. 相似文献
11.
Three HLA-DR genes were isolated from a Swedish HLA-DR3/4 insulin-dependent diabetes mellitus (IDDM) patient and characterized by restriction endonuclease mapping and nucleotide sequence analysis. Two out of the three DNA sequences differed from those of published DR-chain sequences. A DR-gene probe prepared from exon 4 and flanking sequences was used in a Southern blot analysis of blood donors' DNA and DNA from HLA-DR3/4 IDDM patients and HLA-DR-matched healthy control subjects. This probe differentiated HLA-DR3/4 IDDM patients and HLA-DR-matched controls in the Scandinavian population but not in the North American Caucasoid population. 相似文献
12.
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/∼bjyoon/picxaa/. 相似文献
13.
Background
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.Methods
Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction.Results
The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource.Conclusions
THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.14.
LacdiNAc- and LacNAc-containing glycans induce granulomas in an in vivo model for schistosome egg-induced hepatic granuloma formation 总被引:4,自引:0,他引:4
Schistosomes, major parasitic helminths, express numerous glycoconjugatesthat provoke humoral and cellular immune responses in the infectedhuman host. The main pathology in schistosomiasis is due tothe formation of granulomas around tissue-trapped eggs and theresulting organ damage. By using a mouse model of inductionof granulomas by hepatic implantation of antigen-coated beads,it has been determined that the glycan part of schistosomalsoluble egg antigens (SEA) initiates granulomogenesis. To identifywhich individual glycan elements in this complex SEA mixtureare granulomogenic, we have tested in the same mouse model conjugatesof various synthetic oligosaccharides characteristic for schistosomeeggs, including GalNAcß1-4GlcNAc (LacdiNAc, LDN),Galß1-4(Fuc1-3)GlcNAc (Lewisx), Fuc1-2Fuc1-3GlcNAc(DF-Gn), and Fuc1-3GalNAcß1-4(Fuc1-3)GlcNAc (F-LDN-F).Ribonuclease (RNase) A and B, and different fetuin glycoformswere included as controls. Only beads that carry glycoconjugateswith terminal LacdiNAc or Galß1-4GlcNAc (LacNAc, LN)elements gave rise to granulomas, with macrophage, lymphocyte,and eosinophil levels similar to the granulomatous lesions causedby schistosome eggs in a natural infection. Uncoated beads,and beads coated with fucosylated glycoconjugates or glycoconjugateslacking terminally exposed Gal or GalNAc, only attracted a monolayerof macrophages. These results indicate that the formation ofhepatic granulomas is triggered specifically by glycoconjugateswhich carry terminal LacNAc or LacdiNAc, both constituents ofthe schistosome egg. 相似文献
15.
T. C. Elleman 《Journal of molecular evolution》1978,11(2):143-161
Summary A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.Application of this optimum match method of sequence comparison shows that the sensitivity of the maximum match method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The maximum match method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The constrained match method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology. 相似文献
16.
Background
Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.Results
Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.Conclusion
We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.17.
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users. 相似文献18.
Post-translational proteolytic processing and the isolectins of lentil and other Viciae seed lectins
Electrospray mass spectrometry was used to identify precisely the proteolytic cleavage points within, and at the C-termini of, the proprotein forms of four Viciae lectins that give rise to their two-chain forms. The lectins examined were the pea and lentil lectins, favin and theLathyrus odoratus lectin, which represent each of the four genera in this tribe. The molecular mass data showed single -chain forms for each lectin, with masses consistent with the available sequence and glycopeptide data, indicating that each came from a single proprotein. In contrast, the pea, lentil andL. odoratus -chains occurred in as many as five forms, due to multiple C-terminal cleavage points. Only favin showed a single -chain form. The -chain mass data were again consistent with the sequence information available, except for the lenti lectin -chain which was re-determined by protein sequencing. The two isolectin forms of this protein were shown to arise from -chain species with and without residue Lys53. The mass spectrum of concanavalin A was also examined and both the single-chain form and the two fragment forms showed no evidence of C-terminal heterogeneity. 相似文献
19.
An approach to systematic detection of protein structural motifs 总被引:2,自引:0,他引:2
A procedure to detect similar local structures of proteins fromC coordinates is presented. First, the conformations of seven-residuepeptide segments are approximated by a limited number of representatives,each of which is assigned a symbol. Thus, the overall conformationof a protein is represented by a symbol string. The comparisonof these symbol strings using a sequence alignment techniquethen gives pairs of similar local structures. These pairs areconsidered candidates of structural motifs. The applicationof the procedure to the analysis of 93 proteins gave 858 pairsof similar local structures, which included several well-knownstructural motifs such as the nucleotide-binding ßß-unitand the calcium-binding EF hand. The characterization of aminoacid patterns of similar local structures given by the procedureshould be useful for the development of protein structure predictionbased on the acquisition of empirical rules from a large-scaledatabase. 相似文献
20.