首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
RNA secondary structure prediction is one of the classic problems of bioinformatics. The most efficient approaches to solving this problem are based on comparative analysis. As a rule, multiple RNA sequence alignment and subsequent determination of a common secondary structure are used. A new algorithm was developed to obviate the need for preliminary multiple sequence alignment. The algorithm is based on a multilevel MEME-like iterative search for a generalized profile. The search for common blocks in RNA sequences is carried out at the first level. Then the algorithm refines the chains consisting of these blocks. Finally, the search for sets of common helices, matched with alignment blocks, is carried out. The algorithm was tested with a tRNA set containing additional junk sequences and with RFN riboswitches. The algorithm is available at http://bioinf.fbb.msu.ru/RNAAlign.  相似文献   

2.
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257716 pairs of homologous sequences from 100 protein families. On 168475 of the 257716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.  相似文献   

3.
Summary In this paper we argue that the alignment of sets of sequences and the construction of phyletic trees cannot be treated separately. The concept of good alignment is meaningless without reference to a phyletic tree, and the construction of phyletic trees presupposes alignment of the sequences.We propose an integrated method that generates both an alignment of a set of sequences and a phyletic tree. In this method a putative tree is used to align the sequences and the alignment obtained is used to adjust the tree; this process is iterated. As a demonstration we apply the method to the analysis of the evolution of 5S rRNA sequences in prokaryotes.  相似文献   

4.
A system for pattern matching applications on biosequences   总被引:5,自引:0,他引:5  
ANREP is a system for finding matches to patterns composed of(i) spacing constraints called ‘spacers’, and (ii)approximate matches to ‘motifs’ that are, recursively,patterns composed of ‘atomic’ symbols. A user specifiessuch patterns via a declarative, free-format and strongly typedlanguage called A that is presented here in a tutorial stylethrough a series of progressively more complex examples. Thesample patterns are for protein and DNA sequences, the applicationdomain for which ANREP wos specifically created. ANREP providesa unified framework for almost all previously proposed biosequencepatterns and extends them by providing approximate matching,a feature heretofore unavailable except for the limited caseof individual sequences. The pemformance of ANREP is discussedand an appendix gives concise specification of syntax and semantics.A portable C softwore package implementing ANREP is availablevia anonymous remote file transfer.  相似文献   

5.
Jacalin andM. pomifera agglutinin are T-antigen specific lectins with 44 structures that show far greater microheterogeneity than plant lectins from other families, due to multiple genetic isoforms and post-translational processing. Electrospray mass spectrometry and combined liquid chromatography-electrospray mass spectrometry were used to characterize the various forms. For both lectins, the mass data were consistent with previous protein sequencing of the major -chain species of 133 residues and three -chain species of 20 or 21 residues. In addition, for jacalin the mass of one minor -chain species was consistent with a second of the four reported gene sequences. However, the glycopeptide -chain form and one -chain form did not match any of the genes, suggesting a fifth gene remains to be found. ForM. pomifera agglutinin, three more -chain forms were found, but all six could arise from only two genes, with additional post-translational proteolysis and post-translational substitution with an unidentified component of 106 Da creating the set of six forms. Only two -chain forms were found also, with no glycosylation.  相似文献   

6.
 We analyzed nucleotide and deduced amino acid sequence heterogeneity of sheep T-cell receptor β-chain cDNAs isolated from an anchored-polymerase chain reaction library. Evaluation of 34 individual rearrangements has defined 18 new β-chain variable region sequences which have been clustered into 13 families. Presumptive allelic polymorphisms of four of these variable regions have been defined, as well as ten distinct β-chain joining region sequences. The present analysis indicates that sheep T-cell receptor β-chains are composed of characteristic leader, variable, joining, and constant region sequences, and that imprecise joining and N-region addition contribute significantly to diversity in the third hypervariable region. Thus, it appears that sheep, like all other mammals studied to date, employ somatic rearrangement of multiple germline genes to create β-chain heterogeneity. These findings have allowed us to estimate the diversity of the sheep T-cell receptor β-chain variable region repertoire, and they provide information that will permit the evaluation of the role that specific T-cell populations play in naturally occurring and experimental diseases of sheep. Received: 20 October 1997 / Revised: 20 April 1998  相似文献   

7.
Endothelin (ET) receptors activate heterotrimeric G proteinsthat are members of the Gi,Gq, andGs families but may also activatemembers of other families such asG12/13.G13 has multiple complexcellular effects that are similar to those of ET. We studied theability of ET receptors to activateG13 using an assay for Gprotein -chain activation that is based on the fact that an activated (GTP-bound) -chain is resistant to trypsinization compared with an inactive (GDP-bound) -chain. Nonhydrolyzable guanine nucleotides and AlMgF protectedG13 from degradation bytrypsin. In membranes from human embryonic kidney 293 cells thatcoexpress ETB receptors and13, ET-3 and5'-guanylylimidodiphosphate [Gpp(NH)p] increased theprotection of 13 compared withGpp(NH)p alone. The specificity ofETBreceptor-13 coupling wasdocumented by showing that 2receptors and isoproterenol or ETAreceptors and ET-1 did not activate13 and that a specificantagonist for ETB receptorsblocked ET-3-dependent activation of13.  相似文献   

8.
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon–carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389–469 and 482–523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.  相似文献   

9.
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.  相似文献   

10.
Modern computational methods for protein structure prediction have been used to study the structure of the 33 kDa extrinsic membrane protein, associated to the oxygen evolving complex of photosynthetic organisms. A multiple alignment of 14 sequences of this protein from cyanobacteria, algae and plants is presented. The alignment allows the identification of fully conserved residues and the recognition of one deletion and one insertion present in the plant sequences but not in cyanobacteria. A tree of similarity, deduced from pair-wise comparison and cluster analysis of the sequences, is also presented. The alignment and the consensus sequence derived are used for prediction the secondary structure of the protein. This prediction indicates that it is a mainly-beta protein (25–38% of -strands) with no more than 4% of -helix. Fold recognition by threading is applied to obtain a topological 2D model of the protein. In this model the secondary structure elements are located, including several highly conserved loops. Some of these conserved loops are suggested to be important for the binding of the 33 kDa protein to Photosystem II and for the stability of the manganese cluster. These structural predictions are in good agreement with experimental data reported by several authors.  相似文献   

11.
Three HLA-DR genes were isolated from a Swedish HLA-DR3/4 insulin-dependent diabetes mellitus (IDDM) patient and characterized by restriction endonuclease mapping and nucleotide sequence analysis. Two out of the three DNA sequences differed from those of published DR-chain sequences. A DR-gene probe prepared from exon 4 and flanking sequences was used in a Southern blot analysis of blood donors' DNA and DNA from HLA-DR3/4 IDDM patients and HLA-DR-matched healthy control subjects. This probe differentiated HLA-DR3/4 IDDM patients and HLA-DR-matched controls in the Scandinavian population but not in the North American Caucasoid population.  相似文献   

12.
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/∼bjyoon/picxaa/.  相似文献   

13.

Background

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

Methods

Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction.

Results

The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource.

Conclusions

THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
  相似文献   

14.
Schistosomes, major parasitic helminths, express numerous glycoconjugatesthat provoke humoral and cellular immune responses in the infectedhuman host. The main pathology in schistosomiasis is due tothe formation of granulomas around tissue-trapped eggs and theresulting organ damage. By using a mouse model of inductionof granulomas by hepatic implantation of antigen-coated beads,it has been determined that the glycan part of schistosomalsoluble egg antigens (SEA) initiates granulomogenesis. To identifywhich individual glycan elements in this complex SEA mixtureare granulomogenic, we have tested in the same mouse model conjugatesof various synthetic oligosaccharides characteristic for schistosomeeggs, including GalNAcß1-4GlcNAc (LacdiNAc, LDN),Galß1-4(Fuc1-3)GlcNAc (Lewisx), Fuc1-2Fuc1-3GlcNAc(DF-Gn), and Fuc1-3GalNAcß1-4(Fuc1-3)GlcNAc (F-LDN-F).Ribonuclease (RNase) A and B, and different fetuin glycoformswere included as controls. Only beads that carry glycoconjugateswith terminal LacdiNAc or Galß1-4GlcNAc (LacNAc, LN)elements gave rise to granulomas, with macrophage, lymphocyte,and eosinophil levels similar to the granulomatous lesions causedby schistosome eggs in a natural infection. Uncoated beads,and beads coated with fucosylated glycoconjugates or glycoconjugateslacking terminally exposed Gal or GalNAc, only attracted a monolayerof macrophages. These results indicate that the formation ofhepatic granulomas is triggered specifically by glycoconjugateswhich carry terminal LacNAc or LacdiNAc, both constituents ofthe schistosome egg.  相似文献   

15.
Summary A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.Application of this optimum match method of sequence comparison shows that the sensitivity of the maximum match method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The maximum match method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The constrained match method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology.  相似文献   

16.

Background

Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.

Results

Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.

Conclusion

We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.
  相似文献   

17.
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users.  相似文献   

18.
Electrospray mass spectrometry was used to identify precisely the proteolytic cleavage points within, and at the C-termini of, the proprotein forms of four Viciae lectins that give rise to their two-chain forms. The lectins examined were the pea and lentil lectins, favin and theLathyrus odoratus lectin, which represent each of the four genera in this tribe. The molecular mass data showed single -chain forms for each lectin, with masses consistent with the available sequence and glycopeptide data, indicating that each came from a single proprotein. In contrast, the pea, lentil andL. odoratus -chains occurred in as many as five forms, due to multiple C-terminal cleavage points. Only favin showed a single -chain form. The -chain mass data were again consistent with the sequence information available, except for the lenti lectin -chain which was re-determined by protein sequencing. The two isolectin forms of this protein were shown to arise from -chain species with and without residue Lys53. The mass spectrum of concanavalin A was also examined and both the single-chain form and the two fragment forms showed no evidence of C-terminal heterogeneity.  相似文献   

19.
An approach to systematic detection of protein structural motifs   总被引:2,自引:0,他引:2  
A procedure to detect similar local structures of proteins fromC coordinates is presented. First, the conformations of seven-residuepeptide segments are approximated by a limited number of representatives,each of which is assigned a symbol. Thus, the overall conformationof a protein is represented by a symbol string. The comparisonof these symbol strings using a sequence alignment techniquethen gives pairs of similar local structures. These pairs areconsidered candidates of structural motifs. The applicationof the procedure to the analysis of 93 proteins gave 858 pairsof similar local structures, which included several well-knownstructural motifs such as the nucleotide-binding ßß-unitand the calcium-binding EF hand. The characterization of aminoacid patterns of similar local structures given by the procedureshould be useful for the development of protein structure predictionbased on the acquisition of empirical rules from a large-scaledatabase.  相似文献   

20.

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号