共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
MOTIVATION: Multiple sequence alignments are essential tools for establishing the homology relations between proteins. Essential amino acids for the function and/or the structure are generally conserved, thus providing key arguments to help in protein characterization. However for distant proteins, it is more difficult to establish, in a reliable way, the homology relations that may exist between them. In this article, we show that secondary structure prediction is a valuable way to validate protein families at low identity rate. RESULTS: We show that the analysis of the secondary structures compatibility is a reliable way to discard non-related proteins in low identity multiple alignment. AVAILABILITY: This validation is possible through our NPS@ server (http://npsa-pbil.ibcp.fr) 相似文献
3.
Darío Guerrero Rocío Bautista David P Villalobos Francisco R Cantón M Gonzalo Claros 《Algorithms for molecular biology : AMB》2010,5(1):24
Background
Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. 相似文献4.
5.
6.
Chakrabarti S Lanczycki CJ Panchenko AR Przytycka TM Thiessen PA Bryant SH 《Nucleic acids research》2006,34(9):2598-2606
Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/REFINER) and will be incorporated into the next release of the Cn3D structure/alignment viewer. 相似文献
7.
Background
In 2004, Bejerano et al. announced the startling discovery of hundreds of "ultraconserved elements", long genomic sequences perfectly conserved across human, mouse, and rat. Their announcement stimulated a flurry of subsequent research. 相似文献8.
Höchsmann M Voss B Giegerich R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(1):53-62
In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information. 相似文献
9.
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis 总被引:34,自引:0,他引:34
Castresana J 《Molecular biology and evolution》2000,17(4):540-552
The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers. 相似文献
10.
MOTIVATION: The Dss statistic was proposed by McGuire et al. (Mol. Biol. Evol., 14, 1125-1131, 1997) for scanning data sets for the presence of recombination, an important step in some phylogenetic analyses. The statistic, however, could not distinguish well between among-site rate variation and recombination, and had no statistical test for significant values. This paper addresses these shortfalls. RESULTS: A modification to the Dss statistic is proposed which accounts for rate variation to a large extent. A statistical test, based on parametric bootstrapping, is also suggested. AVAILABILITY: The TOPAL package (version 2) may be accessed from http:/ /www.bioss.sari.ac.uk/frank/Genetics and by anonymous ftp from typ://ftp.bioss.sari.ac.uk in the directory pub/phylogeny/topal. CONTACT: frank@bioss.sari.ac.uk 相似文献
11.
O'Sullivan O Suhre K Abergel C Higgins DG Notredame C 《Journal of molecular biology》2004,340(2):385-395
Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment. 相似文献
12.
Background
β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. 相似文献13.
14.
Given a family of related sequences, one can first determinealignments between various pairs of those sequences, then constructa simultaneous alignment of all the sequences that is determinedin a natural manner by the set of pairwise alignments. Thisapproach is sometimes effective for exposing the existence andlocations of conserved regions, which can then be aligned bymore sensitive multiple-alignment methods. This paper presentsan efficient algorithm for constructing a multiple alignmentfrom a set of pairwise alignments. 相似文献
15.
Mironov AA 《Molekuliarnaia biologiia》2007,41(4):711-718
The RNA secondary structure prediction is a classical problem in bioinformatics. The most efficient approach to this problem is based on the idea of a comparative analysis. In this approach the algorithms utilize multiple alignment of the RNA sequences and find common RNA structure. This paper describes a new algorithm for this task. This algorithm does not require predefined multiple alignment. The main idea of the algorithm is based on MEME-like iterative searching of abstract profile on different levels. On the first level the algorithm searches the common blocks in the RNA sequences and creates chain of this blocks. On the next step the algorithm refines the chain of common blocks. On the last stage the algorithm searches sets of common helices that have consistent locations relative to common blocks. The algorithm was tested on sets of tRNA with a subset of junk sequences and on RFN riboswitches. The algorithm is implemented as a web server (http://bioinf.fbb.msu.ru/RNAAlign/). 相似文献
16.
We present a novel method for multiple alignment of protein structures and detection of structural motifs. To date, only a few methods are available for addressing this task. Most of them are based on a series of pairwise comparisons. In contrast, MASS (Multiple Alignment by Secondary Structures) considers all the given structures at the same time. Exploiting the secondary structure representation aids in filtering out noisy results and in making the method highly efficient and robust. MASS disregards the sequence order of the secondary structure elements. Thus, it can find non-sequential and even non-topological structural motifs. An important novel feature of MASS is subset alignment detection: It does not require that all the input molecules be aligned. Rather, MASS is capable of detecting structural motifs shared only by a subset of the molecules. Given its high efficiency and capability of detecting subset alignments, MASS is suitable for a broad range of challenging applications: It can handle large-scale protein ensembles (on the order of tens) that may be heterogeneous, noisy, topologically unrelated and contain structures of low resolution. 相似文献
17.
Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. 总被引:10,自引:0,他引:10 下载免费PDF全文
N Stojanovic L Florea C Riemer D Gumucio J Slightom M Goodman W Miller R Hardison 《Nucleic acids research》1999,27(19):3899-3910
Conserved segments in DNA or protein sequences are strong candidates for functional elements and thus appropriate methods for computing them need to be developed and compared. We describe five methods and computer programs for finding highly conserved blocks within previously computed multiple alignments, primarily for DNA sequences. Two of the methods are already in common use; these are based on good column agreement and high information content. Three additional methods find blocks with minimal evolutionary change, blocks that differ in at most k positions per row from a known center sequence and blocks that differ in at most k positions per row from a center sequence that is unknown a priori. The center sequence in the latter two methods is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons. Although all five methods have quite different theoretical underpinnings, they produce rather similar results on these data sets when their parameters are adjusted to best approximate the experimental data. The optimal parameters for the method based on information content varied little for different regulatory regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed mismatches per row have simple parameters whose values can be chosen a priori and thus they may be more useful than the other methods when calibration against known functional sites is not available. 相似文献
18.
Marchler-Bauer A Anderson JB DeWeese-Scott C Fedorova ND Geer LY He S Hurwitz DI Jackson JD Jacobs AR Lanczycki CJ Liebert CA Liu C Madej T Marchler GH Mazumder R Nikolskaya AN Panchenko AR Rao BS Shoemaker BA Simonyan V Song JS Thiessen PA Vasudevan S Wang Y Yamashita RA Yin JJ Bryant SH 《Nucleic acids research》2003,31(1):383-387
The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members. 相似文献
19.
Muth T García-Martín JA Rausell A Juan D Valencia A Pazos F 《Bioinformatics (Oxford, England)》2012,28(4):584-586
We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. Availability and implementation: JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available. 相似文献
20.
Saccharomyces SRP RNA secondary structures: a conserved S-domain and extended Alu-domain 总被引:2,自引:0,他引:2
The contribution made by the RNA component of signal recognition particle (SRP) to its function in protein targeting is poorly understood. We have generated a complete secondary structure for Saccharomyces cerevisiae SRP RNA, scR1. The structure conforms to that of other eukaryotic SRP RNAs. It is rod-shaped with, at opposite ends, binding sites for proteins required for the SRP functions of signal sequence recognition (S-domain) and translational elongation arrest (Alu-domain). Micrococcal nuclease digestion of purified S. cerevisiae SRP separated the S-domain of the RNA from the Alu-domain as a discrete fragment. The Alu-domain resolved into several stable fragments indicating a compact structure. Comparison of scR1 with SRP RNAs of five yeast species related to S. cerevisiae revealed the S-domain to be the most conserved region of the RNA. Extending data from nuclease digestion with phylogenetic comparison, we built the secondary structure model for scR1. The Alu-domain contains large extensions, including a sequence with hallmarks of an expansion segment. Evolutionarily conserved bases are placed in the Alu- and S-domains as in other SRP RNAs, the exception being an unusual GU(4)A loop closing the helix onto which the signal sequence binding Srp54p assembles (domain IV). Surprisingly, several mutations within the predicted Srp54p binding site failed to disrupt SRP function in vivo. However, the strength of the Srp54p-scR1 and, to a lesser extent, Sec65p-scR1 interaction was decreased in these mutant particles. The availability of a secondary structure for scR1 will facilitate interpretation of data from genetic analysis of the RNA. 相似文献