首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
A new version of the RDP (Ribosomal Database Project).   总被引:69,自引:0,他引:69       下载免费PDF全文
The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.  相似文献   

2.
Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for protein sequence alignments, they do not describe an evolutionary model. BLOSUM matrices do not permit the estimation of the actual number of amino acid substitutions between sequences by correcting for multiple hits. The method presented here uses the Blocks database of protein alignments, along with the additivity of evolutionary distances, to approximate the amino acid substitution probabilities as a function of actual evolutionary distance. The PMB (Probability Matrix from Blocks) defines a new evolutionary model for protein evolution that can be used for evolutionary analyses of protein sequences. Our model is directly derived from, and thus compatible with, the BLOSUM matrices. The model has the additional advantage of being easily implemented.  相似文献   

3.
Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.  相似文献   

4.
5.
The pseudouridine synthases catalyze the isomerization of uridine to pseudouridine at particular positions in certain RNA molecules. Genomic data base searches and sequence alignments using the first four identified pseudouridine synthases led Koonin (Koonin, E. V. (1996) Nucleic Acids Res. 24, 2411-2415) and, independently, Santi and co-workers (Gustafsson, C., Reid, R., Greene, P. J., and Santi, D. V. (1996) Nucleic Acids Res. 24, 3756-3762) to group this class of enzyme into four families, which display no statistically significant global sequence similarity to each other. Upon further scrutiny (Huang, H. L., Pookanjanatavip, M., Gu, X. G., and Santi, D. V. (1998) Biochemistry 37, 344-351), the Santi group discovered that a single aspartic acid residue is the only amino acid present in all of the aligned sequences; they then demonstrated that this aspartic acid residue is catalytically essential in one pseudouridine synthase. To test the functional significance of the sequence alignments in light of the global dissimilarity between the pseudouridine synthase families, we changed the aspartic acid residue in representatives of two additional families to both alanine and cysteine: the mutant enzymes are catalytically inactive but retain the ability to bind tRNA substrate. We have also verified that the mutant enzymes do not release uracil from the substrate at a rate significant relative to turnover by the wild-type pseudouridine synthases. Our results clearly show that the aligned aspartic acid residue is critical for the catalytic activity of pseudouridine synthases from two additional families of these enzymes, supporting the predictive power of the sequence alignments and suggesting that the sequence motif containing the aligned aspartic acid residue might be a prerequisite for pseudouridine synthase function.  相似文献   

6.
Multiple sequence alignments are powerful tools for understanding the structures, functions, and evolutionary histories of linear biological macromolecules (DNA, RNA, and proteins), and for finding homologs in sequence databases. We address several ontological issues related to RNA sequence alignments that are informed by structure. Multiple sequence alignments are usually shown as two-dimensional (2D) matrices, with rows representing individual sequences, and columns identifying nucleotides from different sequences that correspond structurally, functionally, and/or evolutionarily. However, the requirement that sequences and structures correspond nucleotide-by-nucleotide is unrealistic and hinders representation of important biological relationships. High-throughput sequencing efforts are also rapidly making 2D alignments unmanageable because of vertical and horizontal expansion as more sequences are added. Solving the shortcomings of traditional RNA sequence alignments requires explicit annotation of the meaning of each relationship within the alignment. We introduce the notion of “correspondence,” which is an equivalence relation between RNA elements in sets of sequences as the basis of an RNA alignment ontology. The purpose of this ontology is twofold: first, to enable the development of new representations of RNA data and of software tools that resolve the expansion problems with current RNA sequence alignments, and second, to facilitate the integration of sequence data with secondary and three-dimensional structural information, as well as other experimental information, to create simultaneously more accurate and more exploitable RNA alignments.  相似文献   

7.
The European large subunit ribosomal RNA database   总被引:5,自引:1,他引:4  
The European Large Subunit (LSU) Ribosomal RNA (rRNA) database is accessible via the rRNA WWW Server at URL http://rrna.uia.ac.be/lsu/. It is a curated database that compiles complete or nearly complete LSU rRNA sequences in aligned form, and also incorporates secondary structure information for each sequence. Taxonomic information, literature references and other information about the sequences are also available, and can be searched via the WWW interface.  相似文献   

8.
Ribosomal DNA internal transcribed spacers (ITS) and partial external transcribed spacers (ETSf) are popularly used to infer evolutionary hypotheses. However, there is generally little consideration given to the secondary structures of these small RNA molecules and their potential effects on sequence alignment and phylogenetic analyzes. Intergeneric relationships amongst three of the four major lineages in the Sapindaceae, the Dodonaeoideae, Hippcastanoideae and Xanthoceroideae were assessed by firstly, generating secondary structure predictions for ITS and partial ETSf sequences, and then these predictions were used to assist alignment of the sequences. Secondly, the alignment was analyzed using RNA specific models of sequence evolution that account for the variation in nucleotide evolution in the independent loops and covariating stems regions of the ribosomal spacers. These models and phylogeny drawn from these analyzes were compared with that from analyzes using ‘traditional’ 4-state models and previous plastid analyzes. These analyzes identified that paired-site models developed to deal specifically with stem structures in RNA encoding sequences more appropriately account for the evolutionary history of the sequences than traditional 4-state substitution models.  相似文献   

9.
The European Large Subunit Ribosomal RNA Database compiles all complete or nearly complete large subunit ribosomal RNA sequences available from public sequence databases. These are provided in aligned format and the secondary structure, as derived by comparative sequence analysis, is included. Additional information about the sequences such as literature references and taxonomic information is also included. The database is available from our WWW server at http://rrna.uia.ac.be/lsu/.  相似文献   

10.
Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.  相似文献   

11.
We introduce the PSSH ('Protein Sequence-to-Structure Homologies') database derived from HSSP2, an improved version of the HSSP ('Homology-derived Secondary Structure of Proteins') database [Dodge et al. (1998) Nucleic Acids Res., 26, 313-315]. Whereas each HSSP entry lists all protein sequences related to a given 3D structure, PSSH is the 'inverse', with each entry listing all structures related to a given sequence. In addition, we introduce two other derived databases: HSSPchain, in which each entry lists all sequences related to a given PDB chain, and HSSPalign, in which each entry gives details of one sequence aligned onto one PDB chain. This re-organization makes it easier to navigate from sequence to structure, and to map sequence features onto 3D structures. Currently (September 2002), PSSH provides structural information for over 400 000 protein sequences, covering 48% of SWALL and 61% of SWISS-PROT sequences; HSSPchain provides sequence information for over 25 000 PDB chains, and HSSPalign gives over 14 million sequence-to-structure alignments. The databases can be accessed via SRS 3D, an extension to the SRS system, at http://srs3d.ebi.ac.uk/.  相似文献   

12.
On the basis of sequence alignments, the pseudouridine synthases were grouped into four families that share no statistically significant global sequence similarity, though some common sequence motifs were discovered [Koonin, E. V. (1996) Nucleic Acids. Res. 24, 2411-2415; Gustafsson, C., Reid, R., Greene, P. J., and Santi, D. V. (1996) Nucleic Acids Res. 24, 3756-3762]. We have investigated the functional significance of these alignments by substituting the nearly invariant lysine and proline residues in Motif I of RluA and TruB, pseudouridine synthases belonging to different families. Contrary to our expectations, the altered enzymes display only very mild kinetic impairment. Substitution of the aligned lysine and proline residues does, however, reduce structural stability, consistent with a temperature sensitive phenotype that results from substitution of the cognate proline residue in Cbf5p, a yeast homologue of TruB [Zerbarjadian, Y., King, T., Fournier, M. J., Clarke, L., and Carbon, J. (1999) Mol. Cell. Biol. 19, 7461-7472]. Together, our data support a functional role for Motif I, as predicted by sequence alignments, though the effect of substituting the highly conserved residues was milder than we anticipated. By extrapolation, our findings also support the assignment of pseudouridine synthase function to certain physiologically important eukaryotic proteins that contain Motif I, including the human protein dyskerin, alteration of which leads to the disease dyskeratosis congenita.  相似文献   

13.
High-Density Microarray of Small-Subunit Ribosomal DNA Probes   总被引:19,自引:3,他引:16       下载免费PDF全文
Ribosomal DNA sequence analysis, originally conceived as a way to provide a universal phylogeny for life forms, has proven useful in many areas of biological research. Some of the most promising applications of this approach are presently limited by the rate at which sequences can be analyzed. As a step toward overcoming this limitation, we have investigated the use of photolithography chip technology to perform sequence analyses on amplified small-subunit rRNA genes. The GeneChip (Affymetrix Corporation) contained 31,179 20-mer oligonucleotides that were complementary to a subalignment of sequences in the Ribosomal Database Project (RDP) (B. L. Maidak et al., Nucleic Acids Res. 29:173-174, 2001). The chip and standard Affymetrix software were able to correctly match small-subunit ribosomal DNA amplicons with the corresponding sequences in the RDP database for 15 of 17 bacterial species grown in pure culture. When bacteria collected from an air sample were tested, the method compared favorably with cloning and sequencing amplicons in determining the presence of phylogenetic groups. However, the method could not resolve the individual sequences comprising a complex mixed sample. Given these results and the potential for future enhancement of this technology, it may become widely useful.  相似文献   

14.
High-density microarray of small-subunit ribosomal DNA probes   总被引:18,自引:0,他引:18  
Ribosomal DNA sequence analysis, originally conceived as a way to provide a universal phylogeny for life forms, has proven useful in many areas of biological research. Some of the most promising applications of this approach are presently limited by the rate at which sequences can be analyzed. As a step toward overcoming this limitation, we have investigated the use of photolithography chip technology to perform sequence analyses on amplified small-subunit rRNA genes. The GeneChip (Affymetrix Corporation) contained 31,179 20-mer oligonucleotides that were complementary to a subalignment of sequences in the Ribosomal Database Project (RDP) (B. L. Maidak et al., Nucleic Acids Res. 29:173-174, 2001). The chip and standard Affymetrix software were able to correctly match small-subunit ribosomal DNA amplicons with the corresponding sequences in the RDP database for 15 of 17 bacterial species grown in pure culture. When bacteria collected from an air sample were tested, the method compared favorably with cloning and sequencing amplicons in determining the presence of phylogenetic groups. However, the method could not resolve the individual sequences comprising a complex mixed sample. Given these results and the potential for future enhancement of this technology, it may become widely useful.  相似文献   

15.
MOTIVATION: We review proposed syntheses of probabilistic sequence alignment, profiling and phylogeny. We develop a multiple alignment algorithm for Bayesian inference in the links model proposed by Thorne et al. (1991, J. Mol. Evol., 33, 114-124). The algorithm, described in detail in Section 3, samples from and/or maximizes the posterior distribution over multiple alignments for any number of DNA or protein sequences, conditioned on a phylogenetic tree. The individual sampling and maximization steps of the algorithm require no more computational resources than pairwise alignment. METHODS: We present a software implementation (Handel) of our algorithm and report test results on (i) simulated data sets and (ii) the structurally informed protein alignments of BAliBASE (Thompson et al., 1999, Nucleic Acids Res., 27, 2682-2690). RESULTS: We find that the mean sum-of-pairs score (a measure of residue-pair correspondence) for the BAliBASE alignments is only 13% lower for Handelthan for CLUSTALW(Thompson et al., 1994, Nucleic Acids Res., 22, 4673-4680), despite the relative simplicity of the links model (CLUSTALW uses affine gap scores and increased penalties for indels in hydrophobic regions). With reference to these benchmarks, we discuss potential improvements to the links model and implications for Bayesian multiple alignment and phylogenetic profiling. AVAILABILITY: The source code to Handelis freely distributed on the Internet at http://www.biowiki.org/Handel under the terms of the GNU Public License (GPL, 2000, http://www.fsf.org./copyleft/gpl.html).  相似文献   

16.
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.  相似文献   

17.
Goonesekere NC  Lee B 《Proteins》2008,71(2):910-919
The sequence homology detection relies on score matrices, which reflect the frequency of amino acid substitutions observed in a dataset of homologous sequences. The substitution matrices in popular use today are usually constructed without consideration of the structural context in which the substitution takes place. Here, we present amino acid substitution matrices specific for particular polar-nonpolar environment of the amino acid. As expected, these matrices [context-specific substitution matrices (CSSMs)] show striking differences from the popular BLOSUM62 matrix, which does not include structural information. When incorporated into BLAST and PSI-BLAST, CSSM outperformed BLOSUM matrices as assessed by ROC curve analyses of the number of true and false hits and by the accuracy of the sequence alignments to the hit sequences. These findings are also of relevance to profile-profile-based methods of homology detection, since CSSMs may help build a better profile. Profiles generated for protein sequences in PDB using CSSM-PSI-BLAST will be made available for searching via RPSBLAST through our web site http://lmbbi.nci.nih.gov/.  相似文献   

18.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

19.
Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of blast.  相似文献   

20.
Here we advocate the use of 2-dimensional data representation in the context of the informational approach of sequence analysis (Claverie & Bougueleret (1986) Nucleic Acids Research 14, 179-196) by applying these methods to the problem of intron/exon discrimination. Two main findings are reported: i) oligonucleotide patterns complementary to the Ul small nuclear RNA are specifically avoided in exon sequences, ii) vertebrate intron sequences, to the exclusion of other eukaryotic phyla, are characterized by a peculiar distribution of CpG containing patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号