首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
MODBASE is a queryable database of annotated comparative protein structure models. The models are derived by MODPIPE, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database currently contains 3D models for substantial portions of approximately 17 000 proteins from 10 complete genomes, including those of Caenorhabditis elegans, Saccharomyces cerevisiae and Escherichia coli, as well as all the available sequences from Arabidopsis thaliana and Homo sapiens. The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the quality of the models. ModBase is accessible through a web interface at http://guitar.rockefeller.edu/modbase/  相似文献   

3.
Predicting the oxidation state of cysteines by multiple sequence alignment   总被引:4,自引:0,他引:4  
MOTIVATION: Protein sequences found in databanks usually do not report post translational covalent modifications such as the oxidation state of cystein (Cys) residues. Accurate prediction of whether a functionally or structurally important Cys occurs in the oxidized or thiol form would be helpful for molecular biology experiments and structure prediction. RESULTS: A new method is presented for predicting the oxidation state of Cys residues based on multiple sequence alignments and on the observation that Cys tends to occur in the same oxidation state within the same protein. The prediction of the redox state of Cys performs above 82%. The oxidation state of Cys correlates with the cellular location of the given protein within the cell, but the correlation is not perfect (up to 70%). We also perform a statistical analysis of the different redox states of Cys found in secondary structures and buried positions, and of the secondary structures linked by disulfide bonds. The results suggest that the natural borderline lies between the different oxidation states of Cys rather than between the half cystines and cysteins. AVAILABILITY: A web server implementing the prediction method is available at http://guitar.rockefeller.edu/approximately andras/cyspred.html CONTACT: fisera@rockefeller.edu  相似文献   

4.
The Conserved Key Amino Acid Positions DataBase (CKAAPs DB) provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. The derivation and significance of CKAAPs starting from pairwise structure alignments is described fully in Reddy et al. [Reddy,B.V.B., Li,W.W., Shindyalov,I.N. and Bourne,P.E. (2000) PROTEINS:, in press]. The CKAAPs identified from this theoretical analysis are provided to experimentalists and theoreticians for potential use in protein engineering and modeling. It has been suggested that CKAAPs may be crucial features for protein folding, structural stability and function. Over 170 substructures, as defined by the Combinatorial Extension (CE) database, which are found in approximately 3000 representative polypeptide chains have been analyzed and are available in the CKAAPs DB. CKAAPs DB also provides CKAAPs of the representative set of proteins derived from the CE and FSSP databases. Thus the database contains over 5000 representative poly-peptide chains, covering all known structures in the PDB. A web interface to a relational database permits fast retrieval of structure-sequence alignments, CKAAPs and associated statistics. Users may query by PDB ID, protein name, function and Enzyme Classification number. Users may also submit protein alignments of their own to obtain CKAAPs. An interface to display CKAAPs on each structure from a web browser is also being implemented. CKAAPs DB is maintained by the San Diego Supercomputer Center and accessible at the URL http://ckaaps.sdsc.edu.  相似文献   

5.
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.  相似文献   

6.
SUMMARY: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org CONTACT: andreas@sdsc.edu; pbourne@ucsd.edu.  相似文献   

7.
A database comprising all ligand-binding sites of known structure aligned with all related protein sequences and structures is described. Currently, the database contains approximately 50000 ligand-binding sites for small molecules found in the Protein Data Bank (PDB). The structure-structure alignments are obtained by the Combinatorial Extension (CE) program (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998) and sequence-structure alignments are extracted from the ModBase database of comparative protein structure models for all known protein sequences (Sanchez et al., Nucleic Acids Res., 28, 250-253, 2000). It is possible to search for binding sites in LigBase by a variety of criteria. LigBase reports summarize ligand data including relevant structural information from the PDB file, such as ligand type and size, and contain links to all related protein sequences in the TrEMBL database. Residues in the binding sites are graphically depicted for comparison with other structurally defined family members. LigBase provides a resource for the analysis of families of related binding sites.  相似文献   

8.
SUMMARY: A public server for evaluating the accuracy of protein sequence alignment methods is presented. CASA is an implementation of the alignment accuracy benchmark presented by Sauder et al. (Proteins, 40, 6-22, 2000). The benchmark currently contains 39321 pairwise protein structure alignments produced with the CE program from SCOP domain definitions. The server produces graphical and tabular comparisons of the accuracy of a user's input sequence alignments with other commonly used programs, such as BLAST, PSI-BLAST, Clustal W, and SAM-T99. AVAILABILITY: The server is located at http://capb.dbi.udel.edu/casa.  相似文献   

9.
SUMMARY: A website that plots power and sample size calculations over a range of up to eight parameters (including diagnostic misclassification error parameters) for two commonly used statistical tests of genetic association, the linear trend test and the genotypic test of association. AVAILABILITY: This method is made available via the website http://linkage.rockefeller.edu/pawe3d/ CONTACT: pawe3d@linkage.rockefeller.edu.  相似文献   

10.
Cheng H  Kim BH  Grishin NV 《Proteins》2008,70(4):1162-1166
We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup.  相似文献   

11.
Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.  相似文献   

12.
The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.  相似文献   

13.
The Pfam Protein Families Database   总被引:17,自引:0,他引:17       下载免费PDF全文
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgb.ki.se/Pfam/, in France at http://pfam.jouy.inra.fr/ and in the US at http://pfam.wustl.edu/. The latest version (6.6) of Pfam contains 3071 families, which match 69% of proteins in SWISS-PROT 39 and TrEMBL 14. Structural data, where available, have been utilised to ensure that Pfam families correspond with structural domains, and to improve domain-based annotation. Predictions of non-domain regions are now also included. In addition to secondary structure, Pfam multiple sequence alignments now contain active site residue mark-up. New search tools, including taxonomy search and domain query, greatly add to the functionality and usability of the Pfam resource.  相似文献   

14.
MOTIVATION: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate. RESULTS: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods. AVAILABILITY: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas CONTACT: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions.  相似文献   

15.
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling.  相似文献   

16.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.  相似文献   

17.

Background

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.

Results

We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software.

Conclusions

SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.  相似文献   

18.
MOTIVATION: The process of determining the functional sequence content of an organism is confounded by several factors. Large protein coding sequences are relatively easy to find by statistical methods. Smaller proteins however may escape detection due to their size falling below some arbitrary researcher-defined minimum cutoff, or the inability to precisely define a promoter, or translational start (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). Promoter and regulatory sequences themselves are difficult to define due to a significant amount of allowable sequence variation, as well as a probable lack of any completely accurate whole-organismal gene catalogs to date. Finally, certain genes coding functional RNAs may have insufficient structural or sequence constraints to be detectable by normal sequence structure/pattern searching methods (Eddy and Rivas, Bioinformatics, 16, 583-605, 2000). In those cases where there are multiple closely related organisms that have been sequenced, there is additional information that may be used in the investigation of sequence content-that being the possible conserved nature of functional sequences between the organisms. We present a method for the utilization of this conserved information to detect genes and other potentially functional sequences that may be missed by standard ORF-calling, RNA finding, and pattern matching software. The tricross programs produce a multi-way cross comparison of three sets of sequences, determine which are conserved in all three sets, and produce a graphical (Virtual Reality Modelling Language-VRML; (ISO/IEC 14772-1: 1997, VDC), 1997) representation as well as alignments of all sequence triples found. The software can also be applied to a pair of sequence sets, though the noise in the results increases. RESULTS: Tricross has been used to examine the intergenic-sequence content of the three archaeal Pyrococcus genomes to determine the most highly related sequences remaining between the annotated protein and RNA coding sequences. Set to relatively stringent similarity requirements for the search, tricross found 101 intergenic sequences conserved among the three organisms. Interestingly, 29 of these appear to contain members of a family of small RNA molecules (Kiss-Laszlo et al., EMBO J., 17, 797-807, 1998) only recently discovered in the Archaea (Armbruster, OSU, Diss., 1988; Omer et al., Science, 288, 517-522, 2000; Gaspin et al., J. Mol. Biol., 297, 895-906, 2000). While some of the remaining 72 appear to be individual highly conserved promoter sequences, others have no currently known biological significance. Although originally developed to facilitate the examination of intergenic sequences, none of the tricross logic is inherently specific to intergenic sequences. The software can also be applied to gene sequences, and has been used to produce inter-genomic gene order dot-plots for Haemophilus influenzae (Fleischmann et al., Science, 269, 496-512, 1995) versus H.ducreyi (unpublished data), and Neisseria meningiditis Z2491 (serogroup A) (Parkhill et al., Nature, 404, 502-506, 2000) versus Neisseria meningiditis Z58 (serogroup B) (Tettelin et al., Science, 287, 1809-1815, 2000) versus Neisseria gonorrhoeae (Lewis et al., http://micro-gen.ouhsc.edu/, 2000). AVAILABILITY: The tricross software package is available from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html. CONTACT: ray@biosci.ohio-state.edu; daniels.7@osu.edu; munsonr@pediatrics.ohio-state.edu Supplementary information: Additional data from the cross-genomic comparisons examined in the discussion section are linked from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html.  相似文献   

19.
The PredictProtein server   总被引:6,自引:0,他引:6       下载免费PDF全文
Rost B  Liu J 《Nucleic acids research》2003,31(13):3300-3304
PredictProtein (PP, http://cubic.bioc.columbia.edu/pp/) is an internet service for sequence analysis and the prediction of aspects of protein structure and function. Users submit protein sequence or alignments; the server returns a multiple sequence alignment, PROSITE sequence motifs, low-complexity regions (SEG), ProDom domain assignments, nuclear localisation signals, regions lacking regular structure and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions and disulfide-bonds. Upon request, fold recognition by prediction-based threading is available. For all services, users can submit their query either by electronic mail or interactively from World Wide Web.  相似文献   

20.
SUMMARY: A web-based application to analyze protein amino acids conservation-Consensus Sequence (ConSSeq) is presented. ConSSeq graphically represents information about amino acid conservation based on sequence alignments reported in homology-derived structures of proteins. Beyond the relative entropy for each position in the alignment, ConSSeq also presents the consensus sequence and information about the amino acids, which are predominant at each position of the alignment. ConSSeq is part of the STING Millennium Suite and is implemented as a Java Applet. AVAILABILITY: http://sms.cbi.cnptia.embrapa.br/SMS/STINGm/consseq/, http://trantor.bioc.columbia.edu/SMS/STINGm/consseq/, http://mirrors.rcsb.org//SMS/STINGm/consseq/, http://www.es.embnet.org/SMS/STINGm/consseq/ and http://www.ar.embnet.org/SMS/STINGm/consseq/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号