首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
SBASE 5.0 is the fifth release of SBASE, a collection of annotated protein domain sequences that represent various structural, functional, ligand-binding and topogenic segments of proteins. SBASE was designed to facilitate the detection of functional homologies and can be searched with standard database-search programs. The present release contains over 79863 entries provided with standardized names and is cross-referenced to all major sequence databases and sequence pattern collections. The information is assigned to individual domains rather than to entire protein sequences, thus SBASE contains substantially more cross-references and links than do the protein sequence databases. The entries are clustered into >16 000 groups in order to facilitate the detection of distant similarities. SBASE 5.0 is freely available by anonymous 'ftp' file transfer from <ftp.icgeb.trieste.it >. Automated searching of SBASE with BLAST can be carried out with the WWW-server <http://www.icgeb.trieste.it/sbase/ >. and with the electronic mail server <sbase@icgeb.trieste.it >which now also provides a graphic representation of the homologies. A related WWW-server <http://www.abc.hu/blast.html > and e-mail server <domain@hubi.abc.hu > predicts SBASE domain homologies on the basis of SWISS-PROT searches.  相似文献   

2.
SBASE 3.0 is the third release of SBASE, a collection of annotated protein domain sequences. SBASE entries represent various structural, functional, ligand-binding and topogenic segments of proteins as defined by their publishing authors. SBASE can be used for establishing domain homologies using different database-search tools such as FASTA [Lipman and Pearson (1985) Science, 227, 1436-1441], and BLAST3 [Altschul and Lipman (1990) Proc. Natl. Acad. Sci. USA, 87, 5509-5513] which is especially useful in the case of loosely defined domain types for which efficient consensus patterns can not be established. The present release contains 41,749 entries provided with standardized names and cross-referenced to the major protein and nucleic acid databanks as well as to the PROSITE catalogue of protein sequence patterns. The entries are clustered into 2285 groups using the BLAST algorithm for computing similarity measures. SBASE 3.0 is freely available on request to the authors or by anonymous 'ftp' file transfer from < ftp.icgeb.trieste.it >. Individual records can be retrieved with the gopher server at < icgeb.trieste.it > and with a www-server at < http:@www.icgeb.trieste.it >. Automated searching of SBASE by BLAST can be carried out with the electronic mail server < sbase@icgeb.trieste.it >. Another mail server < domain@hubi.abc.hu > assigns SBASE domain homologies on the basis of SWISS-PROT searches. A comparison of pertinent search strategies is presented.  相似文献   

3.
SBASE 7.0 is the seventh release of the SBASE protein domain library sequences that contains 237 937 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to all major sequence databases and sequence pattern collections. The entries are clustered into over 1811 groups and are provided with two WWW-based search facilities for on-line use. SBASE 7.0 is freely available by anonymous 'ftp' file transfer from ftp.icgeb. trieste.it. Automated searching of SBASE with BLAST can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/and http://sbase.abc.hu/sbase/  相似文献   

4.
SBASE 2.0 is the second release of SBASE, a collection of annotated protein domain sequences. SBASE entries represent various structural, functional, ligand-binding and topogenic segments of proteins [Pongor, S. et al. (1993) Prot. Eng., in press]. This release contains 34,518 entries provided with standardized names and it is cross-referenced to the major protein and nucleic acid databanks as well as to the PROSITE catalog of protein sequence patterns [Bairoch, A. (1992) Nucl. Acids Res., 20 suppl, 2013-2018]. SBASE can be used for establishing domain homologies using different database-search tools such as FASTA [Lipman and Pearson (1985) Science, 227, 1436-1441], FASTDB [Brutlag et al. (1990) Comp. Appl. Biosci., 6, 237-245] or BLAST3 [Altschul and Lipman (1990) Proc. Natl. Acad. Sci. USA, 87, 5509-5513] which is especially useful in the case of loosely defined domain types for which efficient consensus patterns can not be established. SBASE 2.0 and a set of search and retrieval tools are freely available on request to the authors or by anonymous 'ftp' file transfer from mean value of ftp.icgeb.trieste.it.  相似文献   

5.
SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous 'ftp' file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc. hu/sbase/.  相似文献   

6.
SBASE (http://www.icgeb.trieste.it/sbase) is an on-line collection of protein domain sequences and related computational tools designed to facilitate detection of domain homologies based on simple database search. The 10th 'jubilee release' of the SBASE library of protein domain sequences contains 1 052 904 protein sequence segments annotated by structure, function, ligand-binding or cellular topology, clustered into over 6000 domain groups. Domain identification and functional prediction are based on a comparison of BLAST search outputs with a knowledge base of biologically significant similarities extracted from known domain groups. The knowledge base is generated automatically for each domain group from the comparison of within-group ('self') and out-of-group ('non-self') similarities. This is a memory-based approach wherein group-specific similarity functions are automatically learned from the database.  相似文献   

7.
RESULTS: A WWW server for protein domain homology prediction, based on BLAST search and a simple data-mining algorithm (Hegyi,H. and Pongor,S. (1993) Comput. Appl. Biosci., 9, 371-372), was constructed providing a tabulated list and a graphic plot of similarities. AVAILABILITY: http://www.icgeb.trieste.it/domain. Mirror site is available at http://sbase.abc.hu/domain. A standalone programme will be available on request. SUPPLEMENTARY INFORMATION: A series of help files is available at the above addresses.  相似文献   

8.
MOTIVATION: A key goal of genomics is to assign function to genes, especially for orphan sequences. RESULTS: We compared the clustered functional domains in the SBASE database to each protein sequence using BLASTP. This representation for a protein is a vector, where each of the non-zero entries in the vector indicates a significant match between the sequence of interest and the SBASE domain. The machine learning methods nearest neighbour algorithm (NNA) and support vector machines are used for predicting protein functional classes from this information. We find that the best results are found using the SBASE-A database and the NNA, namely 72% accuracy for 79% coverage. We tested an assigning function based on searching for InterPro sequence motifs and by taking the most significant BLAST match within the dataset. We applied the functional domain composition method to predict the functional class of 2018 currently unclassified yeast open reading frames. AVAILABILITY: A program for the prediction method, that uses NNA called Functional Class Prediction based on Functional Domains (FCPFD) is available and can be obtained by contacting Y.D.Cai at y.cai@umist.ac.uk  相似文献   

9.
10.
SUMMARY: A WWW server is described for creating 3D models of canonical or bent DNA starting from sequence data. Predicted DNA trajectory is first computed based on a choice of di- and tri-nucleotide models (M.G. Munteanu et al., Trends Biochem. Sci. 23, 341-347, 1998); an atomic model is then constructed and optionally energy-minimized with constrained molecular dynamics. The data are presented as a standard PDB file, directly viewable on the user's PC using any molecule manipulation program. AVAILABILITY: The model.it server is freely available at http://www.icgeb.trieste.it/dna/ CONTACT: kristian@icgeb.trieste.it; pongor@icgeb.trieste.it SUPPLEMENTARY INFORMATION: a series of help files is available at the above address.  相似文献   

11.
The ProDom database of protein domain families.   总被引:12,自引:1,他引:11       下载免费PDF全文
F Corpet  J Gouzy    D Kahn 《Nucleic acids research》1998,26(1):323-326
The ProDom database contains protein domain families generated from the SWISS-PROT database by automated sequence comparisons. It can be searched on the World Wide Web (http://protein.toulouse.inra. fr/prodom.html ) or by E-mail (prodom@toulouse.inra.fr) to study domain arrangements within known families or new proteins. Strong emphasis has been put on the graphical user interface which allows for interactive analysis of protein homology relationships. Recent improvements to the server include: ProDom search by keyword; links to PROSITE and PDB entries; more sensitive ProDom similarity search with BLAST or WU-BLAST; alignments of query sequences with homologous ProDom domain families; and links to the SWISS-MODEL server (http: //www.expasy.ch/swissmod/SWISS-MODEL.html ) for homology based 3-D domain modelling where possible.  相似文献   

12.
MOTIVATION: A simple and fast algorithm is described that calculates a measure of protrusion (cx) for atoms in protein structures, directly useable with the common molecular graphics programs. RESULTS: A sphere of predetermined radius is centered around each non-hydrogen atom, and the volume occupied by the protein and the free volume within the sphere (internal and external volumes, respectively) are calculated. Atoms in protruding regions have a high ratio (cx) between the external and the internal volume. The program reads a PDB file, and writes the output in the same format, with cx values in the B factor field. Output structure files can be directly displayed with standard molecular graphics programs like RASMOL, MOLMOL, Swiss-PDB Viewer and colored according to cx values. We show the potential use of this program in the analysis of two protein-protein complexes and in the prediction of limited proteolysis sites in native proteins. AVAILABILITY: The algorithm is implemented in a standalone program written in C and its source is freely available at ftp.icgeb.trieste.it/pub/CX or on request from the authors.  相似文献   

13.
SUMMARY: A web server has been established for the statistical evaluation of introns in various taxonomic groups and the comparison of taxonomic groups in terms of intron type, length, base composition, etc. The options include the graphic analysis of splice sites and a probability test for exon-shuffling within the selected group. AVAILABILITY: introns.abc.hu, http://www.icgeb.trieste.it/introns  相似文献   

14.
Vlahovicek K  Munteanu MG  Pongor S 《Genetica》1999,106(1-2):63-73
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http://www.icgeb.trieste.it/dna).This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

15.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).  相似文献   

16.
The WWW servers at http://www.icgeb.trieste.it/dna/ are dedicated to the analysis of user-submitted DNA sequences; plot.it creates parametric plots of 45 physicochemical, as well as statistical, parameters; bend.it calculates DNA curvature according to various methods. Both programs provide 1D as well as 2D plots that allow localisation of peculiar segments within the query. The server model.it creates 3D models of canonical or bent DNA starting from sequence data and presents the results in the form of a standard PDB file, directly viewable on the user's PC using any molecule manipulation program. The recently established introns server allows statistical evaluation of introns in various taxonomic groups and the comparison of taxonomic groups in terms of length, base composition, intron type etc. The options include the analysis of splice sites and a probability test for exon-shuffling.  相似文献   

17.
MOTIVATION: The blastp and tblastn modules of BLAST are widely used methods for searching protein queries against protein and nucleotide databases, respectively. One heuristic used in BLAST is to consider only database sequences that contain a high-scoring match of length at most 5 to the query. We implemented the capability to use words of length 6 or 7. We demonstrate an improved trade-off between running time and retrieval accuracy, controlled by the score threshold used for short word matches. For example, the running time can be reduced by 20-30% while achieving ROC (receiver operator characteristic) scores similar to those obtained with current default parameters. AVAILABILITY: The option to use long words is in the NCBI C and C++ toolkit code for BLAST, starting with version 2.2.16 of blastall. A Linux executable used to produce the results herein is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/protein_longwords  相似文献   

18.
PISCES: a protein sequence culling server   总被引:21,自引:0,他引:21  
PISCES is a public server for culling sets of protein sequences from the Protein Data Bank (PDB) by sequence identity and structural quality criteria. PISCES can provide lists culled from the entire PDB or from lists of PDB entries or chains provided by the user. The sequence identities are obtained from PSI-BLAST alignments with position-specific substitution matrices derived from the non-redundant protein sequence database. PISCES therefore provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity and often overestimates sequence identity by aligning only well-conserved fragments. PDB sequences are updated weekly. PISCES can also cull non-PDB sequences provided by the user as a list of GenBank identifiers, a FASTA format file, or BLAST/PSI-BLAST output.  相似文献   

19.
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

The datasets are available at http://hydra.icgeb.trieste.it/benchmark.  相似文献   


20.
PRIDE-NMR is a fast novel method to relate known protein folds to NMR distance restraints. It can be used to obtain a first guess about a structure being determined, as well as to estimate the completeness or verify the correctness of NOE data. AVAILABILITY: The PRIDE-NMR server is available at http://www.icgeb.org/pride  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号