期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

EMBOPRO--an automatically generated protein sequence database

Stulich R.; Rohde K. 《Bioinformatics (Oxford, England)》1989,5(1):15-18

For the identification of newly sequenced proteins it is necessaryto have a large stock of known proteins for comparison. In thispaper we present an automatically generated protein sequencedatabase. The translation program introduced allows a periodicaltranslation of every new release of the EMBL database. Possibleerrors of the translation are discussed as well as the reliabilityof the nucleotide sequence data, which turns out to be quitegood. A comparison of our translated database with some establishedones is given. Received on December 15, 1987; accepted on April 19, 1988 相似文献

2.

OWL--a non-redundant composite protein sequence database. 总被引：4，自引：1，他引：4

下载免费PDF全文

A J Bleasby D Akrigg T K Attwood 《Nucleic acids research》1994,22(17):3574-3577

A comprehensive, non-redundant composite protein sequence database is described. The database, OWL, is an amalgam of data from six publicly-available primary sources, and is generated using strict redundancy criteria. The database is updated monthly and its size has increased almost eight-fold in the last six years: the current version contains > 76,000 entries. For added flexibility, OWL is distributed with a tailor-made query language, together with a number of programs for database exploration, information retrieval and sequence analysis, which together form an integrated database and software resource for protein sequences. 相似文献

3.

Extracting protein alignment models from the sequence database. 总被引：14，自引：2，他引：14

下载免费PDF全文

A F Neuwald J S Liu D J Lipman C E Lawrence 《Nucleic acids research》1997,25(9):1665-1677

Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences. 相似文献

4.

SWISS-PROT: the curated protein sequence database on Internet 总被引：2，自引：0，他引：2

Watanabe K Harayama S 《Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme》2001,46(1):80-86

相似文献

5.

Searching for frameshift evolutionary relationships between protein sequence families

Pellegrini M Yeates TO 《Proteins》1999,37(2):278-283

The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences. 相似文献

6.

3MOTIF: visualizing conserved protein sequence motifs in the protein structure database

Bennett SP Nevill-Manning CG Brutlag DL 《Bioinformatics (Oxford, England)》2003,19(4):541-542

SUMMARY: 3MOTIF is a web application that visually maps conserved sequence motifs onto three-dimensional protein structures in the Protein Data Bank (PDB; Berman et al., Nucleic Acids Res., 28, 235-242, 2000). Important properties of motifs such as conservation strength and solvent accessible surface area at each position are visually represented on the structure using a variety of color shading schemes. Users can manipulate the displayed motifs using the freely available Chime plugin. AVAILABILITY: http://motif.stanford.edu/3motif/ 相似文献

7.

Searching the protein structure databank with weak sequence patterns and structural constraints

Jonassen I Eidhammer I Grindhaug SH Taylor WR 《Journal of molecular biology》2000,304(4):599-619

A method is described in which proteins that match PROSITE patterns are filtered by the root-mean-square deviation of the local 3D structures of the probe and target over the pattern components. This was found to increase the discrimination between true and false members of the protein family but was dependent on how unique the structural features in the pattern were compared to equivalent fragments extracted from the structure databank (for example; if the pattern fell in an alpha-helix, then discrimination was poor.) We then generalised the sequence patterns (by widening the range of amino acid residues allowed at each position) and monitored how well the structural information helped retain specificity. While the discrimination of the pure sequence pattern had generally disappeared at information content values less than ten bits, the discrimination of the combined sequence structure probe remained high at this point before following a similar decay. The displacement between these curves indicates that the structural component is, on average, equivalent to about ten bits. The sequence patterns were also filtered using the structure comparison program SAP, giving a global, rather than local "view" of the proteins. This allowed the information content of the sequence patterns to become even less specific but raised problems of whether some proteins encountered with the same fold but no PROSITE pattern should constitute family members. 相似文献

8.

PIR-ALN: a database of protein sequence alignments.

G Y Srinivasarao L S Yeh C R Marzec B C Orcutt W C Barker 《Bioinformatics (Oxford, England)》1999,15(5):382-390

MOTIVATION: The Protein Information Resource (PIR) maintains a database of annotated and curated alignments in order to visually represent interrelationships among sequences in the PIR-International Protein Sequence Database, to spread and standardize protein names, features and keywords among members of a family or superfamily, and to aid us in classifying sequences, in identifying conserved regions, and in defining new homology domains. RESULTS: Release 22.0, (December 1998), of the PIR-ALN database contains a total of 3806 alignments, including 1303 superfamily, 2131 family and 372 homology domain alignments. This is an appropriate dataset to develop and extract patterns, test profiles, train neural networks or build Hidden Markov Models (HMMs). These alignments can be used to standardize and spread annotation to newer members by homology, as well as to understand the modular architecture of multidomain proteins. PIR-ALN includes 529 alignments that can be used to develop patterns not represented in PROSITE, Blocks, PRINTS and Pfam databases. The ATLAS information retrieval system can be used to browse and query the PIR-ALN alignments. AVAILABILITY: PIR-ALN is currently being distributed as a single ASCII text file along with the title, member, species, superfamily and keyword indexes. The quarterly and weekly updates can be accessed via the WWW at pir.georgetown.edu. The quarterly updates can also be obtained by anonymous FTP from the PIR FTP site at NBRF.Georgetown.edu, directory [ANONYMOUS.PIR.ALIGNMENT]. 相似文献

9.

PFDB: a generic protein family database integrating the CATH domain structure database with sequence based protein family resources

Shepherd AJ Martin NJ Johnson RG Kellam P Orengo CA 《Bioinformatics (Oxford, England)》2002,18(12):1666-1672

MOTIVATION: The PFDB (Protein Family Database) is a new database designed to integrate protein family-related data with relevant functional and genomic data. It currently manages biological data for three projects-the CATH protein domain database (Orengo et al., 1997; Pearl et al., 2001), the VIDA virus domains database (Albà et al., 2001) and the Gene3D database (Buchan et al., 2001). The PFDB has been designed to accommodate protein families identified by a variety of sequence based or structure based protocols and provides a generic resource for biological research by enabling mapping between different protein families and diverse biochemical and genetic data, including complete genomes. RESULTS: A characteristic feature of the PFDB is that it has a number of meta-level entities (for example aggregation, collection and inclusion) represented as base tables in the final design. The explicit representation of relationships at the meta-level has a number of advantages, including flexibility-both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design. A potential drawback with this approach-poor performance caused by the number of joins across meta-level tables-is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i. The resultant database is both fast and flexible. This paper presents the principles on which the database has been designed and implemented, and describes the current status of the database and query facilities supported. 相似文献

10.

Searching databases of conserved sequence regions by aligning protein multiple-alignments. 总被引：14，自引：2，他引：14

下载免费PDF全文

S Pietrokovski 《Nucleic acids research》1996,24(19):3836-3845

A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch. 相似文献

11.

CHIKVPRO - a protein sequence annotation database for chikungunya virus

Mishra AK Jain CK Agrawal A Jain SJ Dudha N Kumar K Sharma SK Gupta S 《Bioinformation》2010,5(1):4-6

In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for Biotechnology Information) database. Hereby we submit a database "CHIKVPRO" containing structural and functional annotation of Chikungunya virus proteins (25 strains) submitted in the NCBI repository. The CHIKV genome encodes for 9 proteins:4 non-structural and 5 structural. The CHIKVPRO database aims to provide the virology community with a single accession authoritative resource for CHIKV proteome- with reference to physiochemical and molecular properties, proteolytic cleavage sites, hydrophobicity, transmembrane prediction, and classification into functional families using SVMProt and other Expasy tools. AVAILABILITY: The database is freely available at http://www.chikvpro.info/ 相似文献

12.

Mass accuracy and sequence requirements for protein database searching.

M K Green M V Johnston B S Larsen 《Analytical biochemistry》1999,275(1):39-46

To elucidate the role of high mass accuracy in mass spectrometric peptide mapping and database searching, selected proteins were subjected to tryptic digestion and the resulting mixtures were analyzed by electrospray ionization on a 7 Tesla Fourier transform mass spectrometer with a mass accuracy of 1 ppm. Two extreme cases were examined in detail: equine apomyoglobin, which digested easily and gave very few spurious masses, and bovine alpha-lactalbumin, which under the conditions used, gave many spurious masses. The effectiveness of accurate mass measurements in minimizing false protein matches was examined by varying the mass error allowed in the search over a wide range (2-500 ppm). For the "clean" data obtained from apomyoglobin, very few masses were needed to return valid protein matches, and the mass error allowed in the search had little effect up to 500 ppm. However, in the case of alpha-lactalbumin more mass values were needed, and low mass errors increased the search specificity. Mass errors below 30 ppm were particularly useful in eliminating false protein matches when few mass values were used in the search. Collision-induced dissociation of an unassigned peak in the alpha-lactalbumin digest provided sufficient data to unambiguously identify the peak as a fragment from alpha-lactalbumin and eliminate a large number of spurious proteins found in the peptide mass search. The results show that even with a relatively high mass error (0.8 Da for mass differences between singly charged product ions), collision-induced dissociation can help identify proteins in cases where unfavorable digest conditions or modifications render digest peaks unidentifiable by a simple mass mapping search. 相似文献

13.

The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 总被引：65，自引：2，他引：65

下载免费PDF全文

Bairoch A Apweiler R 《Nucleic acids research》2000,28(1):45-48

相似文献

14.

Mining the schistosome DNA sequence database

Oliveira G Johnston DA 《Trends in parasitology》2001,17(10):501-503

相似文献

15.

Increased coverage obtained by combination of methods for protein sequence database searching

Webber C Barton GJ 《Bioinformatics (Oxford, England)》2003,19(11):1397-1403

MOTIVATION: Sequence alignment methods that compare two sequences (pairwise methods) are important tools for the detection of biological sequence relationships. In genome annotation, multiple methods are often run and agreement between methods taken as confirmation. In this paper, we assess the advantages of combining search methods by comparing seven pairwise alignment methods, including three local dynamic programming algorithms (PRSS, SSEARCH and SCANPS), two global dynamic programming algorithms (GSRCH and AMPS) and two heuristic approximations (BLAST and FASTA), individually and by pairwise intersection and union of their result lists at equal p-value cut-offs. RESULTS: When applied singly, the dynamic programming methods SCANPS and SSEARCH gave significantly better coverage (p=0.01) compared to AMPS, GSRCH, PRSS, BLAST and FASTA. Results ranked by BLAST p-values gave significantly better coverage compared to ranking by BLAST e-values. Of 56 combinations of eight methods considered, 19 gave significant increases in coverage at low error compared to the parent methods at an equal p-value cutoff. The union of results by BLAST (p-value) and FASTA at an equal p-value cutoff gave significantly better coverage than either method individually. The best overall performance was obtained from the intersection of the results from SSEARCH and the GSRCH62 global alignment method. At an error level of five false positives, this combination found 444 true positives, a significant 12.4% increase over SSEARCH applied alone. 相似文献

16.

Local structure-based sequence profile database for local and global protein structure predictions

Yang AS Wang LY 《Bioinformatics (Oxford, England)》2002,18(12):1650-1657

MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology. 相似文献

17.

The EMBL nucleotide sequence database

下载免费PDF全文

Stoesser G Baker W van den Broek A Camon E Garcia-Pastor M Kanz C Kulikova T Lombard V Lopez R Parkinson H Redaschi N Sterk P Stoehr P Tuli MA 《Nucleic acids research》2001,29(1):17-21

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. 相似文献

18.

The EMBL nucleotide sequence database 总被引：14，自引：0，他引：14

下载免费PDF全文

Baker W van den Broek A Camon E Hingamp P Sterk P Stoesser G Tuli MA 《Nucleic acids research》2000,28(1):19-23

The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database (http://www.ebi.ac. uk/embl/index.html ) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). Data is exchanged amongst the collaborative databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. WEBIN is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via Internet and WWW interfaces. EBI's Sequence Retrieval System (SRS) is a network browser for databanks in molecular biology, integrating and linking the main nucleotide and protein databases plus many specialised databases. For sequence similarity searching a variety of tools (e.g., BLITZ, FASTA, BLAST) are available which allow external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT. 相似文献

19.

生物信息数据库与序列分析

欧阳平《生物学通报》2007,42(1):24-25

随着生物信息学与生物技术的不断发展,生物信息数据库中数据呈指数增长,理解其中所包含的生物学知识,揭示生物内在规律将成为今后自然科学研究中的重要课题。对近几年来国外常用生物信息数据库的使用作了简介,同时也较为详细地描述了如何进行序列分析。相似文献

20.

The stem cell sequence database

《Trends in molecular medicine》2001,7(5):200

相似文献