首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200,000 non-redundant PIR and SWISS-PROT proteins organized with more than 28,000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetown.edu/iproclass/.  相似文献   

2.
The apple (Malus domestica) is one of the most economically important fruit crops in the world, due its importance to human nutrition and health. To analyze the function and evolution of different apple genes, we developed apple gene function and gene family database (AppleGFDB) for collecting, storing, arranging, and integrating functional genomics information of the apple. The AppleGFDB provides several layers of information about the apple genes, including nucleotide and protein sequences, chromosomal locations, gene structures, and any publications related to these annotations. To further analyze the functional genomics data of apple genes, the AppleGFDB was designed to enable users to easily retrieve information through a suite of interfaces, including gene ontology, protein domain and InterPro. In addition, the database provides tools for analyzing the expression profiles and microRNAs of the apple. Moreover, all of the analyzed and collected data can be downloaded from the database. The database can also be accessed using a convenient web server that supports a full-text search, a BLAST sequence search, and database browsing. Furthermore, to facilitate cooperation among apple researchers, AppleGFDB is presented in a user-interactive platform, which provides users with the opportunity to modify apple gene annotations and submit publication information for related genes. AppleGFDB is available at http://www.applegene.org or http://gfdb.sdau.edu.cn/.  相似文献   

3.
MOTIVATION: The PFDB (Protein Family Database) is a new database designed to integrate protein family-related data with relevant functional and genomic data. It currently manages biological data for three projects-the CATH protein domain database (Orengo et al., 1997; Pearl et al., 2001), the VIDA virus domains database (Albà et al., 2001) and the Gene3D database (Buchan et al., 2001). The PFDB has been designed to accommodate protein families identified by a variety of sequence based or structure based protocols and provides a generic resource for biological research by enabling mapping between different protein families and diverse biochemical and genetic data, including complete genomes. RESULTS: A characteristic feature of the PFDB is that it has a number of meta-level entities (for example aggregation, collection and inclusion) represented as base tables in the final design. The explicit representation of relationships at the meta-level has a number of advantages, including flexibility-both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design. A potential drawback with this approach-poor performance caused by the number of joins across meta-level tables-is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i. The resultant database is both fast and flexible. This paper presents the principles on which the database has been designed and implemented, and describes the current status of the database and query facilities supported.  相似文献   

4.
Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.  相似文献   

5.
6.
MycDB: an integrated mycobacterial database   总被引:6,自引:0,他引:6  
As part of ongoing efforts to Investigate the molecular biology of the human pathogens in the genus Mycobacterium, a customized database was developed specifically for these organisms and implemented in ACEDB database manager software. The data loaded include the IMMYC Antigen List, details of reagents available from the CDC/WHO Antibody Bank, more than 1 Mb of sequences of mycobacterial genes and proteins from public databases, the physical maps of Mycobacterium leprae and Mycobacterium tuberculosis developed at the institut Pasteur, as well as a subset of the references found in MedLine. The ACEDB software allows both quick and intuitive access to the data and to connections between facts by a simple mouse-driven interface, as well as by more powerful query mechanisms.  相似文献   

7.
ProClass is a protein family database that organizes non-redundant sequence entries into families defined collectively by PIR superfamilies and PROSITE patterns. By combining global similarities and functional motifs into a single classification scheme, ProClass helps to reveal domain and family relationships and classify multi-domain proteins. The database currently consists of >155 000 sequence entries retrieved from both PIR-International and SWISS-PROT databases. Approximately 92 000 or 60% of the ProClass entries are classified into approximately 6000 families, including a large number of new members detected by our GeneFIND family identification system. The ProClass motif collection contains approximately 72 000 motif sequences and >1300 multiple alignments for all PROSITE patterns, including >21 000 matches not listed in PROSITE and mostly detected from unique PIR sequences. To maximize family information retrieval, the database provides links to various protein family, domain, alignment and structural class databases. With its high classification rate and comprehensive family relationships, ProClass can be used to support full-scale genomic annotation. The database, now being implemented in an object-relational database management system, is available for online sequence search and record retrieval from our WWW server at http://pir.georgetown.edu/gfserver/proclass.html  相似文献   

8.

Background  

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures.  相似文献   

9.
As a result of genome, EST and cDNA sequencing projects, there are huge numbers of predicted and/or partially characterised protein sequences compared with a relatively small number of proteins with experimentally determined function and structure. Thus, there is a considerable attention focused on the accurate prediction of gene function and structure from sequence by using bioinformatics. In the course of our analysis of genomic sequence from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs. Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of standard computational tools. We describe the genomic organisation of SAND in these evolutionarily divergent species and identify sequence homologues from EST database searches confirming the expression of SAND in over 20 different eukaryotes. We confirm the expression of two different SAND paralogues in mammals and determine expression of one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural properties of SAND, and characterise conserved sequence motifs in this protein family.  相似文献   

10.
Protocadherin family: diversity, structure, and function   总被引:1,自引:0,他引:1  
Protocadherins are predominantly expressed in the nervous system, and constitute the largest subgroup within the cadherin superfamily. The recent structural elucidation of the amino-terminal cadherin domain in an archetypal protocadherin revealed unique and remarkable features: the lack of an interface for homophilic adhesiveness found in classical cadherins, and the presence of loop structures specific to the protocadherin family. The unique features of protocadherins extend to their genomic organization. Recent findings have revealed unexpected allelic and combinatorial gene regulation for clustered protocadherins, a major subgroup in the protocadherin family. The unique structural repertoire and unusual gene regulation of the protocadherin family may provide the molecular basis for the extraordinary diversity of the nervous system.  相似文献   

11.
Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.  相似文献   

12.
BioSilico is a web-based database system that facilitates the search and analysis of metabolic pathways. Heterogeneous metabolic databases including LIGAND, ENZYME, EcoCyc and MetaCyc are integrated in a systematic way, thereby allowing users to efficiently retrieve the relevant information on enzymes, biochemical compounds and reactions. In addition, it provides well-designed view pages for more detailed summary information. BioSilico is developed as an extensible system with a robust systematic architecture.  相似文献   

13.
14.
This paper describes the first maize database of proteins separated by two-dimensional electrophoresis. Fifty-six coleoptile proteins and 18 leaf proteins from two maize lines were partially microsequenced. Thirty-six proteins (49%) displayed high similarity with database proteins. Nine of these proteins, representing five different functions, had never been described in maize. No conclusive function could be found for 45 polypeptides (61% of the microsequenced proteins). In addition, an alternative identification method, based on amino acid analysis, allowed candidates to be proposed for 17 proteins out of 44 additional proteins analyzed in the coleoptiles. These results are stored in a database which also includes, when available, genetic information about the chromosomal location of structural genes and regulatory factors of proteins. This database is being used in the context of a project on the genetic mapping of the expressed genome in maize.  相似文献   

15.
MOTIVATION: As the sizes of three-dimensional (3D) protein structure databases are growing rapidly nowadays, exhaustive database searching, in which a 3D query structure is compared to each and every structure in the database, becomes inefficient. We propose a rapid 3D protein structure retrieval system named 'ProtDex2', in which we adopt the techniques used in information retrieval systems in order to perform rapid database searching without having access to every 3D structure in the database. The retrieval process is based on the inverted-file index constructed on the feature vectors of the relationships between the secondary structure elements (SSEs) of all the 3D protein structures in the database. ProtDex2 is a significant improvement, both in terms of speed and accuracy, upon its predecessor system, ProtDex. RESULTS: The experimental results show that ProtDex2 is very much faster than two well-known protein structure comparison methods, DALI and CE, yet not sacrificing on the accuracy of the comparison. When comparing with a similar SSE-based method, namely TopScan, ProtDex2 is much faster with comparable degree of accuracy. AVAILABILITY: The software is available at: http://xena1.ddns.comp.nus.edu.sg/~genesis/PD2.htm  相似文献   

16.
Tau protein: an update on structure and function   总被引:2,自引:0,他引:2  
  相似文献   

17.
The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.  相似文献   

18.
Microbes utilize enzymes to perform a variety of functions. Enzymes are biocatalysts working as highly efficient machines at the molecular level. In the past, enzymes have been viewed as static entities and their function has been explained on the basis of direct structural interactions between the enzyme and the substrate. A variety of experimental and computational techniques, however, continue to reveal that proteins are dynamically active machines, with various parts exhibiting internal motions at a wide range of time-scales. Increasing evidence also indicates that these internal protein motions play a role in promoting protein function such as enzyme catalysis. Moreover, the thermodynamical fluctuations of the solvent, surrounding the protein, have an impact on internal protein motions and, therefore, on enzyme function. In this review, we describe recent biochemical and theoretical investigations of internal protein dynamics linked to enzyme catalysis. In the enzyme cyclophilin A, investigations have lead to the discovery of a network of protein vibrations promoting catalysis. Cyclophilin A catalyzes peptidyl-prolyl cis/trans isomerization in a variety of peptide and protein substrates. Recent studies of cyclophilin A are discussed in detail and other enzymes (dihydrofolate reductase and liver alcohol dehydrogenase) where similar discoveries have been reported are also briefly discussed. The detailed characterization of the discovered networks indicates that protein dynamics plays a role in rate-enhancement achieved by enzymes. An integrated view of enzyme structure, dynamics and function have wide implications in understanding allosteric and co-operative effects, as well as protein engineering of more efficient enzymes and novel drug design.  相似文献   

19.
20.
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号