首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
TargetDB: a target registration database for structural genomics projects   总被引:2,自引:0,他引:2  
TargetDB is a centralized target registration database that includes protein target data from the NIH structural genomics centers and a number of international sites. TargetDB, which is hosted by the Protein Data Bank (RCSB PDB), provides status information on target sequences and tracks their progress through the various stages of protein production and structure determination. A simple search form permits queries based on contributing site, target ID, protein name, sequence, status and other data. The progress of individual targets or entire structural genomics projects may be tracked over time, and target data from all contributing centers may also be downloaded in the XML format. AVAILABILITY: TargetDB is available at http://targetdb.pdb.org/  相似文献   

2.
It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these “low-complexity” sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.  相似文献   

3.
The Protein Data Bank (PDB; http://www.pdb.org/) continues to be actively involved in various aspects of the informatics of structural genomics projects--developing and maintaining the Target Registration Database (TargetDB), organizing data dictionaries that will define the specification for the exchange and deposition of data with the structural genomics centers and creating software tools to capture data from standard structure determination applications.  相似文献   

4.
Membrane Protein Structure Initiative (MPSI) exploits laboratory competencies to work collaboratively and distribute work among the different sites. This is possible as protein structure determination requires a series of steps, starting with target selection, through cloning, expression, purification, crystallization and finally structure determination. Distributed sites create a unique set of challenges for integrating and passing on information on the progress of targets. This role is played by the Protein Information Management System (PIMS), which is a laboratory information management system (LIMS), serving as a hub for MPSI, allowing collaborative structural proteomics to be carried out in a distributed fashion. It holds key information on the progress of cloning, expression, purification and crystallization of proteins. PIMS is employed to track the status of protein targets and to manage constructs, primers, experiments, protocols, sample locations and their detailed histories: thus playing a key role in MPSI data exchange. It also serves as the centre of a federation of interoperable information resources such as local laboratory information systems and international archival resources, like PDB or NCBI. During the challenging task of PIMS integration, within the MPSI, we discovered a number of prerequisites for successful PIMS integration. In this article we share our experiences and provide invaluable insights into the process of LIMS adaptation. This information should be of interest to partners who are thinking about using LIMS as a data centre for their collaborative efforts.  相似文献   

5.
Membrane Protein Structure Initiative (MPSI) exploits laboratory competencies to work collaboratively and distribute work among the different sites. This is possible as protein structure determination requires a series of steps, starting with target selection, through cloning, expression, purification, crystallization and finally structure determination. Distributed sites create a unique set of challenges for integrating and passing on information on the progress of targets. This role is played by the Protein Information Management System (PIMS), which is a laboratory information management system (LIMS), serving as a hub for MPSI, allowing collaborative structural proteomics to be carried out in a distributed fashion. It holds key information on the progress of cloning, expression, purification and crystallization of proteins. PIMS is employed to track the status of protein targets and to manage constructs, primers, experiments, protocols, sample locations and their detailed histories: thus playing a key role in MPSI data exchange. It also serves as the centre of a federation of interoperable information resources such as local laboratory information systems and international archival resources, like PDB or NCBI. During the challenging task of PIMS integration, within the MPSI, we discovered a number of prerequisites for successful PIMS integration. In this article we share our experiences and provide invaluable insights into the process of LIMS adaptation. This information should be of interest to partners who are thinking about using LIMS as a data centre for their collaborative efforts.  相似文献   

6.
We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium, allowing collaborative structural proteomics to be carried out in a distributed fashion. The core of SPINE is a laboratory information management system (LIMS) for key bits of information related to the progress of the consortium in cloning, expressing and purifying proteins and then solving their structures by NMR or X-ray crystallography. Originally, SPINE focused on tracking constructs, but, in its current form, it is able to track target sample tubes and store detailed sample histories. The core database comprises a set of standard relational tables and a data dictionary that form an initial ontology for proteomic properties and provide a framework for large-scale data mining. Moreover, SPINE sits at the center of a federation of interoperable information resources. These can be divided into (i) local resources closely coupled with SPINE that enable it to handle less standardized information (e.g. integrated mailing and publication lists), (ii) other information resources in the NESG consortium that are inter-linked with SPINE (e.g. crystallization LIMS local to particular laboratories) and (iii) international archival resources that SPINE links to and passes on information to (e.g. TargetDB at the PDB).  相似文献   

7.
Introduction of a LIMS can bring immediate cost-savings and efficiency improvements to a Laboratory. However, peripheral problems can often prevent the full benefits from being realised. For example, in our case, reports generated on paper are subject to postal delays. They are also sent initially to an intermediary, for comments and recommendations to be added (this often means re-typing the report) causing a further delay. Within the Laboratory itself, telephone requests from clients for information on sample progress can consume a considerable amount of staff time. We have addressed both of these problems by use of the Internet. An e-mailing program takes ASCII reports produced from the LIMS and attaches them automatically to e-mail messages. Each can be configured to the specific requirements of the recipient (e.g., the use of encryption, digital signatures, and the document format). World-Wide-Web access to appropriate LIMS databases allows clients to determine progress of samples without involving Laboratory personnel. Read-only access is available to a limited sub-set of data determined by the LIMS manager. Both applications have been created in portable languages, in a way that is suitable for many different environments.  相似文献   

8.

Background  

High throughput laboratory techniques generate huge quantities of scientific data. Laboratory Information Management Systems (LIMS) are a necessary requirement, dealing with sample tracking, data storage and data reporting. Commercial LIMS solutions are available, but these can be both costly and overly complex for the task. The development of bespoke LIMS solutions offers a number of advantages, including the flexibility to fulfil all a laboratory's requirements at a fraction of the price of a commercial system. The programming language Perl is a perfect development solution for LIMS applications because of Perl's powerful but simple to use database and web interaction, it is also well known for enabling rapid application development and deployment, and boasts a very active and helpful developer community. The development of an in house LIMS from scratch however can take considerable time and resources, so programming tools that enable the rapid development of LIMS applications are essential but there are currently no LIMS development tools for Perl.  相似文献   

9.
10.
Workflow Information Storage Toolkit (WIST) is a set of application programming interfaces and web applications that allow for the rapid development of customized laboratory information management systems (LIMS). WIST provides common LIMS input components, and allows them to be arranged and configured using a flexible language that specifies each component's visual and semantic characteristics. WIST includes a complete set of web applications for adding, editing and viewing data, as well as a powerful setup tool that can build new LIMS modules by analyzing existing database schema. Availability and implementation: WIST is implemented in Perl and may be obtained from http://vimss.sf.net under the BSD license.  相似文献   

11.
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum.  相似文献   

12.
13.
GeneCards 2002: towards a complete,object-oriented,human gene compendium   总被引:3,自引:0,他引:3  
MOTIVATION: In the post-genomic era, functional analysis of genes requires a sophisticated interdisciplinary arsenal. Comprehensive resources are challenged to provide consistently improving, state-of-the-art tools. RESULTS: GeneCards (Rebhan et al., 1998) has made innovative strides: (a). regular updates and enhancements incorporating new genes enriched with sequences, genomic locations, cDNA assemblies, orthologies, medical information, 3D protein structures, gene expression, and focused SNP summaries; (b). restructured software using object-oriented Perl, migration to schema-driven XML, and (c). pilot studies, introducing methods to produce cards for novel and predicted genes.  相似文献   

14.
Rajavel M  Warrier T  Gopal B 《Proteins》2006,64(4):923-930
The advent of structural genomics has led to a dramatic increase in the number of structures deposited in the Protein Data Bank. The number of new folds, however, still remains a very small fraction of the total number of deposited structures. Recent data on the progress of the structural genomics initiative reveals that more than 85% of target proteins that progress to the stage of data collection and structure determination have a known fold. Enzymes, which tend to exploit reaction space while adopting a common stable scaffold, contribute significantly to this observation. Herein, we evaluate a method to examine the "old fold in a new dataset" scenario likely to be encountered in the structural genomics pipeline. We demonstrate that a fold detection strategy based on secondary structure signatures followed by molecular replacement using a minimalist model can be effectively used to solve the phase problem in X-ray crystallography without further recourse to heavy atom derivatives or multiple anomalous dispersion techniques. Three common folds-the triosephosphate isomerase (TIM), adenine nucleotide alpha hydrolase-like (HUP), and RNA recognition motif (RRM)-were examined using this approach. The results presented herein also provide an estimate of the extent of phase information that can be derived from a single domain in a large multidomain structure.  相似文献   

15.
Chandonia JM  Kim SH  Brenner SE 《Proteins》2006,62(2):356-370
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.  相似文献   

16.
Integral membrane proteins (iMPs) are challenging targets for structure determination because of the substantial experimental difficulties involved in their sample preparation. Accordingly, success rates of large-scale structural genomics consortia are much lower for this class of molecules compared to globular targets, underscoring the pressing need for predictive strategies to identify iMPs that are more likely to overcome laboratory bottlenecks. On the basis of the target status information available in the TargetDB repository, we describe the first large-scale analysis of experimental behavior of iMPs. Using information on recalcitrant and propagating iMP targets as negative and positive sets, respectively, we present naive Bayes classifiers capable of predicting, from sequence alone, those proteins that are more amenable to cloning, expression, and solubilization studies. Protein sequences are represented in the space of 72 features, including amino acid composition, occurrence of amino acid groups, ratios between residue groups, and hydrophobicity measures. Taking into account unequal representation of main taxonomic groups in the TargetDB, sequence database had a beneficial effect on the prediction results. The classifiers achieve accuracies of 70%, 63-70%, and 61% in predicting the amenability of iMPs for cloning, expression, and solubilization, respectively, thus making them useful tools in target selection for structure determination. Our assessment of prediction results clearly demonstrates that classifiers based on single features do not possess acceptable discriminative power and that the experimental behavior of iMPs is imprinted in their primary sequence through relationships between a restricted set of key properties. In most cases, sets of 10-20 protein features were found actually relevant, most notably, the content of isoleucine, valine, and positively-charged residues.  相似文献   

17.
A robust Laboratory Information Management System (LIMS) is required for the efficient handling of data generated from large-scale insertional mutagenesis projects. The Rice Gene Machine Information Management System (RGMIMS), a web-based modular LIMS, developed in a rice functional genomics laboratory at CSIRO, currently has four core modules: Seed Management, Transformation Management, Plant/Progeny Management, Phenotype Management, and an ad hoc querying module. RGMIMS manages, preserves and tracks large inventories of transgenic germplasm and enables efficient and accurate record keeping of the large quantities of experimental data. RGMIMS automates and seamlessly integrates multi-step experimental processes. A web user interface, incorporating barcoding utilities, enables rapid data capture and tracking of biological resources. Ontologies from Gramene and Plant Ontology consortium are used to describe mutant phenotypes. RGMIMS supports generic research processes in plant mutagenesis and could readily be adapted to general LIMS for high-throughput plant research.  相似文献   

18.
MOTIVATION: Obtaining soluble proteins in sufficient concentrations is a recurring limiting factor in various experimental studies. Solubility is an individual trait of proteins which, under a given set of experimental conditions, is determined by their amino acid sequence. Accurate theoretical prediction of solubility from sequence is instrumental for setting priorities on targets in large-scale proteomics projects. RESULTS: We present a machine-learning approach called PROSO to assess the chance of a protein to be soluble upon heterologous expression in Escherichia coli based on its amino acid composition. The classification algorithm is organized as a two-layered structure in which the output of primary support vector machine (SVM) classifiers serves as input for a secondary Naive Bayes classifier. Experimental progress information from the TargetDB database as well as previously published datasets were used as the source of training data. In comparison with previously published methods our classification algorithm possesses improved discriminatory capacity characterized by the Matthews Correlation Coefficient (MCC) of 0.434 between predicted and known solubility states and the overall prediction accuracy of 72% (75 and 68% for positive and negative class, respectively). We also provide experimental verification of our predictions using solubility measurements for 31 mutational variants of two different proteins.  相似文献   

19.
Structural genomics projects are providing large quantities of new 3D structural data for proteins. To monitor the quality of these data, we have developed the protein structure validation software suite (PSVS), for assessment of protein structures generated by NMR or X-ray crystallographic methods. PSVS is broadly applicable for structure quality assessment in structural biology projects. The software integrates under a single interface analyses from several widely-used structure quality evaluation tools, including PROCHECK (Laskowski et al., J Appl Crystallog 1993;26:283-291), MolProbity (Lovell et al., Proteins 2003;50:437-450), Verify3D (Luthy et al., Nature 1992;356:83-85), ProsaII (Sippl, Proteins 1993;17: 355-362), the PDB validation software, and various structure-validation tools developed in our own laboratory. PSVS provides standard constraint analyses, statistics on goodness-of-fit between structures and experimental data, and knowledge-based structure quality scores in standardized format suitable for database integration. The analysis provides both global and site-specific measures of protein structure quality. Global quality measures are reported as Z scores, based on calibration with a set of high-resolution X-ray crystal structures. PSVS is particularly useful in assessing protein structures determined by NMR methods, but is also valuable for assessing X-ray crystal structures or homology models. Using these tools, we assessed protein structures generated by the Northeast Structural Genomics Consortium and other international structural genomics projects, over a 5-year period. Protein structures produced from structural genomics projects exhibit quality score distributions similar to those of structures produced in traditional structural biology projects during the same time period. However, while some NMR structures have structure quality scores similar to those seen in higher-resolution X-ray crystal structures, the majority of NMR structures have lower scores. Potential reasons for this "structure quality score gap" between NMR and X-ray crystal structures are discussed.  相似文献   

20.
Liu J  Hegyi H  Acton TB  Montelione GT  Rost B 《Proteins》2004,56(2):188-200
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号