首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A procedure that automatically provides an evaluation of thediagnostic ability of a protein sequence functional patternis described. The procedure relies on the identification ofthe closest definable set in terms of a (protein sequence) databasefunctional annotation to the set of database instances containinga given pattern. Assuming annotation correctness and completenessin the protein sequence database, the degree of statisticalassociation between these sets provides an appropriate measureof the diagnostic ability of the pattern. An experimental implementationof the procedure, using the NBRF/PIR protein database, has beenapplied to a diverse collection of published sequence patterns.Results obtained reveal that frequently it is not possible todefine (in NBRF/PIR database terminology) the set of databaseinstances containing a given pattern, suggesting either lackof pattern diagnostic ability or protein database annotationincompleteness and/or inconsistencies. Received on November 30, 1989; accepted on July 20, 1990  相似文献   

2.
MOTIVATION: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSI-BLAST). RESULTS: This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALA's sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous Smith-Waterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database.  相似文献   

3.
We have developed a procedure to predict the peptide binding specificity of an SH3 domain from its sequence. The procedure utilizes information extracted from position-specific contacts derived from six SH3/peptide or SH3/protein complexes of known structure. The framework of SH3/peptide contacts defined on the structure of the complexes is used to build a residue-residue interaction database derived from ligands obtained by panning peptide libraries displayed on filamentous phage.The SH3-specific interaction database is a multidimensional array containing frequencies of position-specific contacts. As input, SH3-SPOT requires the sequence of an SH3 domain and of a query decapeptide ligand. The array, that we call the SH3-specific matrix, is then used to evaluate the probability that the peptide would bind the given SH3 domain. This procedure is fast enough to be applied to the entire protein sequence database.Panning experiments were performed to search putative specific ligands of different SH3 domains in a database of decapeptides, or in a database of protein sequences. The procedure ranked some of the natural partners of interaction of a number of SH3 domains among the best ligands of the approximately 5. 6x10(9) different decapeptides in the SWISSPROT database. We expect the predictive power of the method to increase with the enrichment of the SH3-specific matrix by interaction data derived from new complex structures or from the characterization of new ligands. The procedure was developed using the SH3 domain family as test case but its application can easily be extended to other families of protein domains (such as, SH2, MHC, EH, PDZ, etc.).  相似文献   

4.
Identification of novel kinases based on their sequence conservation within kinase catalytic domain has relied so far on two major approaches, low-stringency hybridization of cDNA libraries, and PCR method using degenerate primers. Both of these approaches at times are technically difficult and time-consuming. We have developed a procedure that can significantly reduce the time and effort involved in searching for novel kinases and increase the sensitivity of the analysis. This procedure exploits the computer analysis of a vast resource of human cDNA sequences represented in the expressed sequence tag (EST) database. Seventeen novel human cDNA clones showing significant homology to serine/threonine kinases, including STE-20, CDK- and YAK-related family kinases, were identified by searching EST database. Further sequence analysis of these novel kinases obtained either directly from EST clones or from PCR-RACE products confirmed their identity as protein kinases. Given the rapid accumulation of the EST database and the advent of powerful computer analysis software, this approach provides a fast, sensitive, and economical way to identify novel kinases as well as other genes from EST database.  相似文献   

5.
Estimated breeding values (EBVs) and genomic enhanced breeding values (GEBVs) for milk production of young genotyped Holstein bulls were predicted using a conventional BLUP – Animal Model, a method fitting regression coefficients for loci (RRBLUP), a method utilizing the realized genomic relationship matrix (GBLUP), by a single-step procedure (ssGBLUP) and by a one-step blending procedure. Information sources for prediction were the nation-wide database of domestic Czech production records in the first lactation combined with deregressed proofs (DRP) from Interbull files (August 2013) and domestic test-day (TD) records for the first three lactations. Data from 2627 genotyped bulls were used, of which 2189 were already proven under domestic conditions. Analyses were run that used Interbull values for genotyped bulls only or that used Interbull values for all available sires. Resultant predictions were compared with GEBV of 96 young foreign bulls evaluated abroad and whose proofs were from Interbull method GMACE (August 2013) on the Czech scale. Correlations of predictions with GMACE values of foreign bulls ranged from 0.33 to 0.75. Combining domestic data with Interbull EBVs improved prediction of both EBV and GEBV. Predictions by Animal Model (traditional EBV) using only domestic first lactation records and GMACE values were correlated by only 0.33. Combining the nation-wide domestic database with all available DRP for genotyped and un-genotyped sires from Interbull resulted in an EBV correlation of 0.60, compared with 0.47 when only Interbull data were used. In all cases, GEBVs had higher correlations than traditional EBVs, and the highest correlations were for predictions from the ssGBLUP procedure using combined data (0.75), or with all available DRP from Interbull records only (one-step blending approach, 0.69). The ssGBLUP predictions using the first three domestic lactation records in the TD model were correlated with GMACE predictions by 0.69, 0.64 and 0.61 for milk yield, protein yield and fat yield, respectively.  相似文献   

6.
Pan XM 《Proteins》2001,43(3):256-259
In the present work, a novel method was proposed for prediction of secondary structure. Over a database of 396 proteins (CB396) with a three-state-defining secondary structure, this method with jackknife procedure achieved an accuracy of 68.8% and SOV score of 71.4% using single sequence and an accuracy of 73.7% and SOV score of 77.3% using multiple sequence alignments. Combination of this method with DSC, PHD, PREDATOR, and NNSSP gives Q3 = 76.2% and SOV = 79.8%.  相似文献   

7.
Hartmann C  Antes I  Lengauer T 《Proteins》2009,74(3):712-726
We describe a scoring and modeling procedure for docking ligands into protein models that have either modeled or flexible side-chain conformations. Our methodical contribution comprises a procedure for generating new potentials of mean force for the ROTA scoring function which we have introduced previously for optimizing side-chain conformations with the tool IRECS. The ROTA potentials are specially trained to tolerate small-scale positional errors of atoms that are characteristic of (i) side-chain conformations that are modeled using a sparse rotamer library and (ii) ligand conformations that are generated using a docking program. We generated both rigid and flexible protein models with our side-chain prediction tool IRECS and docked ligands to proteins using the scoring function ROTA and the docking programs FlexX (for rigid side chains) and FlexE (for flexible side chains). We validated our approach on the forty screening targets of the DUD database. The validation shows that the ROTA potentials are especially well suited for estimating the binding affinity of ligands to proteins. The results also show that our procedure can compensate for the performance decrease in screening that occurs when using protein models with side chains modeled with a rotamer library instead of using X-ray structures. The average runtime per ligand of our method is 168 seconds on an Opteron V20z, which is fast enough to allow virtual screening of compound libraries for drug candidates.  相似文献   

8.
Glycogen synthase kinase-3 (GSK-3beta) has been emerging as a key therapeutic target for type-2 diabetics, Alzheimer's disease, cancer, and chronic inflammation. For the purpose of finding biologically active and novel compounds and providing new idea for drug-design, we performed virtual screening using commercially available database. Three-dimensional common feature pharmacophore model was developed by using HipHop program provided in Catalyst software and it was used as a query for screening database. Recursive partitioning (RP) model was developed as a filtering system, which was able to classify active and inactive compounds. Eventually, a sequential virtual screening procedure (SQSP) was conducted by applying the common feature pharmacophore and RP model in succession to discover novel potent GSK-3beta inhibitors. The final 56 hit compounds were carefully selected considering predicted docking mode in crystal structures. Subsequent enzyme assay for human GSK-3beta protein confirmed that three compounds of these hit compounds exhibit micromolar inhibitory activity. Here, we report novel hit compounds and their binding mode in the active site of GSK-3beta crystal structure.  相似文献   

9.
Chemokine receptor 2 (CCR2) is a G-protein coupled receptor (GPCR) and a crucial target for various inflammatory and autoimmune diseases. The structure based antagonists design for many GPCRs, including CCR2, is restricted by the lack of an experimental three dimensional structure. Homology modeling is widely used for the study of GPCR-ligand binding. Since there is substantial diversity for the ligand binding pocket and binding modes among GPCRs, the receptor-ligand binding mode predictions should be derived from homology modeling with supported ligand information. Thus, we modeled the binding of our proprietary CCR2 antagonist using ligand supported homology modeling followed by consensus scoring the docking evaluation based on all modeled binding sites. The protein-ligand model was then validated by visual inspection of receptor-ligand interaction for consistency of published site-directed mutagenesis data and virtual screening a decoy compound database. This model was able to successfully identify active compounds within the decoy database. Finally, additional hit compounds were identified through a docking-based virtual screening of a commercial database, followed by a biological assay to validate CCR2 inhibitory activity. Thus, this procedure can be employed to screen a large database of compounds to identify new CCR2 antagonists.  相似文献   

10.
Eriksson J  Fenyö D 《Proteomics》2002,2(3):262-270
A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.  相似文献   

11.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.  相似文献   

12.
With established ampelographic techniques for grapevine identification it is often difficult to achieve a satisfactory, objective result. We have developed a DNA typing system using sequence-tagged microsatellite site markers as a means of differentiating cultivars of grapevine. A semi-automated analysis procedure was linked to an electronic database and found to be an objective and reliable system for cultivar identification using this simple marker type. The accumulated DNA typing data from over eighty cultivars demonstrated that cultivars that are difficult to differentiate phenotypically using ampelographic techniques can be distinguished by DNA typing. Parentage analysis uncovered errors in parent assignment of cultivar identification in specific cases. The electronic database has a conservative format to take into account the occurrence of null alleles and the possibility of missed alleles. Computer-assisted comparisons of cultivars in the database can be performed and various approaches for estimating the match probability that two unrelated cultivars have the same genotype simply due to chance are discused. We suggest that further development of the database through international co-operation using standardised sequence-tagged site markers offers the possibility of achieving a universal grapevine identification system.  相似文献   

13.
Trait‐based approaches are increasingly used as a proxy for understanding the relationship between biodiversity and ecosystem functioning. Macrobenthic fauna are considered one of the major providers of ecosystem functions in marine soft sediments; however, several gaps persist in the knowledge of their trait classification, limiting the potential use of functional assessments. While trait databases are available for the well‐studied North Atlantic benthic fauna, no such trait classification system exists for Australia. Here, we present the South Australian Macrobenthic Traits (SAMT) database, the first comprehensive assessment of macrobenthic fauna traits in temperate Australian waters. The SAMT database includes 13 traits and 54 trait‐modalities (e.g., life history, morphology, physiology, and behavior), and is based on records of macrobenthic fauna from South Australia. We provide trait information for more than 250 macrobenthic taxa, including outcomes from a fuzzy coding procedure, as well as an R package for using and analyzing the SAMT database. The establishment of the SAMT constitutes the foundation for a comprehensive macrobenthic trait database for the wider southern Australian region that could facilitate future research on functional perspectives, such as assessments of functional diversity and changes to ecosystem functioning.  相似文献   

14.
We have evaluated 271 accessions corresponding to 118 European cultivars, 96 from Spain, 16 from Italy, four from France and two from Portugal with the following objectives: (1) to provide a European database based on reference simple sequence repeats (SSRs) and (2) to define a core collection. A set of 24 highly polymorphic SSRs were used for the genetic analysis. Two main clusters were identified using a model-based Bayesian procedure, which correspond to Spanish and Italian cultivar clusters, with the latter showing a higher genetic diversity. An additional genetic substructure was observed among five different groups of cultivars. A core collection with a minimum of 37 cultivars was selected. We provided a database including 132 European accessions with unique genotypes evaluated with 24 SSRs as a reference for distinction, registering and traceability. Finally, we found that a core collection based on 14% of the total accessions conserves all allelic diversity.  相似文献   

15.
The Tropical Biominer Project is a recent initiative from the Federal University of Minas Gerais (UFMG) and the Oswaldo Cruz foundation, with the participation of the Biominas Foundation (Belo Horizonte, Minas Gerais, Brazil) and the start-up Homologix. The main objective of the project is to build a new resource for the chemogenomics research, on chemical compounds, with a strong emphasis on natural molecules. Adopted technologies include the search of information from structured, semi-structured, and non-structured documents (the last two from the web) and datamining tools in order to gather information from different sources. The database is the support for developing applications to find new potential treatments for parasitic infections by using virtual screening tools. We present here the midpoint of the project: the conception and implementation of the Tropical Biominer Database. This is a Federated Database designed to store data from different resources. Connected to the database, a web crawler is able to gather information from distinct, patented web sites and store them after automatic classification using datamining tools. Finally, we demonstrate the interest of the approach, by formulating new hypotheses on specific targets of a natural compound, violacein, using inferences from a Virtual Screening procedure.  相似文献   

16.
The E-MSD macromolecular structure relational database (http://www.ebi.ac.uk/msd) is designed to be a single access point for protein and nucleic acid structures and related information. The database is derived from Protein Data Bank (PDB) entries. Relational database technologies are used in a comprehensive cleaning procedure to ensure data uniformity across the whole archive. The search database contains an extensive set of derived properties, goodness-of-fit indicators, and links to other EBI databases including InterPro, GO, and SWISS-PROT, together with links to SCOP, CATH, PFAM and PROSITE. A generic search interface is available, coupled with a fast secondary structure domain search tool.  相似文献   

17.
Flavonoids are polyphenolic compounds that occur ubiquitously in foods of plant origin. Some of these molecules exhibit various physiological activities. Among existing drugs, there are a huge number of compounds bearing a flavonoid-related skeleton. Because of the relevance for pharmaceutical research, it would be beneficial to collect these compounds into a database. Recently, various databases of chemicals were compiled to help biological and/or chemical research, but no comprehensive database of flavonoids with chemical structures and physicochemical parameters, supposedly related to their activity, is available yet. The aim of this research was to merge the information about flavonoids of plant origin and flavonoids used as medicines into a database. Moreover, predictions of activities against various targets were performed using a virtual screening procedure to demonstrate a possible application of the database for pharmaceutical research.  相似文献   

18.
We propose a self-consistent approach to analyze knowledge-based atom-atom potentials used to calculate protein-ligand binding energies. Ligands complexed to actual protein structures were first built using the SMoG growth procedure (DeWitte & Shakhnovich, 1996) with a chosen input potential. These model protein-ligand complexes were used to construct databases from which knowledge-based protein-ligand potentials were derived. We then tested several different modifications to such potentials and evaluated their performance on their ability to reconstruct the input potential using the statistical information available from a database composed of model complexes. Our data indicate that the most significant improvement resulted from properly accounting for the following key issues when estimating the reference state: (1) the presence of significant nonenergetic effects that influence the contact frequencies and (2) the presence of correlations in contact patterns due to chemical structure. The most successful procedure was applied to derive an atom-atom potential for real protein-ligand complexes. Despite the simplicity of the model (pairwise contact potential with a single interaction distance), the derived binding free energies showed a statistically significant correlation (approximately 0.65) with experimental binding scores for a diverse set of complexes.  相似文献   

19.
A four-step procedure for the efficient and systematic mining of whole EST libraries for differentially expressed genes is presented. After eliminating redundant entries from the EST library under investigation (step 1), contigs of maximal length are built upon each remaining EST using about 4 000 000 public and proprietary ESTs (step 2). These putative genes are compared against a database comprising ESTs from 16 different tissues (both normal and tumour affected) to determine whether or not they are differentially expressed (step 3; electronic northern). Fisher's exact test is used to assess the significance of differential expression. In step 4, an attempt is made to characterise the contigs obtained in the assembly through database comparison. A case study of the CGAP library NCI_CGAP_Br1.1, a library made from three (well, moderately, and poorly differentiated) invasive ductal breast tumours (2126 ESTs in total) was carried out. Of the maximal contigs, 139 were found to be significantly (alpha = 0.05) over-expressed in breast tumour tissue, while 13 appeared to be down-regulated.  相似文献   

20.
A novel database and modified alignment program is described which provides a fast and accurate procedure for assigning nucleotide sequences to allele types for multi-locus sequence analysis (MLSA). The database has between 40 and 160 alleles per organism including Neisseria meningitidis, Streptococcus pneumoniae, Staphylococcus aureus and Haemophilus influenzae. The database directly compares the query nucleotide sequence against all alleles within the database and this system reduces the time taken for the analysis of nucleotide sequence data and assignment of alleles for subsequent sequence analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号