首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A wealth of bioinformatics tools and databases has been created over the last decade and most are freely available to the general public. However, these valuable resources live a shadow existence compared to experimental results and methods that are widely published in journals and relatively easily found through publication databases such as PubMed. For the general scientist as well as bioinformaticists, these tools can deliver great value to the design and analysis of biological and medical experiments, but there is no inventory presenting an up-to-date and easily searchable index of all these resources. To remedy this, the BioWareDB search engine has been created. BioWareDB is an extensive and current catalog of software and databases of relevance to researchers in the fields of biology and medicine, and presently consists of 2800 validated entries. AVAILABILITY: BioWareDB is freely available over the Internet at http://www.biowaredb.org/  相似文献   

2.
Stockmarr A 《Biometrics》1999,55(3):671-677
A crime has been committed, and a DNA profile of the perpetrator is obtained from the crime scene. A suspect with a matching profile is found. The problem of evaluating this DNA evidence in a forensic context, when the suspect is found through a database search, is analysed through a likelihood approach. The recommendations of the National Research Council of the U.S. are derived in this setting as the proper way of evaluating the evidence when finiteness of the population of possible perpetrators is not taken into account. When a finite population of possible perpetrators may be assumed, it is possible to take account of the sampling process that resulted in the actual database, so one can deal with the problem where a large proportion of the possible perpetrators belongs to the database in question. It is shown that the last approach does not in general result in a greater weight being assigned to the evidence, though it does when a sufficiently large amount of the possible perpetrators are in the database. The value of the likelihood ratio corresponding to the probable cause setting constitutes an upper bound for this weight, and the upper bound is only attained when all but one of the possible perpetrators are in the database.  相似文献   

3.
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.  相似文献   

4.
Halperin I  Ma B  Wolfson H  Nussinov R 《Proteins》2002,47(4):409-443
The docking field has come of age. The time is ripe to present the principles of docking, reviewing the current state of the field. Two reasons are largely responsible for the maturity of the computational docking area. First, the early optimism that the very presence of the "correct" native conformation within the list of predicted docked conformations signals a near solution to the docking problem, has been replaced by the stark realization of the extreme difficulty of the next scoring/ranking step. Second, in the last couple of years more realistic approaches to handling molecular flexibility in docking schemes have emerged. As in folding, these derive from concepts abstracted from statistical mechanics, namely, populations. Docking and folding are interrelated. From the purely physical standpoint, binding and folding are analogous processes, with similar underlying principles. Computationally, the tools developed for docking will be tremendously useful for folding. For large, multidomain proteins, domain docking is probably the only rational way, mimicking the hierarchical nature of protein folding. The complexity of the problem is huge. Here we divide the computational docking problem into its two separate components. As in folding, solving the docking problem involves efficient search (and matching) algorithms, which cover the relevant conformational space, and selective scoring functions, which are both efficient and effectively discriminate between native and non-native solutions. It is universally recognized that docking of drugs is immensely important. However, protein-protein docking is equally so, relating to recognition, cellular pathways, and macromolecular assemblies. Proteins function when they are bound to other molecules. Consequently, we present the review from both the computational and the biological points of view. Although large, it covers only partially the extensive body of literature, relating to small (drug) and to large protein-protein molecule docking, to rigid and to flexible. Unfortunately, when reviewing these, a major difficulty in assessing the results is the non-uniformity in the formats in which they are presented in the literature. Consequently, we further propose a way to rectify it here.  相似文献   

5.
Data produced from the MudPIT analysis of yeast (S. cerevisiae) and rice (O. sativa) were used to develop a technique to validate single-peptide protein identifications using complementary database search algorithms. This results in a considerable reduction of overall false-positive rates for protein identifications; the overall false discovery rates in yeast are reduced from near 25% to less than 1%, and the false discovery rate of yeast single-peptide protein identifications becomes negligible. This technique can be employed by laboratories utilizing a SEQUEST-based proteomic analysis platform, incorporating the XTandem algorithm as a complementary tool for verification of single-peptide protein identifications. We have achieved this using open-source software, including several data-manipulation software tools developed in our laboratory, which are freely available to download.  相似文献   

6.
Protein identification is important in proteomics. Proteomic analyses based on mass spectra (MS) constitute innovative ways to identify the components of protein complexes. Instruments can obtain the mass spectrum to an accuracy of 0.01 Da or better, but identification errors are inevitable. This study shows a novel tool, MultiProtIdent, which can identify proteins using additional information about protein-protein interactions and protein functional associations. Both single and multiple Peptide Mass Fingerprints (PMFs) are input to MultiProtIdent, which matches the PMFs to a theoretical peptide mass database. The relationships or interactions among proteins are considered to reduce false positives in PMF matching. Experiments to identify protein complexes reveal that MultiProtIdent is highly promising. The website associated with this study is http://dbms104.csie.ncu.edu.tw/.  相似文献   

7.
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.  相似文献   

8.
9.
10.
数据库搜索及ISSR-抑制PCR法开发香菇微卫星标记   总被引:1,自引:1,他引:1  
采用数据库搜索及ISSR-抑制PCR法开发香菇微卫星标记。由数据库搜索法开发出21对引物,11对有多态性,各位点平均产生3.3个等位基因;通过ISSR-抑制PCR法开发出8对引物,5对具多态性,各位点平均产生3个等位基因。结果表明,在香菇SSR开发中,两种方法均是行之有效的。  相似文献   

11.
Lipid mediators (LMs) derived from PUFAs play important roles in health and disease. Databases and search algorithms are crucial, but currently unavailable, for accurate and prompt analysis of LMs via liquid chromatography-ultraviolet-tandem mass spectrometry (LC-UV-MS/MS). A novel algorithm and databases, cognoscitive-contrast-angle algorithm and databases (COCAD), were developed for the identification of LMs based on the integration of standard MS/MS spectra with chromatograms and UV spectra. Segment naming and empirical fragmentation rules were introduced to determine MS/MS ion identities, along with ion intensities used by COCAD in matching the unknown to those of authentic standards. The structures of potential LMs without synthetic and/or authentic products as standards were identified by developing theoretical databases and algorithms based on virtual LC-UV-MS/MS spectra and chromatograms. The performance of these databases and algorithms was tested by identifying LMs in murine tissues. These results indicate that COCAD has many advantages for profiling and identification of LMs compared with the conventional dot-product algorithm.  相似文献   

12.

Background  

A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species.  相似文献   

13.
A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods, which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction data is becoming available in settings where less is known about physical interaction data. We compare modules and BPMs obtained to previous methods and across different datasets. Despite needing no physical interaction information, the BPMs produced by our method are competitive with previous methods. Biological findings include a suggested global role for the prefoldin complex and a SWR subcomplex in pathway buffering in the budding yeast interactome.  相似文献   

14.
MOTIVATION: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required. RESULTS: We present a novel method for fast and accurate homology detection, assuming that the Smith-Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available.  相似文献   

15.
16.
Location of functional binding pockets of bioactive ligands on protein molecules is essential in structural genomics and drug design projects. If the experimental determination of ligand-protein complex structures is complicated, blind docking (BD) and pocket search (PS) calculations can help in the prediction of atomic resolution binding mode and the location of the pocket of a ligand on the entire protein surface. Whereas the number of successful predictions by these methods is increasing even for the complicated cases of exosites or allosteric binding sites, their reliability has not been fully established. For a critical assessment of reliability, we use a set of ligand-protein complexes, which were found to be problematic in previous studies. The robustness of BD and PS methods is addressed in terms of success of the selection of truly functional pockets from among the many putative ones identified on the surfaces of ligand-bound and ligand-free (holo and apo) protein forms. Issues related to BD such as effect of hydration, existence of multiple pockets, and competition of subsidiary ligands are considered. Practical cases of PS are discussed, categorized and strategies are recommended for handling the different situations. PS can be used in conjunction with BD, as we find that a consensus approach combining the techniques improves predictive power.  相似文献   

17.

Background  

In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.  相似文献   

18.
The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrostatic surface of functional-site in proteins). Using this system, functional sites similar to those of phosphoenoylpyruvate carboxy kinase were detected in several mononucleotide-binding proteins, which have different folds. We also applied our method to a hypothetical protein, MJ0226 from Methanococcus jannaschii, and detected the mononucleotide binding site from the similarity to other proteins having different folds.  相似文献   

19.
dbSNP数据库为广大研究者提供了丰富的SNPs信息,充分地利用dbSNP数据库中的资源将大幅度降低研究成本提高研究效率。结合本实验室的研究工作,对鸡dbSNP数据库的检索和应用进行了一些探索。认为根据研究目的的不同,dbSNP数据库的检索和应用有必要同其它的数据库相结合。  相似文献   

20.
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. AVAILABILITY: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号