首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html  相似文献   

2.
MOTIVATION: Rapid, automated means of organizing biological data are required if we hope to keep abreast of the flood of data emanating from sequencing, microarray and similar high-throughput analyses. Faced with the need to validate the annotation of thousands of sequences and to generate biologically meaningful classifications based on the sequence data, we turned to statistical methods in order to automate these processes. RESULTS: An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed. AVAILABILITY: S-Plus is available from Insightful, Inc. An S-Plus implementation of the algorithm and the associated data are available at http://taxoweb.mmg.msu.edu/datasets  相似文献   

3.
SUMMARY: AGML Central is a web-based open-source public infrastructure for dissemination of two-dimensional Gel Electrophoresis (2-DE) proteomics data in AGML format (Annotated Gel Markup Language). It includes a growing collection of converters from proprietary formats such as those produced by PDQUEST (BioRad), PHORETIX 2-D (Nonlinear Dynamics) and Melanie (GenBio SA). The resulting unifying AGML formatted entry, with or without the raw gel images, is optionally stored in a database for future reference. AGML Central was developed to provide a common platform for data dissemination and development of 2-DE data analysis tools. This resource responds to an increasing use of AGML for 2-DE public source data representation which requires automated tools for conversion from proprietary formats. Conversion and short-term storage is made publicly available, permanent storage requires prior registering. A JAVA applet visualizer was developed to visualize the AGML data with cross-reference links. In order to facilitate automated access a SOAP web service is also included in the AGML Central infrastructure. AVAILABILITY: http://bioinformatics.musc.edu/agmlcentral.  相似文献   

4.
ABSTRACT: BACKGROUND: Downstream applications in metabolomics, as well as mathematical modelling, require data in a quantitative format, which may also necessitate the automated and simultaneous quantification of numerous metabolites. Although numerous applications have been previously developed for metabolomics data handling, automated calibration and calculation of the concentrations in terms of mumol have not been carried out. Moreover, most of the metabolomics applications are designed for GC-MS, and would not be suitable for LC-MS, since in LC, the deviation in the retention time is not linear, which is not taken into account in these applications. Moreover, only a few are web-based applications, which could improve stand-alone software in terms of compatibility, sharing capabilities and hardware requirements, even though a strong bandwidth is required. Furthermore, none of these incorporate asynchronous communication to allow real-time interaction with pre-processed results. FINDINGS: Here, we present EasyLCMS (http://www.easylcms.es/), a new application for automated quantification which was validated using more than 1000 concentration comparisons in real samples with manual operation. The results showed that only 1% of the quantifications presented a relative error higher than 15%. Using clustering analysis, the metabolites with the highest relative error distributions were identified and studied to solve recurrent mistakes. CONCLUSIONS: EasyLCMS is a new web application designed to quantify numerous metabolites, simultaneously integrating LC distortions and asynchronous web technology to present a visual interface with dynamic interaction which allows checking and correction of LC-MS raw data pre-processing results. Moreover, quantified data obtained with EasyLCMS are fully compatible with numerous downstream applications, as well as for mathematical modelling in the systems biology field.  相似文献   

5.
Advances in dinucleotide-based genetic maps open possibilities for large scale genotyping at high resolution. The current rate-limiting steps in use of these dense maps is data interpretation (allele definition), data entry, and statistical calculations. We have recently reported automated allele identification methods. Here we show that a 10-cM framework map of the human X chromosome can be analyzed on two lanes of an automated sequencer per individual (10–12 loci per lane). We use this map and analysis strategy to generate allele data for an X-linked recessive spastic paraplegia family with a known PLP mutation. We analyzed 198 genotypes in a single gel and used the data to test three methods of data analysis: manual meiotic breakpoint mapping, automated concordance analysis, and whole chromosome multipoint linkage analysis. All methods pinpointed the correct location of the gene. We propose that multipoint exclusion mapping may permit valid inflation of LOD scores using the equation max LOD — (next best LOD).  相似文献   

6.
SUMMARY: The Helmholtz Network for Bioinformatics (HNB) is a joint venture of eleven German bioinformatics research groups that offers convenient access to numerous bioinformatics resources through a single web portal. The 'Guided Solution Finder' which is available through the HNB portal helps users to locate the appropriate resources to answer their queries by employing a detailed, tree-like questionnaire. Furthermore, automated complex tool cascades ('tasks'), involving resources located on different servers, have been implemented, allowing users to perform comprehensive data analyses without the requirement of further manual intervention for data transfer and re-formatting. Currently, automated cascades for the analysis of regulatory DNA segments as well as for the prediction of protein functional properties are provided. AVAILABILITY: The HNB portal is available at http://www.hnbioinfo.de  相似文献   

7.
We present a software solution that enables faster and more accurate data analysis of 2DE/MALDI TOF MS data. The software supports data analysis through a number of automated data selection functions and advanced graphical tools. Once protein identities are determined using MALDI TOF MS, automated data retrieval from online databases provides biological information. The software, called 2DDB, reduces analysis time to a fraction without losing any quality compared to more manual data analysis. The database contains over 100,000 data entries, and selected parts can be reached at http://2ddb.org.  相似文献   

8.
目的:统计分析酵母菌的来源,为大曲微生物物种及其来源变化提供基础资料.方法:利用Biolog微生物自动分析系统进行精确鉴定,并通过统计分析得出微生物菌种的来源.结果:从伏曲样品中分离纯化出7株酵母菌J-1,2,3,4,5,6,7,利用Biolog微生物自动鉴定系统进行鉴定,确定其分别为白吉利丝孢酵母A、无名假丝酵母菌、汉逊德巴利酵母C、海隐球酵母菌、东方伊萨酵母菌、解脂耶罗威亚酵母菌、皱落假丝酵母菌.这些物种主要来源于原料本身和空气,原料为了人室大曲提供了J-1,2,3三种酵母菌,其比例为25.45%、19.23%和7.5%.结论:空气巾含有大曲中所有的酵母菌的种类,曲房空气中的酵母菌数量和种类均较多.  相似文献   

9.
CRANK is a novel suite for automated macromolecular structure solution and uses recently developed programs for substructure detection, refinement, and phasing. CRANK utilizes methods for substructure detection and phasing and combines them with existing crystallographic programs for density modification and automated model building in a convenient and easy-to-use CCP4i graphical interface. The data model used conforms to the XML eXtensible Markup Language specification and works as a common language to communicate data between many different applications inside and outside of the suite. The application of CRANK on various test cases has yielded promising results: with minimal user input, CRANK can produce better quality solutions over currently available programs.  相似文献   

10.
MOTIVATION: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data. RESULTS: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data. AVAILABILITY: miniTUBA is available at http://www.minituba.org.  相似文献   

11.
12.
A high-throughput crystallization-to-structure pipeline for structural genomics was recently developed at the Advanced Protein Crystallography Research Group of the RIKEN SPring-8 Center in Japan. The structure determination pipeline includes three newly developed technologies for automating X-ray protein crystallography: the automated crystallization and observation robot system "TERA", the SPring-8 Precise Automatic Cryosample Exchanger "SPACE" for automated data collection, and the Package of Expert Researcher's Operation Network "PERON" for automated crystallographic computation from phasing to model checking. During the 5 years following April, 2002, this pipeline was used by seven researchers to determine 138 independent crystal structures (resulting from 437 purified proteins, 234 cryoloop-mountable crystals, and 175 diffraction data sets). The protocols used in the high-throughput pipeline are described in this paper.  相似文献   

13.
BACKGROUND: Because different spectral sensitivities of human eye and image sensor lead to different perception of fluorescence signals, data generation is an important step in image analysis, because following work steps depend on it. METHODS: We developed a method to determine image parameters allowing an objective appraisal of quality of image data as well as a separation of object and background. RESULTS: Calculated parameters can be used for an automated adjustment of camera parameters in image analysis systems. DISCUSSION: Our approach for objective adjusted data generation achieves an improvement of analysis quality.  相似文献   

14.
MOTIVATION: The living cell is a complex machine that depends on the proper functioning of its numerous parts, including proteins. Understanding protein functions and how they modify and regulate each other is the next great challenge for life-sciences researchers. The collective knowledge about protein functions and pathways is scattered throughout numerous publications in scientific journals. Bringing the relevant information together becomes a bottleneck in a research and discovery process. The volume of such information grows exponentially, which renders manual curation impractical. As a viable alternative, automated literature processing tools could be employed to extract and organize biological data into a knowledge base, making it amenable to computational analysis and data mining. RESULTS: We present MedScan, a completely automated natural language processing-based information extraction system. We have used MedScan to extract 2976 interactions between human proteins from MEDLINE abstracts dated after 1988. The precision of the extracted information was found to be 91%. Comparison with the existing protein interaction databases BIND and DIP revealed that 96% of extracted information is novel. The recall rate of MedScan was found to be 21%. Additional experiments with MedScan suggest that MEDLINE is a unique source of diverse protein function information, which can be extracted in a completely automated way with a reasonably high precision. Further directions of the MedScan technology improvement are discussed. AVAILABILITY: MedScan is available for commercial licensing from Ariadne Genomics, Inc.  相似文献   

15.
High throughput mutation screening in an automated environment generates large data sets that have to be organized and stored reliably. Complex multistep workflows require strict process management and careful data tracking. We have developed a Laboratory Information Management Systems (LIMS) tailored to high throughput candidate gene mutation scanning and resequencing that respects these requirements. Designed with a client/server architecture, our system is platform independent and based on open-source tools from the database to the web application development strategy. Flexible, expandable and secure, the LIMS, by communicating with most of the laboratory instruments and robots, tracks samples and laboratory information, capturing data at every step of our automated mutation screening workflow. An important feature of our LIMS is that it enables tracking of information through a laboratory workflow where the process at one step is contingent on results from a previous step. AVAILABILITY: Script for MySQL database table creation and source code of the whole JSP application are freely available on our website: http://www-gcs.iarc.fr/lims/. SUPPLEMENTARY INFORMATION: System server configuration, database structure and additional details on the LIMS and the mutation screening workflow are available on our website: http://www-gcs.iarc.fr/lims/  相似文献   

16.
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.  相似文献   

17.
SUMMARY: We describe a tool, called aCGH-Smooth, for the automated identification of breakpoints and smoothing of microarray comparative genomic hybridization (array CGH) data. aCGH-Smooth is written in visual C++, has a user-friendly interface including a visualization of the results and user-defined parameters adapting the performance of data smoothing and breakpoint recognition. aCGH-Smooth can handle array-CGH data generated by all array-CGH platforms: BAC, PAC, cosmid, cDNA and oligo CGH arrays. The tool has been successfully applied to real-life data. AVAILABILITY: aCGH-Smooth is free for researchers at academic and non-profit institutions at http://www.few.vu.nl/~vumarray/.  相似文献   

18.
We describe the design of a database and software for managing and organizing protein crystallization data. We also outline the considerations behind the design of a fast web interface linking protein production data, crystallization images, and automated image analysis. The database and associated interfaces underpin the Oxford Protein Production Facility (OPPF) crystallization laboratory, collecting, in a routine and automatic manner, up to 100,000 images per day. Over 17 million separate images are currently held in this database. We discuss the substantial scientific benefits automated tracking, imaging, and analysis of crystallizations offers to the structural biologist: analysis of the time course of the trial and easy analysis of trials with related crystallization conditions. Features of this system address requirements common to many crystallographic laboratories that are currently setting up (semi-)automated crystallization imaging systems.  相似文献   

19.
OBJECTIVE: To design and analyze an automated diagnostic system for breast carcinoma based on fine needle aspiration (FNA). STUDY DESIGN: FNA is a noninvasive alternative to surgical biopsy for the diagnosis of breast carcinoma. Widespread clinical use of FNA is limited by the relatively poor interobserver reproducibility of the visual interpretation of FNA images. To overcome the reproducibility problem, past research has focused on the development of automated diagnosis systems that yield accurate, reproducible results. While automated diagnosis is, by definition, reproducible, it has yet to achieve diagnostic accuracy comparable to that of surgical biopsy. In this article we describe a sophisticated new diagnostic system in which the mean sensitivity (of FNA diagnosis) approaches that of surgical biopsy. The diagnostic system that we devised analyzes the digital FNA data extracted from FNA images. To achieve high sensitivity, the system needs to solve large, equality-constrained, integer nonlinear optimization problems repeatedly. Powerful techniques from the theory of Lie groups and a novel optimization technique are built into the system to solve the underlying optimization problems effectively. The system is trained using digital data from FNA samples with confirmed diagnosis. To analyze the diagnostic accuracy of the system > 8,000 computational experiments were performed using digital FNA data from the Wisconsin Breast Cancer Database. RESULTS: The system has a mean sensitivity of 99.62% and mean specificity of 93.31%. Statistical analysis shows that at the 95% confidence level, the system can be trusted to correctly diagnose new malignant FNA samples with an accuracy of 99.44-99.8% and new benign FNA samples with an accuracy of 92.43-93.93%. CONCLUSION: The diagnostic system is robust and has higher sensitivity than do all the other systems reported in the literature. The specificity of the system needs to be improved.  相似文献   

20.
MOTIVATION: High-throughput NMR structure determination is a goal that will require progress on many fronts, one of which is rapid resonance assignment. An important rate-limiting step in the resonance assignment process is accurate identification of resonance peaks in the NMR spectra. Peak-picking schemes range from incomplete (which lose essential assignment connectivities) to noisy (which obscure true connectivities with many false ones). We introduce an automated preassignment process that removes false peaks from noisy peak lists by requiring consensus between multiple NMR experiments and exploiting a priori information about NMR spectra. This process is designed to accept multiple input formats and generate multiple output formats, in an effort to be compatible with a variety of user preferences. RESULTS: Automated preprocessing with APART rapidly identifies and removes false peaks from initial peak lists, reduces the burden of manual data entry, and documents and standardizes the peak filtering process. Successful preprocessing is demonstrated by the increased number of correct assignments obtained when data are submitted to an automated assignment program. AVAILABILITY: APART is available from http://sir.lanl.gov/NMR/APART.htm CONTACT: npawley@lanl.gov; rmichalczyk@lanl.gov SUPPLEMENTARY INFORMATION: Manual pages with installation instructions, procedures and screen shots can also be found at http://sir.lanl.gov/NMR/APART_Manual1.pdf.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号