期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis

Henning Redestig Miyako Kusano Atsushi Fukushima Fumio Matsuda Kazuki Saito Masanori Arita 《BMC bioinformatics》2010,11(1):214

相似文献

2.

The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

Richard G Côté Philip Jones Lennart Martens Samuel Kerrien Florian Reisinger Quan Lin Rasko Leinonen Rolf Apweiler Henning Hermjakob 《BMC bioinformatics》2007,8(1):401

Background

Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. 相似文献

3.

Linking microarray reporters with protein functions

Stan Gaj Arie van Erk Rachel IM van Haaften Chris TA Evelo 《BMC bioinformatics》2007,8(1):360

Background

The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. 相似文献

4.

IDconverter and IDClight: Conversion and annotation of gene and protein IDs

Andreu Alibés Patricio Yankilevich Andrés Cañada Ramón Díaz-Uriarte 《BMC bioinformatics》2007,8(1):9

Background

Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000. 相似文献

5.

nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

Pan Du Warren A Kibbe Simon M Lin 《Biology direct》2007,2(1):16-7

Background

Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. 相似文献

6.

Avoiding inconsistencies over time and tracking difficulties in Applied Biosystems AB1700™/Panther™ probe-to-gene annotations

Sebastian?Noth Arndt?Benecke Email author systems epigenomics group 《BMC bioinformatics》2005,6(1):307

Background

Significant inconsistencies between probe-to-gene annotations between different releases of probe set identifiers by commercial microarray platform solutions have been reported. Such inconsistencies lead to misleading or ambiguous interpretation of published gene expression results. 相似文献

7.

Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles

Richard Tzong-Han Tsai Po-Ting Lai 《BMC bioinformatics》2011,12(1):60

Background

Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. 相似文献

8.

LSID Tester,a tool for testing Life Science Identifier resolution services

Roderic DM Page 《Source code for biology and medicine》2008,3(1):2

Background

Life Science Identifiers (LSIDs) are persistent, globally unique identifiers for biological objects. The decentralised nature of LSIDs makes them attractive for identifying distributed resources. Data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) are distributed over many different providers, and this community has adopted LSIDs as the identifier of choice. 相似文献

9.

Merging Children’s Oncology Group Data with an External Administrative Database Using Indirect Patient Identifiers: A Report from the Children’s Oncology Group

Yimei Li Matt Hall Brian T. Fisher Alix E. Seif Yuan-Shung Huang Rochelle Bagatell Kelly D. Getz Todd A. Alonzo Robert B. Gerbing Lillian Sung Peter C. Adamson Alan Gamis Richard Aplenc 《PloS one》2015,10(11)

Purpose

Clinical trials data from National Cancer Institute (NCI)-funded cooperative oncology group trials could be enhanced by merging with external data sources. Merging without direct patient identifiers would provide additional patient privacy protections. We sought to develop and validate a matching algorithm that uses only indirect patient identifiers.

Methods

We merged the data from two Phase III Children’s Oncology Group (COG) trials for de novo acute myeloid leukemia (AML) with the Pediatric Health Information Systems (PHIS). We developed a stepwise matching algorithm that used indirect identifiers including treatment site, gender, birth year, birth month, enrollment year and enrollment month. Results from the stepwise algorithm were compared against the direct merge method that used date of birth, treatment site, and gender. The indirect merge algorithm was developed on AAML0531 and validated on AAML1031.

Results

Of 415 patients enrolled on the AAML0531 trial at PHIS centers, we successfully matched 378 (91.1%) patients using the indirect stepwise algorithm. Comparison to the direct merge result suggested that 362 (95.7%) matches identified by the indirect merge algorithm were concordant with the direct merge result. When validating the indirect stepwise algorithm using the AAML1031 trial, we successfully matched 157 out of 165 patients (95.2%) and 150 (95.5%) of the indirectly merged matches were concordant with the directly merged matches.

Conclusions

These data demonstrate that patients enrolled on COG clinical trials can be successfully merged with PHIS administrative data using a stepwise algorithm based on indirect patient identifiers. The merged data sets can be used as a platform for comparative effectiveness and cost effectiveness studies. 相似文献

10.

Overview of BioCreative II gene normalization

Morgan AA Lu Z Wang X Cohen AM Fluck J Ruch P Divoli A Fundel K Leaman R Hakenberg J Sun C Liu HH Torres R Krauthammer M Lau WW Liu H Hsu CN Schuemie M Cohen KB Hirschman L 《Genome biology》2008,9(Z2):S3

Background:

The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%.

Results:

Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers.

Conclusion:

Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.

相似文献

11.

Fourmidable: a database for ant genomics 总被引：1，自引：0，他引：1

Yannick Wurm Paolo Uva Frédéric Ricci John Wang Stephanie Jemielity Christian Iseli Laurent Falquet Laurent Keller 《BMC genomics》2009,10(1):1-5

相似文献

12.

A graph-search framework for associating gene identifiers with documents

William W Cohen Einat Minkov 《BMC bioinformatics》2006,7(1):440-16

Background

One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. 相似文献

13.

KaBOB: ontology-based semantic integration of biomedical databases

Kevin M Livingston Michael Bada William A Baumgartner Jr Lawrence E Hunter 《BMC bioinformatics》2015,16(1)

Background

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results

We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions

KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users. 相似文献

14.

A pseudo-randomised clinical trial of in situ gels of fluconazole for the treatment of oropharngeal candidiasis

Harish M Nairy Narayana R Charyulu Veena A Shetty Prabhu Prabhakara 《Trials》2011,12(1):1-6

相似文献

15.

The Gene Set Builder: collation,curation, and distribution of sets of genes

Dimas?Yusuf Jonathan?S?Lim Wyeth?W?Wasserman Email author 《BMC bioinformatics》2005,6(1):305

Background

In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. 相似文献

16.

Linkage,Evaluation and Analysis of National Electronic Healthcare Data: Application to Providing Enhanced Blood-Stream Infection Surveillance in Paediatric Intensive Care

Katie Harron Harvey Goldstein Angie Wade Berit Muller-Pebody Roger Parslow Ruth Gilbert 《PloS one》2013,8(12)

Background

Linkage of risk-factor data for blood-stream infection (BSI) in paediatric intensive care (PICU) with bacteraemia surveillance data to monitor risk-adjusted infection rates in PICU is complicated by a lack of unique identifiers and under-ascertainment in the national surveillance system. We linked, evaluated and performed preliminary analyses on these data to provide a practical guide on the steps required to handle linkage of such complex data sources.

Methods

Data on PICU admissions in England and Wales for 2003-2010 were extracted from the Paediatric Intensive Care Audit Network. Records of all positive isolates from blood cultures taken for children <16 years and captured by the national voluntary laboratory surveillance system for 2003-2010 were extracted from the Public Health England database, LabBase2. “Gold-standard” datasets with unique identifiers were obtained directly from three laboratories, containing microbiology reports that were eligible for submission to LabBase2 (defined as “clinically significant” by laboratory microbiologists). Reports in the gold-standard datasets were compared to those in LabBase2 to estimate ascertainment in LabBase2. Linkage evaluated by comparing results from two classification methods (highest-weight classification of match weights and prior-informed imputation using match probabilities) with linked records in the gold-standard data. BSI rate was estimated as the proportion of admissions associated with at least one BSI.

Results

Reporting gaps were identified in 548/2596 lab-months of LabBase2. Ascertainment of clinically significant BSI in the remaining months was approximately 80-95%. Prior-informed imputation provided the least biased estimate of BSI rate (5.8% of admissions). Adjusting for ascertainment, the estimated BSI rate was 6.1-7.3%.

Conclusion

Linkage of PICU admission data with national BSI surveillance provides the opportunity for enhanced surveillance but analyses based on these data need to take account of biases due to ascertainment and linkage error. This study provides a generalisable guide for linkage, evaluation and analysis of complex electronic healthcare data. 相似文献

17.

Recon 2.2: from reconstruction to model of human metabolism

Neil Swainston Kieran Smallbone Hooman Hefzi Paul D. Dobson Judy Brewer Michael Hanscho Daniel C. Zielinski Kok Siong Ang Natalie J. Gardiner Jahir M. Gutierrez Sarantos Kyriakopoulos Meiyappan Lakshmanan Shangzhong Li Joanne K. Liu Veronica S. Martínez Camila A. Orellana Lake-Ee Quek Alex Thomas Juergen Zanghellini Nicole Borth Dong-Yup Lee Lars K. Nielsen Douglas B. Kell Nathan E. Lewis Pedro Mendes 《Metabolomics : Official journal of the Metabolomic Society》2016,12(7):109

相似文献

18.

A transcriptional analysis of carotenoid, chlorophyll and plastidial isoprenoid biosynthesis genes during development and osmotic stress responses in Arabidopsis thaliana

Stuart Meier Oren Tzfadia Ratnakar Vallabhaneni Chris Gehring Eleanore T Wurtzel 《BMC systems biology》2011,5(1):1-19

Background

Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts.

Results

We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison.

Conclusions

Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor. 相似文献

19.

Mining physical protein-protein interactions from the literature

Huang M Ding S Wang H Zhu X 《Genome biology》2008,9(Z2):S12

Background:

Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches.

Results:

During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F₁ score of28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F₁ score = 30.40%) and on the entire dataset (30.96%, 29.35%, and26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system.

Conclusion:

We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.

相似文献

20.

Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome

Thomas M Asbury Matt Mitman Jijun Tang W Jim Zheng 《BMC bioinformatics》2010,11(1):444

Background

New technologies are enabling the measurement of many types of genomic and epigenomic information at scales ranging from the atomic to nuclear. Much of this new data is increasingly structural in nature, and is often difficult to coordinate with other data sets. There is a legitimate need for integrating and visualizing these disparate data sets to reveal structural relationships not apparent when looking at these data in isolation. 相似文献