首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
caCORE: a common infrastructure for cancer informatics   总被引:4,自引:0,他引:4  
MOTIVATION:Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. RESULTS: We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. AVAILABILITY: caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. SUPPLEMENTARY INFORMATION: http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).  相似文献   

2.
The online service PROSIT (Pseudo-Rotational Online Service and Interactive Tool) is a free service available at http://cactus.nci.nih.gov/prosit/ that performs pseudorotational analysis of nucleosides(tides). PROSIT reads the 3D coordinates of nucleosides and returns the pseudorotational phase angle P, puckering amplitude νmax, and other related information. As examples, the sugar conformations in a parallel-stranded guanine tetraplex and a four-way Holliday junction are presented here.  相似文献   

3.
MOTIVATION: The complexity of cancer is prompting researchers to find new ways to synthesize information from diverse data sources and to carry out coordinated research efforts that span multiple institutions. There is a need for standard applications, common data models, and software infrastructure to enable more efficient access to and sharing of distributed computational resources in cancer research. To address this need the National Cancer Institute (NCI) has initiated a national-scale effort, called the cancer Biomedical Informatics Grid (caBIGtrade mark), to develop a federation of interoperable research information systems. RESULTS: At the heart of the caBIG approach to federated interoperability effort is a Grid middleware infrastructure, called caGrid. In this paper we describe the caGrid framework and its current implementation, caGrid version 0.5. caGrid is a model-driven and service-oriented architecture that synthesizes and extends a number of technologies to provide a standardized framework for the advertising, discovery, and invocation of data and analytical resources. We expect caGrid to greatly facilitate the launch and ongoing management of coordinated cancer research studies involving multiple institutions, to provide the ability to manage and securely share information and analytic resources, and to spur a new generation of research applications that empower researchers to take a more integrative, trans-domain approach to data mining and analysis. AVAILABILITY: The caGrid version 0.5 release can be downloaded from https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/. The operational test bed Grid can be accessed through the client included in the release, or through the caGrid-browser web application http://cagrid-browser.nci.nih.gov.  相似文献   

4.
MOTIVATION: A plugin for the Java-based PathVisio pathway editor has been developed to help users draw diagrams of bioregulatory networks according to the Molecular Interaction Map (MIM) notation. Together with the core PathVisio application, this plugin presents a simple to use and cross-platform application for the construction of complex MIM diagrams with the ability to annotate diagram elements with comments, literature references and links to external databases. This tool extends the capabilities of the PathVisio pathway editor by providing both MIM-specific glyphs and support for a MIM-specific markup language file format for exchange with other MIM-compatible tools and diagram validation. AVAILABILITY: The PathVisio-MIM plugin is freely available and works with versions of PathVisio 2.0.11 and later on Windows, Mac OS X and Linux. Information about MIM notation and the MIMML format is available at http://discover.nci.nih.gov/mim. The plugin, along with diagram examples, instructions and Java source code, may be downloaded at http://discover.nci.nih.gov/mim/mim_pathvisio.html.  相似文献   

5.
MOTIVATION: Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. RESULTS: We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. AVAILABILITY: This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). SUPPLEMENTARY MATERIAL: ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf  相似文献   

6.
Combined analysis of the microarray and drug-activity datasets has the potential of revealing valuable knowledge about various relations among gene expressions and drug activities in the malignant cell. In this paper, we apply Bayesian networks, a tool for compact representation of the joint probability distribution, to such analysis. For the alleviation of data dimensionality problem, the huge datasets were condensed using a feature abstraction technique. The proposed analysis method was applied to the NCI60 dataset (http://discover.nci.nih.gov) consisting of gene expression profiles and drug activity patterns on human cancer cell lines. The Bayesian networks, learned from the condensed dataset, identified most of the salient pairwise correlations and some known relationships among several features in the original dataset, confirming the effectiveness of the proposed feature abstraction method. Also, a survey of the recent literature confirms the several relationships appearing in the learned Bayesian network to be biologically meaningful.  相似文献   

7.
Annotation features from the 1.9-fold whole-genome shotgun (WGS) sequences of domestic cat have been organized into an interactive web application, Genome Annotation Resource Fields (GARFIELD) (http://lgd.abcc.ncifcrf.gov) at the Laboratory of Genomic Diversity and Advanced Biomedical Computing Center (ABCC) at The National Cancer Institute (NCI). The GARFIELD browser allows the user to view annotations on a per chromosome basis with unplaced contigs provided on placeholder chromosomes. Various tracks on the browser allow display of annotations. A Genes track on the browser includes 20 285 regions that align to genes annotated in other mammalian genomes: Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, Bos taurus, and Canis familiaris. Also available are tracks that display the contigs that make up the chromosomes and representations of their GC content and repetitive elements as detected using the RepeatMasker (http://www.repeatmasker.org). Data from the browser can be downloaded in FASTA and GFF format, and users can upload their own data to the display. The Felis catus sequences and their chromosome assignments and additional annotations incorporate data analyzed and produced by a multicenter collaboration between NCI, ABCC, Agencourt Biosciences Corporation, Broad Institute of Harvard and Massachusetts Institute of Technology, National Human Genome Research Institute, National Center for Biotechnology and Information, and Texas A&M.  相似文献   

8.
9.
Gene Set Expression Comparison kit for BRB-ArrayTools   总被引:1,自引:0,他引:1  
  相似文献   

10.
SUMMARY: VISDA (Visual Statistical Data Analyzer) is a caBIG analytical tool for cluster modeling, visualization and discovery that has met silver-level compatibility under the caBIG initiative. Being statistically principled and visually interfaced, VISDA exploits both hierarchical statistics modeling and human gift for pattern recognition to allow a progressive yet interactive discovery of hidden clusters within high dimensional and complex biomedical datasets. The distinctive features of VISDA are particularly useful for users across the cancer research and broader research communities to analyze complex biological data. AVAILABILITY: http://gforge.nci.nih.gov/projects/visda/  相似文献   

11.
SUMMARY: Bambino is a variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs. AVAILABILITY: Bambino is written in platform-independent Java and available from https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html, along with documentation and example data. Bambino may be launched online via Java Web Start or downloaded and run locally.  相似文献   

12.
One of the challenges to the effective utilization of cDNA microarray analysis in mouse models of oncogenesis is the choice of a critical set of probes that are informative for human disease. Given the thousands of genes with a potential role in human oncogenesis and the hundreds of thousands of mouse sequences available for use as probes, selection of an informative set of mouse probes can be an overwhelming task. We have developed a web based sequence mining tool using DataBase Independent (DBI) Perl to annotate publicly available sequences. The Mouse Oncochip Design Tool uses the Mouse Genome Database (MGD) developed and maintained by the Jackson Laboratories for mouse DNA sequences. There are over 380 000 sequences in their database. The output list has been ordered to present the genes more likely to be informative in a mouse model of human cancer using a candidate set of oncogenes to order the list. Mouse sequences that represent genes that are homologous with a member of a human oncogene set are listed first. In addition it provides a set of links for information on clone source gene function. Contact: http://nciarray.nci.nih.gov/cgi-bin/me/mouse_design.cgi  相似文献   

13.
Goonesekere NC  Lee B 《Proteins》2008,71(2):910-919
The sequence homology detection relies on score matrices, which reflect the frequency of amino acid substitutions observed in a dataset of homologous sequences. The substitution matrices in popular use today are usually constructed without consideration of the structural context in which the substitution takes place. Here, we present amino acid substitution matrices specific for particular polar-nonpolar environment of the amino acid. As expected, these matrices [context-specific substitution matrices (CSSMs)] show striking differences from the popular BLOSUM62 matrix, which does not include structural information. When incorporated into BLAST and PSI-BLAST, CSSM outperformed BLOSUM matrices as assessed by ROC curve analyses of the number of true and false hits and by the accuracy of the sequence alignments to the hit sequences. These findings are also of relevance to profile-profile-based methods of homology detection, since CSSMs may help build a better profile. Profiles generated for protein sequences in PDB using CSSM-PSI-BLAST will be made available for searching via RPSBLAST through our web site http://lmbbi.nci.nih.gov/.  相似文献   

14.
NCBI's LocusLink and RefSeq   总被引:1,自引:0,他引:1       下载免费PDF全文
  相似文献   

15.
MOTIVATION: The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards. RESULTS: Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation. AVAILABILITY: The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at http://mged.sourceforge.net/ontologies/index.php. Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do). CONTACT: Stoeckrt@pcbi.upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

16.
The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.  相似文献   

17.
MOTIVATION: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variant in humans. SNPs causing amino acid substitutions are of particular interest as candidates for loci affecting susceptibility to complex diseases, such as diabetes and hypertension. To efficiently screen SNPs for disease association, it is important to distinguish neutral variants from deleterious ones. RESULTS: We describe the use of Pfam protein motif models and the HMMER program to predict whether amino acid changes in conserved domains are likely to affect protein function. We find that the magnitude of the change in the HMMER E-value caused by an amino acid substitution is a good predictor of whether it is deleterious. We provide internet-accessible display tools for a genomewide collection of SNPs, including 7391 distinct non-synonymous coding region SNPs in 2683 genes. AVAILABILITY: http://lpgws.nci.nih.gov/cgi-bin/GeneViewer.cgi  相似文献   

18.
Validation of clinical biomarkers and response to therapy is a challenging topic in cancer research. An important source of information for virtual validation is the datasets generated from multi-center cancer research projects such as The Cancer Genome Atlas project (TCGA). These data enable investigation of genetic and epigenetic changes responsible for cancer onset and progression, response to cancer therapies, and discovery of the molecular profiles of various cancers. However, these analyses often require bulk download of data and substantial bioinformatics expertise, which can be intimidating for investigators. Here, we report on the development of a new resource available to scientists: a data base called Glioblastoma Bio Discovery Portal (GBM-BioDP). GBM-BioDP is a free web-accessible resource that hosts a subset of the glioblastoma TCGA data and enables an intuitive query and interactive display of the resultant data. This resource provides visualization tools for the exploration of gene, miRNA, and protein expression, differential expression within the subtypes of GBM, and potential associations with clinical outcome, which are useful for virtual biological validation. The tool may also enable generation of hypotheses on how therapies impact GBM molecular profiles, which can help in personalization of treatment for optimal outcome. The resource can be accessed freely at http://gbm-biodp.nci.nih.gov (a tutorial is included).  相似文献   

19.
Histone Sequence Database: new histone fold family members.   总被引:2,自引:0,他引:2       下载免费PDF全文
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

20.
In the Cancer Genome Atlas (TCGA) project, gene expression of the same set of samples is measured multiple times on different microarray platforms. There are two main advantages to combining these measurements. First, we have the opportunity to obtain a more precise and accurate estimate of expression levels than using the individual platforms alone. Second, the combined measure simplifies downstream analysis by eliminating the need to work with three sets of expression measures and to consolidate results from the three platforms.We propose to use factor analysis (FA) to obtain a unified gene expression measure (UE) from multiple platforms. The UE is a weighted average of the three platforms, and is shown to perform well in terms of accuracy and precision. In addition, the FA model produces parameter estimates that allow the assessment of the model fit.The R code is provided in File S2. Gene-level FA measurements for the TCGA data sets are available from http://tcga-data.nci.nih.gov/docs/publications/unified_expression/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号