首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.  相似文献   

2.
Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is more robust and sensitive to weak spots. More importantly, it can obtain higher segmentation accuracy in the presence of noise, artifacts and weakly expressed spots compared with the other four methods.  相似文献   

3.
KMD     
The Keck Microarray Database (KMD) is a port of the ArrayExpress database from Oracle to the MySQL environment. The requirements for a locally available, open-source microarray database solution based on ArrayExpress are analysed in this article. The differences between the Oracle and MySQL environments are identified and the method to port to MySQL is described, providing a unified relational database management system (RDBMS) platform for both MIAMExpress and ArrayExpress. AVAILABILITY: The software and documentation are available from the Keck Graduate Institute of Applied Life Sciences website at http://public.kgi.edu/~jmainguy/applied-bioinformatics.htm.  相似文献   

4.
The MUSC DNA Microarray Database   总被引:1,自引:0,他引:1  
SUMMARY: The Medical University of South Carolina (MUSC) DNA Microarray Database is a web-accessible archive of DNA microarray data. The database was developed using the DNA microarray project/data management system, micro ArrayDB. Annotations for each DNA microarray project and associated cRNA target information are stored in a MySQL relational database and linked to array hybridization data (raw and normalized). At the discretion of investigators, data are placed into the public domain where they can be interrogated and downloaded through a web browser. In addition to serving as an online resource of gene expression data, the MUSC DNA Microarray Database is a model for other academic DNA microarray data repositories. AVAILABILITY: Browsing and downloading of MUSC DNA Microarray Database information can be done after registration at http://proteogenomics.musc.edu/pss/home.php.  相似文献   

5.
MOTIVATION: Microarrays are an important research tool for the advancement of basic biological sciences. However this technology has yet to be integrated with clinical decision making. We have implemented an information framework based on the Microarray Gene Expression Markup Language (MAGE-ML) specification. We are using this framework to develop a test-bed integrated database application to identify genomic and imaging markers for diagnosis of breast cancer. RESULTS: We developed extensible software architecture for retrieving data from different microarray databases using MAGE-ML and for combining microarray data with breast cancer image analysis and clinical data for correlation studies. The framework we developed will provide the necessary data integration to move microarray research from basic biological sciences to clinical applications. AVAILABILITY: Open source software will be available from SourceForge (http://sourceforge.net/projects/microsoap/).  相似文献   

6.
MOTIVATION: The lack of microarray data management systems and databases is still one of the major problems faced by many life sciences laboratories. While developing the public repository for microarray data ArrayExpress we had to find novel solutions to many non-trivial software engineering problems. Our experience will be both relevant and useful for most bioinformaticians involved in developing information systems for a wide range of high-throughput technologies. RESULTS: ArrayExpress has been online since February 2002, growing exponentially to well over 10,000 hybridizations (as of September 2004). It has been demonstrated that our chosen design and implementation works for databases aimed at storage, access and sharing of high-throughput data. AVAILABILITY: The ArrayExpress database is available at http://www.ebi.ac.uk/arrayexpress/. The software is open source. CONTACT: ugis@ebi.ac.uk.  相似文献   

7.
SUMMARY: Large volumes of microarray data are generated and deposited in public databases. Most of this data is in the form of tab-delimited text files or Excel spreadsheets. Combining data from several of these files to reanalyze these data sets is time consuming. Microarray Data Assembler is specifically designed to simplify this task. The program can list files and data sources, convert selected text files into Excel files and assemble data across multiple Excel worksheets and workbooks. This program thus makes data assembling easy, saves time and helps avoid manual error. AVAILABILITY: The program is freely available for non-profit use, via email request from the author, after signing a Material Transfer Agreement with Johns Hopkins University.  相似文献   

8.
9.
A robust bioinformatics capability is widely acknowledged as central to realizing the promises of toxicogenomics. Successful application of toxicogenomic approaches, such as DNA microarray, inextricably relies on appropriate data management, the ability to extract knowledge from massive amounts of data and the availability of functional information for data interpretation. At the FDA's National Center for Toxicological Research (NCTR), we are developing a public microarray data management and analysis software, called ArrayTrack. ArrayTrack is Minimum Information About a Microarray Experiment (MIAME) supportive for storing both microarray data and experiment parameters associated with a toxicogenomics study. A quality control mechanism is implemented to assure the fidelity of entered expression data. ArrayTrack also provides a rich collection of functional information about genes, proteins and pathways drawn from various public biological databases for facilitating data interpretation. In addition, several data analysis and visualization tools are available with ArrayTrack, and more tools will be available in the next released version. Importantly, gene expression data, functional information and analysis methods are fully integrated so that the data analysis and interpretation process is simplified and enhanced. ArrayTrack is publicly available online and the prospective user can also request a local installation version by contacting the authors.  相似文献   

10.
Microarray technology has resulted in an explosion of complex, valuable data. Integrating data analysis tools with a comprehensive underlying database would allow efficient identification of common properties among differentially regulated genes. In this study we sought to compare the utility of various databases in microarray analysis. The Proteome BioKnowledge Library (BKL), a manually curated, proteome-wide compilation of the scientific literature, was used to generate a list of Gene Ontology (GO) Biological Process (BP) terms enriched among proteins involved in cardiovascular disease. Analysis of DNA microarray data generated in a study of rat vascular smooth muscle cell responses revealed significant enrichment in a number of GO BPs that were also enriched among cardiovascular disease-related proteins. Using annotation from LocusLink and chip annotation from the Gene Expression Omnibus yielded fewer enriched cardiovascular disease-associated GO BP terms. Data sets of orthologous genes from mouse and human were generated using the BKL Retriever. Analysis of these sets focusing on BKL Disease annotation, revealed a significant association of these genes with cardiovascular disease. These results and the extensive presence of experimental evidence for BKL GO and Disease features, underscore the benefits of using this database for microarray analysis.  相似文献   

11.
We have developed a publicly accessible database (ALFRED, the ALlele FREquency Database) that catalogues allele frequency data for a wide range of population samples and DNA polymorphisms. This database is web-accessible through our laboratory (Kidd Lab) Web site: http://info.med.yale.edu/genetics/kkidd. ALFRED currently contains data on 60 populations and 156 genetic systems including single nucleotide polymorphisms (SNPs), short tandem repeat polymorphisms (STRPs), variable number of tandem repeats (VNTRs) and insertion-deletion polymorphisms. While data are not available for all population-DNA polymorphism combinations, over 2000 allele frequency tables have been entered. Our database is designed (i) to address our specific research requirements as well as broader scientific objectives; (ii) to allow researchers and interested educators to easily navigate and retrieve data of interest to them; and (iii) to integrate links to other related public databases such as dbSNP, GenBank and PubMed.  相似文献   

12.
ADAM: another database of abbreviations in MEDLINE   总被引:1,自引:0,他引:1  
MOTIVATION: Abbreviations are an important type of terminology in the biomedical domain. Although several groups have already created databases of biomedical abbreviations, these are either not public, or are not comprehensive, or focus exclusively on acronym-type abbreviations. We have created another abbreviation database, ADAM, which covers commonly used abbreviations and their definitions (or long-forms) within MEDLINE titles and abstracts, including both acronym and non-acronym abbreviations. RESULTS: A model of recognizing abbreviations and their long-forms from titles and abstracts of MEDLINE (2006 baseline) was employed. After grouping morphological variants, 59 405 abbreviation/long-form pairs were identified. ADAM shows high precision (97.4%) and includes most of the frequently used abbreviations contained in the Unified Medical Language System (UMLS) Lexicon and the Stanford Abbreviation Database. Conversely, one-third of abbreviations in ADAM are novel insofar as they are not included in either database. About 19% of the novel abbreviations are non-acronym-type and these cover at least seven different types of short-form/long-form pairs. AVAILABILITY: A free, public query interface to ADAM is available at http://arrowsmith.psych.uic.edu, and the entire database can be downloaded as a text file.  相似文献   

13.
Web Tools for Rice Transcriptome Analyses   总被引:1,自引:0,他引:1  
Gene expression databases provide profiling data for the expression of thousands of genes to researchers worldwide. Oligonucleotide microarray technology is a useful tool that has been employed to produce gene expression profiles in most species. In rice, there are five genome-wide DNA microarray platforms: NSF 45K, BGI/Yale 60K, Affymetrix, Agilent Rice 44K, and NimbleGen 390K. Presently, more than 1,700 hybridizations of microarray gene expression data are available from public microarray depositing databases such as NCBI gene expression omnibus and Arrayexpress at EBI. More processing or reformatting of public gene expression data is required for further applications or analyses. Web-based databases for expression meta-analyses are useful for guiding researchers in designing relevant research schemes. In this review, we summarize various databases for expression meta-analyses of rice genes and web tools for further applications, such as the development of co-expression network or functional gene network.  相似文献   

14.
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.  相似文献   

15.
ArrayExpress is a new public database of microarray gene expression data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. ArrayExpress uses the annotation standard Minimum Information About a Microarray Experiment (MIAME) and the associated XML data exchange format Microarray Gene Expression Markup Language (MAGE-ML) and it is designed to store well annotated data in a structured way. The ArrayExpress infrastructure consists of the database itself, data submissions in MAGE-ML format or via an online submission tool MIAMExpress, online database query interface, and the Expression Profiler online analysis tool. ArrayExpress accepts three types of submission, arrays, experiments and protocols, each of these is assigned an accession number. Help on data submission and annotation is provided by the curation team. The database can be queried on parameters such as author, laboratory, organism, experiment or array types. With an increasing number of organisations adopting MAGE-ML standard, the volume of submissions to ArrayExpress is increasing rapidly. The database can be accessed at http://www.ebi.ac.uk/arrayexpress.  相似文献   

16.
Bioinformatics approaches in the study of cancer   总被引:1,自引:0,他引:1  
A revolution is underway in the approach to studying the genetic basis of cancer. Massive amounts of data are now being generated via high-throughput techniques such as DNA microarray technology and new computational algorithms have been developed to aid in analysis. At the same time, standards-based repositories, including the Stanford Microarray Database and the Gene Expression Omnibus have been developed to store and disseminate the results of microarray experiments. Bioinformatics, the convergence of biology, information science, and computation, has played a key role in these developments. Recently developed techniques include Module Maps, SLAMS (Stepwise Linkage Analysis of Microarray Signatures), and COPA (Cancer Outlier Profile Analysis). What these techniques have in common is the application of novel algorithms to find high-level gene expression patterns across heterogeneous microarray experiments. Large-scale initiatives are underway as well. The Cancer Genome Atlas (TCGA) project is a logical extension of the Human Genome Project and is meant to produce a comprehensive atlas of genetic changes associated with cancer. The Cancer Biomedical Informatics Grid (caBIG), led by the NCI, also represents a colossal initiative involving virtually all aspects of cancer research and may help to transform the way cancer research is conducted and data are shared.  相似文献   

17.
Candida albicans is an important fungal model organism of noteworthy clinical interest in modern medicine. Different initiatives addressing its sequencing and physical mapping have been carried out. The C. albicans genome sequence is currently near to completion at Stanford University, heralding new challenges in proteomic research and functional analyses of its gene products. This review presents an update of the most relevant data resources that are available through the World Wide Web to scientists working in the area of the analysis of the C. albicans proteome. An overview of the current status of the main universal protein sequence databases and specialized data collections for C. albicans is given. Various issues of the single public C. albicans 2D-PAGE database are also described, highlighting the significance of setting up graphical query interface-based databanks to visualize 2D-PAGE images through the Net. Finally, we also emphasize the pressing need to create a "cyber-bioknowledge library" that will integrate all the databases developed at the different levels for the understanding of life processes as well as bioinformatic tools for interpreting this deluge of data generated through the Internet.  相似文献   

18.
19.
With advances in robotics, computational capabilities, and the fabrication of high quality glass slides coinciding with increased genomic information being available on public databases, microarray technology is increasingly being used in laboratories around the world. In fact, fields as varied as: toxicology, evolutionary biology, drug development and production, disease characterization, diagnostics development, cellular physiology and stress responses, and forensics have benefiting from its use. However, for many researchers not familiar with microarrays, current articles and reviews often address neither the fundamental principles behind the technology nor the proper designing of experiments. Although, microarray technology is relatively simple, conceptually, its practice does require careful planning and detailed understanding of the limitations inherently present. Without these considerations, it can be exceedingly difficult to ascertain valuable information from microarray data. Therefore, this text aims to outline key features in microarray technology, paying particular attention to current applications as outlined in recent publications, experimental design, statistical methods, and potential uses. Furthermore, this review is not meant to be comprehensive, but rather substantive; highlighting important concepts and detailing steps necessary to conduct and interpret microarray experiments. Collectively, the information included in this text will highlight the versatility of microarray technology and provide a glimpse of what the future may hold.  相似文献   

20.
Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.”  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号