首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.  相似文献   

2.
Computational analysis of microarray data   总被引:1,自引:0,他引:1  
Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.  相似文献   

3.

Background  

Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data.  相似文献   

4.
Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of differentially expressed genes. An automatic ontological analysis approach has been recently proposed to help with the biological interpretation of such results. Currently, this approach is the de facto standard for the secondary analysis of high throughput experiments and a large number of tools have been developed for this purpose. We present a detailed comparison of 14 such tools using the following criteria: scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data. This detailed analysis of the capabilities of these tools will help researchers choose the most appropriate tool for a given type of analysis. More importantly, in spite of the fact that this type of analysis has been generally adopted, this approach has several important intrinsic drawbacks. These drawbacks are associated with all tools discussed and represent conceptual limitations of the current state-of-the-art in ontological analysis. We propose these as challenges for the next generation of secondary data analysis tools.  相似文献   

5.
Nonlinearity is important and ubiquitous in ecology. Though detectable in principle, nonlinear behavior is often difficult to characterize, analyze, and incorporate mechanistically into models of ecosystem function. One obvious reason is that quantitative nonlinear analysis tools are data intensive (require long time series), and time series in ecology are generally short. Here we demonstrate a useful method that circumvents data limitation and reduces sampling error by combining ecologically similar multispecies time series into one long time series. With this technique, individual ecological time series containing as few as 20 data points can be mined for such important information as (1) significantly improved forecast ability, (2) the presence and location of nonlinearity, and (3) the effective dimensionality (the number of relevant variables) of an ecological system.  相似文献   

6.
7.
Whole-genome copy number analysis platforms, such as array comparative genomic hybridization (aCGH) and single nucleotide polymorphism (SNP) arrays, are transformative research discovery tools. In cancer, the identification of genomic aberrations with these approaches has generated important diagnostic and prognostic markers, and critical therapeutic targets. While robust for basic research studies, reliable whole-genome copy number analysis has been unsuccessful in routine clinical practice due to a number of technical limitations. Most important, aCGH results have been suboptimal because of the poor integrity of DNA derived from formalin-fixed paraffin-embedded (FFPE) tissues. Using self-hybridizations of a single DNA sample we observed that aCGH performance is significantly improved by accurate DNA size determination and the matching of test and reference DNA samples so that both possess similar fragment sizes. Based on this observation, we developed a novel DNA fragmentation simulation method (FSM) that allows customized tailoring of the fragment sizes of test and reference samples, thereby lowering array failure rates. To validate our methods, we combined FSM with Universal Linkage System (ULS) labeling to study a cohort of 200 tumor samples using Agilent 1 M feature arrays. Results from FFPE samples were equivalent to results from fresh samples and those available through the glioblastoma Cancer Genome Atlas (TCGA). This study demonstrates that rigorous control of DNA fragment size improves aCGH performance. This methodological advance will permit the routine analysis of FFPE tumor samples for clinical trials and in daily clinical practice.  相似文献   

8.
Protein microarray technology is rapidly growing and has the potential to accelerate the discovery of targets of serum antibody responses in cancer, autoimmunity and infectious disease. Analytical tools for interpreting this high-throughput array data, however, are not well-established. We developed a concentration-dependent analysis (CDA) method which normalizes protein microarray data based on the concentration of spotted probes. We show that this analysis samples a data space that is complementary to other commonly employed analyses, and demonstrate experimental validation of 92% of hits identified by the intersection of CDA with other tools. These data support the use of CDA either as a preprocessing step for a more complete proteomic microarray data analysis or as a stand-alone analysis method.  相似文献   

9.
Tiling arrays of high-density oligonucleotide probes spanning the entire genome are powerful tools for the discovery of new genes. However, it is difficult to determine the structure of the spliced product of a structurally unknown gene from noisy array signals only. Here we introduce a statistical method that estimates the precise splicing points and the exon/intron structure of a structurally unknown gene by maximizing the odds or the ratio of posterior probabilities of the structure under the observation of array signal intensities and nucleic acid sequences. Our method more accurately predicted the gene structures than the simple threshold-based method, and more correctly estimated the expression values of structurally unknown genes than the window-based method. It was observed that the Markov model contributed to the precision of splice points, and that the statistical significance of expression (P-value) represented the reliability of the estimated gene structure and expression value well. We have implemented the method as a program ARTADE (ARabidopsis Tiling Array-based Detection of Exons) and applied it to the Arabidopsis thaliana whole-genome array data analysis. The database of the predicted results and the ARTADE program are available at http://omicspace.riken.jp/ARTADE/.  相似文献   

10.
Given the current trends, it seems inevitable that all biological documents will eventually exist in a digital format and be distributed across the internet. New network services and tools need to be developed to increase retrieval rates for documents and to refine data recovery. Biological data have traditionally been well managed using taxonomic principles. As part of a larger initiative to build an array of names-based network services that emulate taxonomic principles for managing biological information, we undertook the digitization of a major taxonomic reference text, Nomenclator Zoologicus. The process involved replicating the text to a high level of fidelity, parsing the content for inclusion within a database, developing tools to enable expert input into the product, and integrating the metadata and factual content within taxonomic network services. The result is a high-quality and freely available web application (http://uio.mbl.edu/NomenclatorZoologicus/) capable of being exploited in an array of biological informatics services.  相似文献   

11.
12.
T Conway  B Kraus  D L Tucker  D J Smalley  A F Dorman  L McKibben 《BioTechniques》2002,32(1):110, 112-4, 116, 118-9
Microsoft Windows-based computers have evolved to the point that they provide sufficient computational and visualization power for robust analysis of DNA array data. In fact, smaller laboratories might prefer to carry out some or all of their analyses and visualization in a Windows environment, rather than alternative platforms such as UNIX. We have developed a series of manually executed macros written in Visual Basic for Microsoft Excel spreadsheets, that allows for rapid and comprehensive gene expression data analysis. The first macro assigns gene names to spots on the DNA array and normalizes individual hybridizations by expressing the signal intensity for each gene as a percentage of the sum of all gene intensities. The second macro streamlines statistical consideration of the confidence in individual gene measurements for sets of experimental replicates by calculating probability values with the Student's t test. The third macro introduces a threshold value, calculates expression ratios between experimental conditions, and calculates the standard deviation of the mean of the log ratio values. Selected columns of data are copied by a fourth macro to create a processed data set suitable for entry into a Microsoft Access database. An Access database structure is described that allows simple queries across multiple experiments and export of data into third-party data visualization software packages. These analysis tools can be used in their present form by others working with commercial E. coli membrane arrays, or they may be adapted for use with other systems. The Excel spreadsheets with embedded Visual Basic macros and detailed instructions for their use are available at http://www.ou.edu/microarray.  相似文献   

13.
Chemical genetic screening and DNA and protein microarrays are among a number of increasingly important and widely used biological research tools that involve large numbers of parallel experiments arranged in a spatial array. It is often difficult to ensure that uniform experimental conditions are present throughout the entire array, and as a result, one often observes systematic spatially correlated errors, especially when array experiments are performed using robots. Here, the authors apply techniques based on the discrete Fourier transform to identify and quantify spatially correlated errors superimposed on a spatially random background. They demonstrate that these techniques are effective in identifying common spatially systematic errors in high-throughput 384-well microplate assay data. In addition, the authors employ a statistical test to allow for automatic detection of such errors. Software tools for using this approach are provided.  相似文献   

14.
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.  相似文献   

15.
B. Patterson  J. A. Spudich 《Genetics》1996,143(2):801-810
Dictyostelium provides a powerful environment for characterization of myosin II function. It provides well-established biochemical methods for in vitro analysis of myosin's properties as well as an array of molecular genetic tools. The absence of myosin function results in an array of phenotypes that can be used to genetically manipulate myosin function. We have previously reported methods for the isolation and identification of rapid-effect cold-sensitive myosin II mutations in Dictyostelium. Here, we report the development and utilization of a rapid method for localizing these point mutations. We have also sequenced 19 mutants. The mutations show distinct clustering with respect to three-dimensional location and biochemically characterized functional domains of the protein. We conclude that these mutants represent powerful tools for understanding the mechanisms driving this protein motor.  相似文献   

16.
M Puech  F Giroud 《Cytometry》1999,36(1):11-17
BACKGROUND: DNA image analysis is frequently performed in clinical practice as a prognostic tool and to improve diagnosis. The precision of prognosis and diagnosis depends on the accuracy of analysis and particularly on the quality of image analysis systems. It has been reported that image analysis systems used for DNA quantification differ widely in their characteristics (Thunissen et al.: Cytometry 27: 21-25, 1997). This induces inter-laboratory variations when the same sample is analysed in different laboratories. In microscopic image analysis, the principal instrumentation errors arise from the optical and electronic parts of systems. They bring about problems of instability, non-linearity, and shading and glare phenomena. METHODS: The aim of this study is to establish tools and standardised quality control procedures for microscopic image analysis systems. Specific reference standard slides have been developed to control instability, non-linearity, shading and glare phenomena and segmentation efficiency. RESULTS: Some systems have been controlled with these tools and these quality control procedures. Interpretation criteria and accuracy limits of these quality control procedures are proposed according to the conclusions of a European project called PRESS project (Prototype Reference Standard Slide). Beyond these limits, tested image analysis systems are not qualified to realise precise DNA analysis. CONCLUSIONS: The different procedures presented in this work determine if an image analysis system is qualified to deliver sufficiently precise DNA measurements for cancer case analysis. If the controlled systems are beyond the defined limits, some recommendations are given to find a solution to the problem.  相似文献   

17.
Modern biomedical research is evolving with the rapid growth of diverse data types, biophysical characterization methods, computational tools and extensive collaboration among researchers spanning various communities and having complementary backgrounds and expertise. Collaborating researchers are increasingly dependent on shared data and tools made available by other investigators with common interests, thus forming communities that transcend the traditional boundaries of the single research laboratory or institution. Barriers, however, remain to the formation of these virtual communities, usually due to the steep learning curve associated with becoming familiar with new tools, or with the difficulties associated with transferring data between tools. Recognizing the need for shared reference data and analysis tools, we are developing an integrated knowledge environment that supports productive interactions among researchers. Here we report on our current collaborative environment, which focuses on bringing together structural biologists working in the area of mass spectrometric based methods for the analysis of tertiary and quaternary macromolecular structures (MS3D) called the Collaboratory for MS3D (C-MS3D). C-MS3D is a Web-portal designed to provide collaborators with a shared work environment that integrates data storage and management with data analysis tools. Files are stored and archived along with pertinent meta data in such a way as to allow file handling to be tracked (data provenance) and data files to be searched using keywords and modification dates. While at this time the portal is designed around a specific application, the shared work environment is a general approach to building collaborative work groups. The goal of this is to not only provide a common data sharing and archiving system, but also to assist in the building of new collaborations and to spur the development of new tools and technologies.  相似文献   

18.
Geometric morphometrics involves defining landmark points to generate a discrete representation of an object. This crucial step is strongly influenced by the biological question guiding the analysis, and even more when using curve and surface semi-landmarks methods, because these require to generate a template of reference. We exemplify these constraints using two datasets from projects with very different backgrounds. The Theropod Dataset is a functional morphometric analysis of different extinct and extant theropod pelves. The Shrew Dataset is a populational morphometric analysis of the white-toothed shrew with very small variations in skull shape. We propose a novel procedure to generate a regular template configuration, using polygonal modelling tools. This method allows us to control the template geometry and adapt its complexity to the morphological variation in the sample. More studies are necessary to assess the morphometric and statistical importance of template design in curve and surface analyses.  相似文献   

19.
An integrated software system for analyzing ChIP-chip and ChIP-seq data   总被引:1,自引:0,他引:1  
Ji H  Jiang H  Ma W  Johnson DS  Myers RM  Wong WH 《Nature biotechnology》2008,26(11):1293-1300
  相似文献   

20.
Immunoprecipitation of RNA binding proteins (RBPs) after in vivo crosslinking, coupled with sequencing of associated RNA footprints (HITS-CLIP, CLIP-seq), is a method of choice for the identification of RNA targets and binding sites for RBPs. Compared with RNA-seq, CLIP-seq analysis is widely diverse and depending on the RBPs that are analyzed, the approaches vary significantly, necessitating the development of flexible and efficient informatics tools. In this study, we present CLIPSeqTools, a novel, highly flexible computational suite that can perform analysis from raw sequencing data with minimal user input. It contains a wide array of tools to provide an in-depth view of CLIP-seq data sets. It supports extensive customization and promotes improvization, a critical virtue, since CLIP-seq analysis is rarely well defined a priori. To highlight CLIPSeqTools capabilities, we used the suite to analyze Ago-miRNA HITS-CLIP data sets that we prepared from human brains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号