首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: For DNA microarrays, the gain in certainty by performing multiple experimental repeats is offset by the high cost of each experiment. In a typical experiment, two independent measurements (that is, data from two separate arrays) are combined to yield a single comparative index per gene. Thus, although one uses 2n arrays and performs 2n independent measurements, one obtains only n comparative measurements. We addressed the question: how many of the potential n2 comparisons derivable from such data are actually independent, and what effect do these additional comparisons have on the false positive rate. RESULTS: We show there are precisely 2n - 1 independent comparisons available from among the n2 possibilities. Applying these additional n - 1 independent comparisons to experimental and simulated data reduced the false positive rate by as much as 10-fold, with excellent agreement between experimental and theoretical false positive rates.  相似文献   

2.
MOTIVATION: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest. RESULTS: We propose to this end assessing different noise models on a large number of microarray experiments. The goodness of fit of noise models is quantified by a hierarchical Bayesian analysis of variance model, which predicts normalized expression values as a mixture of a Gaussian density and t-distributions with adjustable degrees of freedom. Inference of differentially expressed genes is taken into consideration at a second mixing level. For attaining far reaching validity, our investigations cover a wide range of analysis platforms and experimental settings. As the most striking result, we find irrespective of the chosen preprocessing and normalization method in all experiments that a heavy-tailed noise model is a better fit than a simple Gaussian. Further investigations revealed that an appropriate choice of noise model has a considerable influence on biological interpretations drawn at the level of inferred genes and gene ontology terms. We conclude from our investigation that neglecting the over dispersed noise in microarray data can mislead scientific discovery and suggest that the convenience of Gaussian-based modelling should be replaced by non-parametric approaches or other methods that account for heavy-tailed noise.  相似文献   

3.
MOTIVATION: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis. RESULTS: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to 'denoise' a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.  相似文献   

4.
5.
6.
Statistical methods and microarray data   总被引:1,自引:0,他引:1  
Klebanov L  Qiu X  Welle S  Yakovlev A 《Nature biotechnology》2007,25(1):25-6; author reply 26-7
  相似文献   

7.
A key modality of non-surgical cancer management is DNA damaging therapy that causes DNA double-strand breaks that are preferentially toxic to rapidly dividing cancer cells. Double-strand break repair capacity is recognized as an important mechanism in drug resistance and is therefore a potential target for adjuvant chemotherapy. Additionally, spontaneous and environmentally induced DSBs are known to promote cancer, making DSB evaluation important as a tool in epidemiology, clinical evaluation and in the development of novel pharmaceuticals. Currently available assays to detect double-strand breaks are limited in throughput and specificity and offer minimal information concerning the kinetics of repair. Here, we present the CometChip, a 96-well platform that enables assessment of double-strand break levels and repair capacity of multiple cell types and conditions in parallel and integrates with standard high-throughput screening and analysis technologies. We demonstrate the ability to detect multiple genetic deficiencies in double-strand break repair and evaluate a set of clinically relevant chemical inhibitors of one of the major double-strand break repair pathways, non-homologous end-joining. While other high-throughput repair assays measure residual damage or indirect markers of damage, the CometChip detects physical double-strand breaks, providing direct measurement of damage induction and repair capacity, which may be useful in developing and implementing treatment strategies with reduced side effects.  相似文献   

8.
Single-particle analysis is a 3-D structure determining method using electron microscopy (EM). In this method, a large number of projections is required to create 3-D reconstruction. In order to enable completely automatic pickup without a matching template or a training data set, we established a brand-new method in which the frames to pickup particles are randomly shifted and rotated over the electron micrograph and, using the total average image of the framed images as an index, each frame reaches a particle. In this process, shifts are selected to increase the contrast of the average. By iterated shifts and further selection of the shifts, the frames are induced to shift so as to surround particles. In this algorithm, hundreds of frames are initially distributed randomly over the electron micrograph in which multi-particle images are dispersed. Starting with these frames, one of them is selected and shifted randomly, and acceptance or non-acceptance of its new position is judged using the simulated annealing (SA) method in which the contrast score of the total average image is adopted as an index. After iteration of this process, the position of each frame converges so as to surround a particle and the framed images are picked up. This method is the first unsupervised fully automatic particle picking method which is applicable to EM of various kinds of proteins, especially to low-contrasted cryo-EM protein images.  相似文献   

9.
Microarray technology has been widely adopted by researchers who use both home-made microarrays and microarrays purchased from commercial vendors. Associated with the adoption of this technology has been a deluge of complex data, both from the microarrays themselves, and also in the form of associated meta data, such as gene annotation information, the properties and treatment of biological samples, and the data transformation and analysis steps taken downstream. In addition, standards for annotation and data exchange have been proposed, and are now being adopted by journals and funding agencies alike. The coupling of large quantities of complex data with extensive and complex standards require all but the most small-scale of microarray users to have access to a robust and scaleable database with various tools. In this review, we discuss some of the desirable properties of such a database, and look at the features of several freely available alternatives.  相似文献   

10.
Data analysis and management represent a major challenge for gene expression studies using microarrays. Here, we compare different methods of analysis and demonstrate the utility of a personal microarray database. Gene expression during HIV infection of cell lines was studied using Affymetrix U-133 A and B chips. The data were analyzed using Affymetrix Microarray Suite and Data Mining Tool, Silicon Genetics GeneSpring, and dChip from Harvard School of Public Health. A small-scale database was established with FileMaker Pro Developer to manage and analyze the data. There was great variability among the programs in the lists of significantly changed genes constructed from the same data. Similarly choices of different parameters for normalization, comparison, and standardization greatly affected the outcome. As many probe sets on the U133 chip target the same Unigene clusters, the Unigene information can be used as an internal control to confirm and interpret the probe set results. Algorithms used for the determination of changes in gene expression require further refinement and standardization. The use of a personal database powered with Unigene information can enhance the analysis of gene expression data.  相似文献   

11.
MOTIVATION: Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS: We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data.  相似文献   

12.
Normalization of cDNA microarray data   总被引:43,自引:0,他引:43  
Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes. This paper describes normalization methods based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. Print-tip loess normalization provides a well-tested general purpose normalization method which has given good results on a wide range of arrays. The method may be refined by using quality weights for individual spots. The method is best combined with diagnostic plots of the data which display the spatial and intensity trends. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such as plate-order normalization or scale-normalization between the arrays may be undertaken. Composite normalization may be used when control spots are available which are known to be not differentially expressed. Variations on loess normalization include global loess normalization and two-dimensional normalization. Detailed commands are given to implement the normalization techniques using freely available software.  相似文献   

13.
14.
Computational analysis of microarray data   总被引:1,自引:0,他引:1  
Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.  相似文献   

15.
16.
  相似文献   

17.
Simulated annealing approach to the study of protein structures   总被引:1,自引:0,他引:1  
One of the most difficult problems in predicting the three dimensional structure of proteins is how to deal with the local minimum problem. In many cases of practical interest this problem has been reduced to how to select an appropriate set of starting conformations for carrying out energy minimizations. How these starting conformations are selected, however, is often based on the physical intuition of the person doing the calculations, and hence it is hard to avoid bearing some sort of arbitrariness. To improve such a situation, we introduced the simulated annealing Monte Carlo algorithm to locate the optimal starting conformations for energy minimizations. The method developed here is valid for both single and multiple polypeptide chain systems. The annealing process can be conducted with respect to either the internal dihedral angles of a polypeptide chain or the external rotations and translations of various constituent polypeptide chains, and hence is particularly useful for studying the packing arrangements of secondary structures in proteins, such as helix/helix packing, helix/sheet packing and sheet/sheet packing. It was shown via a number of comparative calculations that the final structures obtained through the annealing process not only had lower energies than the corresponding energy-minimized structures reported previously, but also assumed the forms closer to the observations in proteins. All these results indicate that a better result can be obtained in search of low-energy structures of proteins by incorporating the simulated annealing approach.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

18.
Transformation and normalization of oligonucleotide microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: Most methods of analyzing microarray data or doing power calculations have an underlying assumption of constant variance across all levels of gene expression. The most common transformation, the logarithm, results in data that have constant variance at high levels but not at low levels. Rocke and Durbin showed that data from spotted arrays fit a two-component model and Durbin, Hardin, Hawkins, and Rocke, Huber et al. and Munson provided a transformation that stabilizes the variance as well as symmetrizes and normalizes the error structure. We wish to evaluate the applicability of this transformation to the error structure of GeneChip microarrays. RESULTS: We demonstrate in an example study a simple way to use the two-component model of Rocke and Durbin and the data transformation of Durbin, Hardin, Hawkins and Rocke, Huber et al. and Munson on Affymetrix GeneChip data. In addition we provide a method for normalization of Affymetrix GeneChips simultaneous with the determination of the transformation, producing a data set without chip or slide effects but with constant variance and with symmetric errors. This transformation/normalization process can be thought of as a machine calibration in that it requires a few biologically constant replicates of one sample to determine the constant needed to specify the transformation and normalize. It is hypothesized that this constant needs to be found only once for a given technology in a lab, perhaps with periodic updates. It does not require extensive replication in each study. Furthermore, the variance of the transformed pilot data can be used to do power calculations using standard power analysis programs. AVAILABILITY: SPLUS code for the transformation/normalization for four replicates is available from the first author upon request. A program written in C is available from the last author.  相似文献   

19.
20.

Background  

Many researchers are concerned with the comparability and reliability of microarray gene expression data. Recent completion of the MicroArray Quality Control (MAQC) project provides a unique opportunity to assess reproducibility across multiple sites and the comparability across multiple platforms. The MAQC analysis presented for the conclusion of inter- and intra-platform comparability/reproducibility of microarray gene expression measurements is inadequate. We evaluate the reproducibility/comparability of the MAQC data for 12901 common genes in four titration samples generated from five high-density one-color microarray platforms and the TaqMan technology. We discuss some of the problems with the use of correlation coefficient as metric to evaluate the inter- and intra-platform reproducibility and the percent of overlapping genes (POG) as a measure for evaluation of a gene selection procedure by MAQC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号