首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The conceptual simplicity of DNA microarray technology often belies the complex nature of the measurement errors inherent in the methodology. As the technology has developed, the importance of understanding the sources of uncertainty in the measurements and developing ways to control their influence on the conclusions drawn has become apparent. In this review, strategies for modeling measurement errors and minimizing their effect on the outcome of experiments using a variety of techniques are discussed in the context of spotted, dual-color microarrays. First, methods designed to reduce the influence of random variability through data filtering, replication, and experimental design are introduced. This is followed by a review of data analysis methods that partition the variance into random effects and one or more systematic effects, specifically two-sample significance testing and analysis of variance (ANOVA) methods. Finally, the current state of measurement error models for spotted microarrays and their role in variance stabilizing transformations are discussed.  相似文献   

2.
Do JH  Choi DK 《Molecules and cells》2006,22(3):254-261
DNA microarray is a powerful tool for high-throughput analysis of biological systems. Various computational tools have been created to facilitate the analysis of the large volume of data produced in DNA microarray experiments. Normalization is a critical step for obtaining data that are reliable and usable for subsequent analysis such as identification of differentially expressed genes and clustering. A variety of normalization methods have been proposed over the past few years, but no methods are still perfect. Various assumptions are often taken in the process of normalization. Therefore, the knowledge of underlying assumption and principle of normalization would be helpful for the correct analysis of microarray data. We present a review of normalization techniques from single-labeled platforms such as the Affymetrix GeneChip array to dual-labeled platforms like spotted array focusing on their principles and assumptions.  相似文献   

3.
尿道致病性大肠杆菌UPEC CFT073菌株(uropathogenic Escherichia coli CFT073)于2002年被完全测序并注释。但是,对其基因组的研究还很不完善,首先表现在基因组注释的系统性错误和滞后性。作者运用一系列生物信息学方法和工具,从编码蛋白质基因、编码RNA基因等角度对RefSeq数据库的基因组注释进行了系统的修正和增补,并在此基础上鉴别了一批新的候选致病因子基因。进一步的分析表明,得到的基因组注释对CFT073致病相关的一些重要调控关系和机制能够给出更准确、完整的描述。  相似文献   

4.
Several systematic errors may occur during the analysis of uninhibited enzyme kinetic data using commercially available multiwell plate reader software. A MATLAB program is developed to remove these systematic errors from the data analysis process for a single substrate-enzyme system conforming to Michaelis-Menten kinetics. Three experimental designs that may be used to validate a new enzyme preparation or assay methodology and to characterize an enzyme-substrate system, while capitalizing on the ability of multiwell plate readers to perform multiple reactions simultaneously, are also proposed. These experimental designs are used to (i) test for enzyme inactivation and the quality of data obtained from an enzyme assay using Selwyn's test, (ii) calculate the limit of detection of the enzyme assay, and (iii) calculate Km and Vm values. If replicates that reflect the overall error in performing a measurement are used, the latter two experiments may be performed with internal estimation of the error structure. The need to correct for the systematic errors discussed and the utility of the proposed experimental designs were confirmed by numerical simulation. The proposed experiments were conducted using recombinant inducible nitric oxide synthase preparations and the oxyhemoglobin assay.  相似文献   

5.
Biophotonics techniques, especially those involving fluorescence, are widely used in proteomics to characterize the in vitro interactions between proteins in high-throughput mode. On the other hand, fluorescence-based imaging studies often show that protein activity is regulated through large protein complexes that transiently form at specific sites in the cell. One could therefore argue that a systematic functional analysis of the human proteome requires technologies that are capable of time and spatially resolved, multiplexed analysis of protein interactions within cells.  相似文献   

6.
Large scale cell biological experiments are beginning to be applied as a systems-level approach to decipher mechanisms that govern cellular function in health and disease. The use of automated microscopes combined with digital imaging, machine learning and other analytical tools has enabled high-content screening (HCS) in a variety of experimental systems. Successful HCS screens demand careful attention to assay development, data acquisition methods and available genomic tools. In this minireview, we highlight developments in this field pertaining to yeast cell biology and discuss how we have combined HCS with methods for automated yeast genetics (synthetic genetic array (SGA) analysis) to enable systematic analysis of cell biological phenotypes in a variety of genetic backgrounds.  相似文献   

7.
Roark DE 《Biophysical chemistry》2004,108(1-3):121-126
Biophysical chemistry experiments, such as sedimentation-equilibrium analyses, require computational techniques to reduce the effects of random errors of the measurement process. The existing approaches have primarily relied on assumption of polynomial models and least-squares approximation. Such models by constraining the data to remove random fluctuations may distort the data and cause loss of information. The better the removal of random errors the greater is the likely introduction of systematic errors through the constraining fit itself. An alternative technique, reverse smoothing, is suggested that makes use of a more model-free approach of exponential smoothing of the first derivative. Exponential smoothing approaches have been generally unsatisfactory because they introduce significant data lag. The approaches given here compensates for the lag defect and appears promising for the smoothing of many experimental data sequences, including the macromolecular concentration data generated by sedimentation-equilibria experiments. Test results on simulated sedimentation-equilibrium data indicate that a 4-fold reduction in error may be typical over standard analyses techniques.  相似文献   

8.
Incorporating DEM Uncertainty in Coastal Inundation Mapping   总被引:1,自引:0,他引:1  
Coastal managers require reliable spatial data on the extent and timing of potential coastal inundation, particularly in a changing climate. Most sea level rise (SLR) vulnerability assessments are undertaken using the easily implemented bathtub approach, where areas adjacent to the sea and below a given elevation are mapped using a deterministic line dividing potentially inundated from dry areas. This method only requires elevation data usually in the form of a digital elevation model (DEM). However, inherent errors in the DEM and spatial analysis of the bathtub model propagate into the inundation mapping. The aim of this study was to assess the impacts of spatially variable and spatially correlated elevation errors in high-spatial resolution DEMs for mapping coastal inundation. Elevation errors were best modelled using regression-kriging. This geostatistical model takes the spatial correlation in elevation errors into account, which has a significant impact on analyses that include spatial interactions, such as inundation modelling. The spatial variability of elevation errors was partially explained by land cover and terrain variables. Elevation errors were simulated using sequential Gaussian simulation, a Monte Carlo probabilistic approach. 1,000 error simulations were added to the original DEM and reclassified using a hydrologically correct bathtub method. The probability of inundation to a scenario combining a 1 in 100 year storm event over a 1 m SLR was calculated by counting the proportion of times from the 1,000 simulations that a location was inundated. This probabilistic approach can be used in a risk-aversive decision making process by planning for scenarios with different probabilities of occurrence. For example, results showed that when considering a 1% probability exceedance, the inundated area was approximately 11% larger than mapped using the deterministic bathtub approach. The probabilistic approach provides visually intuitive maps that convey uncertainties inherent to spatial data and analysis.  相似文献   

9.
The process of identifying active targets (hits) in high-throughput screening (HTS) usually involves 2 steps: first, removing or adjusting for systematic variation in the measurement process so that extreme values represent strong biological activity instead of systematic biases such as plate effect or edge effect and, second, choosing a meaningful cutoff on the calculated statistic to declare positive compounds. Both false-positive and false-negative errors are inevitable in this process. Common control or estimation of error rates is often based on an assumption of normal distribution of the noise. The error rates in hit detection, especially false-negative rates, are hard to verify because in most assays, only compounds selected in primary screening are followed up in confirmation experiments. In this article, the authors take advantage of a quantitative HTS experiment in which all compounds are tested 42 times over a wide range of 14 concentrations so true positives can be found through a dose-response curve. Using the activity status defined by dose curve, the authors analyzed the effect of various data-processing procedures on the sensitivity and specificity of hit detection, the control of error rate, and hit confirmation. A new summary score is proposed and demonstrated to perform well in hit detection and useful in confirmation rate estimation. In general, adjusting for positional effects is beneficial, but a robust test can prevent overadjustment. Error rates estimated based on normal assumption do not agree with actual error rates, for the tails of noise distribution deviate from normal distribution. However, false discovery rate based on empirically estimated null distribution is very close to observed false discovery proportion.  相似文献   

10.
Studies of the Receiver Operating Characteristic (ROC) for taste are reviewed and new data on its shape are presented. What evidence there is suggests that ROCs for taste conform to the normal-normal equal variance model of signal detection theory. Few ROCs for taste have been reported, probably because the large number of trials required by detection theory makes the task arduous for subjects in taste experiments. However, pooling ratings from several subjects and estimating the parameters of the pooled ROC by jackknife techniques circumvents that problem to some extent. Because experiments on taste are often based on a small number of trials, it is especially useful to determine the standard errors of ROC parameters. Methods for estimating these standard errors, including that of the area measure, p(A), are therefore presented.  相似文献   

11.
Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.  相似文献   

12.
MOTIVATION: Normalization of microarray data is essential for multiple-array analyses. Several normalization protocols have been proposed based on different biological or statistical assumptions. A fundamental problem arises whether they have effectively normalized arrays. In addition, for a given array, the question arises how to choose a method to most effectively normalize the microarray data. RESULTS: We propose several techniques to compare the effectiveness of different normalization methods. We approach the problem by constructing statistics to test whether there are any systematic biases in the expression profiles among duplicated spots within an array. The test statistics involve estimating the genewise variances. This is accomplished by using several novel methods, including empirical Bayes methods for moderating the genewise variances and the smoothing methods for aggregating variance information. P-values are estimated based on a normal or chi approximation. With estimated P-values, we can choose a most appropriate method to normalize a specific array and assess the extent to which the systematic biases due to the variations of experimental conditions have been removed. The effectiveness and validity of the proposed methods are convincingly illustrated by a carefully designed simulation study. The method is further illustrated by an application to human placenta cDNAs comprising a large number of clones with replications, a customized microarray experiment carrying just a few hundred genes on the study of the molecular roles of Interferons on tumor, and the Agilent microarrays carrying tens of thousands of total RNA samples in the MAQC project on the study of reproducibility, sensitivity and specificity of the data. AVAILABILITY: Code to implement the method in the statistical package R is available from the authors.  相似文献   

13.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

14.
A number of marine populations exhibit diurnal variations in their behavioral pattern, and this phenomenon has been studied by several authors looking at a variety of species. But to our knowledge, fully adequate statistical tools have not been used in a comprehensive and systematic way. It is the goal of this article to bring forward relevant statistical techniques and to demonstrate how they can be used. Both parametric and nonparametric methods are employed, and we concentrate on such basic statistical issues as testing for the presence of diurnal variations using a nonparametric test and on estimating and testing the shape of diurnal oscillations. We indicate how this can be used to examine the effect of light on diurnal behavior. Our methods are illustrated using data from bottom trawl catches of cod (Gadus morhua) collected during winter surveys in the Barents Sea in the period 1985-1999.  相似文献   

15.
大白菜是重要的蔬菜作物,准确鉴定大白菜品种对于大白菜种质资源管理、新品种测试、种子质量检测等具有重要意义。本研究从已定位到大白菜10个连锁群的205个SSR标记中,筛选出30个在连锁群上分布均匀、PCR扩增稳定、带型简单的标记,用于大白菜品种DNA指纹鉴定。对入选的引物用4种荧光染料进行了标记,利用基于毛细管电泳荧光检测的DNA分析仪对SSR扩增产物进行检测。通过比较分析2种不同型号的3台DNA分析仪的扩增片段检测数据,明确了不同DNA分析仪的检测数据间一般存在明显的系统误差。系统误差的大小取决于引物,一般在1~4 bp之间。使用扩增产物的片段长度对各SSR位点的不同等位变异进行命名。通过使用一组参照品种,消除了不同批次、不同DNA分析仪型号间的系统误差,保证了检测数据的可重复性和可再现性。在此基础上,对184份大白菜品种进行了DNA分子数据采集。  相似文献   

16.
Here we present a methodology for the normalization of element signal intensities to a mean intensity calculated locally across the surface of a DNA microarray. These methods allow the detection and/or correction of spatially systematic artifacts in microarray data. These include artifacts that can be introduced during the robotic printing, hybridization, washing, or imaging of microarrays. Using array element signal intensities alone, this local mean normalization process can correct for such artifacts because they vary across the surface of the array. The local mean normalization can be usedfor quality control and data correction purposes in the analysis of microarray data. These algorithms assume that array elements are not spatially ordered with regard to sequence or biological function and require that this spatial mapping is identical between the two sets of intensities to be compared. The tool described in this report was developed in the R statistical language and is freely available on the Internet as part of a larger gene expression analysis package. This Web implementation is interactive and user-friendly and allows the easy use of the local mean normalization tool described here, without programming expertise or downloading of additional software.  相似文献   

17.
Acquisition of microarray data is prone to systematic errors. A correction, called normalisation, must be applied to the data before further analysis is performed. With many normalisation techniques published and in use, the best way of executing this correction remains an open question. In this study, a variety of single-slide normalisation techniques, and different parameter settings for these techniques, were compared over many replicated microarray experiments. Different normalisation techniques were assessed through the distribution of the standard deviation of replicates from one biological sample across different slides. It is shown that local normalisation outperformed global normalisation, and intensity-based 'LOWESS' outperformed trimmed mean and median normalisation techniques. Overall, the top performing normalisation technique was a print-tip-based LOWESS with zero robust iterations. Lastly, we validated this evaluation methodology by examining the ability to predict oestrogen receptor-positive and -negative breast cancer samples with data that had been normalised using different techniques.  相似文献   

18.
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.  相似文献   

19.
微生物蛋白质组学的定量分析   总被引:2,自引:0,他引:2  
越来越多的微生物基因组序列数据为系统地研究基因的调节和功能创造了有利条件.由于蛋白质是具有生物功能的分子,蛋白质组学在微生物基因组的功能研究中异军突起、蓬勃发展.微生物蛋白质组学的基本原则是,用比较研究来阐明和理解不同微生物之间或不同生长条件下基因的表达水平.显而易见,定量分析技术是比较蛋白质组学中急需发展的核心技术.对蛋白质组学定量分析技术在微生物蛋白质组研究中的进展进行了综述.  相似文献   

20.
Particle tracking techniques are often used to assess the local mechanical properties of cells and biological fluids. The extracted trajectories are exploited to compute the mean-squared displacement that characterizes the dynamics of the probe particles. Limited spatial resolution and statistical uncertainty are the limiting factors that alter the accuracy of the mean-squared displacement estimation. We precisely quantified the effect of localization errors in the determination of the mean-squared displacement by separating the sources of these errors into two separate contributions. A "static error" arises in the position measurements of immobilized particles. A "dynamic error" comes from the particle motion during the finite exposure time that is required for visualization. We calculated the propagation of these errors on the mean-squared displacement. We examined the impact of our error analysis on theoretical model fluids used in biorheology. These theoretical predictions were verified for purely viscous fluids using simulations and a multiple-particle tracking technique performed with video microscopy. We showed that the static contribution can be confidently corrected in dynamics studies by using static experiments performed at a similar noise-to-signal ratio. This groundwork allowed us to achieve higher resolution in the mean-squared displacement, and thus to increase the accuracy of microrheology studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号