首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: To study lowly expressed genes in microarray experiments, it is useful to increase the photometric gain in the scanning. However, a large gain may cause some pixels for highly expressed genes to become saturated. Spatial statistical models that model spot shapes on the pixel level may be used to infer information about the saturated pixel intensities. Other possible applications for spot shape models include data quality control and accurate determination of spot centres and spot diameters. RESULTS: Spatial statistical models for spotted microarrays are studied including pixel level transformations and spot shape models. The models are applied to a dataset from 50mer oligonucleotide microarrays with 452 selected Arabidopsis genes. Logarithmic, Box-Cox and inverse hyperbolic sine transformations are compared in combination with four spot shape models: a cylindric plateau shape, an isotropic Gaussian distribution and a difference of two-scaled Gaussian distribution suggested in the literature, as well as a proposed new polynomial-hyperbolic spot shape model. A substantial improvement is obtained for the dataset studied by the polynomial-hyperbolic spot shape model in combination with the Box-Cox transformation. The spatial statistical models are used to correct spot measurements with saturation by extrapolating the censored data. AVAILABILITY: Source code for R is available at http://www.matfys.kvl.dk/~ekstrom/spotshapes/  相似文献   

2.
3.
Data preprocessing including proper normalization and adequate quality control before complex data mining is crucial for studies using the cDNA microarray technology. We have developed a simple procedure that integrates data filtering and normalization with quantitative quality control of microarray experiments. Previously we have shown that data variability in a microarray experiment can be very well captured by a quality score q(com) that is defined for every spot, and the ratio distribution depends on q(com). Utilizing this knowledge, our data-filtering scheme allows the investigator to decide on the filtering stringency according to desired data variability, and our normalization procedure corrects the q(com)-dependent dye biases in terms of both the location and the spread of the ratio distribution. In addition, we propose a statistical model for false positive rate determination based on the design and the quality of a microarray experiment. The model predicts that a lower limit of 0.5 for the replicate concordance rate is needed in order to be certain of true positives. Our work demonstrates the importance and advantages of having a quantitative quality control scheme for microarrays.  相似文献   

4.
Automatic analysis of DNA microarray images using mathematical morphology   总被引:10,自引:0,他引:10  
MOTIVATION: DNA microarrays are an experimental technology which consists in arrays of thousands of discrete DNA sequences that are printed on glass microscope slides. Image analysis is an important aspect of microarray experiments. The aim of this step is to reduce an image of spots into a table with a measure of the intensity for each spot. Efficient, accurate and automatic analysis of DNA spot images is essential in order to use this technology in laboratory routines. RESULTS: We present an automatic non-supervised set of algorithms for a fast and accurate spot data extraction from DNA microarrays using morphological operators which are robust to both intensity variation and artefacts. The approach can be summarised as follows. Initially, a gridding algorithm yields the automatic segmentation of the microarray image into spot quadrants which are later individually analysed. Then the analysis of the spot quadrant images is achieved in five steps. First, a pre-quantification, the spot size distribution law is calculated. Second, the background noise extraction is performed using a morphological filtering by area. Third, an orthogonal grid provides the first approach to the spot locus. Fourth, the spot segmentation or spot boundaries definition is carried out using the watershed transformation. And fifth, the outline of detected spots allows the signal quantification or spot intensities extraction; in this respect, a noise model has been investigated. The performance of the algorithm has been compared with two packages: ScanAlyze and Genepix, showing its robustness and precision.  相似文献   

5.
The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results.  相似文献   

6.
One of the most commonly used methods for protein separation is 2‐DE. After 2‐DE gel scanning, images with a plethora of spot features emerge that are usually contaminated by inherent noise. The objective of the denoising process is to remove noise to the extent that the true spots are recovered correctly and accurately i.e. without introducing distortions leading to the detection of false‐spot features. In this paper we propose and justify the use of the contourlet transform as a tool for 2‐DE gel images denoising. We compare its effectiveness with state‐of‐the‐art methods such as wavelets‐based multiresolution image analysis and spatial filtering. We show that contourlets not only achieve better average S/N performance than wavelets and spatial filters, but also preserve better spot boundaries and faint spots and alter less the intensities of informative spot features, leading to more accurate spot volume estimation and more reliable spot detection, operations that are essential to differential expression proteomics for biomarkers discovery.  相似文献   

7.
This research provides a new way to measure error in microarray data in order to improve gene expression analysis.Microarray data contains many sources of error.In order to glean information about mRNA expression levels,the true signal must first be segregated from noise.This research focuses on the variation that can be captured at the spot level in cDNA microarray images.Variation at other levels,due to differences at the array,dye,and block levels,can be corrected for by a variety of existing normalization procedures.Two signal quality estimates that capture the reliability of each spot printed on a microarray are described.A parametric estimate of within-spot vari ance,referred to here as σ s2pot,assumes that pixels follow a normal distribution and are spatially correlated.A non-parametric estimate of error,called the mean square prediction error(MSPE),assumes that spots of high quality possess pixels that are similar to their neighbors.This paper will provide a framework to use either spot quality measure in downstream analysis,specifically as weights in regression models.Using these spot quality estimates as weights can result in greater efficiency,in a statistical sense,when modeling microarray data.  相似文献   

8.
A new integrated image analysis package with quantitative quality control schemes is described for cDNA microarray technology. The package employs an iterative algorithm that utilizes both intensity characteristics and spatial information of the spots on a microarray image for signal–background segmentation and defines five quality scores for each spot to record irregularities in spot intensity, size and background noise levels. A composite score qcom is defined based on these individual scores to give an overall assessment of spot quality. Using qcom we demonstrate that the inherent variability in intensity ratio measurements is closely correlated with spot quality, namely spots with higher quality give less variable measurements and vice versa. In addition, gauging data by qcom can improve data reliability dramatically and efficiently. We further show that the variability in ratio measurements drops exponentially with increasing qcom and, for the majority of spots at the high quality end, this improvement is mainly due to an improvement in correlation between the two dyes. Based on these studies, we discuss the potential of quantitative quality control for microarray data and the possibility of filtering and normalizing microarray data using a quality metrics-dependent scheme.  相似文献   

9.
The 3D spatial organization of genes and other genetic elements within the nucleus is important for regulating gene expression. Understanding how this spatial organization is established and maintained throughout the life of a cell is key to elucidating the many layers of gene regulation. Quantitative methods for studying nuclear organization will lead to insights into the molecular mechanisms that maintain gene organization as well as serve as diagnostic tools for pathologies caused by loss of nuclear structure. However, biologists currently lack automated and high throughput methods for quantitative and qualitative global analysis of 3D gene organization. In this study, we use confocal microscopy and fluorescence in-situ hybridization (FISH) as a cytogenetic technique to detect and localize the presence of specific DNA sequences in 3D. FISH uses probes that bind to specific targeted locations on the chromosomes, appearing as fluorescent spots in 3D images obtained using fluorescence microscopy. In this article, we propose an automated algorithm for segmentation and detection of 3D FISH spots. The algorithm is divided into two stages: spot segmentation and spot detection. Spot segmentation consists of 3D anisotropic smoothing to reduce the effect of noise, top-hat filtering, and intensity thresholding, followed by 3D region-growing. Spot detection uses a Bayesian classifier with spot features such as volume, average intensity, texture, and contrast to detect and classify the segmented spots as either true or false spots. Quantitative assessment of the proposed algorithm demonstrates improved segmentation and detection accuracy compared to other techniques.  相似文献   

10.
Detecting spatial hot spots in landscape ecology   总被引:2,自引:0,他引:2  
Hot spots are typically locations of abundant phenomena. In ecology, hot spots are often detected with a spatially global threshold, where a value for a given observation is compared with all values in a data set. When spatial relationships are important, spatially local definitions – those that compare the value for a given observation with locations in the vicinity, or the neighbourhood of the observation – provide a more explicit consideration of space. Here we outline spatial methods for hot spot detection: kernel estimation and local measures of spatial autocorrelation. To demonstrate these approaches, hot spots are detected in landscape level data on the magnitude of mountain pine beetle infestations. Using kernel estimators, we explore how selection of the neighbourhood size (τ) and hot spot threshold impact hot spot detection. We found that as τ increases, hot spots are larger and fewer; as the hot spot threshold increases, hot spots become larger and more plentiful and hot spots will reflect coarser scale spatial processes. The impact of spatial neighbourhood definitions on the delineation of hot spots identified with local measures of spatial autocorrelation was also investigated. In general, the larger the spatial neighbourhood used for analysis, the larger the area, or greater the number of areas, identified as hot spots.  相似文献   

11.
Tracy  L  Bergemann 《遗传学报》2010,37(4):265-279
This research provides a new way to measure error in microarray data in order to improve gene expression analysis. Microarray data contains many sources of error. In order to glean information about mRNA expression levels, the true signal must first be segregated from noise. This research focuses on the variation that can be captured at the spot level in cDNA microarray images. Variation at other levels, due to differences at the array, dye, and block levels, can be corrected for by a variety of existing normalization procedures. Two signal quality estimates that capture the reliability of each spot printed on a microarray are described. A parametric estimate of within-spot variance, referred to here as σ2spot, assumes that pixels follow a normal distribution and are spatially correlated. A non-parametric estimate of error, called the mean square prediction error (MSPE), assumes that spots of high quality possess pixels that are similar to their neighbors. This paper will provide a framework to use either spot quality measure in downstream analysis, specifically as weights in regression models. Using these spot quality estimates as weights can result in greater efficiency, in a statistical sense, when modeling microarray data.  相似文献   

12.
MOTIVATION: The accumulation of sequence-related and other biological data for basic research and application purposes invites disaster. It appears very likely that neither traditional thinking nor current technologies (including their foreseeable evolutionary developments) will be able to cope with this ever intensifying situation. RESULTS: We present the detailed theoretical background for applying signal theory, as known from speech recognition and image analysis, to the analysis of biomolecules. The general scheme is as follows: biochemical and biophysical properties of biomolecules are used to model an n-dimensional signal which represents the entire information-bearing biomolecule. Such signals are used to search for biological principles, analogies or similarities between biomolecules. In a series of simple experiments (bacterial DNA, generation of real signals using melting enthalpies, detection filtering by convolution of signals) we have shown that the novel system for comparative analysis of the properties of information-bearing biomolecules works as in theory. SUPPLEMENTARY INFORMATION: http://genome.gbf.de/wavepaper.  相似文献   

13.
We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.  相似文献   

14.
MOTIVATION: High-throughput microarray technologies enable measurements of the expression levels of thousands of genes in parallel. However, microarray printing, hybridization and washing may create substantial variability in the quality of the data. As erroneous measurements may have a drastic impact on the results by disturbing the normalization schemes and by introducing expression patterns that lead to incorrect conclusions, it is crucial to discard low quality observations in the early phases of a microarray experiment. A typical microarray experiment consists of tens of thousands of spots on a microarray, making manual extraction of poor quality spots impossible. Thus, there is a need for a reliable and general microarray spot quality control strategy. RESULTS: We suggest a novel strategy for spot quality control by using Bayesian networks, which contain many appealing properties in the spot quality control context. We illustrate how a non-linear least squares based Gaussian fitting procedure can be used in order to extract features for a spot on a microarray. The features we used in this study are: spot intensity, size of the spot, roundness of the spot, alignment error, background intensity, background noise, and bleeding. We conclude that Bayesian networks are a reliable and useful model for microarray spot quality assessment. SUPPLEMENTARY INFORMATION: http://sigwww.cs.tut.fi/TICSP/SpotQuality/.  相似文献   

15.
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models.  相似文献   

16.
We have studied the detection, by human observers, of suprathreshold bandlimited signals embedded at various locations in non-white, Gaussian filtered noise. Detection models based upon the direct cross-correlation between the signal and the noise image (matched filtering) cannot account for the results of our experiments. Our findings point instead at a detection process occurring at the level of signal decomposition, and jointly determined by: (a) the differential outputs of discrete, bandlimited spatial analyzers selectively responsive to different components of the signal; and (b) variable detection rules adaptively related to such outputs and to the type of signal information available to the observer.  相似文献   

17.
Understanding the interaction between sexual and natural selection within variable environments is crucial to our understanding of evolutionary processes. The handicap principle predicts females will prefer males with exaggerated traits provided those traits are indicators of male quality to ensure direct or indirect female benefits. Spatial variability in ecological factors is expected to alter the balance between sexual and natural selection that defines the evolution of such traits. Male and female blackspotted topminnows (Fundulidae: Fundulus olivaceus) display prominent black dorsolateral spots that are variable in number across its broad range. We investigated variability in spot phenotypes at 117 sites across 13 river systems and asked if the trait was sexually dimorphic and positively correlated with measures of fitness (condition and gonadosomatic index [GSI]). Laboratory and mesocosm experiments assessed female mate choice and predation pressure on spot phenotypes. Environmental and community data collected at sampling locations were used to assess predictive models of spot density at the individual, site, and river system level. Greater number of spots was positively correlated with measures of fitness in males. Males with more spots were preferred by females and suffered greater mortality due to predation. Water clarity (turbidity) was the best predictor of spot density on the drainage scale, indicating that sexual and natural selection for the trait may be mediated by local light environments.  相似文献   

18.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

19.
《Genomics》2020,112(2):1245-1256
Genetic laboratories use custom-commercial targeted next-generation sequencing (tg-NGS) assays to identify disease-causing variants. Although the high coverage achieved with these tests allows for the detection of copy number variants (CNVs), which account for an important proportion of the genetic burden in human diseases, an easy-to-use tool for automatic CNV detection is still lacking. This article presents a new CNV detection tool optimized for tg-NGS data: PattRec. PattRec was evaluated using a wide range of data, and its performance compared with those of other CNV detection tools. The software includes features for selecting optimal controls, discarding polymorphic CNVs prior to analysis, and filtering out deletions based on SNV zygosity, and automatically creates an in-house CNV database. There is no need for high level bioinformatic expertise and users can choose color-coded xlsx output that helps to prioritize potentially pathogenic CNVs. PattRec is presented as a Java based GUI, freely available online: https://github.com/irotero/PattRec.  相似文献   

20.
SUMMARY: GAAS, Gene Array Analyzer Software supports multi-user efficient management and suitable analyses of large amounts of gene expression data across replicated experiments. Its management framework handles input data generated by different technologies. A multi-user environment allows each user to store his/her own data visualization scheme, analysis parameters used, values and formats of the output data. The analysis engine performs: background and spot quality evaluation, data normalization, differential gene expression analyses in single and multiple replica experiments. Results of expression profiles can be interactively navigated through graphical interfaces and stored into output databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号