首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Automatic registration of microarray images. II. Hexagonal grid   总被引:3,自引:0,他引:3  
MOTIVATION: In the first part of this paper the author presented an efficient, robust and completely automated algorithm for spot and block indexing in microarray images with rectangular grids. Although the rectangular grid is currently the most common type of grouping the probes on microarray slides, there is another microarray technology based on bundles of optical fibers where the probes are packed in hexagonal grids. The hexagonal grid provides both advantages and drawbacks over the standard rectangular packing and of course requires adaptation and/or modification of the algorithm of spot indexing presented in the first part of the paper. RESULTS: In the second part of the paper the author presents a version of the spot indexing algorithm adapted for microarray images with spots packed in hexagonal structures. The algorithm is completely automated, works with hexagonal grids of different types and with different parameters of grid spacing and rotation as well as spot sizes. It can successfully trace the local and global distortions of the grid, including non-orthogonal transformations. Similar to the algorithm from part I, it scales linearly with the grid size, the time complexity is O(M), where M is total number of grid points in hexagonal grid. The algorithm has been tested both on CCD and scanned images with spot expression rates as low as 2%. The processing time of an image with about 50 000 hex grid points was less than a second. For images with high expression rates ( approximately 90%) the registration time is even smaller, around a quarter of a second. Supplementary information: http://fleece.ucsd.edu/~vit/Registration_Supplement.pdf  相似文献   

2.
MOTIVATION: In this paper, we propose a fully automatic block and spot indexing algorithm for microarray image analysis. A microarray is a device which enables a parallel experiment of ten to hundreds of thousands of test genes in order to measure gene expression. Due to this huge size of experimental data, automated image analysis is gaining importance in microarray image processing systems. Currently, most of the automated microarray image processing systems require manual block indexing and, in some cases, spot indexing. If the microarray image is large and contains a lot of noise, it is very troublesome work. In this paper, we show it is possible to locate the addresses of blocks and spots by applying the Nearest Neighbors Graph Model. Also, we propose an analytic model for the feasibility of block addressing. Our analytic model is validated by a large body of experimental results. RESULTS: We demonstrate the features of automatic block detection, automatic spot addressing, and correction of the distortion and skewedness of each microarray image.  相似文献   

3.
An improved algorithm for clustering gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. RESULTS: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.  相似文献   

4.
5.
MOTIVATION: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data. RESULTS: In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as K(max) in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning. AVAILABILITY: Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu.  相似文献   

6.
From its conception, bioinformatics has been a multidisciplinary field which blends domain expert knowledge with new and existing processing techniques, all of which are focused on a common goal. Typically, these techniques have focused on the direct analysis of raw microarray image data. Unfortunately, this fails to utilise the image's full potential and in practice, this results in the lab technician having to guide the analysis algorithms. This paper presents a dynamic framework that aims to automate the process of microarray image analysis using a variety of techniques. An overview of the entire framework process is presented, the robustness of which is challenged throughout with a selection of real examples containing varying degrees of noise. The results show the potential of the proposed framework in its ability to determine slide layout accurately and perform analysis without prior structural knowledge. The algorithm achieves approximately, a 1 to 3 dB improved peak signal-to-noise ratio compared to conventional processing techniques like those implemented in GenePix when used by a trained operator. As far as the authors are aware, this is the first time such a comprehensive framework concept has been directly applied to the area of microarray image analysis.  相似文献   

7.
We have implemented a Fast Fourier Summation algorithm for tomographic reconstruction of three-dimensional biological data sets obtained via transmission electron microscopy. We designed the fast algorithm to reproduce results obtained by the direct summation algorithm (also known as filtered or R-weighted backprojection). For two-dimensional images, the new algorithm scales as O(N(theta)M log M)+O(MN log N) operations, where N(theta) is the number of projection angles and M x N is the size of the reconstructed image. Three-dimensional reconstructions are constructed from sequences of two-dimensional reconstructions. We demonstrate the algorithm on real data sets. For typical sizes of data sets, the new algorithm is 1.5-2.5 times faster than using direct summation in the space domain. The speed advantage is even greater as the size of the data sets grows. The new algorithm allows us to use higher order spline interpolation of the data without additional computational cost. The algorithm has been incorporated into a commonly used package for tomographic reconstruction.  相似文献   

8.
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.Key Words: Clustering, genomics, profiling, microarray, validation index.  相似文献   

9.
Many image analysis systems are available for processing the images produced by laser scanning of DNA microarrays. The image processing system takes pixel-level intensity data and converts it to a set of gene-level expression or copy number summaries that will be used in further analyses. Image analysis systems currently in use differ with regard to the specific algorithms they implement, ease of use, and cost. Thus, it would be desirable to have an objective means of comparing systems. Here we describe a systematic method of comparing image processing results produced by different image analysis systems using a series of replicate microarray experiments. We demonstrate the method with a comparison of cDNA microarray data generated by the UCSF Spot and the GenePix image processing systems.  相似文献   

10.
Currently used joint-surface models require the measurements to be structured according to a grid. With the currently available tracking devices a large quantity of unstructured surface points can be measured in a relatively short time. In this paper a method is presented to fit polynomial functions to three-dimensional unstructured data points. To test the method spherical, cylindrical, parabolic, hyperbolic, exponential, logarithmic, and sellar surfaces with different undulations were used. The resulting polynomials were compared with the original shapes. The results show that even complex joint surfaces can be modelled with polynomial functions. In addition, the influence of noise and the number of data points was also analyzed. From a surface (diam: 20 mm) which is measured with a precision of 0.2 mm a model can be constructed with a precision of 0.02 mm.  相似文献   

11.
We describe PerlMAT, a Perl microarray toolkit providing easy to use object-oriented methods for the simplified manipulation, management and analysis of microarray data. The toolkit provides objects for the encapsulation of microarray spots and reporters, several common microarray data file formats and GAL files. In addition, an analysis object provides methods for data processing, and an image object enables the visualisation of microarray data. This important addition to the Perl developer's library will facilitate more widespread use of Perl for microarray application development within the bioinformatics community. The coherent interface and well-documented code enables rapid analysis by even inexperienced Perl developers. AVAILABILITY: Software is available at http://sourceforge.net/projects/perlmat  相似文献   

12.
Linden R  Bhaya A 《Bio Systems》2007,88(1-2):76-91
This paper develops an algorithm that extracts explanatory rules from microarray data, which we treat as time series, using genetic programming (GP) and fuzzy logic. Reverse polish notation is used (RPN) to describe the rules and to facilitate the GP approach. The algorithm also allows for the insertion of prior knowledge, making it possible to find sets of rules that include the relationships between genes already known. The algorithm proposed is applied to problems arising in the construction of gene regulatory networks, using two different sets of real data from biological experiments on the Arabidopsis thaliana cold response and the rat central nervous system, respectively. The results show that the proposed technique can fit data to a pre-defined precision even in situations where the data set has thousands of features but only a limited number of points in time are available, a situation in which traditional statistical alternatives encounter difficulties, due to the scarcity of time points.  相似文献   

13.
14.
MOTIVATION: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. RESULTS: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). AVAILABILITY: JAVA software of dynamic SOM tree algorithm is available upon request for academic use. SUPPLEMENTARY INFORMATION: A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf  相似文献   

15.
[目的]为了定量评估薇甘菊柄锈菌对薇甘菊的防控效果,研发一种基于图像识别技术的高效、准确的薇甘菊叶片相对病斑面积的计算方法.[方法]利用图像识别、网格法、复印称重法3种相对面积的计算方法,分别计算薇甘菊感染柄锈菌后的相对病斑面积,并结合以手动分割的结果作为标准,计算各方法的绝对准确率和绝对误差并作为评价指标,最终对3种...  相似文献   

16.
Bayesian hierarchical error model for analysis of gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Analysis of genome-wide microarray data requires the estimation of a large number of genetic parameters for individual genes and their interaction expression patterns under multiple biological conditions. The sources of microarray error variability comprises various biological and experimental factors, such as biological and individual replication, sample preparation, hybridization and image processing. Moreover, the same gene often shows quite heterogeneous error variability under different biological and experimental conditions, which must be estimated separately for evaluating the statistical significance of differential expression patterns. Widely used linear modeling approaches are limited because they do not allow simultaneous modeling and inference on the large number of these genetic parameters and heterogeneous error components on different genes, different biological and experimental conditions, and varying intensity ranges in microarray data. RESULTS: We propose a Bayesian hierarchical error model (HEM) to overcome the above restrictions. HEM accounts for heterogeneous error variability in an oligonucleotide microarray experiment. The error variability is decomposed into two components (experimental and biological errors) when both biological and experimental replicates are available. Our HEM inference is based on Markov chain Monte Carlo to estimate a large number of parameters from a single-likelihood function for all genes. An F-like summary statistic is proposed to identify differentially expressed genes under multiple conditions based on the HEM estimation. The performance of HEM and its F-like statistic was examined with simulated data and two published microarray datasets-primate brain data and mouse B-cell development data. HEM was also compared with ANOVA using simulated data. AVAILABILITY: The software for the HEM is available from the authors upon request.  相似文献   

17.
Improving gene quantification by adjustable spot-image restoration   总被引:1,自引:0,他引:1  
MOTIVATION: One of the major factors that complicate the task of microarray image analysis is that microarray images are distorted by various types of noise. In this study a robust framework is proposed, designed to take into account the effect of noise in microarray images in order to assist the demanding task of microarray image analysis. The proposed framework, incorporates in the microarray image processing pipeline a novel combination of spot adjustable image analysis and processing techniques and consists of the following stages: (1) gridding for facilitating spot identification, (2) clustering (unsupervised discrimination between spot and background pixels) applied to spot image for automatic local noise assessment, (3) modeling of local image restoration process for spot image conditioning (adjustable wiener restoration using an empirically determined degradation function), (4) automatic spot segmentation employing seeded-region-growing, (5) intensity extraction and (6) assessment of the reproducibility (real data) and the validity (simulated data) of the extracted gene expression levels. RESULTS: Both simulated and real microarray images were employed in order to assess the performance of the proposed framework against well-established methods implemented in publicly available software packages (Scanalyze and SPOT). Regarding simulated images, the novel combination of techniques, introduced in the proposed framework, rendered the detection of spot areas and the extraction of spot intensities more accurate. Furthermore, on real images the proposed framework proved of better stability across replicates. Results indicate that the proposed framework improves spots' segmentation and, consequently, quantification of gene expression levels. AVAILABILITY: All algorithms were implemented in Matlab (The Mathworks, Inc., Natick, MA, USA) environment. The codes that implement microarray gridding, adaptive spot restoration and segmentation/intensity extraction are available upon request. Supplementary results and the simulated microarray images used in this study are available for download from: ftp://users:bioinformatics@mipa.med.upatras.gr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

18.
Ho SY  Hsieh CH  Chen HM  Huang HL 《Bio Systems》2006,85(3):165-176
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers.  相似文献   

19.
MOTIVATION: Although numerous algorithms have been developed for microarray segmentation, extensive comparisons between the algorithms have acquired far less attention. In this study, we evaluate the performance of nine microarray segmentation algorithms. Using both simulated and real microarray experiments, we overcome the challenges in performance evaluation, arising from the lack of ground-truth information. The usage of simulated experiments allows us to analyze the segmentation accuracy on a single pixel level as is commonly done in traditional image processing studies. With real experiments, we indirectly measure the segmentation performance, identify significant differences between the algorithms, and study the characteristics of the resulting gene expression data. RESULTS: Overall, our results show clear differences between the algorithms. The results demonstrate how the segmentation performance depends on the image quality, which algorithms operate on significantly different performance levels, and how the selection of a segmentation algorithm affects the identification of differentially expressed genes. AVAILABILITY: Supplementary results and the microarray images used in this study are available at the companion web site http://www.cs.tut.fi/sgn/csb/spotseg/  相似文献   

20.
In construction of smart city, numerous vehicles’ trajectory data are produced by Global Positioning System (GPS) to track their real time location. When these GPS data are processed by map matching, results can be used to support a large number of ITS applications such as real time road condition calculation, inspection of traffic event and emergency treatment. However, as the fast explosive growth of monitored vehicle number, massive GPS data proposes overwhelming challenges for map matching. Consequently, traditional map matching algorithms can hardly satisfy high demands for matching speed and accuracy. Therefore, a real time map matching algorithm for numerous GPS data is proposed to guarantee high matching accuracy and matching efficiency. Meanwhile, it can meet demands of GPS data processing required by the monitor of numerous vehicles within the city. Main contributions of the method are: (1) A Kalman filter based correcting algorithm is proposed to improve the matching accuracy of the traditional topological algorithm on the complicated road sections such as intersections and parallel roads. (2) Based on the Spark streaming framework, the serial map-matching algorithm is converted into a parallelized map-matching algorithm, which significantly improves the processing efficiency of the map matching. (3) A gridding method being applicable to the parallelized algorithm was proposed by the paper. The GPS data in the same grid were allocated to the same computing unit to improve the efficiency of the parallelized computation. Experimental results show that the matching accuracy of the algorithm demonstrated by the paper is increased by 10%; the matching efficiency is 25% higher than same amount of stand-alone computers. A cluster of 15 computers that operates the proposed algorithm is capable for the real time map matching for GPS data produced by 800 thousand vehicles, which can effectively and extensively support the lastingly increased demand for processing numerous GPS data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号