共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The power of structured exploratory data analysis (SEDA) to discriminate among major genic, polygenic, and nongenetic determination of phenotypes was investigated using computer simulation. Three classes of SEDA indices (the major gene index, the offspring between parents function, and the midparent-child correlation coefficient) were evaluated. These three statistics, in combination, were reasonably sensitive in detecting the presence of a major locus and in discriminating between phenotypes with genetic effects and those with no genetic component. However, they were unable to discriminate between major genic and polygenically determined phenotypic models. 相似文献
3.
There are many data mining techniques for processing and general learning of multivariate data. However, we believe the wavelet transformation and latent variable projection method are particularly useful for spectroscopic and chromatographic data. Projection based methods are designed to handle hugely multivariate nature of such data effectively. For the actual analysis of the data we have used latent variable projection methods such as principal component analysis (PCA) and partial least squares projection to latent structures based discriminant analysis (PLS-DA) to analyze the raw data presented to the participants of the First Duke Proteomics Data Mining Conference. PCA was used to solve problem #1 (clustering problem) and the PLS-DA was used to solve problem #2 (classification problem). The idea of internal and external cross-validation was used to validate the model obtained from the classification analysis. The simple two-component PLS-DA model obtained from the analysis performed well. The model has completely separated the two groups from all the data. The same model applied on two-thirds of the data showed good performance by external validation with independent test set of remaining 13 specimens obtained by setting aside the spectra of every third specimen (accuracy of 85%). 相似文献
4.
Statistical practice in high-throughput screening data analysis 总被引:1,自引:0,他引:1
High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process. 相似文献
5.
Chalini D. Wijetunge Zhaoping Li Isaam Saeed Jairus Bowne Arthur L. Hsu Ute Roessner Antony Bacic Saman K. Halgamuge 《Metabolomics : Official journal of the Metabolomic Society》2013,9(6):1311-1320
In order to make sense of the sheer volume of metabolomic data that can be generated using current technology, robust data analysis tools are essential. We propose the use of the growing self-organizing map (GSOM) algorithm and by doing so demonstrate that a deeper analysis of metabolomics data is possible in comparison to the widely used batch-learning self-organizing map, hierarchical cluster analysis and partitioning around medoids algorithms on simulated and real-world time-course metabolomic datasets. We then applied GSOM to a recently published dataset representing metabolome response patterns of three wheat cultivars subject to a field simulated cyclic drought stress. This novel and information rich analysis provided by the proposed GSOM framework can be easily extended to other high-throughput metabolomics studies. 相似文献
6.
Approximate geodesic distances reveal biologically relevant structures in microarray data 总被引:1,自引:0,他引:1
MOTIVATION: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account. 相似文献
7.
Max Bylesjö Daniel Eriksson Andreas Sjödin Stefan Jansson Thomas Moritz Johan Trygg 《BMC bioinformatics》2007,8(1):207
Background
During generation of microarray data, various forms of systematic biases are frequently introduced which limits accuracy and precision of the results. In order to properly estimate biological effects, these biases must be identified and discarded. 相似文献8.
Background
With the rapid advancement of array-based genotyping techniques, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with common complex diseases. However, it has been shown that only a small proportion of the genetic etiology of complex diseases could be explained by the genetic factors identified from GWAS. This missing heritability could possibly be explained by gene-gene interaction (epistasis) and rare variants. There has been an exponential growth of gene-gene interaction analysis for common variants in terms of methodological developments and practical applications. Also, the recent advancement of high-throughput sequencing technologies makes it possible to conduct rare variant analysis. However, little progress has been made in gene-gene interaction analysis for rare variants.Results
Here, we propose GxGrare which is a new gene-gene interaction method for the rare variants in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of three steps; 1) collapsing the rare variants, 2) MDR analysis for the collapsed rare variants, and 3) detect top candidate interaction pairs. GxGrare can be used for the detection of not only gene-gene interactions, but also interactions within a single gene. The proposed method is illustrated with 1080 whole exome sequencing data of the Korean population in order to identify causal gene-gene interaction for rare variants for type 2 diabetes.Conclusion
The proposed GxGrare performs well for gene-gene interaction detection with collapsing of rare variants. GxGrare is available at http://bibs.snu.ac.kr/software/gxgrare which contains simulation data and documentation. Supported operating systems include Linux and OS X.9.
Larissa A. Munishkina 《生物化学与生物物理学报:生物膜》2007,1768(8):1862-1885
Amyloidogenesis is a characteristic feature of the 40 or so known protein deposition diseases, and accumulating evidence strongly suggests that self-association of misfolded proteins into either fibrils, protofibrils, or soluble oligomeric species is cytotoxic. The most likely mechanism for toxicity is through perturbation of membrane structure, leading to increased membrane permeability and eventual cell death. There have been a rather limited number of investigations of the interactions of amyloidogenic polypeptides and their aggregated states with membranes; these are briefly reviewed here. Amyloidogenic proteins discussed include A-beta from Alzheimer's disease, the prion protein, α-synuclein from Parkinson's disease, transthyretin (FAP, SSA amyloidosis), immunoglobulin light chains (primary (AL) amyloidosis), serum amyloid A (secondary (AA) amyloidosis), amylin or IAPP (Type 2 diabetes) and apolipoproteins. This review highlights the significant role played by fluorescence techniques in unraveling the nature of amyloid fibrils and their interactions and effects on membranes. Fluorescence spectroscopy is a valuable and versatile method for studying the complex mechanisms of protein aggregation, amyloid fibril formation and the interactions of amyloidogenic proteins with membranes. Commonly used fluorescent techniques include intrinsic and extrinsic fluorophores, fluorescent probes incorporated in the membrane, steady-state and lifetime measurements of fluorescence emission, fluorescence correlation spectroscopy, fluorescence anisotropy and polarization, fluorescence resonance energy transfer (FRET), fluorescence quenching, and fluorescence microscopy. 相似文献
10.
Analysing microarray data using modular regulation analysis 总被引:3,自引:0,他引:3
MOTIVATION: Microarray experiments measure complex changes in the abundance of many mRNAs under different conditions. Current analysis methods cannot distinguish between direct and indirect effects on expression, or calculate the relative importance of mRNAs in effecting responses. RESULTS: Application of modular regulation analysis to microarray data reveals and quantifies which mRNA changes are important for cellular responses. The mRNAs are clustered, and then we calculate how perturbations alter each cluster and how strongly those clusters affect an output response. The product of these values quantifies how an input changes a response through each cluster. Two published datasets are analysed. Two mRNA clusters transmit most of the response of yeast doubling time to galactose; one contains mainly galactose metabolic genes, and the other a regulatory gene. Analysis of the response of yeast relative fitness to 2-deoxy-D-glucose reveals that control is distributed between several mRNA clusters, but experimental error limits statistical significance. 相似文献
11.
Amyloidogenesis is a characteristic feature of the 40 or so known protein deposition diseases, and accumulating evidence strongly suggests that self-association of misfolded proteins into either fibrils, protofibrils, or soluble oligomeric species is cytotoxic. The most likely mechanism for toxicity is through perturbation of membrane structure, leading to increased membrane permeability and eventual cell death. There have been a rather limited number of investigations of the interactions of amyloidogenic polypeptides and their aggregated states with membranes; these are briefly reviewed here. Amyloidogenic proteins discussed include A-beta from Alzheimer's disease, the prion protein, alpha-synuclein from Parkinson's disease, transthyretin (FAP, SSA amyloidosis), immunoglobulin light chains (primary (AL) amyloidosis), serum amyloid A (secondary (AA) amyloidosis), amylin or IAPP (Type 2 diabetes) and apolipoproteins. This review highlights the significant role played by fluorescence techniques in unraveling the nature of amyloid fibrils and their interactions and effects on membranes. Fluorescence spectroscopy is a valuable and versatile method for studying the complex mechanisms of protein aggregation, amyloid fibril formation and the interactions of amyloidogenic proteins with membranes. Commonly used fluorescent techniques include intrinsic and extrinsic fluorophores, fluorescent probes incorporated in the membrane, steady-state and lifetime measurements of fluorescence emission, fluorescence correlation spectroscopy, fluorescence anisotropy and polarization, fluorescence resonance energy transfer (FRET), fluorescence quenching, and fluorescence microscopy. 相似文献
12.
13.
Multivariate exploratory tools for microarray data analysis 总被引:2,自引:0,他引:2
Szabo A Boucher K Jones D Tsodikov AD Klebanov LB Yakovlev AY 《Biostatistics (Oxford, England)》2003,4(4):555-567
The ultimate success of microarray technology in basic and applied biological sciences depends critically on the development of statistical methods for gene expression data analysis. The most widely used tests for differential expression of genes are essentially univariate. Such tests disregard the multidimensional structure of microarray data. Multivariate methods are needed to utilize the information hidden in gene interactions and hence to provide more powerful and biologically meaningful methods for finding subsets of differentially expressed genes. The objective of this paper is to develop methods of multidimensional search for biologically significant genes, considering expression signals as mutually dependent random variables. To attain these ends, we consider the utility of a pertinent distance between random vectors and its empirical counterpart constructed from gene expression data. The distance furnishes exploratory procedures aimed at finding a target subset of differentially expressed genes. To determine the size of the target subset, we resort to successive elimination of smaller subsets resulting from each step of a random search algorithm based on maximization of the proposed distance. Different stopping rules associated with this procedure are evaluated. The usefulness of the proposed approach is illustrated with an application to the analysis of two sets of gene expression data. 相似文献
14.
15.
Boscolo R Liao JC Roychowdhury VP 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(1):15-24
In this article, we introduce an exploratory framework for learning patterns of conditional co-expression in gene expression data. The main idea behind the proposed approach consists of estimating how the information content shared by a set of M nodes in a network (where each node is associated to an expression profile) varies upon conditioning on a set of L conditioning variables (in the simplest case represented by a separate set of expression profiles). The method is non-parametric and it is based on the concept of statistical co-information, which, unlike conventional correlation based techniques, is not restricted in scope to linear conditional dependency patterns. Moreover, such conditional co-expression relationships can potentially indicate regulatory interactions that do not manifest themselves when only pair-wise relationships are considered. A moment based approximation of the co-information measure is derived that efficiently gets around the problem of estimating high-dimensional multi-variate probability density functions from the data, a task usually not viable due to the intrinsic sample size limitations that characterize expression level measurements. By applying the proposed exploratory method, we analyzed a whole genome microarray assay of the eukaryote Saccharomices cerevisiae and were able to learn statistically significant patterns of conditional co-expression. A selection of such interactions that carry a meaningful biological interpretation are discussed. 相似文献
16.
Andrea C Pfeifer Daniel Kaschek Julie Bachmann Ursula Klingmüller Jens Timmer 《BMC systems biology》2010,4(1):106
Background
High-quality quantitative data is a major limitation in systems biology. The experimental data used in systems biology can be assigned to one of the following categories: assays yielding average data of a cell population, high-content single cell measurements and high-throughput techniques generating single cell data for large cell populations. For modeling purposes, a combination of data from different categories is highly desirable in order to increase the number of observable species and processes and thereby maximize the identifiability of parameters. 相似文献17.
18.
Kamentsky L Jones TR Fraser A Bray MA Logan DJ Madden KL Ljosa V Rueden C Eliceiri KW Carpenter AE 《Bioinformatics (Oxford, England)》2011,27(8):1179-1180
There is a strong and growing need in the biology research community for accurate, automated image analysis. Here, we describe CellProfiler 2.0, which has been engineered to meet the needs of its growing user base. It is more robust and user friendly, with new algorithms and features to facilitate high-throughput work. ImageJ plugins can now be run within a CellProfiler pipeline. AVAILABILITY AND IMPLEMENTATION: CellProfiler 2.0 is free and open source, available at http://www.cellprofiler.org under the GPL v. 2 license. It is available as a packaged application for Macintosh OS X and Microsoft Windows and can be compiled for Linux. CONTACT: anne@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
19.
A method for high-throughput gene expression signature analysis 总被引:1,自引:2,他引:1
20.
The last decade has witnessed a remarkable increase in the number of mutations identified both in human disease-related genes and mutation reporter genes including those in mammalian cells and transgenic animals. This has led to the curation of a number of computerised databases, which make mutation data freely available for analysis. A primary interest of both the clinical researcher and the genetic toxicologist is determination of location and types of mutation within a gene of interest. Collections of mutation data observed for a disease-related gene or, for a gene exposed to a particular chemical, permits discovery of regions of sequence along the gene prone to mutagenesis and may provide clues to the origin of a mutation. The principal tool for visualising the distribution pattern of mutant data along a gene is the mutation spectrum: the distribution and frequency of mutations along a nucleotide sequence. In genetic toxicology, the current wealth of mutation data available allows us to construct many mutation spectra of interest to investigate the mutagenic mechanisms and mutational sites for one or a group of mutagens. Using the multivariate statistical methods principal components analysis (PCA) and cluster analysis (CA) we have tested the ability of these methods to establish the underlying patterns within and between 60 UV-induced, mitomycin C-induced and spontaneous mutations in the supF gene. The spectra were derived from human, monkey and mouse cells including both repair efficient and repair deficient cell lines. We demonstrate and support the successful application of multivariate statistical methods for exploring large sets of mutation spectra to reveal underlying patterns, groupings and similarities. The methods clearly demonstrate how different patterns of spontaneous and UV-induced supF mutation spectra can result from variation in plasmid, culture medium, species origin of cell line and whether mutations arose in vivo or in vitro. 相似文献