首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations.  相似文献   

3.
Sharing of microarray data has many advantages for the scientific and biomedical community, and should be advocated by neuroscience journals. The goals of sharing are manifold, and include improving analysis and confidence in results, and facilitating global comparisons between experiments, while at the same time, not penalizing those who share. The sharing of microarray data poses unique challenges relative to more generic data such as DNA sequences. These challenges are surmountable, and various sharing formats are possible. Centralized non-commercial databases are being developed to facilitate this process.  相似文献   

4.
5.
We consider an extension of linear mixed models by assuming a multivariate skew t distribution for the random effects and a multivariate t distribution for the error terms. The proposed model provides flexibility in capturing the effects of skewness and heavy tails simultaneously among continuous longitudinal data. We present an efficient alternating expectation‐conditional maximization (AECM) algorithm for the computation of maximum likelihood estimates of parameters on the basis of two convenient hierarchical formulations. The techniques for the prediction of random effects and intermittent missing values under this model are also investigated. Our methodologies are illustrated through an application to schizophrenia data.  相似文献   

6.
7.
An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.  相似文献   

8.
9.
MOTIVATION: Analysis of gene expression data can provide insights into the time-lagged co-regulation of genes/gene clusters. However, existing methods such as the Event Method and the Edge Detection Method are inefficient as they compare only two genes at a time. More importantly, they neglect some important information due to their scoring criterian. In this paper, we propose an efficient algorithm to identify time-lagged co-regulated gene clusters. The algorithm facilitates localized comparison and processes several genes simultaneously to generate detailed and complete time-lagged information for genes/gene clusters. RESULTS: We experimented with the time-series Yeast gene dataset and compared our algorithm with the Event Method. Our results show that our algorithm is not only efficient, but also delivers more reliable and detailed information on time-lagged co-regulation between genes/gene clusters. AVAILABILITY: The software is available upon request. CONTACT: jiliping@comp.nus.edu.sg SUPPLEMENTARY INFORMATION: Supplementary tables and figures for this paper can be found at http://www.comp.nus.edu.sg/~jiliping/p2.htm.  相似文献   

10.
Recent work has used graphs to modelize expression data from microarray experiments, in view of partitioning the genes into clusters. In this paper, we introduce the use of a decomposition by clique separators. Our aim is to improve the classical clustering methods in two ways: first we want to allow an overlap between clusters, as this seems biologically sound, and second we want to be guided by the structure of the graph to define the number of clusters. We test this approach with a well-known yeast database (Saccharomyces cerevisiae). Our results are good, as the expression profiles of the clusters we find are very coherent. Moreover, we are able to organize into another graph the clusters we find, and order them in a fashion which turns out to respect the chronological order defined by the the sporulation process.  相似文献   

11.
MOTIVATION: Chromosomal copy number changes (aneuploidies) are common in cell populations that undergo multiple cell divisions including yeast strains, cell lines and tumor cells. Identification of aneuploidies is critical in evolutionary studies, where changes in copy number serve an adaptive purpose, as well as in cancer studies, where amplifications and deletions of chromosomal regions have been identified as a major pathogenetic mechanism. Aneuploidies can be studied on whole-genome level using array CGH (a microarray-based method that measures the DNA content), but their presence also affects gene expression. In gene expression microarray analysis, identification of copy number changes is especially important in preventing aberrant biological conclusions based on spurious gene expression correlation or masked phenotypes that arise due to aneuploidies. Previously suggested approaches for aneuploidy detection from microarray data mostly focus on array CGH, address only whole-chromosome or whole-arm copy number changes, and rely on thresholds or other heuristics, making them unsuitable for fully automated general application to gene expression datasets. There is a need for a general and robust method for identification of aneuploidies of any size from both array CGH and gene expression microarray data. RESULTS: We present ChARM (Chromosomal Aberration Region Miner), a robust and accurate expectation-maximization based method for identification of segmental aneuploidies (partial chromosome changes) from gene expression and array CGH microarray data. Systematic evaluation of the algorithm on synthetic and biological data shows that the method is robust to noise, aneuploidal segment size and P-value cutoff. Using our approach, we identify known chromosomal changes and predict novel potential segmental aneuploidies in commonly used yeast deletion strains and in breast cancer. ChARM can be routinely used to identify aneuploidies in array CGH datasets and to screen gene expression data for aneuploidies or array biases. Our methodology is sensitive enough to detect statistically significant and biologically relevant aneuploidies even when expression or DNA content changes are subtle as in mixed populations of cells. AVAILABILITY: Code available by request from the authors and on Web supplement at http://function.cs.princeton.edu/ChARM/  相似文献   

12.
MOTIVATION: Temporal gene expression profiles provide an important characterization of gene function, as biological systems are predominantly developmental and dynamic. We propose a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizations of a stochastic process. The method uses a recently developed functional logistic regression tool based on functional principal components, aimed at classifying gene expression curves into known gene groups. The number of eigenfunctions in the classifier can be chosen by leave-one-out cross-validation with the aim of minimizing the classification error. RESULTS: We demonstrate that this methodology provides low-error-rate classification for both yeast cell-cycle gene expression profiles and Dictyostelium cell-type specific gene expression patterns. It also works well in simulations. We compare our functional principal components approach with a B-spline implementation of functional discriminant analysis for the yeast cell-cycle data and simulations. This indicates comparative advantages of our approach which uses fewer eigenfunctions/base functions. The proposed methodology is promising for the analysis of temporal gene expression data and beyond. AVAILABILITY: MATLAB programs are available upon request.  相似文献   

13.
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.  相似文献   

14.
J-Express is a Java application that allows the user to analyze gene expression (microarray) data in a flexible way giving access to multidimensional scaling, clustering, and visualization methods in an integrated manner. Specifically, J-Express includes implementations of hierarchical clustering, k-means, principal component analysis, and self-organizing maps. At present, it does not include methods for comparing two or more experiments for differentially expressed genes. The application is completely portable and requires only that a Java runtime environment 1.2 is installed on the system. Its efficiency allows interactive clustering of thousands of expression profiles on standard personal computers.  相似文献   

15.
16.
SUMMARY: GAAS, Gene Array Analyzer Software supports multi-user efficient management and suitable analyses of large amounts of gene expression data across replicated experiments. Its management framework handles input data generated by different technologies. A multi-user environment allows each user to store his/her own data visualization scheme, analysis parameters used, values and formats of the output data. The analysis engine performs: background and spot quality evaluation, data normalization, differential gene expression analyses in single and multiple replica experiments. Results of expression profiles can be interactively navigated through graphical interfaces and stored into output databases.  相似文献   

17.
Analysis of gene expression data using self-organizing maps.   总被引:29,自引:0,他引:29  
DNA microarray technologies together with rapidly increasing genomic sequence information is leading to an explosion in available gene expression data. Currently there is a great need for efficient methods to analyze and visualize these massive data sets. A self-organizing map (SOM) is an unsupervised neural network learning algorithm which has been successfully used for the analysis and organization of large data files. We have here applied the SOM algorithm to analyze published data of yeast gene expression and show that SOM is an excellent tool for the analysis and visualization of gene expression profiles.  相似文献   

18.
The family Hydrocharitaceae, with 15 genera and ca. 80 species, shows a remarkable morphological diversity which presumably developed as an adaptation to their aquatic habitat. This is particularly true in the case of the many different kinds of pollination mechanisms. To gather more basic information regarding the adaptive evolution of Hydrocharitaceae, we have carried out a phylogenetic analysis based on the sequences of therbcL andmatK. Our resulting neighbor-joining distance tree provides the following insights: (1) none of the previous classification systems were supported by molecular phylogenetic tree; (2) Najas (Najadaceae), which has never been included in Hydrocharitaceae except in Shaffer-Fehre's (1991) system based on seed coat structures, is an ingroup of Hydrocharitaceae; (3) Limnocharitaceae and Alismataceae are sister groups of Hydrocharitaceae; (4) the three marine genera,Halophila, Enhalus andThalassia, are monophyletic; and (5) a peculiar pollination mechanism specific to Hydrocharitaceae (Hydrocharitaceae-epihydrophily), underwent a parallel evolution.  相似文献   

19.
Detailed studies of individual genes have shown that gene expression divergence often results from adaptive evolution of regulatory sequence. Genome-wide analyses, however, have yet to unite patterns of gene expression with polymorphism and divergence to infer population genetic mechanisms underlying expression evolution. Here, we combined genomic expression data—analyzed in a phylogenetic context—with whole genome light-shotgun sequence data from six Drosophila simulans lines and reference sequences from D. melanogaster and D. yakuba. These data allowed us to use molecular population genetics to test for neutral versus adaptive gene expression divergence on a genomic scale. We identified recent and recurrent adaptive evolution along the D. simulans lineage by contrasting sequence polymorphism within D. simulans to divergence from D. melanogaster and D. yakuba. Genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3′ flanking and amino acid sequence. Concomitantly, these genes are also decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. Interestingly, adaptive evolution in 5′ cis-regulatory regions did not correspond strongly with expression evolution. Our results provide a genomic view of the intimate link between selection acting on a phenotype and associated genic evolution.  相似文献   

20.
Quantitative trait loci (QTLs), as determined in crossbred studies, are a valuable resource to identify genes responsible for the corresponding phenotypic variances. Due to their broad chromosomal extension of some dozens of megabases, further steps are necessary to bring the number of candidate genes that underlie the detected effects to a reasonable order of magnitude. We use a set of 13,370 SNPs to identify informative haplotype blocks in 22 mouse QTLs for fatness. About half of the genes in a typical QTL overlap with haplotype blocks, which are different for the two base mouse lines, and which, thus, qualify for further analysis. For these genes we collect four more pieces of evidence for association with fat accumulation, namely (1) homology to genes identified in a Caenorhabditis elegans knock-out experiment as fat decreasing or fat increasing, (2) the overexpression of the genes in mouse fat, liver, muscle, or hypothalamus tissues, (3) the occurrence of a gene in several independently found QTLs, and (4) the information provided by gene ontology, to achieve a ranked list of 131 candidate genes. Ten genes fulfill three or four of the above sketched criteria and are discussed briefly, 121 further genes fulfilling two criteria are provided as on-line material. Viewing the genomic region of fatness-related QTLs under several different aspects is appropriate to assess the many thousands of genes that reside in such QTLs and to produce lists of more robust candidate genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号