首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The determination of a list of differentially expressed genes is a basic objective in many cDNA microarray experiments. We present a statistical approach that allows direct control over the percentage of false positives in such a list and, under certain reasonable assumptions, improves on existing methods with respect to the percentage of false negatives. The method accommodates a wide variety of experimental designs and can simultaneously assess significant differences between multiple types of biological samples. Two interconnected mixed linear models are central to the method and provide a flexible means to properly account for variability both across and within genes. The mixed model also provides a convenient framework for evaluating the statistical power of any particular experimental design and thus enables a researcher to a priori select an appropriate number of replicates. We also suggest some basic graphics for visualizing lists of significant genes. Analyses of published experiments studying human cancer and yeast cells illustrate the results.  相似文献   

3.

Background  

A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined.  相似文献   

4.
5.
Quantitative high-throughput mass spectrometry has become an established tool to measure relative gene expression proteome-wide. The output of such an experiment usually consists of a list of expression ratios (fold changes) for several thousand proteins between two conditions. However, we observed that individual peptide fold changes may show a significantly different behavior than other peptides from the same protein and that these differences cannot be explained by imprecise measurements. Such outlier peptides can be the consequence of several technical (misidentifications, misquantifications) or biological (post-translational modifications, differential regulation of isoforms) reasons. We developed a method to detect outlier peptides in mass spectrometry data which is able to delineate imprecise measurements from real outlier peptides with high accuracy when the true difference is as small as 1.4 fold. We applied our method to experimental data and investigated the different technical and biological effects that result in outlier peptides. Our method will assist future research to reduce technical bias and can help to identify genes with differentially regulated protein isoforms in high throughput mass spectrometry data.  相似文献   

6.
Peak lists are commonly used in NMR as input data for various software tools such as automatic assignment and structure calculation programs. Inconsistencies of chemical shift referencing among different peak lists or between peak and chemical shift lists can cause severe problems during peak assignment. Here we present a simple and robust tool to achieve self-consistency of the chemical shift referencing among a set of peak lists. The Peakmatch algorithm matches a set of peak lists to a specified reference peak list, neither of which have to be assigned. The chemical shift referencing offset between two peak lists is determined by optimizing an assignment-free match score function using either a complete grid search or downhill simplex optimization. It is shown that peak lists from many different types of spectra can be matched reliably as long as they contain at least two corresponding dimensions. Using a simulated peak list, the Peakmatch algorithm can also be used to obtain the optimal agreement between a chemical shift list and experimental peak lists. Combining these features makes Peakmatch a useful tool that can be applied routinely before automatic assignment or structure calculation in order to obtain an optimized input data set.  相似文献   

7.
8.
We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations.  相似文献   

9.
SUMMARY: OrderedList is a Bioconductor compliant package for meta-analysis based on ordered gene lists like those resulting from differential gene expression analysis. Our package quantifies the similarity between gene lists. The significance of the similarity score is estimated from random scores computed on perturbed data. OrderedList illustrates list similarity in intuitive plots and determines the score-driving genes for further analysis. AVAILABILITY: http://www.bioconductor.org CONTACT: claudio.lottaz@molgen.mpg.de SUPPLEMENTARY INFORMATION: Please visit our webpage on http://compdiag.molgen.mpg.de/software.  相似文献   

10.
MOTIVATION: Many applications of microarray technology in clinical cancer studies aim at detecting molecular features for refined diagnosis. In this paper, we follow an opposite rationale: we try to identify common molecular features shared by phenotypically distinct types of cancer using a meta-analysis of several microarray studies. We present a novel algorithm to uncover that two lists of differentially expressed genes are similar, even if these similarities are not apparent to the eye. The method is based on the ordering in the lists. RESULTS: In a meta-analysis of five clinical microarray studies we were able to detect significant similarities in five of the ten possible comparisons of ordered gene lists. We included studies, where not a single gene can be significantly associated to outcome. The detection of significant similarities of gene lists from different microarray studies is a novel and promising approach. It has the potential to improve upon specialized cancer studies by exploring the power of several studies in one single analysis. Our method is complementary to previous methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent ones across multiple studies.  相似文献   

11.
Ma J  Zhang X  Ung CY  Chen YZ  Li B 《Molecular bioSystems》2012,8(4):1179-1186
Interest in essential genes has arisen recently given their importance in antimicrobial drug development. Although knockouts of essential genes are commonly known to cause lethal phenotypes, there is insufficient understanding on the intermediate changes followed by genetic perturbation and to what extent essential genes correlate to other genes. Here, we characterized the gene knockout effects by using a list of affected genes, termed as 'damage lists'. These damage lists were identified through a refined cascading failure approach that was based on a previous topological flux balance analysis. Using an Escherichia coli metabolic network, we incorporated essentiality information into damage lists and revealed that the knockout of an essential gene mainly affects a large range of other essential genes whereas knockout of a non-essential gene only interrupts other non-essential genes. Also, genes sharing common damage lists tend to have the same essentiality. We extracted 72 core functional modules from the common damage lists of essential genes and demonstrated their ability to halt essential metabolites production. Overall, our network analysis revealed that essential and non-essential genes propagated their deletion effects via distinct routes, conferring mechanistic explanation to the observed lethality phenotypes of essential genes.  相似文献   

12.
The DAVID Gene Functional Classification Tool uses a novel agglomeration algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules. This organization is accomplished by mining the complex biological co-occurrences found in multiple sources of functional annotation. It is a powerful method to group functionally related genes and terms into a manageable number of biological modules for efficient interpretation of gene lists in a network context.  相似文献   

13.
The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or to a meta-analysis comparison, it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained, instead of just one list. Here we introduce a method, based on permutations, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated by finding and comparing gene profiles on a large prostate cancer dataset, consisting of two cohorts of patients from different countries, for a total of 455 samples.  相似文献   

14.
Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein–protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web‐based tool, PPI spider ( http://mips.helmholtz‐muenchen.de/proj/ppispider ). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list.  相似文献   

15.

Background  

Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.  相似文献   

16.
17.
Since the available microarray data of BOEC (human blood outgrowth endothelial cells), large vessel, and microvascular endothelial cells were from two different platforms, a working cross-platform normalization method was needed to make these data comparable. With six HUVEC (human umbilical vein endothelial cells) samples hybridized on two-channel cDNA arrays and six HUVEC samples on Affymetrix arrays, 64 possible combinations of a three-step normalization procedure were investigated to search for the best normalization method, which was selected, based on two criteria measuring the extent to which expression profiles of biological samples of the same cell type arrayed on two platforms were indistinguishable. Next, three discriminative gene lists between the large vessel and the microvascular endothelial cells were achieved by SAM (significant analysis of microarrays), PAM (prediction analysis for microarrays), and a combination of SAM and PAM lists. The final discriminative gene list was selected by SVM (support vector machine). Based on this discriminative gene list, SVM classification analysis with best tuning parameters and 10,000 times of validations showed that BOEC were far from large vessel cells, they either formed their own class, or fell into the microvascular class. Based on all the common genes between the two platforms, SVM analysis further confirmed this conclusion.  相似文献   

18.
The model plant Arabidopsis has been well-studied using high-throughput genomics technologies, which usually generate lists of differentially expressed genes under various conditions. Our group recently collected 1065 gene lists from 397 gene expression studies as a knowledgebase for pathway analysis. Here we systematically analyzed these gene lists by computing overlaps in all-vs.-all comparisons. We identified 16,261 statistically significant overlaps, represented by an undirected network in which nodes correspond to gene lists and edges indicate significant overlaps. The network highlights the correlation across the gene expression signatures of the diverse biological processes. We also partitioned the main network into 20 sub-networks, representing groups of highly similar expression signatures. These are common sets of genes that were co-regulated under different treatments or conditions and are often related to specific biological themes. Overall, our result suggests that diverse gene expression signatures are highly interconnected in a modular fashion.  相似文献   

19.
20.
Meta-analysis combines affymetrix microarray results across laboratories   总被引:3,自引:0,他引:3  
With microarray technology becoming more prevalent in recent years, it is now common for several laboratories to employ the same microarray technology to identify differentially expressed genes that are related to the same phenomenon in the same species. Although experimental specifics may be similar, each laboratory will typically produce a slightly different list of statistically significant genes, which calls into question the validity of each gene list (i.e. which list is best). A statistically-based meta-analytic approach to microarray analysis systematically combines results from the different laboratories to provide a single estimate of the degree of differential expression for each gene. This approach provides a more precise view of genes that are of significant interest, while simultaneously allowing for differences between laboratories. The widely-used Affymetrix oligonucleotide array and its software are of particular interest because the results are naturally suited to a meta-analysis. A simulation model based on the Affymetrix platform is developed to examine the adaptive nature of the meta-analytic approach and to illustrate the utility of such an approach in combining microarray results across laboratories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号