共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis. 相似文献
2.
Rongheng Lin Shuangshuang Dai Richard D Irwin Alexandra N Heinloth Gary A Boorman Leping Li 《BMC bioinformatics》2008,9(1):481
Background
Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed. 相似文献3.
Microarrays have revolutionized gene expression analysis as they allow for highly parallel monitoring of mRNA levels of thousands
of genes in a single experiment. Since their introduction some 15 years ago, substantial progress has been achieved with regard
to, e.g., faster or more sensitive analyses. In this review, interesting new approaches for a more sensitive detection of
specific mRNAs will be highlighted. Particularly, the potential of electrical DNA chip formats that allow for faster mRNA
analyses will be discussed. 相似文献
4.
5.
6.
Guro Dørum Lars Snipen Margrete Solheim Solve Sæbø 《Biometrical journal. Biometrische Zeitschrift》2014,56(6):1055-1075
Gene set analysis methods are popular tools for identifying differentially expressed gene sets in microarray data. Most existing methods use a permutation test to assess significance for each gene set. The permutation test's assumption of exchangeable samples is often not satisfied for time‐series data and complex experimental designs, and in addition it requires a certain number of samples to compute p‐values accurately. The method presented here uses a rotation test rather than a permutation test to assess significance. The rotation test can compute accurate p‐values also for very small sample sizes. The method can handle complex designs and is particularly suited for longitudinal microarray data where the samples may have complex correlation structures. Dependencies between genes, modeled with the use of gene networks, are incorporated in the estimation of correlations between samples. In addition, the method can test for both gene sets that are differentially expressed and gene sets that show strong time trends. We show on simulated longitudinal data that the ability to identify important gene sets may be improved by taking the correlation structure between samples into account. Applied to real data, the method identifies both gene sets with constant expression and gene sets with strong time trends. 相似文献
7.
8.
9.
Wu TD 《Briefings in bioinformatics》2002,3(1):7-17
The accumulation of DNA microarray data has now made it possible to use gene expression profiles to analyse expression data. A gene expression profile contains the expression data for a given gene over various samples, and can be contrasted with an expression signature, which contains the expression data for a single sample. Gene expression profiles are most revealing when samples are grouped appropriately, either by standard clinical or pathological categories or by categories discovered through cluster analysis techniques. Expression profiles can exist at various levels of abstraction, yielding information across various tissues or across diseases within a particular tissue. Hypothesis tests may be applied to expression profiles on a large scale to identify candidate genes of interest. 相似文献
10.
Recent developments in microarray technology make it possible to capture the gene expression profiles for thousands of genes at once. With this data researchers are tackling problems ranging from the identification of 'cancer genes' to the formidable task of adding functional annotations to our rapidly growing gene databases. Specific research questions suggest patterns of gene expression that are interesting and informative: for instance, genes with large variance or groups of genes that are highly correlated. Cluster analysis and related techniques are proving to be very useful. However, such exploratory methods alone do not provide the opportunity to engage in statistical inference. Given the high dimensionality (thousands of genes) and small sample sizes (often <30) encountered in these datasets, an honest assessment of sampling variability is crucial and can prevent the over-interpretation of spurious results. We describe a statistical framework that encompasses many of the analytical goals in gene expression analysis; our framework is completely compatible with many of the current approaches and, in fact, can increase their utility. We propose the use of a deterministic rule, applied to the parameters of the gene expression distribution, to select a target subset of genes that are of biological interest. In addition to subset membership, the target subset can include information about relationships between genes, such as clustering. This target subset presents an interesting parameter that we can estimate by applying the rule to the sample statistics of microarray data. The parametric bootstrap, based on a multivariate normal model, is used to estimate the distribution of these estimated subsets and relevant summary measures of this sampling distribution are proposed. We focus on rules that operate on the mean and covariance. Using Bernstein's Inequality, we obtain consistency of the subset estimates, under the assumption that the sample size converges faster to infinity than the logarithm of the number of genes. We also provide a conservative sample size formula guaranteeing that the sample mean and sample covariance matrix are uniformly within a distance epsilon > 0 of the population mean and covariance. The practical performance of the method using a cluster-based subset rule is illustrated with a simulation study. The method is illustrated with an analysis of a publicly available leukemia data set. 相似文献
11.
Summary . Multiple outcomes are often used to properly characterize an effect of interest. This article discusses model-based statistical methods for the classification of units into one of two or more groups where, for each unit, repeated measurements over time are obtained on each outcome. We relate the observed outcomes using multivariate nonlinear mixed-effects models to describe evolutions in different groups. Due to its flexibility, the random-effects approach for the joint modeling of multiple outcomes can be used to estimate population parameters for a discriminant model that classifies units into distinct predefined groups or populations. Parameter estimation is done via the expectation-maximization algorithm with a linear approximation step. We conduct a simulation study that sheds light on the effect that the linear approximation has on classification results. We present an example using data from a study in 161 pregnant women in Santiago, Chile, where the main interest is to predict normal versus abnormal pregnancy outcomes. 相似文献
12.
Tsai MH Yan H Chen X Chandramouli GV Zhao S Coffin D Coleman CN Mitchell JB Chuang EY 《Molecular biotechnology》2005,29(3):221-224
We compared different hybridization conditions of oligonucleotide-based DNA microarray to acquire optimized and reliable microarray
data. Several parameters were evaluated at different hybridization conditions, including signal-to-background (S:B) ratios,
signal dynamic range, usable spots, and reproducibility. Statistical analysis showed that better results were obtained when
spotted, presynthesized long oligonucleotide arrays were blocked with succinic anhydride and hybridized at 42°C in the presence
of 50% formamide. 相似文献
13.
In the past several years, oligonucleotide microarrays have emerged as a widely used tool for the simultaneous, non-biased measurement of expression levels for thousands of genes. Several challenges exist in successfully utilizing this biotechnology; principal among these is analysis of microarray data. An experiment to measure differential gene expression can consist of a dozen microarrays, each consisting of over a hundred thousand data points. Previously, we have described the use of a novel algorithm for analyzing oligonucleotide microarrays and assessing changes in gene expression [J. Mol. Biol. 317 (2002) 225]. This algorithm describes changes in expression in terms of the statistical significance (S-score) of change, which combines signals detected by multiple probe pairs according to an error model characteristic of oligonucleotide arrays. Software is available that simplifies the use of the application of this algorithm so that it may be applied to improving the analysis of oligonucleotide microarray data. The application of this method to problems of the central nervous system is discussed. 相似文献
14.
Carissa M. Soto Kate M. Blaney Mubasher Dar Manzer Khan Baochuan Lin Anthony P. Malanoski Cherise Tidd Mayrim V. Rios Darlah M. Lopez Banahalli R. Ratna 《Biosensors & bioelectronics》2009,25(1):48-54
Previous studies have shown that a functionalized viral nanoparticle can be used as a fluorescent signal-generating element and enhance detection sensitivity for immunoassays and low density microarrays. In this study, we further tested this ability in commercial DNA microarrays, including Affymetrix high density resequencing microarray. Optimum conditions for NeutrAvidin and dye coupling to a double-cysteine mutant of cowpea mosaic virus (CPMV) were found to be comparable to the commonly used streptavidin-phycoerythrin (SAPE) for high density resequencing microarray. A 3-fold signal enhancement in comparison to Cy5-dCTP controls was obtained when using nanoparticles on control scorecard expression microarrays. Hybridization results from commercially available 8000 rat expression arrays indicate an increment of 14% on the detected features when the virus complex was used as the staining reagent in comparison to Cy5-dCTP controls. The current work shows the utility of the CPMV-dye nanoparticles as a detection reagent in well-established detection platforms. 相似文献
15.
The speed of sound (SOS) value is an indicator of bone mineral density (BMD). Previous genome-wide association (GWA) studies have identified a number of genes, whose variations may affect BMD levels. However, their biological implications have been elusive. We re-analyzed the GWA study dataset for the SOS values in skeletal sites of 4,659 Korean women, using a gene-set analysis software, GSA-SNP. We identified 10 common representative GO terms, and 17 candidate genes between these two traits (PGS < 0.05). Implication of these GO terms and genes in the bone mechanism is well supported by the literature survey. Interestingly, the significance levels of some member genes were inversely related, in several gene-sets that were shared between two skeletal sites. This implies that biological process, rather than SNP or gene, is the substantial unit of genetic association for SOS in bone. In conclusion, our findings may provide new insights into the biological mechanisms for BMD. [BMB Reports 2014; 47(6): 348-353] 相似文献
16.
Summary . Gene expression microarray experiments are intrinsically two-phase experiments. Messenger RNA (mRNA), required for the microarray experiment, must first be derived from plants or animals that are exposed to a set of treatments in a previous experiment (Phase 1). The mRNA is then used in the subsequent laboratory-based microarray experiment (Phase 2) from which gene expression is measured and ultimately analyzed. We show that obtaining a valid test for the effects of treatments on gene expression depends on the design of both the Phase 1 and Phase 2 experiments. Examples show that the multiple dye-swap design at Phase 2 is more robust than the alternating loop design in the absence of prior knowledge of the relative size of variation in the Phase 1 and Phase 2 experiments. 相似文献
17.
18.
G. Morota F. Peñagaricano J. L. Petersen D. C. Ciobanu K. Tsuyuzaki I. Nikaido 《Animal genetics》2015,46(4):381-387
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species. 相似文献
19.
DNA甲基化与基因表达调控研究进展 总被引:4,自引:0,他引:4
表观遗传修饰是指不改变DNA序列的、可遗传的对碱基和组蛋白的化学修饰,主要包括DNA甲基化、组蛋白修饰、染色质重塑以及非编码RNA等.表观遗传修饰是更高层次的基因表达调控手段.DNA甲基化是一种重要的表观遗传修饰,参与基因表达调控、基因印记、转座子沉默、X染色体失活以及癌症发生等重要生物学过程.近年来随着研究方法和技术的进步,全基因组DNA甲基化的研究广泛兴起,多个物种全基因组甲基化图谱被破译,全局水平对DNA甲基化的研究不仅利于在宏观层面上了解DNA甲基化的特性与规律,同时也为深入分析DNA甲基化的生物学功能与调控奠定了基础.结合最新研究进展综述DNA甲基化在基因组中的分布模式、规律以及和基因转录的关系等. 相似文献
20.
Until recently, the approach to understanding the molecular basis of complex syndromes such as cancer, coronary artery disease, and diabetes was to study the behavior of individual genes. However, it is generally recognized that expression of a number of genes is coordinated both spatially and temporally and that this coordination changes during the development and progression of diseases. Newly developed functional genomic approaches, such as serial analysis of gene expression (SAGE) and DNA microarrays have enabled researchers to determine the expression pattern of thousands of genes simultaneously. One attractive feature of SAGE compared to microarrays is its ability to quantify gene expression without prior sequence information or information about genes that are thought to be expressed. SAGE has been successfully applied to the gene expression profiling of a number of human diseases. In this review, we will first discuss SAGE technique and contrast it to microarray. We will then highlight new biological insights that have emerged from its application to the study of human diseases. 相似文献