共查询到20条相似文献,搜索用时 0 毫秒
1.
Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis 总被引:1,自引:0,他引:1
MOTIVATION: The field of microarray data analysis is shifting emphasis from methods for identifying differentially expressed genes to methods for identifying differentially expressed gene categories. The latter approaches utilize a priori information about genes to group genes into categories and enhance the interpretation of experiments aimed at identifying expression differences across treatments. While almost all of the existing approaches for identifying differentially expressed gene categories are practically useful, they suffer from a variety of drawbacks. Perhaps most notably, many popular tools are based exclusively on gene-specific statistics that cannot detect many types of multivariate expression change. RESULTS: We have developed a nonparametric multivariate method for identifying gene categories whose multivariate expression distribution differs across two or more conditions. We illustrate our approach and compare its performance to several existing procedures via the analysis of a real data set and a unique data-based simulation study designed to capture the challenges and complexities of practical data analysis. We show that our method has good power for differentiating between differentially expressed and non-differentially expressed gene categories, and we utilize a resampling based strategy for controlling the false discovery rate when testing multiple categories. AVAILABILITY: R code (www.r-project.org) for implementing our approach is available from the first author by request. 相似文献
2.
Lim J Kim J Kim SC Yu D Kim K Kim BS 《Statistical applications in genetics and molecular biology》2012,11(3):Article 5
Partially paired data sets often occur in microarray experiments (Kim et al., 2005; Liu, Liang and Jang, 2006). Discussions of testing with partially paired data are found in the literature (Lin and Stivers 1974; Ekbohm, 1976; Bhoj, 1978). Bhoj (1978) initially proposed a test statistic that uses a convex combination of paired and unpaired t statistics. Kim et al. (2005) later proposed the t3 statistic, which is a linear combination of paired and unpaired t statistics, and then used it to detect differentially expressed (DE) genes in colorectal cancer (CRC) cDNA microarray data. In this paper, we extend Kim et al.'s t3 statistic to the Hotelling's T2 type statistic Tp for detecting DE gene sets of size p. We employ Efron's empirical null principle to incorporate inter-gene correlation in the estimation of the false discovery rate. Then, the proposed Tp statistic is applied to Kim et al's CRC data to detect the DE gene sets of sizes p=2 and p=3. Our results show that for small p, particularly for p=2 and marginally for p=3, the proposed Tp statistic compliments the univariate procedure by detecting additional DE genes that were undetected in the univariate test procedure. We also conduct a simulation study to demonstrate that Efron's empirical null principle is robust to the departure from the normal assumption. 相似文献
3.
Giulio Galla Gianni Barcaccia Angelo Ramina Silvio Collani Fiammetta Alagna Luciana Baldoni Nicolò GM Cultrera Federico Martinelli Luca Sebastiani Pietro Tonutti 《BMC plant biology》2009,9(1):128-17
Background
Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. 相似文献4.
Discovering tightly regulated and differentially expressed gene sets in whole genome expression data
MOTIVATION: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS: When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY: The programs are available by request from the authors. 相似文献
5.
An important problem addressed using cDNA microarray data is the detection of genes differentially expressed in two tissues of interest. Currently used approaches ignore the multidimensional structure of the data. However it is well known that correlation among covariates can enhance the ability to detect less pronounced differences. We use the Mahalanobis distance between vectors of gene expressions as a criterion for simultaneously comparing a set of genes and develop an algorithm for maximizing it. To overcome the problem of instability of covariance matrices we propose a new method of combining data from small-scale random search experiments. We show that by utilizing the correlation structure the multivariate method, in addition to the genes found by the one-dimensional criteria, finds genes whose differential expression is not detectable marginally. 相似文献
6.
GOAnno: GO annotation based on multiple alignment 总被引:2,自引:0,他引:2
Chalmel F Lardenois A Thompson JD Muller J Sahel JA Léveillard T Poch O 《Bioinformatics (Oxford, England)》2005,21(9):2095-2096
SUMMARY: GOAnno is a web tool that automatically annotates proteins according to the Gene Ontology (GO) using evolutionary information available in hierarchized multiple alignments. GO terms present in the aligned functional subfamily can be cross-validated and propagated to obtain highly reliable predicted GO annotation based on the GOAnno algorithm. AVAILABILITY: The web tool and a reduced version for local installation are freely available at http://igbmc.u-strasbg.fr/GOAnno/GOAnno.html SUPPLEMENTARY INFORMATION: The website supplies a detailed explanation and illustration of the algorithm at http://igbmc.u-strasbg.fr/GOAnno/GOAnnoHelp.html. 相似文献
7.
We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets. 相似文献
8.
Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 总被引:20,自引:2,他引:20 下载免费PDF全文
Selina S. Dwight Midori A. Harris Kara Dolinski Catherine A. Ball Gail Binkley Karen R. Christie Dianna G. Fisk Laurie Issel-Tarver Mark Schroeder Gavin Sherlock Anand Sethuraman Shuai Weng David Botstein J. Michael Cherry 《Nucleic acids research》2002,30(1):69-72
9.
Background
To identify differentially expressed genes, it is standard practice to test a two-sample hypothesis for each gene with a proper adjustment for multiple testing. Such tests are essentially univariate and disregard the multidimensional structure of microarray data. A more general two-sample hypothesis is formulated in terms of the joint distribution of any sub-vector of expression signals.Results
By building on an earlier proposed multivariate test statistic, we propose a new algorithm for identifying differentially expressed gene combinations. The algorithm includes an improved random search procedure designed to generate candidate gene combinations of a given size. Cross-validation is used to provide replication stability of the search procedure. A permutation two-sample test is used for significance testing. We design a multiple testing procedure to control the family-wise error rate (FWER) when selecting significant combinations of genes that result from a successive selection procedure. A target set of genes is composed of all significant combinations selected via random search.Conclusions
A new algorithm has been developed to identify differentially expressed gene combinations. The performance of the proposed search-and-testing procedure has been evaluated by computer simulations and analysis of replicated Affymetrix gene array data on age-related changes in gene expression in the inner ear of CBA mice.10.
Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical 'baseline' set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns. 相似文献
11.
12.
Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins. 相似文献
13.
Microarrays allow researchers to examine the expression of thousands of genes simultaneously. However, identification of genes differentially expressed in microarray experiments is challenging. With an optimal test statistic, we rank genes and estimate a threshold above which genes are considered to be differentially expressed genes (DE). This overcomes the embarrassing shortcoming of many statistical methods to determine the cut-off values in ranking analysis. Experiments demonstrate that our method is a good performance and avoids the problems with graphical examination and multiple hypotheses testing that affect alternative approaches. Comparing to those well known methods, our method is more sensitive to data sets with small differentially expressed values and not biased in favor of data sets based on certain distribution models. 相似文献
14.
Protein profiling is frequently used to elucidate disease-specific or differentially expressed proteins. While recent developments have resulted in improved differential profiling, alternative expression platforms that complement existing techniques are continually being explored. We developed a novel method utilizing the amplification and selection capabilities of random peptide-expressing M13 bacteriophage to accentuate differentially expressed proteins in biologic specimens. While the current study used this method to demonstrate differentially expressed proteins in lung cancer tissue in comparison to normal lung tissue, this approach is applicable to a wide range of sample types. 相似文献
15.
Although many statistical methods have been proposed for identifying differentially expressed genes, the optimal approach has still not been resolved. Therefore, it is necessary to develop more efficient methods of finding differentially expressed genes while accounting for noise and false discovery rate (FDR). We propose a method based on multi-resolution wavelet transformation analysis combined with SAM for identifying differentially expressed genes by adjusting the Δ and computing the FDR. This method was applied to a microarray expression dataset from adenoma patients and normal subjects. The number of differentially expressed genes gradually reduced with an increasing Δ value, and the FDR was reduced after wavelet transformation. At a given Δ value, the FDR was also reduced before and after wavelet transformation. In conclusion, a greater number and quality of differentially expressed genes were detected using the method when compared to non-transformed data, and the FDRs were notably more controlled and reduced. 相似文献
16.
Identification of differentially expressed genes in mouse kidney after irradiation using microarray analysis 总被引:3,自引:0,他引:3
Kruse JJ te Poele JA Velds A Kerkhoven RM Boersma LJ Russell NS Stewart FA 《Radiation research》2004,161(1):28-38
Irradiation of the kidney induces dose-dependent, progressive renal functional impairment, which is partly mediated by vascular damage. The molecular mechanisms underlying the development of radiation-induced nephropathy are unclear. Given the complexity of radiation-induced responses, microarrays may offer new opportunities to identify a wider range of genes involved in the development of radiation injury. The aim of the present study was to determine whether microarrays are a useful tool for identifying time-related changes in gene expression and potential mechanisms of radiation-induced nephropathy. Microarray experiments were performed using amplified RNA from irradiated mouse kidneys (1 x 16 Gy) and from sham-irradiated control tissue at different intervals (1-30 weeks) after irradiation. After normalization procedures (using information from straight-color, color-reverse and self-self experiments), the differentially expressed genes were identified. Control and repeat experiments were done to confirm that the observations were not artifacts of the array procedure (RNA amplification, probe synthesis, hybridizations and data analysis). To provide independent confirmation of microarray data, semi-quantitative PCR was performed on a selection of genes. At 1 week after irradiation (before the onset of vascular and functional damage), 16 genes were significantly up-regulated and 9 genes were down-regulated. During the period of developing nephropathy (10 to 20 weeks), 31 and 42 genes were up-regulated and 9 and 4 genes were down-regulated. At the later time of 30 weeks, the vast majority of differentially expressed genes (191 out of 203) were down-regulated. Potential genes of interest included TSA-1 (also known as Ly6e) and Jagged 1 (Jag1). Increased expression of TSA-1, a member of the Ly-6 family, has previously been reported in response to proteinuria. Jagged 1, a ligand for the Notch receptor, is known to play a role in angiogenesis, and is particularly interesting in the context of radiation-induced vascular injury. The present study demonstrates the potential of microarrays to identify changing patterns of gene expression in irradiated kidney. Further studies will be required to evaluate functional involvement of these genes in vascular-mediated normal tissue injury. 相似文献
17.
利用生物信息学方法分析脱发相关差异表达基因,有望帮助了解脱发发生发展的分子机制。本研究从NCBI的子数据库GEO中选择基因表达谱GSE45512和GSE45513数据集,利用R语言limma工具包,筛选出两个物种斑秃样本与正常样本的共同显著差异表达基因。对这部分基因进行功能注释和蛋白互作网络分析,同时对全部差异表达基因进行基因集富集分析。结果发现,人头皮斑秃样本共筛选出225个差异表达基因;C3H/HeJ小鼠自发斑秃皮肤样本共筛选出337个差异表达基因;两个物种的共同显著差异表达基因有23个。GO功能富集分析和蛋白互作网络分析显示,这部分差异基因显著富集于免疫相关功能,并且彼此间存在蛋白互作关系。基因集富集分析显示两个物种的差异基因都能显著富集到趋化因子信号通路、细胞因子受体相互作用、金葡菌感染及抗原加工与呈递通路;而且人的下调差异基因不仅映射到了人类表型数据库的脱发表型,也映射到皮肤附属物病理相关表型。综上所述,本研究通过生物信息方法分析脱发皮肤组织与正常皮肤组织的差异表达基因,最终筛选出23个在人和小鼠中共同存在的显著差异表达基因;此外,分析发现脱发与免疫过程及皮肤附属物病变密切相关,这些结果为脱发的诊断和治疗提供了新思路。 相似文献
18.
Background
Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. 相似文献19.
20.
Chandan Badapanda 《Bioinformation》2013,9(4):216-221
The suppression subtractive hybridization (SSH) approach, a PCR based approach which amplifies differentially expressed cDNAs(complementary DNAs), while simultaneously suppressing amplification of common cDNAs, was employed to identify immuneinduciblegenes in insects. This technique has been used as a suitable tool for experimental identification of novel genes ineukaryotes as well as prokaryotes; whose genomes have been sequenced, or the species whose genomes have yet to be sequenced.In this article, I have proposed a method for in silico functional characterization of immune-inducible genes from insects. Apartfrom immune-inducible genes from insects, this method can be applied for the analysis of genes from other species, starting frombacteria to plants and animals. This article is provided with a background of SSH-based method taking specific examples frominnate immune-inducible genes in insects, and subsequently a bioinformatics pipeline is proposed for functional characterization ofnewly sequenced genes. The proposed workflow presented here, can also be applied for any newly sequenced species generatedfrom Next Generation Sequencing (NGS) platforms. 相似文献