首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
目的:研究混合效应模型(Mixed Effects Model)在肿瘤表达谱基因芯片数据分析中的检验效能,并探讨其分析效果。方法:采用混合效应模型分析肿瘤实例基因芯片数据,并以基因集富集分析方法(GSEA)作为参照比较分析结果的有效性和科学性,探讨其检验效果。结果:通过混合效应模型和基因集富集分析(GSEA)两种方法对肿瘤基因芯片数据的分析和比较,两种方法筛选出共同的差异表达通路外,混合效应模型额外地筛选出来GSEA未能检验到的8条差异表达通路,且得到文献支持;混和效应模型筛选出的前10个差异表达通路中有6个已有生物学证明而基因集富集分析方法(GSEA)筛选出的前10个差异表达通路中仅有4个已有生物学证明。结论:混合效应模型作为top-down方法中的典型代表,其优势在于通过构建潜变量达到降维目的,可有效地减少多个复杂的变异来源从而保证了结果的准确性和科学性,其检验效能优于基因集富集分析方法(GSEA),是一种行之有效的筛选肿瘤基因芯片数据的分析方法。  相似文献   

2.
CONSTANS(CO)是植物光周期诱导开花途径中的关键基因之一。为探究BdCO在光周期途径中的分子调控机制,本研究对野生型二穗短柄草(Brachypodium distachyon)Bd21植株、过表达BdCO基因型二穗短柄草(CO_OX)植株和BdCO基因敲除型二穗短柄草(CO_A3)植株进行转录组测序分析,对差异表达基因进行GO和KEGG功能富集分析,最后观察三种植株的开花表型。结果表明,对比Bd21 vs CO_OX和Bd21 vs CO_A3的基因表达量,分别检测到1 382个和773个差异表达基因;GO功能富集分析发现,Bd21 vs CO_OX的差异表达基因主要富集在小核仁核糖核蛋白复合物、 snoRNA结合和rRNA处理中,Bd21 vs CO_A3的差异表达基因主要富集在类囊体、色素结合和光合作用中;KEGG通路富集分析发现,Bd21 vs CO_OX的差异表达基因主要富集在植物激素信号转导、真核生物中的核糖体生物发生、光合作用-天线蛋白和昼夜节律-植物等通路,Bd21 vs CO_A3的差异表达基因主要富集在MAPK信号通路-植物、真核生物中的核糖体生物发生和光合作...  相似文献   

3.
miR-15a靶基因的预测及生物信息学分析   总被引:3,自引:0,他引:3  
目的:对目前研究较为广泛的miR-15a的靶基因进行预测及相关生物信息学分析,以期为miR-15a靶基因的实验验证提供数据支持,并为深入研究miR-15a的调控机制及生物学功能奠定基础和提供理论指导。方法:选择TargetScan5.1与PicTar两种计算方法预测miR-15a的靶基因的交集作为分析的基因集合,分别进行GO注释描述、GO富集分析和生物通路富集分析。结果与结论:预测靶基因集合分别富集在转录调控、蛋白质修饰、细胞周期等生物学过程和蛋白激酶活性等分子功能上(P0.01);经典miR-15a预测靶基因集合显著富集于KEGG通路数据库中的Wnt信号通路、细胞周期和p53信号通路等5个信号转导通路及前列腺癌、慢性髓细胞性白血病、黑素瘤等7个疾病通路中(P0.05)。  相似文献   

4.
向虹  阳小胡  艾亮霞  潘燕平  胡勇 《遗传》2020,(2):172-182,I0002,I0003
利用生物信息学方法分析脱发相关差异表达基因,有望帮助了解脱发发生发展的分子机制。本研究从NCBI的子数据库GEO中选择基因表达谱GSE45512和GSE45513数据集,利用R语言limma工具包,筛选出两个物种斑秃样本与正常样本的共同显著差异表达基因。对这部分基因进行功能注释和蛋白互作网络分析,同时对全部差异表达基因进行基因集富集分析。结果发现,人头皮斑秃样本共筛选出225个差异表达基因;C3H/HeJ小鼠自发斑秃皮肤样本共筛选出337个差异表达基因;两个物种的共同显著差异表达基因有23个。GO功能富集分析和蛋白互作网络分析显示,这部分差异基因显著富集于免疫相关功能,并且彼此间存在蛋白互作关系。基因集富集分析显示两个物种的差异基因都能显著富集到趋化因子信号通路、细胞因子受体相互作用、金葡菌感染及抗原加工与呈递通路;而且人的下调差异基因不仅映射到了人类表型数据库的脱发表型,也映射到皮肤附属物病理相关表型。综上所述,本研究通过生物信息方法分析脱发皮肤组织与正常皮肤组织的差异表达基因,最终筛选出23个在人和小鼠中共同存在的显著差异表达基因;此外,分析发现脱发与免疫过程及皮肤附属物病变密切相关,这些结果为脱发的诊断和治疗提供了新思路。  相似文献   

5.
D型细胞周期蛋白(D-type cyclin)调控着细胞周期G1/S的转变,在植物生长发育过程中发挥重要作用。转基因杨树PtoCYCD2;1(OE-PtoCYCD2;1)植株出现明显的表型变化,株高降低,茎粗变细且叶片发生卷曲。该研究以转基因杨树OE-PtoCYCD2;1为研究材料,通过转录组学测序和生理指标变化并结合植株表型特征分析PtoCYCD2;1在植物生长发育中的功能,为研究木本植物D型细胞周期蛋白功能提供理论基础。结果表明:(1)在OE-PtoCYCD2;1中共鉴定得到1269个差异表达基因,其中有700个上调表达,569个下调表达。分析发现,有26个属于AP2/ERF转录因子的基因上调表达;有8个下调的差异表达基因富集在木质部合成通路中;在碳代谢通路中共富集27个下调差异表达基因,其中有8个基因富集到卡尔文循环通路中。(2)qRT-PCR实验结果显示,9个差异表达基因的qRT-PCR结果与RNA-seq测定的表达水平变化趋势一致,表明所用RNA-seq结果可靠。(3)生理指标分析发现,与野生型(WT)相比,转基因杨树OE-PtoCYCD2;1的幼叶和成熟叶的总叶绿素含量分别增加57.36%和78.22%;成熟叶的可溶性糖含量下降了12.72%;幼叶和成熟叶中的木质素含量分别下降了4.48%和8.03%;幼茎和成熟茎中的木质素含量分别下降了20.03%和31.63%。研究认为,转基因杨树OE-PtoCYCD2;1通过影响杨树碳代谢和木质素合成过程中相关基因的表达,从而造成转基因植株相应代谢物含量减少,最终导致植株表型改变,总体生物量降低。  相似文献   

6.
干酪乳杆菌(Lactobacillus casei)刺激小鼠肠道后,利用高通量测序技术对干酪乳杆菌饲喂组和空白组小鼠的肠道组织进行分析,以期查询和验证干酪乳杆菌对肠道免疫的影响。转录组数据的生物信息学分析发现差异表达基因共751个,通过GO富集分析发现有14个基因与细胞免疫相关,聚焦在T细胞激活、细胞分化、免疫调节负调控等功能上。KEGG通路富集显示聚集在PPAR信号通路、B细胞受体信号通路和趋化因子信号通路。对基因的基本特性分析结果显示,14个基因分别定位在10条染色体上,蛋白的分子量介于11.37~83.45 kDa之间,等电点pI 4.42~11.36,均为不稳定脂溶性蛋白,并具有相关功能结构域。通过保守基序与结构分析发现Motif分布具有保守性,大部分基因含有2个内含子。qRT-PCR验证结果表明,14个基因中有6个基因(Ces1d、Lzts1、Paqr7、Aloxe3、Zbtb16、OX40)在不同时间的处理下整体表达水平较高。验证结果与转录组测序结果一致,且这些基因功能与细胞免疫相关,该结果为研究干酪乳杆菌对机体的免疫作用效果提供一定的理论依据。  相似文献   

7.
利用生物信息学方法研究HER-2阴性乳腺癌的潜在靶向基因、信号传导通路及分子机制。从公众数据库(gene expression omnibus,GEO)下载HER-2阴性乳腺癌患者和正常女性上皮细胞基因芯片数据,运用RMA算法进行数据预处理,运用R语言LIMMA的包选出差异表达基因。将差异表达基因上传到DAVID在线网站进行富集分析,运用KEGG信号传导通路进行信号传导通路分析,最后用STRING进行蛋白相互作用网络分析(控制差异表达倍数大于2倍,p值小于0.01),筛出差异表达基因72个(上调基因10个,下调基因62个)。差异基因富集分析结果表明富集程度高的差异基因主要与转录调节、细胞凋亡及细胞外刺激反应等密切相关,信号传导通路分析结果表明差异基因主要与MAPK信号转导通路等密切相关,蛋白共表达网络分析结果表明关键蛋白质为FOS、JUN、KLF6、ATF3、HIST2H2AA4和HIST2H2BE等。HER-2阴性的乳腺癌患者早期差异表达基因表现为下调,HIST2H2AA4和HIST2H2BE可成为其潜在靶向基因,并可作为MAPK信号传导通路中ERK1/2中FOS基因的下游基因,验证需进一步实验。  相似文献   

8.
9.
为筛选出促进铁皮石斛(Dendrobium officinale)生长的差异表达基因和差异代谢物,对瘤菌根菌与无菌盆栽铁皮石斛苗共生后形成的侧根根系进行转录组、代谢组和双组学联合分析。结果表明,转录组分析共找到262条差异表达基因富集到了35条通路中,其中内质网蛋白质加工通路途径的差异基因最多,其次为氨基糖和核苷酸糖代谢通路。代谢组分析共检测出194个差异代谢物富集到33个KEGG通路中,其中代谢途径的差异代谢物最多有133个,其次为不同环境的微生物代谢途径的差异代谢物有70个。通过联合分析,有9个差异基因的差异表达导致丝氨酸、谷氨酸、D-甘露糖和激素等代谢物的积累量发生变化,这可能是瘤菌根菌促进铁皮石斛生长的重要原因。因此,推测瘤菌根菌促进铁皮石斛生长与氨基酸、糖、植物激素的积累及相关基因的表达变化有关。  相似文献   

10.
绿色杜氏藻是研究耐盐机理的模式绿藻.葡萄糖不仅是营养物质,而且还是信号物质.目前,对绿色杜氏藻转录组、糖处理后差异表达基因和β-胡萝卜素生物合成途径关键基因表达还不清楚.本研究通过Illumina HiSeqTM 2000高通量测序,获得葡萄糖处理和未处理绿色杜氏藻转录组信息.利用P value值和差异倍数对样本进行差异表达分析,共111条转录本存在差异表达,3条为上调转录本,108条为下调转录本.利用RT-qPCR检验差异表达分析的准确性. 结果表明,转录本表达结果与转录组分析结果一致.GO功能富集结果表明,71条下调转录本与代谢相关,占所有下调转录本的65.74%.KEGG富集分析结果表明,21条KEGG通路含89条下调转录本,14条通路与代谢相关.代谢中通路最多的为能量代谢(6条),含63条下调转录本.能量代谢中与光合作用相关的下调转录本最多,为29条.通过分析找到2条与β-胡萝卜素生物合成相关通路(MVA/MEP途径及β-胡萝卜素合成途径),并发现通路的关键基因hmgs、dxs、dxr、psy、pds、chyb,对其进行差异表达分析,均不存在差异表达.研究表明,葡萄糖抑制了绿色杜氏藻光合作用,代谢受阻,但未影响β-胡萝卜素生物合成相关通路及关键基因.  相似文献   

11.
Prostate cancer is one of the most common male malignant neoplasms; however, its causes are not completely understood. A few recent studies have used gene expression profiling of prostate cancer to identify differentially expressed genes and possible relevant pathways. However, few studies have examined the genetic mechanics of prostate cancer at the pathway level to search for such pathways. We used gene set enrichment analysis and a meta-analysis of six independent studies after standardized microarray preprocessing, which increased concordance between these gene datasets. Based on gene set enrichment analysis, there were 12 down- and 25 up-regulated mixing pathways in more than two tissue datasets, while there were two down- and two up-regulated mixing pathways in three cell datasets. Based on the meta-analysis, there were 46 and nine common pathways in the tissue and cell datasets, respectively. Three up- and 10 down-regulated crossing pathways were detected with combined gene set enrichment analysis and meta-analysis. We found that genes with small changes are difficult to detect by classic univariate statistics; they can more easily be identified by pathway analysis. After standardized microarray preprocessing, we applied gene set enrichment analysis and a meta-analysis to increase the concordance in identifying biological mechanisms involved in prostate cancer. The gene pathways that we identified could provide insight concerning the development of prostate cancer.  相似文献   

12.

Background  

Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.  相似文献   

13.
It has been suggested that pathway analysis can complement single-SNP analysis in exploring genomewide association data. Pathway analysis incorporates the available biological knowledge of genes and SNPs and is expected to improve the chances of revealing the underlying genetic architecture of complex traits. Methods for pathway analysis can be classified as competitive (enrichment) or self-contained (association) according to the hypothesis tested. Although association tests are statistically more powerful than enrichment tests they can be difficult to calibrate because biases in analysis accumulate across multiple SNPs or genes. Furthermore, enrichment tests can be more scientifically relevant than association tests, as they detect pathways with relatively more evidence for association than the remaining genes. Here we show how some well known association tests can be simply adapted to test for enrichment, and compare their performance to some established enrichment tests. We propose versions of the Adaptive Rank Truncated Product (ARTP), Tail Strength Measure and Fisher's combination of p-values for testing the enrichment null hypothesis. We compare the behaviour of these proposed methods with the established Hypergeometric Test and Gene-Set Enrichment Analysis (GSEA). The results of the simulation study show that the modified version of the ARTP method has generally the best performance across the situations considered. The methods were also applied for finding enriched pathways for body mass index (BMI) and platelet function phenotypes. The pathway analysis of BMI identified the Vasoactive Intestinal Peptide pathway as significantly associated with BMI. This pathway has been previously reported as associated with BMI and the risk of obesity. The ARTP method was the method that identified the largest number of enriched pathways across all tested pathway databases and phenotypes. The simulation and data application results are in agreement with previous work on association tests and suggests that the ARTP should be preferred for both enrichment and association testing.  相似文献   

14.
15.
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression (NPR) analysis to efficiently integrate genomic data and metadata. Such NPR models consider multiple pathways simultaneously and allow complex interactions among genes within the pathways and can be applied to identify pathways and genes that are related to variations of the phenotypes. These methods also provide an alternative to mediating the problem of a large number of potential interactions by limiting analysis to biologically plausible interactions between genes in related pathways. Our simulation studies indicate that the proposed boosting procedure can indeed identify relevant pathways. Application to a gene expression data set on breast cancer distant metastasis identified that Wnt, apoptosis, and cell cycle-regulated pathways are more likely related to the risk of distant metastasis among lymph-node-negative breast cancer patients. Results from analysis of other two breast cancer gene expression data sets indicate that the pathways of Metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer relapse and survival. We also observed that by incorporating the pathway information, we achieved better prediction for cancer recurrence.  相似文献   

16.
Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.  相似文献   

17.
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.  相似文献   

18.
MOTIVATION: Gene expression profiling experiments in cell lines and animal models characterized by specific genetic or molecular perturbations have yielded sets of genes annotated by the perturbation. These gene sets can serve as a reference base for interrogating other expression datasets. For example, a new dataset in which a specific pathway gene set appears to be enriched, in terms of multiple genes in that set evidencing expression changes, can then be annotated by that reference pathway. We introduce in this paper a formal statistical method to measure the enrichment of each sample in an expression dataset. This allows us to assay the natural variation of pathway activity in observed gene expression data sets from clinical cancer and other studies. RESULTS: Validation of the method and illustrations of biological insights gleaned are demonstrated on cell line data, mouse models, and cancer-related datasets. Using oncogenic pathway signatures, we show that gene sets built from a model system are indeed enriched in the model system. We employ ASSESS for the use of molecular classification by pathways. This provides an accurate classifier that can be interpreted at the level of pathways instead of individual genes. Finally, ASSESS can be used for cross-platform expression models where data on the same type of cancer are integrated over different platforms into a space of enrichment scores. AVAILABILITY: Versions are available in Octave and Java (with a graphical user interface). Software can be downloaded at http://people.genome.duke.edu/assess.  相似文献   

19.
Analysis of variance components in gene expression data   总被引:5,自引:0,他引:5  
MOTIVATION: A microarray experiment is a multi-step process, and each step is a potential source of variation. There are two major sources of variation: biological variation and technical variation. This study presents a variance-components approach to investigating animal-to-animal, between-array, within-array and day-to-day variations for two data sets. The first data set involved estimation of technical variances for pooled control and pooled treated RNA samples. The variance components included between-array, and two nested within-array variances: between-section (the upper- and lower-sections of the array are replicates) and within-section (two adjacent spots of the same gene are printed within each section). The second experiment was conducted on four different weeks. Each week there were reference and test samples with a dye-flip replicate in two hybridization days. The variance components included week-to-week, animal-to-animal and between-array and within-array variances. RESULTS: We applied the linear mixed-effects model to quantify different sources of variation. In the first data set, we found that the between-array variance is greater than the between-section variance, which, in turn, is greater than the within-section variance. In the second data set, for the reference samples, the week-to-week variance is larger than the between-array variance, which, in turn, is slightly larger than the within-array variance. For the test samples, the week-to-week variance has the largest variation. The animal-to-animal variance is slightly larger than the between-array and within-array variances. However, in a gene-by-gene analysis, the animal-to-animal variance is smaller than the between-array variance in four out of five housekeeping genes. In summary, the largest variation observed is the week-to-week effect. Another important source of variability is the animal-to-animal variation. Finally, we describe the use of variance-component estimates to determine optimal numbers of animals, arrays per animal and sections per array in planning microarray experiments.  相似文献   

20.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号