首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
杨敏  张静 《生物信息学》2014,12(1):65-71
转录调控是基因表达调控的主要过程,而转录调控模体使用的差异性可能是导致基因组织特异性的因素之一.本文提出一种不同组织基因调控差异性的统计分析方法,首先结合泊松分布和主成分分析提取基因启动子中过表达模体作为潜在的转录因子结合位点.基于这些位点通过Wilcoxon秩和检验获得不同组织基因结构的差异性.再用超几何分布确定出现次数显著的模体作为组织基因的特有模体,并分析特有模体的碱基特征及在启动子序列中的位置分布.将特有模体与TRANSFAC数据库进行对照,得到潜在的调控组织特异性基因的转录因子结合位点.以人管家基因及30个组织特异性基因为分析对象,得到不同组织调控模体使用的差异性信息.  相似文献   

2.
酵母核糖体蛋白基因组合转录调控位点统计分析   总被引:1,自引:1,他引:0  
田瑞琴  张静  胡俊 《生物信息学》2010,8(2):127-133
真核基因的转录调控是后基因组时代研究的主要问题之一,其基础是认识DNA上转录因子结合位点(模体)及分布状况。基于马尔可夫链模型对酵母核糖体蛋白基因上游启动子序列中模体出现次数进行统计,利用Z-score统计量抽提出过表达和低表达的模体,其中95%的模体与实验得到的转录因子结合位点相符合。然后将抽提出的模体两两配对,通过与背景序列比较,找出酵母核糖体蛋白基因中出现概率及距离分布均具有统计显著性的模体对,这些非随机出现的模体对具有潜在的组合转录调控功能,其中一些模体对的组合调控作用已有实验支持。对提取出的模体对在序列中的位置分布进行分析,发现近94%的模体对位于转录起始位点上游,超过半数的模体对两模体之间的最短距离在0~100bp之间,距离小于30bp的模体对接近30%,这样的短距离间隔有利于两模体的相同作用。这些结果将有助于对酵母核糖体蛋白基因转录调控机制的深入认识。  相似文献   

3.
周荣阁  张静 《生物信息学》2011,9(2):120-124,130
识别真核基因的转录因子结合位点(或称模体)是后基因组时代的一项主要工作,对共表达或共调控的基因同时进行分析可以提高模体识别的准确性.本文基于2×2列联表的对数线性模型,以模体出现的基因条数计数,对酵母核糖体蛋白(RP)基因普遍使用的转录调控模体进行分析,然后用U-检验进一步筛选出相对于背景序列来说过表达的模体.这些模体为酵母RP基因潜在的转录调控元件,与实验获得的转录因子结合位点的符合率达90%.本方法的优点在于用严格的统计标准在一组基因启动子中搜索普遍使用的模体,克服了以往分析中对模体使用普遍性的模糊判断.本文的方法也可以有效地搜索共表达基因族的组合调控模体对.研究中还发现一个现象:2×2列联表中反映属性相关程度的Pearson相关系数与对数线性模型的交互效应之间存在着明显的相关性.这一结果提示,可以用对数线性模型的交互效应来评价两属性的关联情况.  相似文献   

4.
揭示真核生物转录调控机制是生物信息学的一项重要研究内容。转录调控的一个重要特征是基因受多个转录因子的组合调控。在用系统生物学和数学建模的方法识别组合调控中的转录因子结合位点时,过表达模体对的距离检验是其关键步骤之一。本文对组合调控模体对距离检验的三种方法进行了综述,同时给出了三种检验的数学模型和具体检验方法。文章为研究基因的组合调控和探测潜在的过表达模体对提供理论支持。  相似文献   

5.
研究表明,第一内含子可能参与基因转录调控.利用统计方法提取人管家基因上游至第一内含子序列中潜在的组合转录调控模体,分析模体间的距离、区域分布等特征,探讨内含子参与基因转录调控的可能性及其参与方式.在管家基因中共获得960对潜在转录调控模体对,其中57%与实验已知的具有转录相互作用的因子对吻合,共涉及12组因子对.分析发现,绝大多数模体对(80%)偏向于上游区域及"上游-内含子"区域,进一步支持了内含子参与基因转录调控的假设,并据此推测内含子与上游序列之间具有转录协同作用,模体在基因转录起始位点(TSS)附近较为集中,模体对的两个模体之间距离较近,60%左右距离在200 bp以内,特别地,65%的模体对特征距离在100 bp以内,短距离间隔有利于转录因子间的协同作用.这些结果将有助于对人基因转录调控机制及内含子功能的深入认识.  相似文献   

6.
数理统计方法是分析统计资料、处理实验数据、从偶然现象中揭示必然规律的科学方法,由于统计处理时数据多、计算繁、容易出错,目前已广泛采用电子计算机来解决数理统计领域的各类问题。 中国科学院遗传研究所技术室最近推出的ST-1软件就是为提供常用数理统计方法的程序而设计的,这些程序涉及了数据整理,常用分布函数的数值计算,参数的区间估计,统计检验(u 检验、t 检验、x~2 检验、F检验)、方差分析、回归分析等数理统计方法的主要内  相似文献   

7.
亲体量和环境对东海小黄鱼补充成功率的影响   总被引:4,自引:0,他引:4  
补充成功率通常可用多个假说机制进行解释,模型选择方法通过选择最优模型而支持某种特定假说.然而,由于忽略模型不确定性,将单一模型结果应用到衰退种类的资源管理或许并不是行之有效的方案.本研究利用1992—2012年东海区海洋渔业统计、渔业资源监测和渔业资源同步调查获得的小黄鱼亲体量丰度、补充量丰度资料,以及同年东海北部5—8月海表温度(SST)、经向风应力(MWS)、纬向风应力(ZWS)、海平面气压(SSP)和长江径流量(RCR)等水文环境数据,采用AIC、最大校正R2和变量显著性3种独立的模型选择方法对竞争模型进行优化,根据模型选择结果探寻影响小黄鱼补充成功率的显著因素.同时,采用贝叶斯模型平均(BMA)方法,在模型不确定性假设背景下对多种变量进行了概率集成.选取平均绝对误差、均方预测误差和连续排序概率评分3种概率检验方法评估贝叶斯模型平均方法和标准模型选择方法的预报系统的整体性能.结果表明:3种模型选择方法获得的模型形式并不一致,AIC选择的预测变量有亲体量和经向风应力,变量显著性方法为亲体量,最大校正R2为亲体量、经向风应力和长江径流量.亲体量与补充成功率为显著负线性关系(P<0.01),表明种群可能通过自相蚕食、饵料竞争等过度补偿效应控制补充成功率;经向风应力强度和长江径流分别对补充成功率有近似显著的正效应(P=0.06)和负效应影响(P=0.07).在平均绝对误差和连续排序概率评分分析指标中,贝叶斯模型平均方法均最小,变量显著性方法最大,最大校正R2模型在均方预测误差中估计精度最高.基于贝叶斯模型平均的亲体-补充量集成预报不仅可以提供精度较高的预报均值,而且可以通过概率分布定量评价模型预报的不确定性.  相似文献   

8.
《遗传》1985,7(5):22-22
数理绕计方法是分析统计资料、处理实验数据、从 偶然现象中揭示必然规律的科学方法,由于统计处理 时数据多、计算繁、容易出错,目前巳广泛采用电子计 算机来解决数理统计领域的各类问题。 中国科学院遗传研究所技术室最近推出的ST-1 软件就是为提供常用数理统计方法的程序而设计的, 这些程序涉及了数据整理,常用分布函数的数值计算, 参数的区间估计,统计检验(“检验、,检验、x=检验、r 检验)、方差分析、回归分析等数理绕计方法的主要内 容,共计31个,全部采用BASIC语言编写。使用时只需 从软盘上调出所用程序,按照有关提示输人相应的数 据,即可得到处理结果。  相似文献   

9.
汤在祥  王学枫  吴雯雯  徐辰武 《遗传》2006,28(9):1117-1122
贝叶斯学派是不同于经典数理统计的一个重要学派, 其发展的贝叶斯统计方法在现代科学的许多领域已有着广泛的应用。探讨了贝叶斯统计在遗传连锁分析中的应用, 包括遗传重组率的贝叶斯估计、遗传连锁的贝叶斯因子检验和基于马尔可夫链蒙特卡罗理论的遗传连锁图谱构建。用编制的SAS/IML程序进行了模拟研究和实例分析, 验证了贝叶斯方法在遗传连锁分析中的有效性和实用性。  相似文献   

10.
随着标记信息可以被越来越多的应用在家畜育种中,许多基因组选择(GS)方法使得育种工作者可以利用家畜早期的基因型数据提前对其进行选择。结合系谱、表型和基因型数据,我们可以缩短家畜的世代间隔,提高家畜遗传价值估计的准确性,进而加速其遗传改良速度。近年来,和广泛使用的多步基因组选择策略相比,业界更推崇基于在系谱关系矩阵中增加基因组信息的单步遗传评估方法。即使通常的基因组选择方法依然是多步方法,如GBLUP法,但是基于单步基因组模型进行的基因组评估能提供更为准确的结果。本研究的目的是引入单步贝叶斯方法,此方法可以用贝叶斯回归模型直接计算单核苷酸多态性(SNP)的效应,同时我们使用模拟方法评估模型的性能。研究结果显示:QTL数目对单步贝叶斯方法的准确性无影响,但其准确性受遗传力的影响。同时,其准确性随着测序个体数的增加而增加。我们也讨论了与使用单步贝叶斯方法相关的问题,并详细描述了一些与之有关的统计理论和算法问题。  相似文献   

11.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

12.
In this work we present a web-based tool for estimating multiple alignment quality using Bayesian hypothesis testing. The proposed method is very simple, easily implemented and not time consuming with a linear complexity. We evaluated method against a series of different alignments (a set of random and biologically derived alignments) and compared the results with tools based on classical statistical methods (such as sFFT and csFFT). Taking correlation coefficient as an objective criterion of the true quality, we found that Bayesian hypothesis testing performed better on average than the classical methods we tested. This approach may be used independently or as a component of any tool in computational biology which is based on the statistical estimation of alignment quality. AVAILABILITY: http://www.fmi.ch/groups/functional.genomics/tool.htm. SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.fmi.ch/groups/functional.genomics/tool-Supp.htm.  相似文献   

13.
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.  相似文献   

14.
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right- and interval-censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right- and interval-censored data.  相似文献   

15.
We present a Bayesian approach to analyze matched "case-control" data with multiple disease states. The probability of disease development is described by a multinomial logistic regression model. The exposure distribution depends on the disease state and could vary across strata. In such a model, the number of stratum effect parameters grows in direct proportion to the sample size leading to inconsistent MLEs for the parameters of interest even when one uses a retrospective conditional likelihood. We adopt a semiparametric Bayesian framework instead, assuming a Dirichlet process prior with a mixing normal distribution on the distribution of the stratum effects. We also account for possible missingness in the exposure variable in our model. The actual estimation is carried out through a Markov chain Monte Carlo numerical integration scheme. The proposed methodology is illustrated through simulation and an example of a matched study on low birth weight of newborns (Hosmer, D. A. and Lemeshow, S., 2000, Applied Logistic Regression) with two possible disease groups matched with a control group.  相似文献   

16.
A Model for Analysis of Population Structure   总被引:5,自引:3,他引:2       下载免费PDF全文
Arguments have been presented for the appropriateness of a multinomial Dirichlet distribution for describing single-locus genotypic frequencies in a subdivided population. This distribution is defined as a function of allele frequency, the average (over the entire population) inbreeding coefficient and the correlation between genotypes within a subdivision. Alternative parameterizations and their genetic interpretations are given.-We then show how information from a sample drawn from this subdivided population, in the absence of pedigrees, can be combined with the multinomial Dirichlet model to form a likelihood function. This likelihood function is then used as the basis for estimation and testing hypotheses concerning the genetic parameters of the model. Comparisons of this approach to the alternative procedure of Cockerham (1969) and (1973) are made using human data obtained from Tecumseh, Michigan and Monte Carlo simulations.-Finally, implications of these results to statistical inference and to mutation rates are presented.  相似文献   

17.
We propose a Bayesian hypothesis testing procedure for comparing the distributions of paired samples. The procedure is based on a flexible model for the joint distribution of both samples. The flexibility is given by a mixture of Dirichlet processes. Our proposal uses a spike-slab prior specification for the base measure of the Dirichlet process and a particular parametrization for the kernel of the mixture in order to facilitate comparisons and posterior inference. The joint model allows us to derive the marginal distributions and test whether they differ or not. The procedure exploits the correlation between samples, relaxes the parametric assumptions, and detects possible differences throughout the entire distributions. A Monte Carlo simulation study comparing the performance of this strategy to other traditional alternatives is provided. Finally, we apply the proposed approach to spirometry data collected in the United States to investigate changes in pulmonary function in children and adolescents in response to air polluting factors.  相似文献   

18.
We modified the phylogenetic program MrBayes 3.1.2 to incorporate the compound Dirichlet priors for branch lengths proposed recently by Rannala, Zhu, and Yang (2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.) as a solution to the problem of branch-length overestimation in Bayesian phylogenetic inference. The compound Dirichlet prior specifies a fairly diffuse prior on the tree length (the sum of branch lengths) and uses a Dirichlet distribution to partition the tree length into branch lengths. Six problematic data sets originally analyzed by Brown, Hedtke, Lemmon, and Lemmon (2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:145-161) are reanalyzed using the modified version of MrBayes to investigate properties of Bayesian branch-length estimation using the new priors. While the default exponential priors for branch lengths produced extremely long trees, the compound Dirichlet priors produced posterior estimates that are much closer to the maximum likelihood estimates. Furthermore, the posterior tree lengths were quite robust to changes in the parameter values in the compound Dirichlet priors, for example, when the prior mean of tree length changed over several orders of magnitude. Our results suggest that the compound Dirichlet priors may be useful for correcting branch-length overestimation in phylogenetic analyses of empirical data sets.  相似文献   

19.
In this paper we analyze the fraction of non-disjunction in Meiosis I assuming reference (non-informative) priors. We consider Jeffreys's approach to built a non-informative prior (Jeffreys's prior) for the fraction of non-disjunction in Meiosis I. We prove that Jeffreys's prior is a proper distribution. We perform Monte Carlo studies in order to compare Bayes estimates obtained assuming Jeffreys's and uniform priors. We consider full Bayesian significance test (FBST) and Bayes factor (BF) for testing precise hypothesis on the fraction of non-disjunction in Meiosis I. The ultimate goal of this paper is to compare these two test procedures through simulation studies using both prior specifications. An application to Down Syndrome data is also presented.  相似文献   

20.
The distribution found by compounding the multinomial distribution with the Dirichlet distribution has been suggested as a basis for the estimation of parameters in subdivided populations, in particular of the "correlation between genotypes" within subpopulations. It is shown that the estimators deriving from these procedures perform poorly when the data are generated by the classical Wright drift model of subdivided populations. This conclusion suggests that the compound distribution estimation approach does not provide a good estimation procedure for real populations which are reasonably described by the Wright model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号