首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 484 毫秒
1.
生物多样性和均匀度显著性的随机化检验及计算软件   总被引:6,自引:0,他引:6  
多样性指数和均匀度以其简单易用而被广泛应用于群落生物学和生物多样性等研究中,然而由于缺乏合适的统计检验方法等原因,其分析的可信性往往较低,因而限制了其应用。鉴于生物多样性研究中广泛应用主观和直接的比较不,有必要建立和使用较为严格的多样性统计检验。本研究建立和应用了如下随机化检验方法:单群落多样性指数和均匀度的显著性检验,单群落多样性指数和均匀度的置认区间,群落间多群样和均匀度的差异显著性检验。随机化方法已被成功地应用于群落生态学研究,其原理是:随机排序某一向量中的元素,或随机交换两向量中的对应元素。计算该随机化数据的多样性和均匀度,重复该过程多次,统计和计算显著性检验的p值,由向量中的对应元素。计算该随机化数据的多样性和均匀度,重复该过程多次,统计和显著性检验的p值。由此可确定多样性和差异的统计显著性。同时,研制了相应的Internet计算软件BiodiverisytTest。该软件由7个Java类和1个HTML文件组成,可运行于多种操作系统和网络浏览器上,可读取多种类型的ODBC数据库文件如Access,Excel,FoxPro,Dbase等。该软件中包括Shannon-Wiener多样性指数,Simpson多样性指数,McIntosh多样性指数,Berger-Parker多样性指数,Hrlbert多样性指数以及Brillouin多样性指数。基于Shannon-Wiener多样性指数和Berger-Parker多样性指数,用BiodiversityTest软件对水稻田节肢动物群落多样性(15个地点,17个功能群,125个节肢动物种)进行了比较和分析。结果显示,两组结果可较好地反映水稻节肢动物群落多样性的差异显著性,这些检验方法可有效地反映多样性指数和均匀度的变化。与水稻田节肢动物群落间多样性的直接比较法相比,该随机化检验方法获得更客观的结果。本算法与软件有助于改进生物多样性研究中使用的某些不甚严格的分析方法,为随机化检验方法在生物多样性研究中的进一步应用提供了一种可用的工具。  相似文献   

2.
在代谢工程和系统生物学领域, 计算机模拟比以往更为有效的应用于生物过程的分析和优化。胞内代谢通量可以用代谢通量分析和基元模式分析来估算。由于测定数据的不足和误差, 以及基元途径的冗余, 经常很难得到准确的代谢通量分布数据。本研究提出一种基于最大熵原理的算法来计算基元模式系数。欠定和不确定条件下, 通过胞外代谢通量数据估算胞内代谢通量分布。为了检验算法的可行性, 对杂交瘤细胞、枯草芽孢杆菌和大肠杆菌的胞内代谢通量分布做了估算。本研究提出的基于最大熵原理的优化算法避免了对细胞状态的生理学假设。与其他目标函数相比, 可以更为可靠和可行的估算胞内代谢通量分布。  相似文献   

3.
基于基因表达变异性的通路富集方法研究   总被引:1,自引:0,他引:1  
当前的通路富集方法主要是基于基因的表达差异,很少有方法从通路变异性(方差)角度对其富集分析.我们注意到用合适的统计量描述通路的变异性时,在疾病表型下一些通路的变异性有明显的上升或者下降.因此本研究假设:通路变异性程度在不同表型中存在差异.本文设计了14种描述通路变异性的统计量与检验方法,检测不同表型下变异性有差异的通路即富集通路,并将富集结果与文献检索结果进行比较,同时,分析不同芯片预处理方法对数据和结果的影响.研究结果表明:5种预处理方法中,多阵列对数健壮算法(RMA)是数据预处理的最优方法;不同表型下通路的变异性程度存在差异;根据文献检索的通路结果,14种基于变异性的通路富集方法中,以通路中各基因欧氏距离的方差做统计量进行permutation检验(方法11)能有效识别显著通路,其富集结果优于基因集富集分析(GSEA).综上所述,基于通路变异性的通路富集策略具有可行性,不仅对通路富集分析有一定的理论指导意义,而且为人类疾病研究提供新的视角.  相似文献   

4.
针对生物信息学中序列模体的显著性检验问题,提出了一种基于极大似然准则的贝叶斯假设检验方法.将模体的显著性检验转化为多项分布的拟合优度检验问题,选取Dirichlet分布作为多项分布的先验分布并采用Newton-Raphson算法估计Dirichlet分布的超参数,使得数据的预测分布达到最大.应用贝叶斯定理得到贝叶斯因子进行模型选择,用于评价模体检验的统计显著性,这种方法克服了传统多项分布检验中构造检验统计量并计算其在零假设下确切分布的困难.选择JASPAR数据库中107个转录因子结合位点和100组随机模拟数据进行实验,采用皮尔逊积矩相关系数作为评价检验质量的一个标准,发现实验结果好于传统的模体检验的一些方法.  相似文献   

5.
从非同源蛋白质的一级序列预测其结构类   总被引:8,自引:1,他引:7  
对基于氨基酸组成、自相关函数和自协方差函数提取特征的蛋白质结构类预测算法进行分析比较,对氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协放差函数相结合的方法的预测算法进行了研究。结果表明:对非同源蛋白质,因氨基酸和自相关函数相结合的方法中,采用Miyazawa和Jernigan的疏水值时,训练的自检验的总精度为95.34%,其Jackknife检验的总精度为81.92%,检验加的他检验的总精工为86.61%。在氨基酸组成和自协方差函数相结合的方法中,采用Wold等的疏水值时,训练库的自检验的总精度为96.71%,其Jackknife检验的总精度为82.18%,检验加的他检验的总精工为86.88%。这说明氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协方差函数相结合的方法可有效提高结构类预测精度,表明提取更多有效的序列信息是提高分类精度的关键。  相似文献   

6.
本研究采用聚丙烯酰胺凝胶电泳法,分析了樱亚科8个种共17个生物型枝皮、叶片的过氧化物同工酶(POD),初步确定了各样品枝皮、叶片的POD同工酶酶谱特征.根据酶谱相似系数对样品进行了Fuzzy聚类分析.通过样品酶谱积值的F检验,选定了樱亚科植物分属的最佳阈值,为同工酶谱聚类效果研究提供新方法.  相似文献   

7.
在蛋白质组学中,进行液相质谱(LC-MS)实验谱数据处理,发现并分析生物标志物的复杂肽或蛋白质样本的差异是重点,而校准相同样本的多次重复实验中肽链产生的洗脱时间峰信号(LC峰)是进行量化、分析差异的关键。目前多个重复实验数据的校准通常是在重复的实验数据集中根据液相二级质谱(LC-MS/MS)实验标识LC峰的时间特征,然后使用翘曲函数对时间特征进行对齐。由于多重数据的洗脱时间误差产生是随机的,统一使用翘曲函数校准会产生较大误差。为了解决这个问题,本研究重点研究了多个重复实验数据中LC峰的时间校准算法。我们选取了两个重复实验数据,采用机器学习的思路,通过选用两个数据的LC-MS/MS中重复检测到的肽链数据作为可信数据,部分选为训练序列,部分作为测试序列,建立统计数学模型,提出了一种新的校准算法,并采用测试序列对该统计模型进行准确率测试,表明算法的准确性达到95%以上;然后,将该模型应用在两个实验数据的所有LC-MS/MS肽链检测值上,提高检测值在多个数据中的覆盖率,表明覆盖率可以到达85%以上。  相似文献   

8.
目的:运用电子鼻、电子舌对不同品牌红糖风味进行检测区分,建立一种有效鉴别不同品牌、等级、产地红糖的方法。方法:对红糖感官评定结果进行等级一致性检验;运用电子舌自带软件对数据进行主成分分析,运用电子鼻自带软件对数据进行主成分分析、传感器区分度分析及线性判别分析;并利用DPS对电子舌数据进行方差分析及最小显著差异(LSD)法多重比较,对电子鼻数据进行方差分析;建立滋味雷达图观察不同红糖样品对传感器响应强度的差异性。结果:电子舌、电子鼻能够区分不同品牌、等级、产地的红糖,各红糖样品滋味差异主要是丰富性、咸味,香气成分差异主要是氮氧化合物、甲烷等短链烷烃、有机硫化物和无机硫化物;感官评定等级一致性检验结果也显示不同红糖样品的差异性显著。结论:电子舌、电子鼻技术可用于区分不同品牌、产地、等级的红糖,为红糖品质鉴别提供技术支持。  相似文献   

9.
目的:研究在基因芯片数据分析中自限性原假设和竞争性原假设两类方法的优劣性和准确型,选取各自具有代表性的GAGE(Generally Applicable Gene-set Enrichment)和GSEA(Gene Set Enrichment Analysis)两种基因集分析方法筛选富集基因集的效能,并探讨其筛选效果.方法:采用两种待比较的方法在实际基因表达谱数据中分析研究,比较筛选结果的准确性和科学性,探讨两种方法筛选富集基因集的效果.结果:两方法对已知的基因表达谱数据进行应用分析表明GAGE的检验效能和筛选出的基因集生物学相关性均优于GSEA.结论:GAGE作为一种自限性原假设的基因集分析方法,由于其充分利用了表达谱数据,并将表达数据分为实验集和通路集分别进行分析处理,同时考虑到基因集的上调和下调,其检验效能优于竞争性原假设的GSEA,能够得到更为准确和科学的结果.  相似文献   

10.
基因芯片筛选差异表达基因方法比较   总被引:1,自引:0,他引:1  
单文娟  童春发  施季森 《遗传》2008,30(12):1640-1646
摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。  相似文献   

11.
12.
Abstract. Hypothesis testing in phytocoenological applications is likely to be hindered when based on conventional statistical methods. The problem created by unrealistic assumptions can, however, be overcome by randomization. This paper discusses the general idea of randomization testing, describes a method and interprets its application in group comparisons. Two sets of variables are involved, the vegetation set on the basis of which the groups are compared and the environmental factors which delimit the groups under different analytical designs. Although simple partitioning of sum of squares is at the core of the test, the method has versatility of testing uni- or multifactor designs, which is novel in phytocoenological applications. The algorithm has been implemented in programs SYNCSA and MULTIV by V.P. Data from the Campos of southern Brazil are used for illustration.  相似文献   

13.
Permutation test is a popular technique for testing a hypothesis of no effect, when the distribution of the test statistic is unknown. To test the equality of two means, a permutation test might use a test statistic which is the difference of the two sample means in the univariate case. In the multivariate case, it might use a test statistic which is the maximum of the univariate test statistics. A permutation test then estimates the null distribution of the test statistic by permuting the observations between the two samples. We will show that, for such tests, if the two distributions are not identical (as for example when they have unequal variances, correlations or skewness), then a permutation test for equality of means based on difference of sample means can have an inflated Type I error rate even when the means are equal. Our results illustrate permutation testing should be confined to testing for non-identical distributions. CONTACT: calian@raunvis.hi.is.  相似文献   

14.
Null models exploring species co-occurrence and trait-based limiting similarity are increasingly used to explore the influence of competition on community assembly; however, assessments of common models have not thoroughly explored the influence of variation in matrix size on error rates, in spite of the fact that studies have explored community matrices that vary considerably in size. To determine how smaller matrices, which are of greatest concern, perform statistically, we generated biologically realistic presence-absence matrices ranging in size from 3–50 species and sites, as well as associated trait matrices. We examined co-occurrence tests using the C-Score statistic and independent swap algorithm. For trait-based limiting similarity null models, we used the mean nearest neighbour trait distance (NN) and the standard deviation of nearest neighbour distances (SDNN) as test statistics, and considered two common randomization algorithms: abundance independent trait shuffling (AITS), and abundance weighted trait shuffling (AWTS). Matrices as small as three × three resulted in acceptable type I error rates (p < 0.05) for both the co-occurrence and trait-based limiting similarity null models when exclusive p-values were used. The commonly used inclusive p-value (≤ or ≥, as opposed to exclusive p-values; < or >) was associated with increased type I error rates, particularly for matrices with fewer than eight species. Type I error rates increased for limiting similarity tests using the AWTS randomization scheme when community matrices contained more than 35 sites; a similar randomization used in null models of phylogenetic dispersion has previously been viewed as robust. Notwithstanding other potential deficiencies related to the use of small matrices to represent communities, the application of both classes of null model should be restricted to matrices with 10 or more species to avoid the possibility of type II errors. Additionally, researchers should restrict the use of the AWTS randomization to matrices with fewer than 35 sites to avoid type I errors when testing for trait-based limiting similarity. The AITS randomization scheme performed better in terms of type I error rates, and therefore may be more appropriate when considering systems for which traits are not clustered by abundance.  相似文献   

15.
MOTIVATION: A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Recently several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. However, it may not be clear how these methods compare with each other. Our main goal here is to compare three methods, the t-test, a regression modeling approach (Thomas et al., Genome Res., 11, 1227-1236, 2001) and a mixture model approach (Pan et al., http://www.biostat.umn.edu/cgi-bin/rrs?print+2001,2001a,b) with particular attention to their different modeling assumptions. RESULTS: It is pointed out that all the three methods are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected. In particular, we give an explicit formula for the test statistic used in the regression approach. Using the leukemia data of Golub et al. (Science, 285, 531-537, 1999), we illustrate these points. We also briefly compare the results with those of several other methods, including the empirical Bayesian method of Efron et al. (J. Am. Stat. Assoc., to appear, 2001) and the Significance Analysis of Microarray (SAM) method of Tusher et al. (PROC: Natl Acad. Sci. USA, 98, 5116-5121, 2001).  相似文献   

16.
MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM.  相似文献   

17.
The affected-pedigree-member (APM) method of linkage analysis is a nonparametric statistic that tests for nonrandom cosegregation of a disease and marker loci. The APM statistic is based on the observation that if a marker locus is near a disease-susceptibility locus, then affected individuals within a family should be more similar at the marker locus than is expected by chance. The APM statistic measures marker similarity in terms of identity by state (IBS) of marker alleles; that is, two alleles are IBS if they are the same, regardless of their ancestral origin. Since the APM statistic measures increased marker similarity, it makes no assumptions concerning how the disease is inherited; this can be an advantage when dealing with complex diseases for which the mode of inheritance is difficult to determine. We investigate here the power of the APM statistic to detect linkage in the context of a genomewide search. In such a search, the APM statistic is evaluated at a grid of markers. Then regions with high APM statistics are investigated more thoroughly by typing more markers in the region. Using simulated data, we investigate various search strategies and recommend an optimal search strategy that maximizes the power to detect linkage while minimizing the false-positive rate and number of markers. We determine an optimal series of three increasing cut-points and an independent criterion for significance.  相似文献   

18.
A statistical goodness-of-fit test, based on representing the sample observations by linked vectors, is developed. The direction and the length of the linked vectors are defined as functions of the expected values of the order statistics and sample order statistics, respectively. The underlying method can be used to test distributional assumptions for any location-scale family. A test statistic Qn is introduced and some of its properties are studied. It is shown that the proposed test can be generalized to test if two or more independent samples come from the same distribution. The test procedure provides a graphical method of identifying the true distribution when the null hypothesis is rejected.  相似文献   

19.
We present a class of likelihood-based score statistics that accommodate genotypes of both unrelated individuals and families, thereby combining the advantages of case-control and family-based designs. The likelihood extends the one proposed by Schaid and colleagues (Schaid and Sommer 1993, 1994; Schaid 1996; Schaid and Li 1997) to arbitrary family structures with arbitrary patterns of missing data and to dense sets of multiple markers. The score statistic comprises two component test statistics. The first component statistic, the nonfounder statistic, evaluates disequilibrium in the transmission of marker alleles from parents to offspring. This statistic, when applied to nuclear families, generalizes the transmission/disequilibrium test to arbitrary numbers of affected and unaffected siblings, with or without typed parents. The second component statistic, the founder statistic, compares observed or inferred marker genotypes in the family founders with those of controls or those of some reference population. The founder statistic generalizes the statistics commonly used for case-control data. The strengths of the approach include both the ability to assess, by comparison of nonfounder and founder statistics, the potential bias resulting from population stratification and the ability to accommodate arbitrary family structures, thus eliminating the need for many different ad hoc tests. A limitation of the approach is the potential power loss and/or bias resulting from inappropriate assumptions on the distribution of founder genotypes. The systematic likelihood-based framework provided here should be useful in the evaluation of both the relative merits of case-control and various family-based designs and the relative merits of different tests applied to the same design. It should also be useful for genotype-disease association studies done with the use of a dense set of multiple markers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号