期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using weighted permutation scores to detect differential gene expression with microarray data

Guo X Pan W 《Journal of bioinformatics and computational biology》2005,3(4):989-1006

A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM. 相似文献

2.

Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments 总被引：3，自引：0，他引：3

Zhao Y Pan W 《Bioinformatics (Oxford, England)》2003,19(9):1046-1054

MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM. 相似文献

3.

A mixture model approach to detecting differentially expressed genes with microarray data 总被引：4，自引：0，他引：4

Pan W Lin J Le CT 《Functional & integrative genomics》2003,3(3):117-124

An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection. 相似文献

4.

The <Emphasis Type="Italic">t</Emphasis>-mixture model approach for detecting differentially expressed genes in microarrays

Jiao S Zhang S 《Functional & integrative genomics》2008,8(3):181-186

The finite mixture model approach has attracted much attention in analyzing microarray data due to its robustness to the excessive variability which is common in the microarray data. Pan (2003) proposed to use the normal mixture model method (MMM) to estimate the distribution of a test statistic and its null distribution. However, considering the fact that the test statistic is often of t-type, our studies find that the rejection region from MMM is often significantly larger than the correct rejection region, resulting an inflated type I error. This motivates us to propose the t-mixture model (TMM) approach. In this paper, we demonstrate that TMM provides significantly more accurate control of the probability of making type I errors (hence of the familywise error rate) than MMM. Finally, TMM is applied to the well-known leukemia data of Golub et al. (1999). The results are compared with those obtained from MMM. 相似文献

5.

A Nonparametric Likelihood Ratio Test to Identify Differentially Expressed Genes from Microarray Data

Bokka S Mathur SK 《Applied bioinformatics》2006,5(4):267-276

Microarray experiments contribute significantly to the progress in disease treatment by enabling a precise and early diagnosis. One of the major objectives of microarray experiments is to identify differentially expressed genes under various conditions. The statistical methods currently used to analyse microarray data are inadequate, mainly due to the lack of understanding of the distribution of microarray data. We present a nonparametric likelihood ratio (NPLR) test to identify differentially expressed genes using microarray data. The NPLR test is highly robust against extreme values and does not assume the distribution of the parent population. Simulation studies show that the NPLR test is more powerful than some of the commonly used methods, such as the two-sample t-test, the Mann-Whitney U-test and significance analysis of microarrays (SAM). When applied to microarray data, we found that the NPLR test identifies more differentially expressed genes than its competitors. The asymptotic distribution of the NPLR test statistic and the p-value function is presented. The application of the NPLR method is shown, using both synthetic and real-life data. The biological significance of some of the genes detected only by the NPLR method is discussed. 相似文献

6.

A Nonparametric Approach for Mapping Quantitative Trait Loci 总被引：23，自引：3，他引：20

下载免费PDF全文

L. Kruglyak E. S. Lander 《Genetics》1995,139(3):1421-1428

Genetic mapping of quantitative trait loci (QTLs) is performed typically by using a parametric approach, based on the assumption that the phenotype follows a normal distribution. Many traits of interest, however, are not normally distributed. In this paper, we present a nonparametric approach to QTL mapping applicable to any phenotypic distribution. The method is based on a statistic Z(w), which generalizes the nonparametric Wilcoxon rank-sum test to the situation of whole-genome search by interval mapping. We determine the appropriate significance level for the statistic Z(w), by showing that its asymptotic null distribution follows an Ornstein-Uhlenbeck process. These results provide a robust, distribution-free method for mapping QTLs. 相似文献

7.

Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments

Gao X 《Bioinformatics (Oxford, England)》2006,22(12):1486-1494

MOTIVATION: The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. RESULTS: In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better approximated tail probabilities. 相似文献

8.

Bootstrap Restricted Likelihood Ratio Test for the Detection of Rare Variants

Ping Zeng Ting Wang 《Current Genomics》2015,16(3):194-202

In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test. 相似文献

9.

A permutation-based multiple testing method for time-course microarray experiments

Insuk Sohn Kouros Owzar Stephen L George Sujong Kim Sin-Ho Jung 《BMC bioinformatics》2009,10(1):336

Background

Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation. 相似文献

10.

A novel Mixture Model Method for identification of differentially expressed genes from DNA microarray data

Kayvan?Najarian Email author Maryam?Zaheri Ali?A Rad Siamak?Najarian Javad?Dargahi 《BMC bioinformatics》2004,5(1):201

Background

The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. Mixture model method (MMM hereafter) is a nonparametric statistical method often used for microarray processing applications, but is known to over-fit the data if the number of replicates is small. In addition, the results of the MMM may not be repeatable when dealing with a small number of replicates. In this paper, we propose a new version of MMM to ensure the repeatability of the results in different runs, and reduce the sensitivity of the results on the parameters. 相似文献

11.

The Baumgartner-Weiss-Schindler test for the detection of differentially expressed genes in replicated microarray experiments

Neuhäuser M Senske R 《Bioinformatics (Oxford, England)》2004,20(18):3553-3564

MOTIVATION: An important application of microarray experiments is to identify differentially expressed genes. Because microarray data are often not distributed according to a normal distribution nonparametric methods were suggested for their statistical analysis. Here, the Baumgartner-Weiss-Schindler test, a novel and powerful test based on ranks, is investigated and compared with the parametric t-test as well as with two other nonparametric tests (Wilcoxon rank sum test, Fisher-Pitman permutation test) recently recommended for the analysis of gene expression data. RESULTS: Simulation studies show that an exact permutation test based on the Baumgartner-Weiss-Schindler statistic B is preferable to the other three tests. It is less conservative than the Wilcoxon test and more powerful, in particular in case of asymmetric or heavily tailed distributions. When the underlying distribution is symmetric the differences in power between the tests are relatively small. Thus, the Baumgartner-Weiss-Schindler is recommended for the usual situation that the underlying distribution is a priori unknown. AVAILABILITY: SAS code available on request from the authors. 相似文献

12.

基因芯片筛选差异表达基因方法比较 总被引：1，自引：0，他引：1

单文娟童春发施季森《遗传》2008,30(12):1640-1646

摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。相似文献

13.

Estimating the null distribution to adjust observed confidence levels for genome-scale screening

Bickel DR 《Biometrics》2011,67(2):363-370

In a novel approach to the multiple testing problem, Efron (2004, Journal of the American Statistical Association 99, 96-104; 2007a Journal of the American Statistical Association 102, 93-103; 2007b, Annals of Statistics 35, 1351-1377) formulated estimators of the distribution of test statistics or nominal p-values under a null distribution suitable for modeling the data of thousands of unaffected genes, nonassociated single-nucleotide polymorphisms, or other biological features. Estimators of the null distribution can improve not only the empirical Bayes procedure for which it was originally intended, but also many other multiple-comparison procedures. Such estimators in some cases improve the proposed multiple-comparison procedure (MCP) based on a recent non-Bayesian framework of minimizing expected loss with respect to a confidence posterior, a probability distribution of confidence levels. The flexibility of that MCP is illustrated with a nonadditive loss function designed for genomic screening rather than for validation. The merit of estimating the null distribution is examined from the vantage point of the confidence-posterior MCP (CPMCP). In a generic simulation study of genome-scale multiple testing, conditioning the observed confidence level on the estimated null distribution as an approximate ancillary statistic markedly improved conditional inference. Specifically simulating gene expression data, however, indicates that estimation of the null distribution tends to exacerbate the conservative bias that results from modeling heavy-tailed data distributions with the normal family. To enable researchers to determine whether to rely on a particular estimated null distribution for inference or decision making, an information-theoretic score is provided. As the sum of the degree of ancillarity and the degree of inferential relevance, the score reflects the balance conditioning would strike between the two conflicting terms. The CPMCP and other methods introduced are applied to gene expression microarray data. 相似文献

14.

A nonparametric procedure for the two-factor mixed model with missing data

Gao X 《Biometrical journal. Biometrische Zeitschrift》2007,49(5):774-788

We develop a nonparametric imputation technique to test for the treatment effects in a nonparametric two-factor mixed model with incomplete data. Within each block, an arbitrary covariance structure of the repeated measurements is assumed without the explicit parametrization of the joint multivariate distribution. The number of repeated measurements is uniformly bounded whereas the number of blocks tends to infinity. The essential idea of the nonparametric imputation is to replace the unknown indicator functions of pairwise comparisons by the corresponding empirical distribution functions. The proposed nonparametric imputation method holds valid under the missing completely at random (MCAR) mechanism. We apply the nonparametric imputation on Brunner and Dette's method for the nonparametric two-factor mixed model and this extension results in a weighted partial rank transform statistic. Asymptotic relative efficiency of the nonparametric imputation method with the complete data versus the incomplete data is derived to quantify the efficiency loss due to the missing data. Monte Carlo simulation studies are conducted to demonstrate the validity and power of the proposed method in comparison with other existing methods. A migraine severity score data set is analyzed to demonstrate the application of the proposed method in the analysis of missing data. 相似文献

15.

Sample size calculation for multiple testing in microarray data analysis

Jung SH Bang H Young S 《Biostatistics (Oxford, England)》2005,6(1):157-169

Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely. 相似文献

16.

Testing equality of survival functions of quality-adjusted lifetime

Zhao H Tsiatis AA 《Biometrics》2001,57(3):861-867

We present a method for comparing the survival functions of quality-adjusted lifetime from two treatments. This test statistic becomes the ordinary log-rank test when quality-adjusted lifetime is the same as the survival time. Simulation experiments are conducted to examine the behavior of our proposed test statistic under both null and alternative hypotheses. In addition, we apply our method to a breast cancer trial for comparing the distribution of quality-adjusted lifetime between two treatment regimes. 相似文献

17.

Assessing the adequacy of variance function in heteroscedastic regression models

Wang L Zhou XH 《Biometrics》2007,63(4):1218-1225

Heteroscedastic data arise in many applications. In heteroscedastic regression analysis, the variance is often modeled as a parametric function of the covariates or the regression mean. We propose a kernel-smoothing type nonparametric test for checking the adequacy of a given parametric variance structure. The test does not need to specify a parametric distribution for the random errors. It is shown that the test statistic has an asymptotical normal distribution under the null hypothesis and is powerful against a large class of alternatives. We suggest a simple bootstrap algorithm to approximate the distribution of the test statistic in finite sample size. Numerical simulations demonstrate the satisfactory performance of the proposed test. We also illustrate the application by the analysis of a radioimmunoassay data set. 相似文献

18.

A Scan Statistic for Binary Outcome Based on Hypergeometric Probability Model,with an Application to Detecting Spatial Clusters of Japanese Encephalitis

Xing Zhao Xiao-Hua Zhou Zijian Feng Pengfei Guo Hongyan He Tao Zhang Lei Duan Xiaosong Li 《PloS one》2013,8(6)

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff’s methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff’s statistics for clusters of high population density or large size; otherwise Kulldorff’s statistics are superior. 相似文献

19.

Goodness-of-fit tests in proportional hazards models with random effects

Wenceslao González-Manteiga María Dolores Martínez-Miranda Ingrid Van Keilegom 《Biometrical journal. Biometrische Zeitschrift》2023,65(1):2000353

This paper deals with testing the functional form of the covariate effects in a Cox proportional hazards model with random effects. We assume that the responses are clustered and incomplete due to right censoring. The estimation of the model under the null (parametric covariate effect) and the alternative (nonparametric effect) is performed using the full marginal likelihood. Under the alternative, the nonparametric covariate effects are estimated using orthogonal expansions. The test statistic is the likelihood ratio statistic, and its distribution is approximated using a bootstrap method. The performance of the proposed testing procedure is studied through simulations. The method is also applied on two real data sets one from biomedical research and one from veterinary medicine. 相似文献

20.

FiSSE: A simple nonparametric test for the effects of a binary character on lineage diversification rates

下载免费PDF全文

Daniel L. Rabosky Emma E. Goldberg 《Evolution; international journal of organic evolution》2017,71(6):1432-1442

It is widely assumed that phenotypic traits can influence rates of speciation and extinction, and several statistical approaches have been used to test for correlations between character states and lineage diversification. Recent work suggests that model‐based tests of state‐dependent speciation and extinction are sensitive to model inadequacy and phylogenetic pseudoreplication. We describe a simple nonparametric statistical test (“FiSSE”) to assess the effects of a binary character on lineage diversification rates. The method involves computing a test statistic that compares the distributions of branch lengths for lineages with and without a character state of interest. The value of the test statistic is compared to a null distribution generated by simulating character histories on the observed phylogeny. Our tests show that FiSSE can reliably infer trait‐dependent speciation on phylogenies of several hundred tips. The method has low power to detect trait‐dependent extinction but can infer state‐dependent differences in speciation even when net diversification rates are constant. We assemble a range of macroevolutionary scenarios that are problematic for likelihood‐based methods, and we find that FiSSE does not show similarly elevated false positive rates. We suggest that nonparametric statistical approaches, such as FiSSE, provide an important complement to formal process‐based models for trait‐dependent diversification. 相似文献