共查询到20条相似文献,搜索用时 0 毫秒
1.
Summary . We develop formulae to calculate sample sizes for ranking and selection of differentially expressed genes among different clinical subtypes or prognostic classes of disease in genome-wide screening studies with microarrays. The formulae aim to control the probability that a selected subset of genes with fixed size contains enough truly top-ranking informative genes, which can be assessed on the basis of the distribution of ordered statistics from independent genes. We provide strategies for conservative designs to cope with issues of unknown number of informative genes and unknown correlation structure across genes. Application of the formulae to a clinical study for multiple myeloma is given. 相似文献
2.
SUMMARY: Cluster randomized clinical trials (cluster-RCT), where the community entities serve as clusters, often yield data with three hierarchy levels. For example, interventions are randomly assigned to the clusters (level three unit). Health care professionals (level two unit) within the same cluster are trained with the randomly assigned intervention to provide care to subjects (level one unit). In this study, we derived a closed form power function and formulae for sample size determination required to detect an intervention effect on outcomes at the subject's level. In doing so, we used a test statistic based on maximum likelihood estimates from a mixed-effects linear regression model for three level data. A simulation study follows and verifies that theoretical power estimates based on the derived formulae are nearly identical to empirical estimates based on simulated data. Recommendations at the design stage of a cluster-RCT are discussed. 相似文献
3.
A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments 总被引:3,自引:0,他引:3
Motivation: The proliferation of public data repositories createsa need for meta-analysis methods to efficiently evaluate, integrateand validate related datasets produced by independent groups.A t-based approach has been proposed to integrate effect sizefrom multiple studies by modeling both intra- and between-studyvariation. Recently, a non-parametric rank productmethod, which is derived based on biological reasoning of fold-changecriteria, has been applied to directly combine multiple datasetsinto one meta study. Fisher's Inverse 2 method, which only dependson P-values from individual analyses of each dataset, has beenused in a couple of medical studies. While these methods addressthe question from different angles, it is not clear how theycompare with each other. Results: We comparatively evaluate the three methods; t-basedhierarchical modeling, rank products and Fisher's Inverse 2test with P-values from either the t-based or the rank productmethod. A simulation study shows that the rank product method,in general, has higher sensitivity and selectivity than thet-based method in both individual and meta-analysis, especiallyin the setting of small sample size and/or large between-studyvariation. Not surprisingly, Fisher's 2 method highly dependson the method used in the individual analysis. Application toreal datasets demonstrates that meta-analysis achieves morereliable identification than an individual analysis, and rankproducts are more robust in gene ranking, which leads to a muchhigher reproducibility among independent studies. Though t-basedmeta-analysis greatly improves over the individual analysis,it suffers from a potentially large amount of false positiveswhen P-values serve as threshold. We conclude that careful meta-analysisis a powerful tool for integrating multiple array studies. Contact: fxhong{at}jimmy.harvard.edu Supplementary information: Supplementary data are availableat Bioinformatics online.
Associate Editor: David Rocke
Present address: Department of Biostatistics and ComputationalBiology, Dana-Farber Cancer Institute, Harvard School of PublicHealth, 44 Binney Street, Boston, MA 02115, USA. 相似文献
4.
MOTIVATION: An important application of microarray experiments is to identify differentially expressed genes. Because microarray data are often not distributed according to a normal distribution nonparametric methods were suggested for their statistical analysis. Here, the Baumgartner-Weiss-Schindler test, a novel and powerful test based on ranks, is investigated and compared with the parametric t-test as well as with two other nonparametric tests (Wilcoxon rank sum test, Fisher-Pitman permutation test) recently recommended for the analysis of gene expression data. RESULTS: Simulation studies show that an exact permutation test based on the Baumgartner-Weiss-Schindler statistic B is preferable to the other three tests. It is less conservative than the Wilcoxon test and more powerful, in particular in case of asymmetric or heavily tailed distributions. When the underlying distribution is symmetric the differences in power between the tests are relatively small. Thus, the Baumgartner-Weiss-Schindler is recommended for the usual situation that the underlying distribution is a priori unknown. AVAILABILITY: SAS code available on request from the authors. 相似文献
5.
A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments 总被引:10,自引:0,他引:10
Pan W 《Bioinformatics (Oxford, England)》2002,18(4):546-554
MOTIVATION: A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Recently several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. However, it may not be clear how these methods compare with each other. Our main goal here is to compare three methods, the t-test, a regression modeling approach (Thomas et al., Genome Res., 11, 1227-1236, 2001) and a mixture model approach (Pan et al., http://www.biostat.umn.edu/cgi-bin/rrs?print+2001,2001a,b) with particular attention to their different modeling assumptions. RESULTS: It is pointed out that all the three methods are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected. In particular, we give an explicit formula for the test statistic used in the regression approach. Using the leukemia data of Golub et al. (Science, 285, 531-537, 1999), we illustrate these points. We also briefly compare the results with those of several other methods, including the empirical Bayesian method of Efron et al. (J. Am. Stat. Assoc., to appear, 2001) and the Significance Analysis of Microarray (SAM) method of Tusher et al. (PROC: Natl Acad. Sci. USA, 98, 5116-5121, 2001). 相似文献
6.
7.
Background
Most microarray experiments are carried out with the purpose of identifying genes whose expression varies in relation with specific conditions or in response to environmental stimuli. In such studies, genes showing similar mean expression values between two or more groups are considered as not differentially expressed, even if hidden subclasses with different expression values may exist. In this paper we propose a new method for identifying differentially expressed genes, based on the area between the ROC curve and the rising diagonal (ABCR). ABCR represents a more general approach than the standard area under the ROC curve (AUC), because it can identify both proper (i.e., concave) and not proper ROC curves (NPRC). In particular, NPRC may correspond to those genes that tend to escape standard selection methods. 相似文献8.
9.
In DNA microarray analysis, there is often interest in isolating a few genes that best discriminate between tissue types. This is especially important in cancer, where different clinicopathologic groups are known to vary in their outcomes and response to therapy. The identification of a small subset of gene expression patterns distinctive for tumor subtypes can help design treatment strategies and improve diagnosis. Toward this goal, we propose a methodology for the analysis of high-density oligonucleotide arrays. The gene expression measures are modeled as censored data to account for the quantification limits of the technology, and two gene selection criteria based on contrasts from an analysis of covariance (ANCOVA) model are presented. The model is formulated in a hierarchical Bayesian framework, which in addition to making the fit of the model straightforward and computationally efficient, allows us to borrow strength across genes. The elicitation of hierarchical priors, as well as issues related to parameter identifiability and posterior propriety, are discussed in detail. We examine the performance of our proposed method on simulated data, then present a detailed case study of an endometrial cancer dataset. 相似文献
10.
Weng Kee Wong Peter A. Lachenbruch Philip J. Clements 《Biometrical journal. Biometrische Zeitschrift》1996,38(7):767-777
We consider the problem of choosing the number of replicates and number of subjects in a components of variance problem which optimizes various criteria. The case study here involves patients suffering from systemic sclerosis (Scleroderma), a form of rheumatic disease that is potentially disabling. Under the physical constraints imposed on the study, we find that using 2 or 3 replicates with as many patients as possible is the optimal strategy for several criteria. 相似文献
11.
Proanthocyanidins are dimeric or polymeric conden-sation products of the flavonoids, including catechin,epicatechin or gallocatechin with leucocyanidin, leuco-pelargonidin or leucodelphinidin [1]. They are prominentcolorless compounds, and are found widely existed inthe bark of trees, leaves, fruits, flowers and seed coats.They have many natural functions, such as antioxidantproperties [2] and insect resistance [3]. In forage, theycan bind and precipitate dietary proteins, thus protectthe anim… 相似文献
12.
13.
Null alleles represent a common artefact of microsatellite-based analyses. Rapid methods for their detection and frequency estimation have been proposed to replace the existing time-consuming laboratory methods. The objective of this paper is to assess the power and accuracy of these statistical tools using both simulated and real datasets. Our results revealed that none of the tests developed to detect null alleles are perfect. However, combining tests allows the detection of null alleles with high confidence. Comparison of the estimators of null allele frequency indicated that those that account for unamplified individuals, such as the Brookfield2 estimator, are more accurate than those that do not. Altogether, the use of statistical tools appeared more appropriate than testing with alternative primers as null alleles often remain undetected following this laborious work. Based on these results, we propose recommendations to detect and correct datasets with null alleles. 相似文献
14.
We consider sample size calculations for testing differences in means between two samples and allowing for different variances in the two groups. Typically, the power functions depend on the sample size and a set of parameters assumed known, and the sample size needed to obtain a prespecified power is calculated. Here, we account for two sources of variability: we allow the sample size in the power function to be a stochastic variable, and we consider estimating the parameters from preliminary data. An example of the first source of variability is nonadherence (noncompliance). We assume that the proportion of subjects who will adhere to their treatment regimen is not known before the study, but that the proportion is a stochastic variable with a known distribution. Under this assumption, we develop simple closed form sample size calculations based on asymptotic normality. The second source of variability is in parameter estimates that are estimated from prior data. For example, we account for variability in estimating the variance of the normal response from existing data which are assumed to have the same variance as the study for which we are calculating the sample size. We show that we can account for the variability of the variance estimate by simply using a slightly larger nominal power in the usual sample size calculation, which we call the calibrated power. We show that the calculation of the calibrated power depends only on the sample size of the existing data, and we give a table of calibrated power by sample size. Further, we consider the calculation of the sample size in the rarer situation where we account for the variability in estimating the standardized effect size from some existing data. This latter situation, as well as several of the previous ones, is motivated by sample size calculations for a Phase II trial of a malaria vaccine candidate. 相似文献
15.
In some occupational health studies, observations occur in both exposed and unexposed individuals. If the levels of all exposed individuals have been detected, a two-part zero-inflated log-normal model is usually recommended, which assumes that the data has a probability mass at zero for unexposed individuals and a continuous response for values greater than zero for exposed individuals. However, many quantitative exposure measurements are subject to left censoring due to values falling below assay detection limits. A zero-inflated log-normal mixture model is suggested in this situation since unexposed zeros are not distinguishable from those exposed with values below detection limits. In the context of this mixture distribution, the information contributed by values falling below a fixed detection limit is used only to estimate the probability of unexposed. We consider sample size and statistical power calculation when comparing the median of exposed measurements to a regulatory limit. We calculate the required sample size for the data presented in a recent paper comparing the benzene TWA exposure data to a regulatory occupational exposure limit. A simulation study is conducted to investigate the performance of the proposed sample size calculation methods. 相似文献
16.
Nam JM 《Biometrics》2003,59(4):1027-1035
When the intraclass correlation coefficient or the equivalent version of the kappa agreement coefficient have been estimated from several independent studies or from a stratified study, we have the problem of comparing the kappa statistics and combining the information regarding the kappa statistics in a common kappa when the assumption of homogeneity of kappa coefficients holds. In this article, using the likelihood score theory extended to nuisance parameters (Tarone, 1988, Communications in Statistics-Theory and Methods 17(5), 1549-1556) we present an efficient homogeneity test for comparing several independent kappa statistics and, also, give a modified homogeneity score method using a noniterative and consistent estimator as an alternative. We provide the sample size using the modified homogeneity score method and compare it with that using the goodness-of-fit method (GOF) (Donner, Eliasziw, and Klar, 1996, Biometrics 52, 176-183). A simulation study for small and moderate sample sizes showed that the actual level of the homogeneity score test using the maximum likelihood estimators (MLEs) of parameters is satisfactorily close to the nominal and it is smaller than those of the modified homogeneity score and the goodness-of-fit tests. We investigated statistical properties of several noniterative estimators of a common kappa. The estimator (Donner et al., 1996) is essentially efficient and can be used as an alternative to the iterative MLE. An efficient interval estimation of a common kappa using the likelihood score method is presented. 相似文献
17.
José J. Reina-Pinto Derry Voisin Roxana Teodor Alexander Yephremov 《The Plant journal : for cell and molecular biology》2010,61(1):166-175
High-density oligonucleotide arrays are widely used for analysis of gene expression on a genomic scale, but the generated data remain largely inaccessible for comparative analysis purposes. Similarity searches in databases with differentially expressed gene (DEG) lists may be used to assign potential functions to new genes and to identify potential chemical inhibitors/activators and genetic suppressors/enhancers. Although this is a very promising concept, it requires the compatibility and validity of the DEG lists to be significantly improved. Using Arabidopsis and human datasets, we have developed guidelines for the performance of similarity searches against databases that collect microarray data. We found that, in comparison with many other methods, a rank-product analysis achieves a higher degree of inter- and intra-laboratory consistency of DEG lists, and is advantageous for assessing similarities and differences between them. To support this concept, we developed a tool called MASTA (microarray overlap search tool and analysis), and re-analyzed over 600 Arabidopsis microarray expression datasets. This revealed that large-scale searches produce reliable intersections between DEG lists that prove to be useful for genetic analysis, thus aiding in the characterization of cellular and molecular mechanisms. We show that this approach can be used to discover unexpected connections and to illuminate unanticipated interactions between individual genes. 相似文献
18.
目的新生土拨鼠感染土拨鼠肝炎病毒后,大部分发展为慢性肝炎,而成年土拨鼠感染后则多发生急性自限性肝炎。本实验目的就是寻找其肝组织中可能导致这种预后差异的关键基因。方法采用全基因组表达谱芯片技术,对比新生与成年小鼠肝组织基因表达差异,选取目的基因,再通过多个物种序列比对,设计简并引物,在土拨鼠肝组织eDNA中扩增对应基因,测序,再次设计引物,进行实时荧光定量PCR。结果与新生土拨鼠相比,成年土拨鼠肝细胞中与钙离子重吸收相关基因DNMl(Dynamin1)、DNM3(Dynamin3)及Prkcc(proteinkinaseC,gamma)表达率明显升高,分别上升2.65±0.25倍、1.90±0.34倍、2.94±0.54倍。结论在钙离子重吸收通路中,在两组小鼠肝脏中表达差异最明显的上述三个基因,在新生组与成年组土拨鼠之间也有明显差异。此类基因造成肝细胞内钙离子浓度的差别,间接影响其中肝炎病毒的复制。这种表达差异很可能是导致两个年龄段动物感染土拨鼠肝炎病毒后转归不同的原因之一。 相似文献
19.