首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the optimal discovery procedure (ODP), which has recently been introduced and theoretically shown to optimally perform multiple significance tests. Whereas existing procedures essentially use data from only one feature at a time, the ODP approach uses the relevant information from the entire data set when testing each feature. In particular, we propose a generally applicable estimate of the ODP for identifying differentially expressed genes in microarray experiments. This microarray method consistently shows favorable performance over five highly used existing methods. For example, in testing for differential expression between two breast cancer tumor types, the ODP provides increases from 72% to 185% in the number of genes called significant at a false discovery rate of 3%. Our proposed microarray method is freely available to academic users in the open-source, point-and-click EDGE software package.  相似文献   

2.
Common heritable diseases ("complex traits") are assumed to be due to multiple underlying susceptibility genes. While genetic mapping methods for Mendelian disorders have been very successful, the search for genes underlying complex traits has been difficult and often disappointing. One of the reasons may be that most current gene-mapping approaches are still based on conventional methodology of testing one or a few SNPs at a time. Here, we demonstrate a simple strategy that allows for the joint analysis of multiple disease-associated SNPs in different genomic regions. Our set-association method combines information over SNPs by forming sums of relevant single-marker statistics. As previously hypothesized, we show here that this approach successfully addresses the "curse of dimensionality" problem--too many variables should be estimated with a comparatively small number of observations. We also report results of simulation studies showing that our method furnishes unbiased and accurate significance levels. Power calculations demonstrate good power even in the presence of large numbers of nondisease associated SNPs. We extended our method to microarray expression data, where expression levels for large numbers of genes should be compared between two tissue types. In applications to such data, our approach turned out to be highly efficient.  相似文献   

3.
Selection studies involving multiple intercorrelated independent variables have employed multiple regression analysis as a means to estimate and partition natural and sexual selection's direct and indirect effects. These statistical models assume that independent variables are measured without error. Most would conclude that such is not the case in the field studies for which these methods are employed. We demonstrate that the distortion of estimates resulting from error variance is not trivial. When independent variables are intercorrelated, extreme distortions may occur. We propose to use Structural Equation Models (SEM), to estimate error variance and produce highly accurate coefficients for formulation of selection gradients. This method is particularly appropriate when the selection is viewed as happening at the level of the latent variables.  相似文献   

4.
Forest canopies and tree crown structures are of high ecological importance. Measuring canopies and crowns by direct inventory methods is time‐consuming and of limited accuracy. High‐resolution inventory tools, in particular terrestrial laser scanning (TLS), is able to overcome these limitations and obtain three‐dimensional (3D) structural information about the canopy with a very high level of detail. The main objective of this study was to introduce a novel method to analyze spatiotemporal dynamics in canopy occupancy at the individual tree and local neighborhood level using high‐resolution 3D TLS data. For the analyses, a voxel grid approach was applied. The tree crowns were modeled through the combination of two approaches: the encasement of all crown points with a 3D α‐shape, which was then converted into a voxel grid, and the direct voxelization of the crown points. We show that canopy occupancy at individual tree level can be quantified as the crown volume occupied only by the respective tree or shared with neighboring trees. At the local neighborhood level, our method enables the precise determination of the extent of canopy space filling, the identification of tree–tree interactions, and the analysis of complementary space use. Using multitemporal TLS data recordings, this method allows the precise detection and quantification of changes in canopy occupancy through time. The method is applicable to a wide range of investigations in forest ecology research, including the study of tree diversity effects on forest productivity or growing space analyses for optimal tree growth. Due to the high accuracy of this novel method, it facilitates the precise analyses even of highly plastic individual tree crowns and, thus, the realistic representation of forest canopies. Moreover, our voxel grid framework is flexible enough to allow for the inclusion of further biotic and abiotic variables relevant to complex analyses of forest canopy dynamics.  相似文献   

5.
We consider the problem of comparing two treatments on multiple endpoints where the goal is to identify the endpoints that have treatment effects, while controlling the familywise error rate. Two current approaches for this are (i) applying a global test within a closed testing procedure, and (ii) adjusting individual endpoint p‐values for multiplicity. We propose combining the two current methods. We compare the combined method with several competing methods in a simulation study. It is concluded that the combined approach maintains higher power under a variety of treatment effect configurations than the other methods and is thus more power‐robust.  相似文献   

6.
L. Finos  A. Farcomeni 《Biometrics》2011,67(1):174-181
Summary We show a novel approach for k‐FWER control which does not involve any correction, but only testing the hypotheses along a (possibly data‐driven) order until a suitable number of p‐values are found above the uncorrected α level. p‐values can arise from any linear model in a parametric or nonparametric setting. The approach is not only very simple and computationally undemanding, but also the data‐driven order enhances power when the sample size is small (and also when k and/or the number of tests is large). We illustrate the method on an original study about gene discovery in multiple sclerosis, in which were involved a small number of couples of twins, discordant by disease. The methods are implemented in an R package (someKfwer ), freely available on CRAN.  相似文献   

7.
In experiments involving many variables, investigators typically use multiple comparisons procedures to determine differences that are unlikely to be the result of chance. However, investigators rarely consider how the magnitude of the greatest observed effect sizes may have been subject to bias resulting from multiple testing. These questions of bias become important to the extent investigators focus on the magnitude of the observed effects. As an example, such bias can lead to problems in attempting to validate results, if a biased effect size is used to power a follow-up study. An associated important consequence is that confidence intervals constructed using standard distributions may be badly biased. A bootstrap approach is used to estimate and adjust for the bias in the effect sizes of those variables showing strongest differences. This bias is not always present; some principles showing what factors may lead to greater bias are given and a proof of the convergence of the bootstrap distribution is provided.  相似文献   

8.

Background

Gene set testing has become an important analysis technique in high throughput microarray and next generation sequencing studies for uncovering patterns of differential expression of various biological processes. Often, the large number of gene sets that are tested simultaneously require some sort of multiplicity correction to account for the multiplicity effect. This work provides a substantial computational improvement to an existing familywise error rate controlling multiplicity approach (the Focus Level method) for gene set testing in high throughput microarray and next generation sequencing studies using Gene Ontology graphs, which we call the Short Focus Level.

Results

The Short Focus Level procedure, which performs a shortcut of the full Focus Level procedure, is achieved by extending the reach of graphical weighted Bonferroni testing to closed testing situations where restricted hypotheses are present, such as in the Gene Ontology graphs. The Short Focus Level multiplicity adjustment can perform the full top-down approach of the original Focus Level procedure, overcoming a significant disadvantage of the otherwise powerful Focus Level multiplicity adjustment. The computational and power differences of the Short Focus Level procedure as compared to the original Focus Level procedure are demonstrated both through simulation and using real data.

Conclusions

The Short Focus Level procedure shows a significant increase in computation speed over the original Focus Level procedure (as much as ∼15,000 times faster). The Short Focus Level should be used in place of the Focus Level procedure whenever the logical assumptions of the Gene Ontology graph structure are appropriate for the study objectives and when either no a priori focus level of interest can be specified or the focus level is selected at a higher level of the graph, where the Focus Level procedure is computationally intractable.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0349-3) contains supplementary material, which is available to authorized users.  相似文献   

9.
In the comparison of various dose levels it can often be assumed that the parameters to be tested follow an order restriction. Two closed multiple test procedures for detecting the highest dose level still providing a shift in the response distribution as compared to the adjacent lower dose level is proposed. One is based on one sided comparisons between neighbouring doses, the other uses Helmert-type contrast statistics. If a sequence of testing is fixed in advance the multiple test can be suitably modified. The power of the procedures is simulated under the assumption of normally distributed responses for various constellations of the dose means. It is compared with the power of a general Holm-type procedure discussed in BUDDE & BAUER (1989).  相似文献   

10.
11.
Inbreeding depression, the reduced fitness of offspring of related individuals, is a central theme in evolutionary biology. Inbreeding effects are influenced by the genetic makeup of a population, which is driven by any history of genetic bottlenecks and genetic drift. The Chatham Island black robin represents a case of extreme inbreeding following two severe population bottlenecks. We tested whether inbreeding measured by a 20‐year pedigree predicted variation in fitness among individuals, despite the high mean level of inbreeding and low genetic diversity in this species. We found that paternal and maternal inbreeding reduced fledgling survival and individual inbreeding reduced juvenile survival, indicating that inbreeding depression affects even this highly inbred population. Close inbreeding also reduced survival for fledglings with less‐inbred mothers, but unexpectedly improved survival for fledglings with highly inbred mothers. This counterintuitive interaction could not be explained by various potentially confounding variables. We propose a genetic mechanism, whereby a highly inbred chick with a highly inbred parent inherits a “proven” genotype and thus experiences a fitness advantage, which could explain the interaction. The positive and negative effects we found emphasize that continuing inbreeding can have important effects on individual fitness, even in populations that are already highly inbred.  相似文献   

12.
Testing for unequal variances is usually performed in order to check the validity of the assumptions that underlie standard tests for differences between means (the t-test and anova). However, existing methods for testing for unequal variances (Levene's test and Bartlett's test) are notoriously non-robust to normality assumptions, especially for small sample sizes. Moreover, although these methods were designed to deal with one hypothesis at a time, modern applications (such as to microarrays and fMRI experiments) often involve parallel testing over a large number of levels (genes or voxels). Moreover, in these settings a shift in variance may be biologically relevant, perhaps even more so than a change in the mean. This paper proposes a parsimonious model for parallel testing of the equal variance hypothesis. It is designed to work well when the number of tests is large; typically much larger than the sample sizes. The tests are implemented using an empirical Bayes estimation procedure which `borrows information' across levels. The method is shown to be quite robust to deviations from normality, and to substantially increase the power to detect differences in variance over the more traditional approaches even when the normality assumption is valid.  相似文献   

13.
In ophthalmologic studies, measurements obtained from both eyes of an individual are often highly correlated. Ignoring the correlation could lead to incorrect inferences. An asymptotic method was proposed by Tang and others (2008) for testing equality of proportions between two groups under Rosner''s model. In this article, we investigate three testing procedures for general g ≥ 2 groups. Our simulation results show the score testing procedure usually produces satisfactory type I error control and has reasonable power. The three test procedures get closer when sample size becomes larger. Examples from ophthalmologic studies are used to illustrate our proposed methods.  相似文献   

14.
In clinical trials with an active control usually therapeutical equivalence of a new treatment is investigated by looking at a location parameter of the distributions of the primary efficacy variable. But even if the location parameters are close to each other existing differences in variability may be connected with different risks for under or over treatment in an individual patient. Assuming normally distributed responses a multiple test procedure applying two shifted one-sided t-tests for the mean and accordingly two one-sided F-tests for the variances is proposed. Equivalence in location and variability is established if all four tests lead to a rejection at the (one-sided) level α. A conservative procedure “correcting” the t-tests for heteroscedasticity is derived. The choice of a design in terms of the global level α, the global power, the relevant deviations in the population means and variances, as well as the sample size is outlined. Numerical calculations of the actual level and power for the proposed designs show, that for balanced sample sizes the classical uncorrected one-sided t-tests can be used safely without exaggerating the global type I error probability. Finally an example is given.  相似文献   

15.
Aims:  To investigate the effectiveness of pooled sampling methods for detection of Salmonella in turkey flocks.
Methods and Results:  Individual turkey droppings were taken from 43 flocks, with half the dropping tested for S almonella as an individual sample and the other half included in a pool of five. A pair of boot swabs and a dust sample were also taken from each flock. The results were analysed using Bayesian methods in the absence of a gold standard. This showed a dilution effect of mixing true-positive with negative samples, but even with this the pooled faecal samples were found to be a highly efficient method of testing compared with individual faecal samples. The more samples included in the pool, the more sensitive the pooled sampling method was predicted to be. The sensitivity of dust sampling was much more sensitive than faecal sampling at low prevalence.
Conclusions:  Pooled faecal sampling is an efficient method of Salmonella detection in turkey flocks. The additional testing of a dust sample greatly increased the effectiveness of sampling, especially at low prevalence.
Significance and Impact of the Study:  This is the first study to relate the sensitivity of the sampling methods to the within-flock prevalence.  相似文献   

16.
Recently, structural variation in the genome has been implicated in many complex diseases. Using genomewide single nucleotide polymorphism (SNP) arrays, researchers are able to investigate the impact not only of SNP variation, but also of copy-number variants (CNVs) on the phenotype. The most common analytic approach involves estimating, at the level of the individual genome, the underlying number of copies present at each location. Once this is completed, tests are performed to determine the association between copy number state and phenotype. An alternative approach is to carry out association testing first, between phenotype and raw intensities from the SNP array at the level of the individual marker, and then aggregate neighboring test results to identify CNVs associated with the phenotype. Here, we explore the strengths and weaknesses of these two approaches using both simulations and real data from a pharmacogenomic study of the chemotherapeutic agent gemcitabine. Our results indicate that pooled marker-level testing is capable of offering a dramatic increase in power (> 12-fold) over CNV-level testing, particularly for small CNVs. However, CNV-level testing is superior when CNVs are large and rare; understanding these tradeoffs is an important consideration in conducting association studies of structural variation.  相似文献   

17.
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.  相似文献   

18.
Most variables of interest in laboratory medicine show predictable changes with several frequencies in the span of time investigated. The waveform of such nonsinusoidal rhythms can be well described by the use of multiple components rhythmometry, a method that allows fitting a linear model with several cosine functions. The method, originally described for analysis of longitudinal time series, is here extended to allow analysis of hybrid data (time series sampled from a group of subjects, each represented by an individual series). Given k individual series, we can fit the same linear model with m different frequencies (harmonics or not from one fundamental period) to each series. This fit will provide estimations for 2m + 1 parameters, namely, the amplitude and acrophase of each component, as well as the rhythm-adjusted mean. Assuming that the set of parameters obtained for each individual is a random sample from a multivariate normal population, the corresponding population parameter estimates can be based on the means of estimates obtained from individuals in the sample. Their confidence intervals depend on the variability among individual parameter estimates. The vari-ance-covariance matrix can then be estimated on the basis of the sample covariances. Confidence intervals for the rhythm-adjusted mean, as well as for the amplitude-acrophase pair, of each component can then be computed using the estimated covariance matrix. The p-values for testing the zero-amplitude assumption for each component, as well as for the global model, can finally be derived using those confidence intervals and the t and F distributions. The method, validated by a simulation study and illustrated by an example of modeling the circadian variation of heart rate, represents a new step in the development of statistical procedures in chronobiology.  相似文献   

19.
Food web framework for size-structured populations   总被引:2,自引:0,他引:2  
We synthesise traditional unstructured food webs, allometric body size scaling, trait-based modelling, and physiologically structured modelling to provide a novel and ecologically relevant tool for size-structured food webs. The framework allows food web models to include ontogenetic growth and life-history omnivory at the individual level by resolving the population structure of each species as a size-spectrum. Each species is characterised by the trait ‘size at maturation’, and all model parameters are made species independent through scaling with individual body size and size at maturation. Parameter values are determined from cross-species analysis of fish communities as life-history omnivory is widespread in aquatic systems, but may be reparameterised for other systems. An ensemble of food webs is generated and the resulting communities are analysed at four levels of organisation: community level, species level, trait level, and individual level. The model may be solved analytically by assuming that the community spectrum follows a power law. The analytical solution provides a baseline expectation of the results of complex food web simulations, and agrees well with the predictions of the full model on biomass distribution as a function of individual size, biomass distribution as a function of size at maturation, and relation between predator-prey mass ratio of preferred and eaten food. The full model additionally predicts the diversity distribution as a function of size at maturation.  相似文献   

20.
The currently practiced methods of significance testing in microarray gene expression profiling are highly unstable and tend to be very low in power. These undesirable properties are due to the nature of multiple testing procedures, as well as extremely strong and long-ranged correlations between gene expression levels. In an earlier publication, we identified a special structure in gene expression data that produces a sequence of weakly dependent random variables. This structure, termed the delta-sequence, lies at the heart of a new methodology for selecting differentially expressed genes in nonoverlapping gene pairs. The proposed method has two distinct advantages: (1) it leads to dramatic gains in terms of the mean numbers of true and false discoveries, and in the stability of the results of testing; and (2) its outcomes are entirely free from the log-additive array-specific technical noise. We demonstrate the usefulness of this approach in conjunction with the nonparametric empirical Bayes method. The proposed modification of the empirical Bayes method leads to significant improvements in its performance. The new paradigm arising from the existence of the delta-sequence in biological data offers considerable scope for future developments in this area of methodological research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号