首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gene expression signatures from microarray experiments promise to provide important prognostic tools for predicting disease outcome or response to treatment. A number of microarray studies in various cancers have reported such gene signatures. However, the overlap of gene signatures in the same disease has been limited so far, and some reported signatures have not been reproduced in other populations. Clearly, the methods used for verifying novel gene signatures need improvement. In this article, we describe an experiment in which microarrays and sample hybridization are designed according to the statistical principles of randomization, replication and blocking. Our results show that such designs provide unbiased estimation of differential expression levels as well as powerful tests for them.  相似文献   

2.
Experiments involving neonates should follow the same basic principles as most other experiments. They should be unbiased, be powerful, have a good range of applicability, not be excessively complex, and be statistically analyzable to show the range of uncertainty in the conclusions. However, investigation of growth and development in neonatal multiparous animals poses special problems associated with the choice of "experimental unit" and differences between litters: the "litter effect." Two main types of experiments are described, with recommendations regarding their design and statistical analysis: First, the "between litter design" is used when females or whole litters are assigned to a treatment group. In this case the litter, rather than the individuals within a litter, is the experimental unit and should be the unit for the statistical analysis. Measurements made on individual neonatal animals need to be combined within each litter. Counting each neonate as a separate observation may lead to incorrect conclusions. The number of observations for each outcome ("n") is based on the number of treated females or whole litters. Where litter sizes vary, it may be necessary to use a weighted statistical analysis because means based on more observations are more reliable than those based on a few observations. Second, the more powerful "within-litter design" is used when neonates can be individually assigned to treatment groups so that individuals within a litter can have different treatments. In this case, the individual neonate is the experimental unit, and "n" is based on the number of individual pups, not on the number of whole litters. However, variation in litter size means that it may be difficult to perform balanced experiments with equal numbers of animals in each treatment group within each litter. This increases the complexity of the statistical analysis. A numerical example using a general linear model analysis of variance is provided in the Appendix. The use of isogenic strains should be considered in neonatal research. These strains are like immortal clones of genetically identical individuals (i.e., they are uniform, stable, and repeatable), and their use should result in more powerful experiments. Inbred females mated to males of a different inbred strain will produce F1 hybrid offspring that will be uniform, vigorous, and genetically identical. Different strains may develop at different rates and respond differently to experimental treatments.  相似文献   

3.
When conducting field studies, it is common for ecologists to choose the locations of sampling units arbitrarily at the time sampling occurs, rather than using a properly randomised sampling design. Unfortunately, this ‘haphazard’ sampling approach cannot provide formal statistical inference from the sample to the population without making untestable assumptions. Here, we argue that two recent technological developments remove the need for haphazard sampling in many situations. A general approach to simple randomised sampling designs is outlined, and some examples demonstrate that even complicated designs can be implemented easily using software that is widely used among ecologists. We consider that more rigorous, randomised sampling designs would strengthen the validity of the conclusions drawn from ecological studies, to the benefit of the discipline as a whole.  相似文献   

4.
5.
Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification, such as family-based association designs, may be less powerful. Furthermore, it is often more feasible and less expensive to collect unrelated individuals. Recently, several statistical methods have been proposed for case-control association tests in a structured population; these methods may be robust to population stratification. In the present study, we propose a quantitative similarity-based association test (QSAT) to identify association between a candidate marker and a quantitative trait of interest, through use of unrelated individuals. For the QSAT, we first determine whether two individuals are from the same subpopulation or from different subpopulations, using genotype data at a set of independent markers. We then perform an association test between the candidate marker and the quantitative trait, through incorporation of such information. Simulation results based on either coalescent models or empirical population genetics data show that the QSAT has a correct type I error rate in the presence of population stratification and that the power of the QSAT is higher than that of family-based association designs.  相似文献   

6.

Background  

cDNA microarrays are a powerful means to screen for biologically relevant gene expression changes, but are often limited by their ability to detect small changes accurately due to "noise" from random and systematic errors. While experimental designs and statistical analysis methods have been proposed to reduce these errors, few studies have tested their accuracy and ability to identify small, but biologically important, changes. Here, we have compared two cDNA microarray experimental design methods with northern blot confirmation to reveal changes in gene expression that could contribute to the early antiproliferative effects of neuregulin on MCF10AT human breast epithelial cells.  相似文献   

7.
Extreme discordant sibling pairs (EDSPs) are theoretically powerful for the mapping of quantitative-trait loci (QTLs) in humans. EDSPs have not been used much in practice, however, because of the need to screen very large populations to find enough pairs that are extreme and discordant. Given appropriate statistical methods, another alternative is to use moderately discordant sibling pairs (MDSPs)--pairs that are discordant but not at the far extremes of the distribution. Such pairs can be powerful yet far easier to collect than extreme discordant pairs. Recent work on statistical methods for QTL mapping in humans has included a number of methods that, though not developed specifically for discordant pairs, may well be powerful for MDSPs and possibly even EDSPs. In the present article, we survey the new statistics and discuss their applicability to discordant pairs. We then use simulation to study the type I error and the power of various statistics for EDSPs and for MDSPs. We conclude that the best statistic(s) for discordant pairs (moderate or extreme) is (are) to be found among the new statistics. We suggest that the new statistics are appropriate for many other designs as well-and that, in fact, they open the way for the exploration of entirely novel designs.  相似文献   

8.
The view is widely held that experimental methods (randomised controlled trials) are the "gold standard" for evaluation and that observational methods (cohort and case control studies) have little or no value. This ignores the limitations of randomised trials, which may prove unnecessary, inappropriate, impossible, or inadequate. Many of the problems of conducting randomised trials could often, in theory, be overcome, but the practical implications for researchers and funding bodies mean that this is often not possible. The false conflict between those who advocate randomised trials in all situations and those who believe observational data provide sufficient evidence needs to be replaced with mutual recognition of the complementary roles of the two approaches. Researchers should be united in their quest for scientific rigour in evaluation, regardless of the method used.  相似文献   

9.
10.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

11.
Case-control studies of association in structured or admixed populations   总被引:7,自引:0,他引:7  
Case-control tests for association are an important tool for mapping complex-trait genes. But population structure can invalidate this approach, leading to apparent associations at markers that are unlinked to disease loci. Family-based tests of association can avoid this problem, but such studies are often more expensive and in some cases--particularly for late-onset diseases--are impractical. In this review article we describe a series of approaches published over the past 2 years which use multilocus genotype data to enable valid case-control tests of association, even in the presence of population structure. These tests can be classified into two categories. "Genomic control" methods use the independent marker loci to adjust the distribution of a standard test statistic, while "structured association" methods infer the details of population structure en route to testing for association. We discuss the statistical issues involved in the different approaches and present results from simulations comparing the relative performance of the methods under a range of models.  相似文献   

12.
Extreme discordant sibling-pair (EDSP) designs have been shown in theory to be very powerful for mapping quantitative-trait loci (QTLs) in humans. However, their practical applicability has been somewhat limited by the need to phenotype very large populations to find enough pairs that are extremely discordant. In this paper, we demonstrate that there is also substantial power in pairs that are only moderately discordant, and that designs using moderately discordant pairs can yield a more practical balance between phenotyping and genotyping efforts. The power we demonstrate for moderately discordant pairs stems from a new statistical result. Statistical analysis in discordant-pair studies is generally done by testing for reduced identity by descent (IBD) sharing in the pairs. By contrast, the most commonly-used statistical methods for more standard QTL mapping are Haseman-Elston regression and variance-components analysis. Both of these use statistics that are functions of the trait values given IBD information for the pedigree. We show that IBD sharing statistics and "trait value given IBD" statistics contribute complementary rather than redundant information, and thus that statistics of the two types can be combined to form more powerful tests of linkage. We propose a simple composite statistic, and test it with simulation studies. The simulation results show that our composite statistic increases power only minimally for extremely discordant pairs. However, it boosts the power of moderately discordant pairs substantially and makes them a very practical alternative. Our composite statistic is straightforward to calculate with existing software; we give a practical example of its use by applying it to a Genetic Analysis Workshop (GAW) data set.  相似文献   

13.
Selective genotyping (i.e., genotyping only those individuals with extreme phenotypes) can greatly improve the power to detect and map quantitative trait loci in genetic association studies. Because selection depends on the phenotype, the resulting data cannot be properly analyzed by standard statistical methods. We provide appropriate likelihoods for assessing the effects of genotypes and haplotypes on quantitative traits under selective-genotyping designs. We demonstrate that the likelihood-based methods are highly effective in identifying causal variants and are substantially more powerful than existing methods.  相似文献   

14.
Background: Since biological systems are complex and often involve multiple types of genomic relationships, tensor analysis methods can be utilized to elucidate these hidden complex relationships. There is a pressing need for this, as the interpretation of the results of high-throughput experiments has advanced at a much slower pace than the accumulation of data.Results: In this review we provide an overview of some tensor analysis methods for biological systems.Conclusions: Tensors are natural and powerful generalizations of vectors and matrices to higher dimensions and play a fundamental role in physics, mathematics and many other areas. Tensor analysis methods can be used to provide the foundations of systematic approaches to distinguish significant higher order correlations among the elements of a complex systems via finding ensembles of a small number of reduced systems that provide a concise and representative summary of these correlations.  相似文献   

15.
Using statistical methods, the designs of multifraction experiments which are likely to give the most precise estimate of the alpha-beta ratio in the linear-quadratic model are investigated. The aim of the investigation is to try to understand what features of an experimental design make it efficient for estimating alpha/beta rather than to recommend a specific design. A plot of the design on an nd2 versus nd graph is suggested, and this graph is called the design plot. The best designs are those which have a large spread in the isoeffect direction in the design plot, which means that a wide range of doses per fraction should be used. For binary response assays, designs with expected response probabilities near to 0.5 are most efficient. Furthermore, dose points with expected response probabilities outside the range 0.1 to 0.9 contribute negligibly to the efficiency with which alpha/beta can be estimated. For "top-up" experiments, the best designs are those which replace as small a portion as possible of the full experiment with the top-up scheme. In addition, from a statistical viewpoint, it makes no difference whether a single large top-up dose or several smaller top-up doses are used; however, other considerations suggest that two or more top-up doses may be preferable. The practical realities of designing experiments as well as the somewhat idealized statistical considerations are discussed.  相似文献   

16.
Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. AVAILABILITY: http://www.csse.monash.edu.au/~karun/pmml.  相似文献   

17.
One of the most important steps in biomedical longitudinal studies is choosing a good experimental design that can provide high accuracy in the analysis of results with a minimum sample size. Several methods for constructing efficient longitudinal designs have been developed based on power analysis and the statistical model used for analyzing the final results. However, development of this technology is not available to practitioners through user-friendly software. In this paper we introduce LADES (Longitudinal Analysis and Design of Experiments Software) as an alternative and easy-to-use tool for conducting longitudinal analysis and constructing efficient longitudinal designs. LADES incorporates methods for creating cost-efficient longitudinal designs, unequal longitudinal designs, and simple longitudinal designs. In addition, LADES includes different methods for analyzing longitudinal data such as linear mixed models, generalized estimating equations, among others. A study of European eels is reanalyzed in order to show LADES capabilities. Three treatments contained in three aquariums with five eels each were analyzed. Data were collected from 0 up to the 12th week post treatment for all the eels (complete design). The response under evaluation is sperm volume. A linear mixed model was fitted to the results using LADES. The complete design had a power of 88.7% using 15 eels. With LADES we propose the use of an unequal design with only 14 eels and 89.5% efficiency. LADES was developed as a powerful and simple tool to promote the use of statistical methods for analyzing and creating longitudinal experiments in biomedical research.  相似文献   

18.
We present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Our primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, we adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called "contigs." These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.  相似文献   

19.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号