首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Association studies are traditionally performed in the case-control framework. As a first step in the analysis process, comparing allele frequencies using the Pearson's chi-square statistic is often invoked. However such an approach assumes the independence of alleles under the hypothesis of no association, which may not always be the case. Consequently this method introduces a bias that deviates the expected type I error-rate. In this article we first propose an unbiased and exact test as an alternative to the biased allelic test. Available data require to perform thousands of such tests so we focused on its fast execution. Since the biased allelic test is still widely used in the community, we illustrate its pitfalls in the context of genome-wide association studies and particularly in the case of low-level tests. Finally, we compare the unbiased and exact test with the Cochran-Armitage test for trend and show it perfoms similarly in terms of power. The fast, unbiased and exact allelic test code is available in R, C++ and Perl at: http://stat.genopole.cnrs.fr/software/fueatest.  相似文献   

2.
In animal vaccination experiments with binary outcome (diseased/non diseased), the comparison of the vaccinated and control group is often based on the Fisher exact test. A tool for the evaluation of different designs is proposed, based on the expected power of the Fisher exact test. The expected power can sometimes unexpectedly increase with decreasing sample size and/or increasing imbalance. The reasons for these peculiar results are explained and compared to the results of two other types of tests: the unconditional test and the randomisation test. In a vaccination experiment with a restricted number of animals it is shown to be important to consider expected power in order to choose the most appropriate design.  相似文献   

3.
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined–there are more parameters than data points–and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.  相似文献   

4.
The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.  相似文献   

5.
OBJECTIVES: Genetic association studies are usually based upon restricted sets of 'tag' markers selected to represent the total sequence variation. Tag selection is often determined by some threshold for the r(2) coefficients of linkage disequilibrium (LD) between tag and untyped markers, it being widely assumed that power to detect an effect at the untyped sites is retained by typing the tag marker in a sample scaled by the inverse of the selected threshold (1/r(2)). However, unless only a single causal variant occurs at a locus, it has been shown [Eur J Hum Genet 2006;14:426-437] that significant power loss can occur if this principle is applied. We sought to investigate whether unexpected loss of power might be an exceptional case or more general concern. In the absence of detailed knowledge about the genetic architecture at complex disease loci, we developed a mathematical approach to test all possible situations. METHODS: We derived mathematical formulae allowing the calculation of all possible odds ratios (OR) at a tag marker locus given the effect size that would be observed by typing a second locus and the r(2) between the two loci. For a range of allele frequencies, r(2) between loci, and strengths of association at the causal locus (OR from 0.5 to 2) that we consider realistic for complex disease loci, we next determined the sample sizes that would be necessary to give equivalent power to detect association by genotyping tag and causal loci and compared these with the sample sizes predicted by applying 1/r(2). RESULTS: Under most of the hypothetical scenarios we examined, the calculated sample sizes required to maintain power by typing markers that tag the causal locus at even moderately high r(2) (0.8) were greater than that calculated by applying 1/r(2). Even in populations with apparently similar measurements of allele frequency, LD structure, and effect size at the susceptibility allele, the required sample size to detect association with a tag marker can vary substantially. We also show that in apparently similar populations, associations to either allele at the tag site are possible. CONCLUSIONS: Indirect tests of association are less powered than sizes predicted by applying 1/r(2) in the majority of hypothetical scenarios we examined. Our findings pertain even for what we consider likely to be larger than average effect sizes in complex diseases (OR = 1.5-2) and even for moderately high r(2) values between the markers. Until a substantial number of disease genes have been identified through methods that are not based on tagging, and therefore biased towards those situations most favourable to tagging, it is impossible to know how the true scenarios are distributed across the range of possible scenarios. Nevertheless, while association designs based upon tag marker selection by necessity are the tool of choice for de novo gene discovery, our data suggest power to initially detect association may often be less than assumed. Moreover, our data suggest that to avoid genuine findings being subsequently discarded by unpredictable losses of power, follow up studies in other samples should be based upon more detailed analyses of the gene rather than simply on the tag SNPs showing association in the discovery study.  相似文献   

6.
SUMMARY:Maynard Smith's maximum chi(2) method is a useful tool for detecting recombination. Significance testing is usually carried out by random permutation, which may require a large amount of computer time. I describe an exact algorithm. AVAILABILITY: Matlab source code from http://www.bioc.cam.ac.uk/UTOs/howe/  相似文献   

7.
SUMMARY: Developing a quantitative understanding of intracellular networks requires simulations and computational analyses. However, traditional differential equation modeling tools are often inadequate due to the stochasticity of intracellular reaction networks that can potentially influence the phenotypic characteristics. Unfortunately, stochastic simulations are computationally too intense for most biological systems. Herein, we have utilized the recently developed binomial tau-leap method to carry out stochastic simulations of the epidermal growth factor receptor induced mitogen activated protein kinase cascade. Results indicate that the binomial tau-leap method is computationally 100-1000 times more efficient than the exact stochastic simulation algorithm of Gillespie. Furthermore, the binomial tau-leap method avoids negative populations and accurately captures the species populations along with their fluctuations despite the large difference in their size. AVAILABILITY: http://www.dion.che.udel.edu/multiscale/Introduction.html. Fortran 90 code available for academic use by email. SUPPLEMENTARY INFORMATION: Details about the binomial tau-leap algorithm, software and a manual are available at the above website.  相似文献   

8.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

9.
Using techniques from optimization theory, we have developed a computer program that approximates a desired probability distribution for amino acids by imposing a probability distribution on the four nucleotides in each of the three codon positions. These base probabilities allow for the generation of biased codons for use in mutational studies and in the design of biologically encoded libraries. The dependencies between codons in the genetic code often makes the exact generation of the desired probability distribution for amino acids impossible. Compromises are often necessary. The program, therefore, not only solves for the "optimal" approximation to the desired distribution (where the definition of "optimal" is influenced by several types of parameters entered by the user), but also solves for a number of "sub-optimal" solutions that are classified into families of similar solutions. A representative of each family is presented to the program user, who can then choose the type of approximation that is best for the intended application. The Combinatorial Codons program is available for use over the web from http://www.wi.mit.edu/kim/computing.html.  相似文献   

10.
The origin of the genetic code is a central open problem regarding the early evolution of life. Here, we consider two undeveloped but important aspects of possible scenarios for the evolutionary pathway of the translation machinery: the role of unassigned codons in early stages of the code and the incorporation of tRNA anticodon modifications. As the first codons started to encode amino acids, the translation machinery likely was faced with a large number of unassigned codons. Current molecular scenarios for the evolution of the code usually assume the very rapid assignment of all codons before all 20 amino acids became encoded. We show that the phenomenon of nonsense suppression as observed in current organisms allows for a scenario in which many unassigned codons persisted throughout most of the evolutionary development of the code. In addition, we demonstrate that incorporation of anticodon modifications at a late stage is feasible. The wobble rules allow a set of 20 tRNAs fully lacking anticodon modifications to encode all 20 canonical amino acids. These observations have implications for the biochemical plausibility of early stages in the evolution of the genetic code predating tRNA anticodon modifications and allow for effective translation by a relatively small and simple early tRNA set.  相似文献   

11.
The conditional exact tests of homogeneity of two binomial proportions are often used in small samples, because the exact tests guarantee to keep the size under the nominal level. The Fisher's exact test, the exact chi‐squared test and the exact likelihood ratio test are popular and can be implemented in software StatXact. In this paper we investigate which test is the best in small samples in terms of the unconditional exact power. In equal sample cases it is proved that the three tests produce the same unconditional exact power. A symmetry of the unconditional exact power is also found. In unequal sample cases the unconditional exact powers of the three tests are computed and compared. In most cases the Fisher's exact test turns out to be best, but we characterize some cases in which the exact likelihood ratio test has the highest unconditional exact power. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
13.
MOTIVATION: An important application of microarray experiments is to identify differentially expressed genes. Because microarray data are often not distributed according to a normal distribution nonparametric methods were suggested for their statistical analysis. Here, the Baumgartner-Weiss-Schindler test, a novel and powerful test based on ranks, is investigated and compared with the parametric t-test as well as with two other nonparametric tests (Wilcoxon rank sum test, Fisher-Pitman permutation test) recently recommended for the analysis of gene expression data. RESULTS: Simulation studies show that an exact permutation test based on the Baumgartner-Weiss-Schindler statistic B is preferable to the other three tests. It is less conservative than the Wilcoxon test and more powerful, in particular in case of asymmetric or heavily tailed distributions. When the underlying distribution is symmetric the differences in power between the tests are relatively small. Thus, the Baumgartner-Weiss-Schindler is recommended for the usual situation that the underlying distribution is a priori unknown. AVAILABILITY: SAS code available on request from the authors.  相似文献   

14.
For sample size calculation in clinical trials with survival endpoints, the logrank test, which is the optimal method under the proportional hazard (PH) assumption, is predominantly used. In reality, the PH assumption may not hold. For example, in immuno-oncology trials, delayed treatment effects are often expected. The sample size without considering the potential violation of the PH assumption may lead to an underpowered study. In recent years, combination tests such as the maximum weighted logrank test have received great attention because of their robust performance in various hazards scenarios. In this paper, we propose a flexible simulation-free procedure to calculate the sample size using combination tests. The procedure extends the Lakatos' Markov model and allows for complex situations encountered in a clinical trial, like staggered entry, dropouts, etc. We evaluate the procedure using two maximum weighted logrank tests, one projection-type test, and three other commonly used tests under various hazards scenarios. The simulation studies show that the proposed method can achieve the target power for all compared tests in most scenarios. The combination tests exhibit robust performance under correct specification and misspecification scenarios and are highly recommended when the hazard-changing patterns are unknown beforehand. Finally, we demonstrate our method using two clinical trial examples and provide suggestions about the sample size calculations under nonproportional hazards.  相似文献   

15.
STAGES IN THE ORIGIN OF VERTEBRATES: ANALYSIS BY MEANS OF SCENARIOS   总被引:1,自引:0,他引:1  
Vertebrates lack an epidermal nerve plexus. This feature is common to many invertebrates from which vertebrates differ by an extensive set of shared-derived characters (synapomorphies) derived from the neural crest and epidermal neurogenic placodes. Hence, the hypothesis that the developmental precursor of the epidermal nerve plexus may be homologous to the neural crest and epidermal neurogenic placodes. This account attempts to generate a nested set of scenarios for the prevertebrate-vertebrate transition, associating a presumed sequence of behavioural and environmental changes with the observed phenotypic ones. Toward this end, it integrates morphological, developmental, functional (physiological/behavioural) and some ecological data, as many phenotypic shifts apparently involved associated transitions in several aspects of the animals. The scenarios deal with the origin of embryonic and adult tissues and such major organs as the notochord, the CNS, grills and kidneys and propose a sequence of associated changes. Alternative scenarios are stated as the evidence often remains insufficient for decision. The analysis points to gaps in comprehension of the biology of the animals and therefore suggests further research.  相似文献   

16.
Multidimensional omic datasets often have correlated features leading to the possibility of discovering multiple biological signatures with similar predictive performance for a phenotype. However, their exploration is limited by low sample size and the exponential nature of the combinatorial search leading to high computational cost. To address these issues, we have developed an algorithm muSignAl (multiple signature algorithm) which selects multiple signatures with similar predictive performance while systematically bypassing the requirement of exploring all the combinations of features. We demonstrated the workflow of this algorithm with an example of proteomics dataset. muSignAl is applicable in various bioinformatics-driven explorations, such as understanding the relationship between multiple biological feature sets and phenotypes, and discovery and development of biomarker panels while providing the opportunity of optimising their development cost with the help of equally good multiple signatures. Source code of muSignAl is freely available at https://github.com/ShuklaLab/muSignAl .  相似文献   

17.
Chan IS  Zhang Z 《Biometrics》1999,55(4):1202-1209
Confidence intervals are often provided to estimate a treatment difference. When the sample size is small, as is typical in early phases of clinical trials, confidence intervals based on large sample approximations may not be reliable. In this report, we propose test-based methods of constructing exact confidence intervals for the difference in two binomial proportions. These exact confidence intervals are obtained from the unconditional distribution of two binomial responses, and they guarantee the level of coverage. We compare the performance of these confidence intervals to ones based on the observed difference alone. We show that a large improvement can be achieved by using the standardized Z test with a constrained maximum likelihood estimate of the variance.  相似文献   

18.
Case–control designs are commonly employed in genetic association studies. In addition to the case–control status, data on secondary traits are often collected. Directly regressing secondary traits on genetic variants from a case–control sample often leads to biased estimation. Several statistical methods have been proposed to address this issue. The inverse probability weighting (IPW) approach and the semiparametric maximum-likelihood (SPML) approach are the most commonly used. A new weighted estimating equation (WEE) approach is proposed to provide unbiased estimation of genetic associations with secondary traits, by combining observed and counterfactual outcomes. Compared to the existing approaches, WEE is more robust against biased sampling and disease model misspecification. We conducted simulations to evaluate the performance of the WEE under various models and sampling schemes. The WEE demonstrated robustness in all scenarios investigated, had appropriate type I error, and was as powerful or more powerful than the IPW and SPML approaches. We applied the WEE to an asthma case–control study to estimate the associations between the thymic stromal lymphopoietin gene and two secondary traits: overweight status and serum IgE level. The WEE identified two SNPs associated with overweight in logistic regression, three SNPs associated with serum IgE levels in linear regression, and an additional four SNPs that were missed in linear regression to be associated with the 75th quantile of IgE in quantile regression. The WEE approach provides a general and robust secondary analysis framework, which complements the existing approaches and should serve as a valuable tool for identifying new associations with secondary traits.  相似文献   

19.
There exist several formulae for sample sizes for testing the equivalence of binomial proportions which are based on approximations by the normal distribution. Quite often these formulae produce drastically different results. In this paper the validity of the approximate sample sizes is investigated with respect to the exact distributions.  相似文献   

20.
Chi‐squared test has been a popular approach to the analysis of a 2 × 2 table when the sample sizes for the four cells are large. When the large sample assumption does not hold, however, we need an exact testing method such as Fisher's test. When the study population is heterogeneous, we often partition the subjects into multiple strata, so that each stratum consists of homogeneous subjects and hence the stratified analysis has an improved testing power. While Mantel–Haenszel test has been widely used as an extension of the chi‐squared test to test on stratified 2 × 2 tables with a large‐sample approximation, we have been lacking an extension of Fisher's test for stratified exact testing. In this paper, we discuss an exact testing method for stratified 2 × 2 tables that is simplified to the standard Fisher's test in single 2 × 2 table cases, and propose its sample size calculation method that can be useful for designing a study with rare cell frequencies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号