首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The present study assesses the effects of genotyping errors on the type I error rate of a particular transmission/disequilibrium test (TDT(std)), which assumes that data are errorless, and introduces a new transmission/disequilibrium test (TDT(ae)) that allows for random genotyping errors. We evaluate the type I error rate and power of the TDT(ae) under a variety of simulations and perform a power comparison between the TDT(std) and the TDT(ae), for errorless data. Both the TDT(std) and the TDT(ae) statistics are computed as two times a log-likelihood difference, and both are asymptotically distributed as chi(2) with 1 df. Genotype data for trios are simulated under a null hypothesis and under an alternative (power) hypothesis. For each simulation, errors are introduced randomly via a computer algorithm with different probabilities (called "allelic error rates"). The TDT(std) statistic is computed on all trios that show Mendelian consistency, whereas the TDT(ae) statistic is computed on all trios. The results indicate that TDT(std) shows a significant increase in type I error when applied to data in which inconsistent trios are removed. This type I error increases both with an increase in sample size and with an increase in the allelic error rates. TDT(ae) always maintains correct type I error rates for the simulations considered. Factors affecting the power of the TDT(ae) are discussed. Finally, the power of TDT(std) is at least that of TDT(ae) for simulations with errorless data. Because data are rarely error free, we recommend that researchers use methods, such as the TDT(ae), that allow for errors in genotype data.  相似文献   

2.
Summary .   We develop methods for competing risks analysis when individual event times are correlated within clusters. Clustering arises naturally in clinical genetic studies and other settings. We develop a nonparametric estimator of cumulative incidence, and obtain robust pointwise standard errors that account for within-cluster correlation. We modify the two-sample Gray and Pepe–Mori tests for correlated competing risks data, and propose a simple two-sample test of the difference in cumulative incidence at a landmark time. In simulation studies, our estimators are asymptotically unbiased, and the modified test statistics control the type I error. The power of the respective two-sample tests is differentially sensitive to the degree of correlation; the optimal test depends on the alternative hypothesis of interest and the within-cluster correlation. For purposes of illustration, we apply our methods to a family-based prospective cohort study of hereditary breast/ovarian cancer families. For women with BRCA1 mutations, we estimate the cumulative incidence of breast cancer in the presence of competing mortality from ovarian cancer, accounting for significant within-family correlation.  相似文献   

3.
Individual‐based landscape genetic methods have become increasingly popular for quantifying fine‐scale landscape influences on gene flow. One complication for individual‐based methods is that gene flow and landscape variables are often correlated with geography. Partial statistics, particularly Mantel tests, are often employed to control for these inherent correlations by removing the effects of geography while simultaneously correlating measures of genetic differentiation and landscape variables of interest. Concerns about the reliability of Mantel tests prompted this study, in which we use simulated landscapes to evaluate the performance of partial Mantel tests and two ordination methods, distance‐based redundancy analysis (dbRDA) and redundancy analysis (RDA), for detecting isolation by distance (IBD) and isolation by landscape resistance (IBR). Specifically, we described the effects of suitable habitat amount, fragmentation and resistance strength on metrics of accuracy (frequency of correct results, type I/II errors and strength of IBR according to underlying landscape and resistance strength) for each test using realistic individual‐based gene flow simulations. Mantel tests were very effective for detecting IBD, but exhibited higher error rates when detecting IBR. Ordination methods were overall more accurate in detecting IBR, but had high type I errors compared to partial Mantel tests. Thus, no one test outperformed another completely. A combination of statistical tests, for example partial Mantel tests to detect IBD paired with appropriate ordination techniques for IBR detection, provides the best characterization of fine‐scale landscape genetic structure. Realistic simulations of empirical data sets will further increase power to distinguish among putative mechanisms of differentiation.  相似文献   

4.
Computer simulation was used to test Smith's (1994) correction for phylogenetic nonindependence in comparative studies. Smith's method finds effective N, which is computed using nested analysis of variance, and uses this value in place of observed N as the baseline degrees of freedom (df) for calculating statistical significance levels. If Smith's formula finds the correct df, distributions of computer-generated statistics from simulations with observed N nonindependent species should match theoretical distributions (from statistical tables) with the df based on effective N. The computer program developed to test Smith's method simulates character evolution down user-specified phylogenies. Parameters were systematically varied to discover their effects on Smith's method. In simulations in which the phylogeny and taxonomy were identical (tests of narrow-sense validity), Smith's method always gave conservative statistical results when the taxonomy had fewer than five levels. This conservative departure gave way to a liberal deviation in type I error rates in simulations using more than five taxonomic levels, except when species values were nearly independent. Reducing the number of taxonomic levels used in the analysis, and thereby eliminating available information regarding evolutionary relationships, also increased type I error rates (broad-sense validity), indicating that this may be inappropriate under conditions shown to have high type I error rates. However, the use of taxonomic categories over more accurate phylogenies did not create a liberal bias in all cases in the analysis performed here. The effect of correlated trait evolution was ambiguous but, relative to other parameters, negligible. © 1995 Wiley-Liss, Inc.  相似文献   

5.
6.
OBJECTIVE: The potential value of haplotypes has attracted widespread interest in the mapping of complex traits. Haplotype sharing methods take the linkage disequilibrium information between multiple markers into account, and may have good power to detect predisposing genes. We present a new approach based on Mantel statistics for spacetime clustering, which is developed in order to improve the power of haplotype sharing analysis for gene mapping in complex disease. METHODS: The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes for case-only and case-control studies. The genetic similarity is measured as the shared length between haplotypes around a putative disease locus. The phenotypic similarity is measured as the mean-corrected cross-product based on the respective phenotypes. We analyzed two tests for statistical significance with respect to type I error: (1) assuming asymptotic normality, and (2) using a Monte Carlo permutation procedure. The results were compared to the chi(2) test for association based on 3-marker haplotypes. RESULTS: The results of the type I error rates for the Mantel statistics using the permutational procedure yielded pointwise valid tests. The approach based on the assumption of asymptotic normality was seriously liberal. CONCLUSION: Power comparisons showed that the Mantel statistics were better than or equal to the chi(2) test for all simulated disease models.  相似文献   

7.
Although habitat fragmentation is one of the greatest threats to biodiversity worldwide, virtually no attention has been paid to the quantification of error in fragmentation statistics. Landscape pattern indices (LPIs), such as mean patch size and number of patches, are routinely used to quantify fragmentation and are often calculated using remote-sensing imagery that has been classified into different land-cover classes. No classified map is ever completely correct, so we asked if different maps with similar misclassification rates could result in widely different errors in pattern indices. We simulated landscapes with varying proportions of habitat and clumpiness (autocorrelation) and then simulated classification errors on the same maps. We simulated higher misclassification at patch edges (as is often observed), and then used a smoothing algorithm routinely used on images to correct salt-and-pepper classification error. We determined how well classification errors (and smoothing) corresponded to errors seen in four pattern indices. Maps with low misclassification rates often yielded errors in LPIs of much larger magnitude and substantial variability. Although smoothing usually improved classification error, it sometimes increased LPI error and reversed the direction of error in LPIs introduced by misclassification. Our results show that classification error is not always a good predictor of errors in LPIs, and some types of image postprocessing (for example, smoothing) might result in the underestimation of habitat fragmentation. Furthermore, our results suggest that there is potential for large errors in nearly every landscape pattern analysis ever published, because virtually none quantify the errors in LPIs themselves.  相似文献   

8.
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.  相似文献   

9.
Nonlinear mixed effects models allow investigating individual differences in drug concentration profiles (pharmacokinetics) and responses. Pharmacogenetics focuses on the genetic component of this variability. Two tests often used to detect a gene effect on a pharmacokinetic parameter are (1) the Wald test, assessing whether estimates for the gene effect are significantly different from 0 and (2) the likelihood ratio test comparing models with and without the genetic effect. Because those asymptotic tests show inflated type I error on small sample size and/or with unevenly distributed genotypes, we develop two alternatives and evaluate them by means of a simulation study. First, we assess the performance of the permutation test using the Wald and the likelihood ratio statistics. Second, for the Wald test we propose the use of the F-distribution with four different values for the denominator degrees of freedom. We also explore the influence of the estimation algorithm using both the first-order conditional estimation with interaction linearization-based algorithm and the stochastic approximation expectation maximization algorithm. We apply these methods to the analysis of the pharmacogenetics of indinavir in HIV patients recruited in the COPHAR2-ANRS 111 trial. Results of the simulation study show that the permutation test seems appropriate but at the cost of an additional computational burden. One of the four F-distribution-based approaches provides a correct type I error estimate for the Wald test and should be further investigated.  相似文献   

10.
The genetic basis of complex diseases is expected to be highly heterogeneous, with complex interactions among multiple disease loci and environment factors. Due to the multi-dimensional property of interactions among large number of genetic loci, efficient statistical approach has not been well developed to handle the high-order epistatic complexity. In this article, we introduce a new approach for testing genetic epistasis in multiple loci using an entropy-based statistic for a case-only design. The entropy-based statistic asymptotically follows a χ2 distribution. Computer simulations show that the entropy-based approach has better control of type I error and higher power compared to the standard χ2 test. Motivated by a schizophrenia data set, we propose a method for measuring and testing the relative entropy of a clinical phenotype, through which one can test the contribution or interaction of multiple disease loci to a clinical phenotype. A sequential forward selection procedure is proposed to construct a genetic interaction network which is illustrated through a tree-based diagram. The network information clearly shows the relative importance of a set of genetic loci on a clinical phenotype. To show the utility of the new entropy-based approach, it is applied to analyze two real data sets, a schizophrenia data set and a published malaria data set. Our approach provides a fast and testable framework for genetic epistasis study in a case-only design.  相似文献   

11.
Li M  Boehnke M  Abecasis GR  Song PX 《Genetics》2006,173(4):2317-2327
Mapping and identifying variants that influence quantitative traits is an important problem for genetic studies. Traditional QTL mapping relies on a variance-components (VC) approach with the key assumption that the trait values in a family follow a multivariate normal distribution. Violation of this assumption can lead to inflated type I error, reduced power, and biased parameter estimates. To accommodate nonnormally distributed data, we developed and implemented a modified VC method, which we call the "copula VC method," that directly models the nonnormal distribution using Gaussian copulas. The copula VC method allows the analysis of continuous, discrete, and censored trait data, and the standard VC method is a special case when the data are distributed as multivariate normal. Through the use of link functions, the copula VC method can easily incorporate covariates. We use computer simulations to show that the proposed method yields unbiased parameter estimates, correct type I error rates, and improved power for testing linkage with a variety of nonnormal traits as compared with the standard VC and the regression-based methods.  相似文献   

12.
An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.  相似文献   

13.
Keightley PD  Halligan DL 《Genetics》2011,188(4):931-940
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.  相似文献   

14.
The simultaneous testing of a large number of hypotheses in a genome scan, using individual thresholds for significance, inherently leads to inflated genome-wide false positive rates. There exist various approaches to approximating the correct genomewide p-values under various assumptions, either by way of asymptotics or simulations. We explore a philosophically different criterion, recently proposed in the literature, which controls the false discovery rate. The test statistics are assumed to arise from a mixture of distributions under the null and non-null hypotheses. We fit the mixture distribution using both a nonparametric approach and commingling analysis, and then apply the local false discovery rate to select cut-off points for regions to be declared interesting. Another criterion, the minimum total error, is also explored. Both criteria seem to be sensible alternatives to controlling the classical type I and type II error rates.  相似文献   

15.
Interim analyses in clinical trials are planned for ethical as well as economic reasons. General results have been published in the literature that allow the use of standard group sequential methodology if one uses an efficient test statistic, e.g., when Wald-type statistics are used in random-effects models for ordinal longitudinal data. These models often assume that the random effects are normally distributed. However, this is not always the case. We will show that, when the random-effects distribution is misspecified in ordinal regression models, the joint distribution of the test statistics over the different interim analyses is still a multivariate normal distribution, but a sandwich-type correction to the covariance matrix is needed in order to obtain the correct covariance matrix. The independent increment structure is also investigated. A bias in estimation will occur due to the misspecification. However, we will also show that the treatment effect estimate will be unbiased under the null hypothesis, thus maintaining the type I error. Extensive simulations based on a toenail dermatophyte onychomycosis trial are used to illustrate our results.  相似文献   

16.
Using a computerized phylogenetic analysis of the Isopoda (Crustacea: Peracarida) as source of typical errors and misunderstandings, problems that may occur in computer cladistics are reviewed. It is concluded that in addition to the errors that are possible in a conventional Hennigian analysis some specific methodological problems exist in computer cladistics. It is recommended that the OTU be replaced by the groundpattern concept. Tree statistics are not useful for comparing different competing hypotheses. Arguments ought to concentrate on the hypo-thetico-deductive steps of the analysis, i.e. on character analysis. The use of computers does not add objectivity to character analysis. Single outgroup taxa should not be used in assessing the character states of ingroups. Concerning isopod phylogeny, it is argued here that the tail fan of the Isopoda can probably be derived from the eumalacostracan groundpattern and did not evolve de novo within the Isopoda.  相似文献   

17.
E M Kierepka  E K Latch 《Heredity》2016,116(1):33-43
Landscape genetics is a powerful tool for conservation because it identifies landscape features that are important for maintaining genetic connectivity between populations within heterogeneous landscapes. However, using landscape genetics in poorly understood species presents a number of challenges, namely, limited life history information for the focal population and spatially biased sampling. Both obstacles can reduce power in statistics, particularly in individual-based studies. In this study, we genotyped 233 American badgers in Wisconsin at 12 microsatellite loci to identify alternative statistical approaches that can be applied to poorly understood species in an individual-based framework. Badgers are protected in Wisconsin owing to an overall lack in life history information, so our study utilized partial redundancy analysis (RDA) and spatially lagged regressions to quantify how three landscape factors (Wisconsin River, Ecoregions and land cover) impacted gene flow. We also performed simulations to quantify errors created by spatially biased sampling. Statistical analyses first found that geographic distance was an important influence on gene flow, mainly driven by fine-scale positive spatial autocorrelations. After controlling for geographic distance, both RDA and regressions found that Wisconsin River and Agriculture were correlated with genetic differentiation. However, only Agriculture had an acceptable type I error rate (3–5%) to be considered biologically relevant. Collectively, this study highlights the benefits of combining robust statistics and error assessment via simulations and provides a method for hypothesis testing in individual-based landscape genetics.  相似文献   

18.
When movement outcome differs consistently from the intended movement, errors are used to correct subsequent movements (e.g., adaptation to displacing prisms or force fields) by updating an internal model of motor and/or sensory systems. Here, we examine changes to an internal model of the motor system under changes in the variance structure of movement errors lacking an overall bias. We introduced a horizontal visuomotor perturbation to change the statistical distribution of movement errors anisotropically, while monetary gains/losses were awarded based on movement outcomes. We derive predictions for simulated movement planners, each differing in its internal model of the motor system. We find that humans optimally respond to the overall change in error magnitude, but ignore the anisotropy of the error distribution. Through comparison with simulated movement planners, we found that aimpoints corresponded quantitatively to an ideal movement planner that updates a strictly isotropic (circular) internal model of the error distribution. Aimpoints were planned in a manner that ignored the direction-dependence of error magnitudes, despite the continuous availability of unambiguous information regarding the anisotropic distribution of actual motor errors.  相似文献   

19.
This paper examines the consequences of observation errors for the "random walk with drift", a model that incorporates density independence and is frequently used in population viability analysis. Exact expressions are given for biases in estimates of the mean, variance and growth parameters under very general models for the observation errors. For other quantities, such as the finite rate of increase, and probabilities about population size in the future we provide and evaluate approximate expressions. These expressions explain the biases induced by observation error without relying exclusively on simulations, and also suggest ways to correct for observation error. A secondary contribution is a careful discussion of observation error models, presented in terms of either log-abundance or abundance. This discussion recognizes that the bias and variance in observation errors may change over time, the result of changing sampling effort or dependence on the underlying population being sampled.  相似文献   

20.

Background

There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses.

Methodology/Principal Findings

We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature.

Conclusions/Significance

Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号