首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of ascertainment for linkage analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
It is generally believed that ascertainment corrections are unnecessary in linkage analysis, provided individuals are selected for study solely on the basis of trait phenotype and not on the basis of marker genotype. The theoretical rationale for this is that standard linkage analytic methods involve conditioning likelihoods on all the trait data, which may be viewed as an application of the ascertainment assumption-free (AAF) method of Ewens and Shute. In this paper, we show that when the observed pedigree structure depends on which relatives within a pedigree happen to have been the probands (proband-dependent, or PD, sampling) conditioning on all the trait data is not a valid application of the AAF method and will result in asymptotically biased estimates of genetic parameters (except under single ascertainment). Furthermore, this result holds even if the recombination fraction R is the only parameter of interest. Since the lod score is proportional to the likelihood of the marker data conditional on all the trait data, this means that when data are obtained under PD sampling the lod score will yield asymptotically biased estimates of R, and that so-called mod scores (i.e., lod scores maximized over both R and parameters theta of the trait distribution) will yield asymptotically biased estimates of R and theta. Furthermore, the problem appears to be intractable, in the sense that it is not possible to formulate the correct likelihood conditional on observed pedigree structure. In this paper we do not investigate the numerical magnitude of the bias, which may be small in many situations. On the other hand, virtually all linkage data sets are collected under PD sampling. Thus, the existence of this bias will be the rule rather than the exception in the usual applications.  相似文献   

2.
Stoklosa J  Hwang WH  Wu SH  Huggins R 《Biometrics》2011,67(4):1659-1665
In practice, when analyzing data from a capture-recapture experiment it is tempting to apply modern advanced statistical methods to the observed capture histories. However, unless the analysis takes into account that the data have only been collected from individuals who have been captured at least once, the results may be biased. Without the development of new software packages, methods such as generalized additive models, generalized linear mixed models, and simulation-extrapolation cannot be readily implemented. In contrast, the partial likelihood approach allows the analysis of a capture-recapture experiment to be conducted using commonly available software. Here we examine the efficiency of this approach and apply it to several data sets.  相似文献   

3.
To obtain accurate estimates of activity budget parameters, samples must be unbiased and precise. Many researchers have considered how biased data may affect their ability to draw conclusions and examined ways to decrease bias in sampling efforts, but few have addressed the implications of not considering estimate precision. We propose a method to assess whether the number of instantaneous samples collected is sufficient to obtain precise activity budget parameter estimates. We draw on sampling theory to determine the number of observations per animal required to reach a desired bound on the error of estimation based on a stratified random sample, with individual animals acting as strata. We also discuss the optimal balance between the number of individuals sampled and the number of observations sampled per individual for a variety of sampling conditions. We present an empirical dataset on pronghorn (Antilocapra americana) as an example of the utility of the method. The required numbers of observation to reach precise estimates for pronghorn varied between common and rare behaviors, but precise estimates were achieved with <255 observations per individual for common behaviors. The two most apparent factors affecting the required number of observations for precise estimates were the number of individuals sampled and the complexity of the activity budget. This technique takes into account variation associated with individual activity budgets and population variation in activity budget parameter estimates, and helps to ensure that estimates are precise. The method can also be used for planning future sampling efforts.  相似文献   

4.
In clinical and epidemiological studies information on the primary outcome of interest, that is, the disease status, is usually collected at a limited number of follow‐up visits. The disease status can often only be retrieved retrospectively in individuals who are alive at follow‐up, but will be missing for those who died before. Right‐censoring the death cases at the last visit (ad‐hoc analysis) yields biased hazard ratio estimates of a potential risk factor, and the bias can be substantial and occur in either direction. In this work, we investigate three different approaches that use the same likelihood contributions derived from an illness‐death multistate model in order to more adequately estimate the hazard ratio by including the death cases into the analysis: a parametric approach, a penalized likelihood approach, and an imputation‐based approach. We investigate to which extent these approaches allow for an unbiased regression analysis by evaluating their performance in simulation studies and on a real data example. In doing so, we use the full cohort with complete illness‐death data as reference and artificially induce missing information due to death by setting discrete follow‐up visits. Compared to an ad‐hoc analysis, all considered approaches provide less biased or even unbiased results, depending on the situation studied. In the real data example, the parametric approach is seen to be too restrictive, whereas the imputation‐based approach could almost reconstruct the original event history information.  相似文献   

5.
The occurrence of decompression sickness in animals and humans is characterized by the extreme variability of individual response. Nevertheless, models and analyses of decompression results have generally used a critical value approach to separate safe and unsafe decompression procedures. Application of the principle of maximum likelihood provides a formal and consistent way to quantify decompression risk and to apply models to data on decompression outcome. By use of the maximum likelihood principle, a number of models were fit to data from dose-response and maximum pressure-reduction experiments with both rats and men. Several different formulations of two- and three-parameter models described the data well. In addition to summarizing data sets, the analyses provide a way to maximize the value of experimental observations, test theoretical predictions, estimate uncertainty in conclusions, and recommend safe practices.  相似文献   

6.
Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations.  相似文献   

7.
Aim Propagule size and output are critical for the ability of a plant species to colonize new environments. If invasive species have a greater reproductive output than native species (via more and/or larger seeds), then they will have a greater dispersal and establishment ability. Previous comparisons within plant genera, families or environments have conflicted over the differences in reproductive traits between native and invasive species. We went beyond a genus‐, family‐ or habitat‐specific approach and analysed data for plant reproductive traits from the global literature, to investigate whether: (1) seed mass and production differ between the original and introduced ranges of invasive species; (2) seed mass and production differ between invasives and natives; and (3) invasives produce more seeds per unit seed mass than natives. Location Global. Methods We combined an existing data set of native plant reproductive data with a new data compilation for invasive species. We used t‐tests to compare original and introduced range populations, two‐way ANOVAs to compare natives and invasives, and an ANCOVA to examine the relationship between seed mass and production for natives and invasives. The ANCOVA was performed again incorporating phylogenetically independent contrasts to overcome any phylogenetic bias in the data sets. Results Neither seed mass nor seed production of invasive species differed between their introduced and original ranges. We found no significant difference in seed mass between invasives and natives after growth form had been accounted for. Seed production was greater for invasive species overall and within herb and woody growth forms. For a given seed mass, invasive species produced 6.7‐fold (all species), 6.9‐fold (herbs only) and 26.1‐fold (woody species only) more seeds per individual per year than native species. The phylogenetic ANCOVA verified that this trend did not appear to be influenced by phylogenetic bias within either data set. Main conclusions This study provides the first global examination of both seed mass and production traits in native and invasive species. Invasive species express a strategy of greater seed production both overall and per unit seed mass compared with natives. The consequent increased likelihood of establishment from long‐distance seed dispersal may significantly contribute to the invasiveness of many exotic species.  相似文献   

8.
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller intraclass correlations (ICCs) lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random, and cases in which data are missing at random are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared.  相似文献   

9.
Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.  相似文献   

10.
Dorazio RM  Royle JA 《Biometrics》2003,59(2):351-364
We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe hares for estimating abundance and to counts of bird species for estimating species richness. In both sets of data, rates of detection appear to vary more among individuals (animals or species) than among sampling occasions or locations. The estimates of population size and species richness are sensitive to model-specific assumptions about the latent distribution of individual rates of detection. We demonstrate using simulation experiments that conventional diagnostics for assessing model adequacy, such as deviance, cannot be relied on for selecting classes of mixture models that produce valid inferences about population size. Prior knowledge about sources of individual heterogeneity in detection rates, if available, should be used to help select among classes of mixture models that are to be used for inference.  相似文献   

11.
In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero‐inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over‐dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero‐inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same‐day separations. Random effects are introduced to account for inter‐hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log‐likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non‐parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.  相似文献   

12.
Parker CB  Delong ER 《Biometrics》2000,56(4):996-1001
Changes in maximum likelihood parameter estimates due to deletion of individual observations are useful statistics, both for regression diagnostics and for computing robust estimates of covariance. For many likelihoods, including those in the exponential family, these delete-one statistics can be approximated analytically from a one-step Newton-Raphson iteration on the full maximum likelihood solution. But for general conditional likelihoods and the related Cox partial likelihood, the one-step method does not reduce to an analytic solution. For these likelihoods, an alternative analytic approximation that relies on an appropriately augmented design matrix has been proposed. In this paper, we extend the augmentation approach to explicitly deal with discrete failure-time models. In these models, an individual subject may contribute information at several time points, thereby appearing in multiple risk sets before eventually experiencing a failure or being censored. Our extension also allows the covariates to be time dependent. The new augmentation requires no additional computational resources while improving results.  相似文献   

13.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

14.
ABSTRACT: BACKGROUND: Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher's intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele. FINDINGS: Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model's convergence. The functions are collectively available as the MLEP R package. CONCLUSIONS: Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at http://cran.r-project.org/web/packages/MLEP/index.html.  相似文献   

15.
Dispersal is one of the most important factors determining the genetic structure of a population, but good data on dispersal distances are rare because it is difficult to observe a large sample of dispersal events. However, genetic data contain unbiased information about the average dispersal distances in species with a strong sex bias in their dispersal rates. By plotting the genetic similarity between members of the philopatric sex against some measure of the distance between them, the resulting regression line can be used for estimating how far dispersing individuals of the opposite sex have moved before settling. Dispersers showing low genetic similarity to members of the opposite sex will on average have originated from further away. Applying this method to a microsatellite dataset from lions (Panthera leo) shows that their average dispersal distance is 1.3 home ranges with a 95% confidence interval of 0.4-3.0 home ranges. These results are consistent with direct observations of dispersal from our study population and others. In this case, direct observations of dispersal distance were not detectably biased by a failure to detect long-range dispersal, which is thought to be a common problem in the estimation of dispersal distance.  相似文献   

16.
A common task in microbiology involves determining the composition of a mixed population of individuals by drawing a sample from the population and using some procedure to identify the individuals in the sample. There may be a significant probability that the identification procedure misidentifies some members of the sample (for example, because the available data are insufficient unambiguously to identify an individual) which makes finding the proportions in the underlying population non-trivial. A further complication arises where individuals are present in the population that do not belong to any of the subpopulations recognised by use of the identification procedure. A simple algorithm is presented to address these problems and construct a maximum likelihood estimate of the proportions, together with confidence limits. The technique is illustrated using an example drawn from flow cytometry in which phytoplankton cells are identified from flow cytometry data by an RBF neural network, and the limitations of the approach are discussed.  相似文献   

17.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

18.
Abstract: Home-range models implicitly assume equal observation rates across the study area. Because this assumption is frequently violated, we describe methods for correcting home-range models for observation bias. We suggest corrections for 3 general types of home-range models including those for which parameters are estimated using least-squares theory, models utilizing maximum likelihood for parameter estimation, and models based on kernel smoothing techniques. When applied to mule deer (Odocoileus hemionus) location data, we found that uncorrected estimates of the utilization distribution were biased low by as much as 18.4% and biased high by 19.2% when compared to corrected estimates. Because the magnitude of bias is related to several factors, future research should determine the relative influence of each of these factors on home-range bias.  相似文献   

19.
Comparative methods analyses have usually assumed that the species phenotypes are the true means for those species. In most analyses, the actual values used are means of samples of modest size. The covariances of contrasts then involve both the covariance of evolutionary changes and a fraction of the within-species phenotypic covariance, the fraction depending on the sample size for that species. Ives et al. have shown how to analyze data in this case when the within-species phenotypic covariances are known. The present model allows them to be unknown and to be estimated from the data. A multivariate normal statistical model is used for multiple characters in samples of finite size from species related by a known phylogeny, under the usual Brownian motion model of change and with equal within-species phenotypic covariances. Contrasts in each character can be obtained both between individuals within a species and between species. Each contrast can be taken for all of the characters. These sets of contrasts, each the same contrast taken for different characters, are independent. The within-set covariances are unequal and depend on the unknown true covariance matrices. An expectation-maximization algorithm is derived for making a reduced maximum likelihood estimate of the covariances of evolutionary change and the within-species phenotypic covariances. It is available in the Contrast program of the PHYLIP package. Computer simulations show that the covariances are biased when the finiteness of sample size is not taken into account and that using the present model corrects the bias. Sampling variation reduces the power of inference of covariation in evolution of different characters. An extension of this method to incorporate estimates of additive genetic covariances from a simple genetic experiment is also discussed.  相似文献   

20.
Beyer J  May B 《Molecular ecology》2003,12(8):2243-2250
We present an algorithm to partition a single generation of individuals into full-sib families using single-locus co-dominant marker data. Pairwise likelihood ratios are used to create a graph that represents the full-sib relationships within the data set. Connected-component and minimum-cut algorithms from the graph theory are then employed to find the full-sib families within the graph. The results of a large-scale simulation study show that the algorithm is able to produce accurate partitions when applied to data sets with eight or more loci. Although the algorithm performs best when the distribution of allele frequencies and family sizes in a data set is uniform, the inclusion of more loci or alleles per locus allows accurate partitions to be created from data sets in which these distributions are highly skewed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号