首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 620 毫秒
1.
The aggregation of lipids [total cholesterol (CH) and triglyceride (TG)] and lipoproteins [high-density lipoprotein cholesterol (HDL) and low-density lipoprotein cholesterol (LDL)] in families ascertained through random and nonrandom probands in the Iowa Lipid Research Clinics family study was examined. Nonrandom probands were selected because their lipid levels (at a prior screening visit) exceeded a certain pre-specified threshold. The statistical method conditions the likelihood function on the actual event that the proband's value is beyond the threshold. This method allows for estimation of the path model parameters in randomly and nonrandomly ascertained families jointly and separately, thus enabling tests of heterogeneity between the two types of samples. Marked heterogeneity between the random and the hyperlipidemic samples is detected in the multifactorial transmission for TG and HDL, and moderate heterogeneity is detected for CH and LDL, with a pattern of higher genetic heritability estimates in the random than nonrandom samples. The observed pattern of heterogeneity is compatible with a higher prevalence in the random sample of certain dyslipoproteinemias that are associated with nonelevated lipids. For the random samples, genetic heritabilities are higher for CH and HDL (about 60%) than for TG and LDL (about 50%). For the nonrandom samples those estimates are about 45, 40, 35 and 30% for HDL, CH, LDL and TG, respectively. Little to no cultural (familial environmental) heritability is evident for CH and LDL, although 10-20% of the phenotypic variance is due to cultural factors for TG and HDL. These results suggest that the etiologies for lipids and lipoproteins may be quite different in random versus hyperlipidemic samples.  相似文献   

2.
Growing interest in adaptive evolution in natural populations has spurred efforts to infer genetic components of variance and covariance of quantitative characters. Here, I review difficulties inherent in the usual least-squares methods of estimation. A useful alternative approach is that of maximum likelihood (ML). Its particular advantage over least squares is that estimation and testing procedures are well defined, regardless of the design of the data. A modified version of ML, REML, eliminates the bias of ML estimates of variance components. Expressions for the expected bias and variance of estimates obtained from balanced, fully hierarchical designs are presented for ML and REML. Analyses of data simulated from balanced, hierarchical designs reveal differences in the properties of ML, REML, and F-ratio tests of significance. A second simulation study compares properties of REML estimates obtained from a balanced, fully hierarchical design (within-generation analysis) with those from a sampling design including phenotypic data on parents and multiple progeny. It also illustrates the effects of imposing nonnegativity constraints on the estimates. Finally, it reveals that predictions of the behavior of significance tests based on asymptotic theory are not accurate when sample size is small and that constraining the estimates seriously affects properties of the tests. Because of their great flexibility, likelihood methods can serve as a useful tool for estimation of quantitative-genetic parameters in natural populations. Difficulties involved in hypothesis testing remain to be solved.  相似文献   

3.
Mandel M  Betensky RA 《Biometrics》2007,63(2):405-412
Several goodness-of-fit tests of a lifetime distribution have been suggested in the literature; many take into account censoring and/or truncation of event times. In some contexts, a goodness-of-fit test for the truncation distribution is of interest. In particular, better estimates of the lifetime distribution can be obtained when knowledge of the truncation law is exploited. In cross-sectional sampling, for example, there are theoretical justifications for the assumption of a uniform truncation distribution, and several studies have used it to improve the efficiency of their survival estimates. The duality of lifetime and truncation in the absence of censoring enables methods for testing goodness of fit of the lifetime distribution to be used for testing goodness of fit of the truncation distribution. However, under random censoring, this duality does not hold and different tests are required. In this article, we introduce several goodness-of-fit tests for the truncation distribution and investigate their performance in the presence of censored event times using simulation. We demonstrate the use of our tests on two data sets.  相似文献   

4.
Prosopis represents a valuable forest resource in arid and semiarid regions. Management of promising species requires information about genetic parameters, mainly the heritability (h(2)) of quantitative profitable traits. This parameter is traditionally estimated from progeny tests or half-sib analysis conducted in experimental stands. Such an approach estimates h(2) from the ratio of between-family/total phenotypic variance. These analyses are difficult to apply to natural populations of species with a long life cycle, overlapping generations, and a mixed mating system, without genealogical information. A promising alternative is the use of molecular marker information to infer relatedness between individuals and to estimate h(2) from the regression of phenotypic similarity on inferred relatedness. In the current study we compared h(2) of 13 quantitative traits estimated by these two methods in an experimental stand of P. alba, where genealogical information was available. We inferred pairwise relatedness by Ritland's method using six microsatellite loci. Relatedness and heritability estimates from molecular information were highly correlated to the values obtained from genealogical data. Although Ritland's method yields lower h(2) estimates and tends to overestimate genetic correlations between traits, this approach is useful to predict the expected relative gain of different quantitative traits under selection without genealogical information.  相似文献   

5.
6.
The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well.  相似文献   

7.
Susan Murray 《Biometrics》2001,57(2):361-368
This research introduces methods for nonparametric testing of weighted integrated survival differences in the context of paired censored survival designs. The current work extends work done by Pepe and Fleming (1989, Biometrics 45, 497-507), which considered similar test statistics directed toward independent treatment group comparisons. An asymptotic closed-form distribution of the proposed family of tests is presented, along with variance estimates constructed under null and alternative hypotheses using nonparametric maximum likelihood estimates of the closed-form quantities. The described method allows for additional information from individuals with no corresponding matched pair member to be incorporated into the test statistic in sampling scenarios where singletons are not prone to selection bias. Simulations presented over a range of potential dependence in the paired censored survival data demonstrate substantial power gains associated with taking into account the dependence structure. Consequences of ignoring the paired nature of the data include overly conservative tests in terms of power and size. In fact, simulation results using tests for independent samples in the presence of positive correlation consistently undershot both size and power targets that would have been attained in the absence of correlation. This additional worrisome effect on operating characteristics highlights the need for accounting for dependence in this popular family of tests.  相似文献   

8.
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.  相似文献   

9.
Characterizing dispersal kernels from truncated data is important for managing and predicting population dynamics. We used mark-recapture data from 10 previously published replicated experiments at three host plant development stages (seedling, tillering, and heading) to estimate parameters of the normal and exponential dispersal kernels for green rice leafhopper, Nephotettix cincticeps (Uhler). We compared classic statistical methods for estimating untruncated distribution parameters from truncated data with maximum likelihood (MLE) and the method of statistical moments for simulated and empirical data. Simulations showed that both methods provided accurate parameter estimates with similar precision. The method of moments is algebraically complex, but simple to calculate, while the MLE methods require numerical solutions of nonlinear equations. Simulations also showed that accurate, precise estimates of the parameters of the untruncated distributions could be attained even under severe truncation with sufficient numbers of recaptures. Both diffusivity and the exponential mean were higher with later plant growth stage, showing that insects moved farther and faster at the heading stage. Precision of the estimates was not strongly related to percent capture, size of the experimental field, or the number of leafhoppers captured. The leptokurtic exponential kernel fit the data better than the normal kernel for all the experiments. These results support an alternative explanation for the strong density-dependent population regulation of this species at the heading stage. Instead of leafhopper density per se, the increase in movement at this stage could integrate the populations in the separate fields, leveling densities throughout the landscape.  相似文献   

10.
Comparative methods analyses have usually assumed that the species phenotypes are the true means for those species. In most analyses, the actual values used are means of samples of modest size. The covariances of contrasts then involve both the covariance of evolutionary changes and a fraction of the within-species phenotypic covariance, the fraction depending on the sample size for that species. Ives et al. have shown how to analyze data in this case when the within-species phenotypic covariances are known. The present model allows them to be unknown and to be estimated from the data. A multivariate normal statistical model is used for multiple characters in samples of finite size from species related by a known phylogeny, under the usual Brownian motion model of change and with equal within-species phenotypic covariances. Contrasts in each character can be obtained both between individuals within a species and between species. Each contrast can be taken for all of the characters. These sets of contrasts, each the same contrast taken for different characters, are independent. The within-set covariances are unequal and depend on the unknown true covariance matrices. An expectation-maximization algorithm is derived for making a reduced maximum likelihood estimate of the covariances of evolutionary change and the within-species phenotypic covariances. It is available in the Contrast program of the PHYLIP package. Computer simulations show that the covariances are biased when the finiteness of sample size is not taken into account and that using the present model corrects the bias. Sampling variation reduces the power of inference of covariation in evolution of different characters. An extension of this method to incorporate estimates of additive genetic covariances from a simple genetic experiment is also discussed.  相似文献   

11.
Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard relationships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1-COI and cyt b-tRNA(Thr) of the mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods, likelihood methods with assumed partitions, Bayesian methods with assumed partitions, and Bayesian mixture models. The topology (Uma, (Callisaurus, (Cophosaurus, Holbrookia))) and thus monophyly of the "earless" taxa, Cophosaurus and Holbrookia, is supported by all analyses. Previously proposed topologies in which Uma and Callisaurus are sister taxa and those in which Holbrookia is the sister group to all other sand lizard taxa are rejected using both parsimony and likelihood-based significance tests with the combined, unparitioned data set. Bayesian hypothesis tests also reject those topologies using six assumed partitioning strategies, and the two partitioning strategies presumably associated with the most powerful tests also reject a third previously proposed topology, in which Callisaurus and Cophosaurus are sister taxa. For both maximum likelihood and Bayesian methods with assumed partitions, those partitions defined by codon position and tRNA stem and nonstems explained the data better than other strategies examined. Bayes factor estimates comparing results of assumed partitions versus mixture models suggest that mixture models perform better than assumed partitions when the latter were not based on functional characteristics of the data, such as codon position and tRNA stem and nonstems. However, assumed partitions performed better than mixture models when functional differences were incorporated. We reiterate the importance of accounting for heterogeneous evolutionary processes in the analysis of complex data sets and emphasize the importance of implementing mixed model likelihood methods.  相似文献   

12.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

13.
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff’s methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff’s statistics for clusters of high population density or large size; otherwise Kulldorff’s statistics are superior.  相似文献   

14.
D C Thomas  M Blettner  N E Day 《Biometrics》1992,48(3):781-794
A method is proposed for analysis of nested case-control studies that combines the matched comparison of covariate values between cases and controls and a comparison of the observed numbers of cases in the nesting cohort with expected numbers based on external rates and average relative risks estimated from the controls. The former comparison is based on the conditional likelihood for matched case-control studies and the latter on the unconditional likelihood for Poisson regression. It is shown that the two likelihoods are orthogonal and that their product is an estimator of the full survival likelihood that would have been obtained on the total cohort, had complete covariate data been available. Parameter estimation and significance tests follow in the usual way by maximizing this product likelihood. The method is illustrated using data on leukemia following irradiation for cervical cancer. In this study, the original cohort study showed a clear excess of leukemia in the first 15 years after exposure, but it was not feasible to obtain dose estimates on the entire cohort. However, the subsequent nested case-control study failed to demonstrate significant differences between alternative dose-response relations and effects of time-related modifiers. The combined analysis allows much clearer discrimination between alternative dose-time-response models.  相似文献   

15.
ABSTRACT: BACKGROUND: Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher's intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele. FINDINGS: Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model's convergence. The functions are collectively available as the MLEP R package. CONCLUSIONS: Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at http://cran.r-project.org/web/packages/MLEP/index.html.  相似文献   

16.
Diagnostic studies in ophthalmology frequently involve binocular data where pairs of eyes are evaluated, through some diagnostic procedure, for the presence of certain diseases or pathologies. The simplest approach of estimating measures of diagnostic accuracy, such as sensitivity and specificity, treats eyes as independent, consequently yielding incorrect estimates, especially of the standard errors. Approaches that account for the inter‐eye correlation include regression methods using generalized estimating equations and likelihood techniques based on various correlated binomial models. The paper proposes a simple alternative statistical methodology of jointly estimating measures of diagnostic accuracy for binocular tests based on a flexible model for correlated binary data. Moments' estimation of model parameters is outlined and asymptotic inference is discussed. The resulting estimates are straightforward and easy to obtain, requiring no special statistical software but only elementary calculations. Results of simulations indicate that large‐sample and bootstrap confidence intervals based on the estimates have relatively good coverage properties when the model is correctly specified. The computation of the estimates and their standard errors are illustrated with data from a study on diabetic retinopathy.  相似文献   

17.
We derive the nonparametric maximum likelihood estimate (NPMLE) of the cumulative incidence functions for competing risks survival data subject to interval censoring and truncation. Since the cumulative incidence function NPMLEs give rise to an estimate of the survival distribution which can be undefined over a potentially larger set of regions than the NPMLE of the survival function obtained ignoring failure type, we consider an alternative pseudolikelihood estimator. The methods are then applied to data from a cohort of injecting drug users in Thailand susceptible to infection from HIV-1 subtypes B and E.  相似文献   

18.
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.  相似文献   

19.
Approximate nonparametric maximum likelihood estimation of the tumor incidence rate and comparison of tumor incidence rates between treatment groups are examined in the context of animal carcinogenicity experiments that have interval sacrifice data but lack cause-of-death information. The estimation procedure introduced by MALANI and VAN RYZIN (1988), which can result in a negative estimate of the tumor incidence rate, is modified by employing a numerical method to maximize the likelihood function iteratively, under the constraint that the tumor incidence rate is nonnegative. With the new procedure, estimates can be obtained even if sacrifices occur anywhere within an interval. The resulting estimates have reduced standard error and give more power to the test of two heterogeneous groups. Furthermore, a linear contrast of more than two groups can be tested using our procedure. The proposed estimation and testing methods are illustrated with an experimental data set.  相似文献   

20.
Summary Many major genes have been identified that strongly influence the risk of cancer. However, there are typically many different mutations that can occur in the gene, each of which may or may not confer increased risk. It is critical to identify which specific mutations are harmful, and which ones are harmless, so that individuals who learn from genetic testing that they have a mutation can be appropriately counseled. This is a challenging task, since new mutations are continually being identified, and there is typically relatively little evidence available about each individual mutation. In an earlier article, we employed hierarchical modeling ( Capanu et al., 2008 , Statistics in Medicine 27 , 1973–1992) using the pseudo‐likelihood and Gibbs sampling methods to estimate the relative risks of individual rare variants using data from a case–control study and showed that one can draw strength from the aggregating power of hierarchical models to distinguish the variants that contribute to cancer risk. However, further research is needed to validate the application of asymptotic methods to such sparse data. In this article, we use simulations to study in detail the properties of the pseudo‐likelihood method for this purpose. We also explore two alternative approaches: pseudo‐likelihood with correction for the variance component estimate as proposed by Lin and Breslow (1996, Journal of the American Statistical Association 91 , 1007–1016) and a hybrid pseudo‐likelihood approach with Bayesian estimation of the variance component. We investigate the validity of these hierarchical modeling techniques by looking at the bias and coverage properties of the estimators as well as at the efficiency of the hierarchical modeling estimates relative to that of the maximum likelihood estimates. The results indicate that the estimates of the relative risks of very sparse variants have small bias, and that the estimated 95% confidence intervals are typically anti‐conservative, though the actual coverage rates are generally above 90%. The widths of the confidence intervals narrow as the residual variance in the second‐stage model is reduced. The results also show that the hierarchical modeling estimates have shorter confidence intervals relative to estimates obtained from conventional logistic regression, and that these relative improvements increase as the variants become more rare.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号