首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine-Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry.  相似文献   

2.
We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently approximately 98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8-10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12-15 highly variable markers and only 15-20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.  相似文献   

3.
The Yellowstone National Park bison herd is 1 of only 2 populations known to have continually persisted on their current landscape since pre-Columbian times. Over the last century, the census size of this herd has fluctuated from around 100 individuals to over 3000 animals. Previous studies involving radiotelemetry, tooth wear, and parturition timing provide evidence of at least 2 distinct groups of bison within Yellowstone National Park. To better understand the biology of Yellowstone bison, we investigated the potential for limited gene flow across this population using multilocus Bayesian clustering analysis. Two genetically distinct and clearly defined subpopulations were identified based on both genotypic diversity and allelic distributions. Genetic cluster assignments were highly correlated with sampling locations for a subgroup of live capture individuals. Furthermore, a comparison of the cluster assignments to the 2 principle winter cull sites revealed critical differences in migration patterns across years. The 2 Yellowstone subpopulations display levels of differentiation that are only slightly less than that between populations which have been geographically and reproductively isolated for over 40 years. The identification of cryptic population subdivision and genetic differentiation of this magnitude highlights the importance of this biological phenomenon in the management of wildlife species.  相似文献   

4.
Choi SC  Hey J 《Genetics》2011,189(2):561-577
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.  相似文献   

5.
Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns.  相似文献   

6.
Dunson DB  Chen Z  Harry J 《Biometrics》2003,59(3):521-530
In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this dependency can produce biased inferences. This article proposes a Bayesian framework for jointly modeling cluster size and multiple categorical and continuous outcomes measured on each subunit. We use a continuation ratio probit model for the cluster size and underlying normal regression models for each of the subunit-specific outcomes. Dependency between cluster size and the different outcomes is accommodated through a latent variable structure. The form of the model facilitates posterior computation via a simple and computationally efficient Gibbs sampler. The approach is illustrated with an application to developmental toxicity data, and other applications, to joint modeling of longitudinal and event time data, are discussed.  相似文献   

7.
Faubet P  Gaggiotti OE 《Genetics》2008,178(3):1491-1504
We present a new multilocus genotype method that makes inferences about recent immigration rates and identifies the environmental factors that are more likely to explain observed gene flow patterns. It also estimates population-specific inbreeding coefficients, allele frequencies, and local population F(ST)'s and performs individual assignments. We generate synthetic data sets to determine the region of the parameter space where our method is and is not able to provide accurate estimates. Our simulation study indicates that reliable results can be obtained when the global level of genetic differentiation (F(ST)) is >1%, the number of loci is only 10, and sample sizes are of the order of 50 individuals per population. We illustrate our method by applying it to Pakistani human data, considering altitude and geographic distance as explanatory factors. Our results suggest that altitude explains better the genetic data than geographic distance. Additionally, they show that southern low-altitude populations have higher migration rates than northern high-altitude ones.  相似文献   

8.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

9.
Association studies are one of the major strategies for identifying genetic factors underlying complex traits. In samples of related individuals, conventional statistical procedures are not valid for testing association, and maximum likelihood (ML) methods have to be used, but they are computationally demanding and are not necessarily robust to violations of their assumptions. Estimating equations (EE) offer an alternative to ML methods, for estimating association parameters in correlated data. We studied through simulations the behavior of EE in a large range of practical situations, including samples of nuclear families of varying sizes and mixtures of related and unrelated individuals. For a quantitative phenotype, the power of the EE test was comparable to that of a conventional ML test and close to the power expected in a sample of unrelated individuals. For a binary phenotype, the power of the EE test decreased with the degree of clustering, as did the power of the ML test. This result might be partly explained by a modeling of the correlations between responses that is less efficient than that in the quantitative case. In small samples (< 50 families), the variance of the EE association parameter tended to be underestimated, leading to an inflation of the type I error. The heterogeneity of cluster size induced a slight loss of efficiency of the EE estimator, by comparison with balanced samples. The major advantages of the EE technique are its computational simplicity and its great flexibility, easily allowing investigation of gene-gene and gene-environment interactions. It constitutes a powerful tool for testing genotype-phenotype association in related individuals.  相似文献   

10.
Parentage analysis in natural populations is a powerful tool for addressing a wide range of ecological and evolutionary questions. However, identifying parent–offspring pairs in samples collected from natural populations is often more challenging than simply resolving the Mendelian pattern of shared alleles. For example, large numbers of pairwise comparisons and limited numbers of genetic markers can contribute to incorrect assignments, whereby unrelated individuals are falsely identified as parent–offspring pairs. Determining which parentage methods are the least susceptible to making false assignments is an important challenge facing molecular ecologists. In a recent paper, Harrison et al. (2013a) address this challenge by comparing three commonly used parentage methods, including a Bayesian approach, in order to explore the effects of varied proportions of sampled parents on the accuracy of parentage assignments. Unfortunately, Harrison et al. made a simple error in using the Bayesian approach, which led them to incorrectly conclude that this method could not control the rate of false assignment. Here, I briefly outline the basic principles behind the Bayesian approach, identify the error made by Harrison et al., and provide detailed guidelines as to how the method should be correctly applied. Furthermore, using the exact data from Harrison et al., I show that the Bayesian approach actually provides greater control over the number of false assignments than either of the other tested methods. Lastly, I conclude with a brief introduction to solomon , a recently updated version of the Bayesian approach that can account for genotyping error, missing data and false matching.  相似文献   

11.
12.
King R  Brooks SP  Coulson T 《Biometrics》2008,64(4):1187-1195
SUMMARY: We consider the issue of analyzing complex ecological data in the presence of covariate information and model uncertainty. Several issues can arise when analyzing such data, not least the need to take into account where there are missing covariate values. This is most acutely observed in the presence of time-varying covariates. We consider mark-recapture-recovery data, where the corresponding recapture probabilities are less than unity, so that individuals are not always observed at each capture event. This often leads to a large amount of missing time-varying individual covariate information, because the covariate cannot usually be recorded if an individual is not observed. In addition, we address the problem of model selection over these covariates with missing data. We consider a Bayesian approach, where we are able to deal with large amounts of missing data, by essentially treating the missing values as auxiliary variables. This approach also allows a quantitative comparison of different models via posterior model probabilities, obtained via the reversible jump Markov chain Monte Carlo algorithm. To demonstrate this approach we analyze data relating to Soay sheep, which pose several statistical challenges in fully describing the intricacies of the system.  相似文献   

13.
Inference of population structure using multilocus genotype data   总被引:243,自引:0,他引:243  
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci-e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/ approximately pritch/home. html.  相似文献   

14.
Nathoo FS  Dean CB 《Biometrics》2008,64(1):271-279
Summary .   Follow-up medical studies often collect longitudinal data on patients. Multistate transitional models are useful for analysis in such studies where at any point in time, individuals may be said to occupy one of a discrete set of states and interest centers on the transition process between states. For example, states may refer to the number of recurrences of an event, or the stage of a disease. We develop a hierarchical modeling framework for the analysis of such longitudinal data when the processes corresponding to different subjects may be correlated spatially over a region. Continuous-time Markov chains incorporating spatially correlated random effects are introduced. Here, joint modeling of both spatial dependence as well as dependence between different transition rates is required and a multivariate spatial approach is employed. A proportional intensities frailty model is developed where baseline intensity functions are modeled using parametric Weibull forms, piecewise-exponential formulations, and flexible representations based on cubic B-splines. The methodology is developed within the context of a study examining invasive cardiac procedures in Quebec. We consider patients admitted for acute coronary syndrome throughout the 139 local health units of the province and examine readmission and mortality rates over a 4-year period.  相似文献   

15.
Landscape heterogeneity plays an important role in population structure and divergence, particularly for species with limited vagility. Here, we used a landscape genetic approach to identify how landscape and environmental variables affect genetic structure and color morph frequency in a polymorphic salamander. The eastern red‐backed salamander, Plethodon cinereus, is widely distributed in northeastern North America and contains two common color morphs, striped and unstriped, that are divergent in ecology, behavior, and physiology. To quantify population structure, rates of gene flow, and genetic drift, we amplified 10 microsatellite loci from 648 individuals across 28 sampling localities. This study was conducted in northern Ohio, where populations of P. cinereus exhibit an unusually wide range of morph frequency variation. To test whether genetic distance was more correlated with morph frequency, elevation, canopy cover, waterways, ecological niche or geographic distance, we used resistance distance and least cost path analyses. We then examined whether landscape and environmental variables, genetic distance or geographic distance were correlated with variation in morph frequency. Tests for population structure revealed three genetic clusters across our sampling range, with one cluster monomorphic for the striped morph. Rates of gene flow and genetic drift were low to moderate across sites. Genetic distance was most correlated with ecological niche, elevation and a combination of landscape and environmental variables. In contrast, morph frequency variation was correlated with waterways and geographic distance. Thus, our results suggest that selection is also an important evolutionary force across our sites, and a balance between gene flow, genetic drift and selection interact to maintain the two color morphs.  相似文献   

16.
Many studies in the fields of genetic epidemiology and applied population genetics are predicated on, or require, an assessment of the genetic background diversity of the individuals chosen for study. A number of strategies have been developed for assessing genetic background diversity. These strategies typically focus on genotype data collected on the individuals in the study, based on a panel of DNA markers. However, many of these strategies are either rooted in cluster analysis techniques, and hence suffer from problems inherent to the assignment of the biological and statistical meaning to resulting clusters, or have formulations that do not permit easy and intuitive extensions. We describe a very general approach to the problem of assessing genetic background diversity that extends the analysis of molecular variance (AMOVA) strategy introduced by Excoffier and colleagues some time ago. As in the original AMOVA strategy, the proposed approach, termed generalized AMOVA (GAMOVA), requires a genetic similarity matrix constructed from the allelic profiles of individuals under study and/or allele frequency summaries of the populations from which the individuals have been sampled. The proposed strategy can be used to either estimate the fraction of genetic variation explained by grouping factors such as country of origin, race, or ethnicity, or to quantify the strength of the relationship of the observed genetic background variation to quantitative measures collected on the subjects, such as blood pressure levels or anthropometric measures. Since the formulation of our test statistic is rooted in multivariate linear models, sets of variables can be related to genetic background in multiple regression-like contexts. GAMOVA can also be used to complement graphical representations of genetic diversity such as tree diagrams (dendrograms) or heatmaps. We examine features, advantages, and power of the proposed procedure and showcase its flexibility by using it to analyze a wide variety of published data sets, including data from the Human Genome Diversity Project, classical anthropometry data collected by Howells, and the International HapMap Project.  相似文献   

17.
Hidden Markov modeling (HMM) can be applied to extract single channel kinetics at signal-to-noise ratios that are too low for conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The optimization approach has the advantage that it optimizes the rate constants directly. This allows setting constraints on the rate constants, fitting multiple data sets across different experimental conditions, and handling nonstationary channels where the starting probability of the channel depends on the unknown kinetics. We present here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an autoregressive (AR) process. In addition to correlated background noise, the algorithm allows for excess open channel noise that can be white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the algorithm to data containing multiple channels is described. Examples are presented that demonstrate the applicability and effectiveness of the algorithm. Practical issues such as the selection of appropriate noise AR orders are also discussed through examples.  相似文献   

18.
Here we describe a new algorithm for automatically determining the mainchain sequential assignment of NMR spectra for proteins. Using only the customary triple resonance experiments, assignments can be quickly found for not only small proteins having rather complete data, but also for large proteins, even when only half the residues can be assigned. The result of the calculation is not the single best assignment according to some criterion, but rather a large number of satisfactory assignments that are summarized in such a way as to help the user identify portions of the sequence that are assigned with confidence, vs. other portions where the assignment has some correlated alternatives. Thus very imperfect initial data can be used to suggest future experiments.  相似文献   

19.
Statistical species delimitation usually relies on singular data, primarily genetic, for detecting putative species and individual assignment to putative species. Given the variety of speciation mechanisms, singular data may not adequately represent the genetic, morphological and ecological diversity relevant to species delimitation. We describe a methodological framework combining multivariate and clustering techniques that uses genetic, morphological and ecological data to detect and assign individuals to putative species. Our approach recovers a similar number of species recognized using traditional, qualitative taxonomic approaches that are not detected when using purely genetic methods. Furthermore, our approach detects groupings that traditional, qualitative taxonomic approaches do not. This empirical test suggests that our approach to detecting and assigning individuals to putative species could be useful in species delimitation despite varying levels of differentiation across genetic, phenotypic and ecological axes. This work highlights a critical, and often overlooked, aspect of the process of statistical species delimitation—species detection and individual assignment. Irrespective of the species delimitation approach used, all downstream processing relies on how individuals are initially assigned, and the practices and statistical issues surrounding individual assignment warrant careful consideration.  相似文献   

20.
MOTIVATION: The analysis of genetic data poses statistical problems in the form of high dimensionality with small sample sizes. The construction of a composite gene region (sequence pair) heterogeneity measure is one technique for reducing the dimensionality of the problem. This approach however is not without cost, since the contribution of locations to observed gene region differences between groups becomes entangled in this summary measure. This is problematic since it is of scientific interest to identify locations that together depict phenotype. RESULTS: A method is proposed for relating observed gene region heterogeneity back to the location level. In the spirit of a factor analysis-type setting, the approach focuses on identifying a latent variable structure among locations to explain within and between group genetic differences associated with phenotype. The method is flexible for identifying either the additive contribution from individual locations or the additive contribution from a group of locations, to observed gene region heterogeneity, depending upon the weighting scheme used in constructing a gene region heterogeneity measure. The approach is illustrated with clinical trial data, where the problem of altered HIV drug susceptibility is examined through characterizing location contributions to HIV protease gene region differences associated with a phenotypic treatment response. AVAILABILITY: The Splus (MathSoft, Inc. S-Plus 2000, Seattle, WA, 1999) developed menu-driven functions for obtaining results, GENE_ S (J.Kowalski, Harvard School of Public Health, Boston, MA 2001), is available from the author upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号