首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.  相似文献   

2.
Microsatellite loci have become important in population genetics because of their high level of polymorphism in natural populations, very frequent occurrence throughout the genome, and apparently high mutation rate. Observed repeat numbers (alleles size) in natural populations and expectations based on computer simulations suggest that the range of repeat numbers at a microsatellite locus is restricted. This range is a key parameter that should be properly estimated in order to proceed with calculations of divergence times in phylogenetic studies and to better investigate the within- and between-population variability. The 'plug-in' estimate of range based on the minimum and maximum value observed in a sample is not satisfactory because of the relatively large number of alleles in comparison with typical sample sizes. In this paper, a set of data from 30 dinucleotide microsatellite loci is analysed under the assumption of independence among loci. Bayesian inference on range for one locus is obtained by assuming that constraints on range values exist as sharp bounds. Closed-form calculations and robustness revealed by our analysis suggest that the proposed Bayesian approach might be routinely used by researchers to classify microsatellite loci according to the estimated value of their allelic range.  相似文献   

3.
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. A full likelihood approach for inference about selection at a single site within an otherwise neutral fully linked sequence of sites is described here. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. The mutation model, for the selected and neutral sites, is the infinitely many-sites model where there is no back or parallel mutation at sites. A unique perfect phylogeny, a gene tree, can be constructed from the configuration of mutations on the sample sequences under this model of mutation. The approach is general and can be used for any bi-allelic selection scheme. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time, then using a subdivided population model considering the population frequencies through time as variable population sizes. An importance sampling algorithm is then used to explore over coalescent tree space consistent with the data. The method is applied to a simulated data set and the gene tree presented in Verrelli et al. (2002).  相似文献   

4.
A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study data using the recently developed semiparametric extension of the generalized linear model (GLM), which is substantially more robust to model misspecification than existing approaches based on parametric GLMs. For valid estimation and inference, we use a conditional likelihood to account for the biased sampling design. We describe analysis procedures for estimation and inference for the semiparametric GLM under a conditional likelihood, and we discuss problems with estimation and inference under a conditional likelihood when the response distribution is misspecified. We demonstrate the flexibility of our approach over existing ones through extensive simulation studies, and we apply the methodology to an analysis of the Asset and Health Dynamics Among the Oldest Old study, which motives our research. The proposed approach yields a simple yet versatile solution for handling ODS in a wide variety of possible response distributions and sampling schemes encountered in practice.  相似文献   

5.
6.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

7.
Within-population variation at the DNA level will rarely be studied by sequencing of loci of randomly chosen individuals. Instead, individuals will usually be chosen for sequencing based on some knowledge of their genotype. Data collected in this way require new sampling theory. Motivated by these observations, we have examined the sampling properties of a finite population model with two mutation processes and with no selection or recombination. One mutation process generates new alleles according to an infinite-alleles model, and the other generates polymorphisms at sites according to an infinite-sites model. A sample of n genes is considered. The stationary distribution of the number of segregating sites in a subsample from one of the allelic classes in the sample conditional on the allelic configuration of the sample is studied. A recursive scheme is developed to compute the moments of this distribution, and it is shown that the distribution is functionally independent of the number of additional alleles in the sample and their respective frequencies in the sample. For the case in which the sample contains only two alleles, the distribution of the number of segregating sites in a subsample containing both alleles conditional on the sample frequencies of the alleles is studied. The results are applied to the analysis of DNA sequences of two alleles found at the Adh locus of Drosophila melanogaster. No significant departure from the neutral model is detected.  相似文献   

8.
The relationship between neutral and adaptive genetic diversity is important to understand in assessing the implications of a population bottleneck. Fitness-related genes, such as those of the major histocompatibility complex (MHC), may be influenced by selection, and so retain diversity even when it is lost at neutral markers. We measured MHC class I variation in an archaic reptile species Sphenodon guntheri [North Brother Island (NBI) tuatara], which naturally occurs on one 4 ha island in Cook Strait, New Zealand, and has low levels of microsatellite diversity. MHC variation in S. guntheri was compared with microsatellite DNA variation, and with MHC variation in a large population of Sphenodon punctatus (Cook Strait tuatara) on Stephens Island. The NBI population shows significantly decreased levels of genetic diversity compared with the Stephens Island population. Only three different MHC sequences and three genotypes were found on NBI, compared with 15 sequences and 21 genotypes in a similar sample size from Stephens Island. Two sequences appear to be unique to the NBI population. The assortment of sequence variants into genotypes suggests strong gametic disequilibrium between two MHC class I loci in S. guntheri , and only two haplotypes that were present in Hardy–Weinberg proportions were identified. MHC diversity in NBI tuatara appears to be largely influenced by genetic drift, consistent with a recent population bottleneck. This may compromise the ability of this population to respond to novel disease threats.  相似文献   

9.
I J Wilson  D J Balding 《Genetics》1998,150(1):499-510
Ease and accuracy of typing, together with high levels of polymorphism and widespread distribution in the genome, make microsatellite (or short tandem repeat) loci an attractive potential source of information about both population histories and evolutionary processes. However, microsatellite data are difficult to interpret, in particular because of the frequency of back-mutations. Stochastic models for the underlying genetic processes can be specified, but in the past they have been too complicated for direct analysis. Recent developments in stochastic simulation methodology now allow direct inference about both historical events, such as genealogical coalescence times, and evolutionary parameters, such as mutation rates. A feature of the Markov chain Monte Carlo (MCMC) algorithm that we propose here is that the likelihood computations are simplified by treating the (unknown) ancestral allelic states as auxiliary parameters. We illustrate the algorithm by analyzing microsatellite samples simulated under the model. Our results suggest that a single microsatellite usually does not provide enough information for useful inferences, but that several completely linked microsatellites can be informative about some aspects of genealogical history and evolutionary processes. We also reanalyze data from a previously published human Y chromosome microsatellite study, finding evidence for an effective population size for human Y chromosomes in the low thousands and a recent time since their most recent common ancestor: the 95% interval runs from approximately 15, 000 to 130,000 years, with most likely values around 30,000 years.  相似文献   

10.
Summary A statistical model is presented for dealing with genotypic frequency data obtained from a single population observed over a run of consecutive generations. This model takes into account possible correlations that exist between generations by conditioning the marginal probability distribution of any one generation on the previously observed generation. Maximum likelihood estimates of the fitness parameters are derived and a hypothesis testing framework developed. The model is very general, and in this paper is applied to random-mating, selfing, parthenogenetic and mixed random-mating and selfing populations with respect to a single locus, g-allele model with constant genotypic fitness differences with all selection occurring either before or after sampling. The assumptions behind this model are contrasted with those of alternative techniques such as minimum chi-square or unconditional maximum likelihood estimation when the marginal likelihoods for any one generation are conditioned only on the initial conditions and not the previous generation. The conditional model is most appropriate when the sample size per generation is large either in an absolute sense or in relation to the total population size. Minimum chi-square and the unconditional likelihood are most appropriate when the population size is effectively infinite and the samples are small. Both models are appropriate when the samples are large and the population size is effectively infinite. Under these last conditions, the conditional model may be preferred because it has greater robustness with respect to small deviations from the underlying assumptions and has a greater simplicity of form. Furthermore, if any genetic drift occurs in the experiment, the minimum chi-square and unconditional likelihood approaches can create spurious evidence for selection while the conditional approach will not. Worked examples are presented.This study was supported in part by the U. S. Atomic Energy Commission, Contract AT (11-1) -1552 to the Department of Human Genetics (CFS), University of Michigan, and by National Science Foundation Grant BMS 74-17453 awarded to the author.  相似文献   

11.
The problem of exact conditional inference for discrete multivariate case-control data has two forms. The first is grouped case-control data, where Monte Carlo computations can be done using the importance sampling method of Booth and Butler (1999, Biometrika86, 321-332), or a proposed alternative sequential importance sampling method. The second form is matched case-control data. For this analysis we propose a new exact sampling method based on the conditional-Poisson distribution for conditional testing with one binary and one integral ordered covariate. This method makes computations on data sets with large numbers of matched sets fast and accurate. We provide detailed derivation of the constraints and conditional distributions for conditional inference on grouped and matched data. The methods are illustrated on several new and old data sets.  相似文献   

12.
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.  相似文献   

13.

Background

The scalloped hammerhead shark, Sphyrna lewini, is a large endangered predator with a circumglobal distribution, observed in the open ocean but linked ontogenetically to coastal embayments for parturition and juvenile development. A previous survey of maternal (mtDNA) markers demonstrated strong genetic partitioning overall (global ΦST = 0.749) and significant population separations across oceans and between discontinuous continental coastlines.

Methodology/Principal Findings

We surveyed the same global range with increased sample coverage (N = 403) and 13 microsatellite loci to assess the male contribution to dispersal and population structure. Biparentally inherited microsatellites reveal low or absent genetic structure across ocean basins and global genetic differentiation (F ST = 0.035) over an order of magnitude lower than the corresponding measures for maternal mtDNA lineages (ΦST = 0.749). Nuclear allelic richness and heterozygosity are high throughout the Indo-Pacific, while genetic structure is low. In contrast, allelic diversity is low while population structure is higher for populations at the ends of the range in the West Atlantic and East Pacific.

Conclusions/Significance

These data are consistent with the proposed Indo-Pacific center of origin for S. lewini, and indicate that females are philopatric or adhere to coastal habitats while males facilitate gene flow across oceanic expanses. This study includes the largest sampling effort and the most molecular loci ever used to survey the complete range of a large oceanic predator, and findings emphasize the importance of incorporating mixed-marker analysis into stock assessments of threatened and endangered shark species.  相似文献   

14.
Estimating dispersal distances from population genetic data provides an important alternative to logistically taxing methods for directly observing dispersal. Although methods for estimating dispersal rates between a modest number of discrete demes are well developed, methods of inference applicable to "isolation-by-distance" models are much less established. Here, we present a method for estimating ρσ2, the product of population density (ρ) and the variance of the dispersal displacement distribution (σ2). The method is based on the assumption that low-frequency alleles are identical by descent. Hence, the extent of geographic clustering of such alleles, relative to their frequency in the population, provides information about ρσ2. We show that a novel likelihood-based method can infer this composite parameter with a modest bias in a lattice model of isolation-by-distance. For calculating the likelihood, we use an importance sampling approach to average over the unobserved intraallelic genealogies, where the intraallelic genealogies are modeled as a pure birth process. The approach also leads to a likelihood-ratio test of isotropy of dispersal, that is, whether dispersal distances on two axes are different. We test the performance of our methods using simulations of new mutations in a lattice model and illustrate its use with a dataset from Arabidopsis thaliana .  相似文献   

15.
Microsatellite primers are often developed in one species and used to assess neutral variability in related species. Such analyses may be confounded by ascertainment bias (i.e. a decline in amplification success and allelic variability with increasing genetic distance from the source of the microsatellites). In addition, other factors, such as the size of the microsatellite, whether it consists of perfect or interrupted tandem repeats, and whether it is autosomal or X-linked, can affect variation. To test the relative importance of these factors on microsatellite variation, we examine patterns of amplification and allelic diversity in 52 microsatellite loci amplified from five individuals in each of six populations of Cyrtodiopsis stalk-eyed flies that range from 2.2 % to 11.2% mitochondrial DNA sequence divergence from the population used for microsatellite development. We find that amplification success and most measures of allelic diversity declined with genetic distance from the source population, in some cases an order of magnitude faster than in birds or mammals. The median and range of the repeat array length did not decline with genetic distance. In addition, for loci on the X chromosome, we find evidence of lower observed heterozygosity compared with loci on autosomes. The differences in variability between X-linked and autosomal loci are not adequately explained by differences in effective population sizes of the chromosomes. We suggest, instead, that periodic selection events associated with X-chromosome meiotic drive, which is present in many of these populations, reduces X-linked variation.  相似文献   

16.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

17.
Leung Lai T  Shih MC  Wong SP 《Biometrics》2006,62(1):159-167
To circumvent the computational complexity of likelihood inference in generalized mixed models that assume linear or more general additive regression models of covariate effects, Laplace's approximations to multiple integrals in the likelihood have been commonly used without addressing the issue of adequacy of the approximations for individuals with sparse observations. In this article, we propose a hybrid estimation scheme to address this issue. The likelihoods for subjects with sparse observations use Monte Carlo approximations involving importance sampling, while Laplace's approximation is used for the likelihoods of other subjects that satisfy a certain diagnostic check on the adequacy of Laplace's approximation. Because of its computational tractability, the proposed approach allows flexible modeling of covariate effects by using regression splines and model selection procedures for knot and variable selection. Its computational and statistical advantages are illustrated by simulation and by application to longitudinal data from a fecundity study of fruit flies, for which overdispersion is modeled via a double exponential family.  相似文献   

18.
Methods for the analysis of unmatched case-control data based on a finite population sampling model are developed. Under this model, and the prospective logistic model for disease probabilities, a likelihood for case-control data that accommodates very general sampling of controls is derived. This likelihood has the form of a weighted conditional logistic likelihood. The flexibility of the methods is illustrated by providing a number of control sampling designs and a general scheme for their analyses. These include frequency matching, counter-matching, case-base, randomized recruitment, and quota sampling. A study of risk factors for childhood asthma illustrates an application of the counter-matching design. Some asymptotic efficiency results are presented and computational methods discussed. Further, it is shown that a 'marginal' likelihood provides a link to unconditional logistic methods. The methods are examined in a simulation study that compares frequency and counter-matching using conditional and unconditional logistic analyses and indicate that the conditional logistic likelihood has superior efficiency. Extensions that accommodate sampling of cases and multistage designs are presented. Finally, we compare the analysis methods presented here to other approaches, compare counter-matching and two-stage designs, and suggest areas for further research.To whom correspondence should be addressed.  相似文献   

19.
闫路娜  张德兴 《动物学报》2004,50(2):279-290
我们以中国飞蝗种群的微卫星遗传分析数据为例 ,评估了取样对种群遗传多样性指标的影响 ,结果显示 :样本大小与所观测到的每位点等位基因数、平均等位基因数及基因丰富度指数均呈显著正相关 ,而与期望杂合度无显著相关 ;微卫星位点多态性的高低直接影响所观测到的种群基因丰富度及其检测所需的样本量 ;对大多数种群遗传和分子生态学研究而言 ,30 - 5 0个个体是微卫星DNA分析所需要的最小样本量。基因丰富度经过稀疏法或多次随机抽样法校正后 ,可适用于瓶颈效应等种群历史数量变动的检测。另外 ,在研究中 ,还应避免采集时间的不同及样本的性比构成所可能造成的对种群遗传结构的影响  相似文献   

20.
This paper studies gene trees in subdivided populations which are constructed as perfect phylogenies from the pattern of mutations in a sample of DNA sequences and presents a new recursion for the probability distribution of such gene trees. The underlying evolutionary model is the coalescent process in a subdivided population. The infinitely-many-sites model of mutation is assumed. Ancestral inference questions that are discussed are maximum likelihood estimation of migration and mutation rates; detection of population growth by likelihood techniques; determining the distribution of the time to the most recent common ancestor of a sample of sequences; determining the distribution of the age of the mutations on the gene tree; determining in which subpopulation the most recent common ancestor of all the sequences was; determining subpopulation ancestors, where they were, and times to them; and determining in which subpopulations mutations occurred. A computational technique of Griffiths and Tavaré used is a computer intensive Markov chain simulation, which simulates gene trees conditional on their topology implied by the mutation pattern in the sample of DNA sequences. The software GENETREE, which implements these ancestral inference techniques, is available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号