首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method for estimating the number of founding chromosomes in an isolated population is introduced. The method assumes that n/2 diploid individuals are sampled from a population and that alleles are identified at L unlinked loci. The population is assumed to have been founded T generations in the past by individuals carrying c chromosomes drawn randomly from a known source population, which has also been sampled. If c is small and the population grew rapidly after it was founded, accurate estimates of c can be obtained and those estimates are not sensitive to details of the history of population sizes. If c is larger or the population remained small after it was founded, then estimates of c depend on the history of population sizes. We test the performance of our method on simulated data and demonstrate its use on data from a rainbow trout (Oncorhynchus mykiss) population.  相似文献   

2.
MOTIVATION: With the availability of large-scale, high-density single-nucleotide polymorphism markers and information on haplotype structures and frequencies, a great challenge is how to take advantage of haplotype information in the association mapping of complex diseases in case-control studies. RESULTS: We present a novel approach for association mapping based on directly mining haplotypes (i.e. phased genotype pairs) produced from case-control data or case-parent data via a density-based clustering algorithm, which can be applied to whole-genome screens as well as candidate-gene studies in small genomic regions. The method directly explores the sharing of haplotype segments in affected individuals that are rarely present in normal individuals. The measure of sharing between two haplotypes is defined by a new similarity metric that combines the length of the shared segments and the number of common alleles around any marker position of the haplotypes, which is robust against recent mutations/genotype errors and recombination events. The effectiveness of the approach is demonstrated by using both simulated datasets and real datasets. The results show that the algorithm is accurate for different population models and for different disease models, even for genes with small effects, and it outperforms some recently developed methods.  相似文献   

3.
It was shown recently using experimental data that it is possible under certain conditions to determine whether a person with known genotypes at a number of markers was part of a sample from which only allele frequencies are known. Using population genetic and statistical theory, we show that the power of such identification is, approximately, proportional to the number of independent SNPs divided by the size of the sample from which the allele frequencies are available. We quantify the limits of identification and propose likelihood and regression analysis methods for the analysis of data. We show that these methods have similar statistical properties and have more desirable properties, in terms of type-I error rate and statistical power, than test statistics suggested in the literature.  相似文献   

4.
MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).  相似文献   

5.
Incorporating genotypes of relatives into a test of linkage disequilibrium.   总被引:3,自引:0,他引:3  
Genetic data from autosomal loci in diploids generally consist of genotype data for which no phase information is available, making it difficult to implement a test of linkage disequilibrium. In this paper, we describe a test of linkage disequilibrium based on an empirical null distribution of the likelihood of a sample. Information on the genotypes of related individuals is explicitly used to help reconstruct the gametic phase of the independent individuals. Simulation studies show that the present approach improves on estimates of linkage disequilibrium gathered from samples of completely independent individuals but only if some offspring are sampled together with their parents. The failure to incorporate some parents sharply decreases the sensitivity and accuracy of the test. Simulations also show that for multiallelic data (more than two alleles) our testing procedure is not as powerful as an exact test based on known haplotype frequencies, owing to the interaction between departure from Hardy-Weinberg equilibrium and linkage disequilibrium.  相似文献   

6.
In genetic studies the haplotype structure of the regarded population is expected to carry important information. Experimental methods to derive haplotypes, however, are expensive and none of them has yet become standard methodology. On the other hand, maximum likelihood haplotype estimation from unphased individual genotypes may incur inaccuracies. We therefore investigated the relative efficiency of haplotype frequency estimation when nuclear family information is included compared to estimation from experimentally derived haplotypes. Efficiency was measured in terms of variance ratios of the estimates. The variances were derived from the binomial distribution for experimentally derived haplotypes, and from the Fisher information matrix corresponding to the general likelihood function of the haplotype frequency parameters, including family information. We subsequently compared these variance ratios to the variance ratios for the case of estimation from individual genotypes. We found that the information gained from a single child compensates missing phase information to a high degree, resulting in estimates almost as reliable as those derived from observed haplotypes. Thus, if children have already been genotyped for other reasons, it is highly recommendable to include them into the estimation. If child information is not already present, it depends on the number of loci and the haplotype diversity if it is useful to genotype a single child just to reduce phase ambiguity. In general, if the number of loci is less than or equal to three or if the number of haplotypes with a frequency >5% is less than or equal to four, haplotype estimation from individuals is quite good already and the improvement gained from a single child can not compensate the genotyping effort for it. On the other hand, under scenarios with many loci and high haplotype diversity, haplotype frequency estimation from trios can be more efficient than haplotype frequency estimation from individuals also on a per genotype base.  相似文献   

7.
The wolverine is an endangered carnivore that in northwestern Europe is restricted to the mountain range along the border between Sweden and Norway. The Scandinavian wolverine population experienced a severe decline in numbers due to human persecution during the 20th century, although with legislative protection the population has recently implied that the population slowly has started to recover (current population size estimate of 800 individuals). In the mid 1990s, wolverines appeared in two new and isolated areas east of the mountain range, in the forest landscape close to the Gulf of Bothnia. Using non-invasive, DNA-based monitoring, we show here that these new subpopulations were likely founded by as few as 2 and 2–4 individuals, respectively, and that little, if any, genetic contact with the main population since colonisation has been established. A high degree of genetic similarity among individuals in the two areas indicates inbreeding. We estimate the minimum number of wolverines known to be alive in these areas during the period of 2001–2005 to 5 and 17, respectively, with one subpopulation showing decreasing (currently 2) numbers and the other increasing (10). For the somewhat larger population, we infer a tentative pedigree from relatedness values and parentage tests, which indicates the occurrence of brother–sister matings. This study illustrates the usefulness of non-invasive monitoring in the management of endangered carnivore populations.  相似文献   

8.
We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs.  相似文献   

9.
The brown anole, Anolis sagrei, is one of the most widespread and successful colonisers of the diverse Anolis genus, which comprises c. 400 species occurring naturally in Central and South America and the Caribbean. Based on extensive between and within population sampling from a previously published study (334 mitochondrial DNA sequences) and sampling for this study (37 mtDNA sequences), we reconstruct a phylogeny and produce a haplotype network to assign a recently introduced population in St Vincent, Lesser Antilles to its geographic origin. A single haplotype was present in the St Vincent population, which was identical to a haplotype from Tampa, FL. We show that genetic diversity within native range populations, combined with low frequencies of introduced haplotypes in native ranges, may impair attempts to identify source populations, even despite intensive sampling effort. The absence of mtDNA haplotype diversity suggests a significant genetic founder effect within the St Vincent population.  相似文献   

10.
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.  相似文献   

11.
The probability distribution of haplotype frequencies in a population, and the way it is influenced by genetical forces such as recombination, selection, random drift ...is a question of fundamental interest in population genetics. For large populations, the distribution of haplotype frequencies for two linked loci under the classical Wright-Fisher model is almost impossible to compute because of numerical reasons. However the Wright-Fisher process can in such cases be approximated by a diffusion process and the transition density can then be deduced from the Kolmogorov equations. As no exact solution has been found for these equations, we developed a numerical method based on finite differences to solve them. It applies to transient states and models including selection or mutations. We show by several tests that this method is accurate for computing the conditional joint density of haplotype frequencies given that no haplotype has been lost. We also prove that it is far less time consuming than other methods such as Monte Carlo simulations.  相似文献   

12.
RFLP haplotypes at the alpha-globin gene complex have been examined in 190 individuals from the Niokolo Mandenka population of Senegal: haplotypes were assigned unambiguously for 210 chromosomes. The Mandenka share with other African populations a sample size-independent haplotype diversity that is much greater than that in any non-African population: the number of haplotypes observed in the Mandenka is typically twice that seen in the non-African populations sampled to date. Of these haplotypes, 17.3% had not been observed in any previous surveys, and a further 19.1% have previously been reported only in African populations. The haplotype distribution shows clear differences between African and non-African peoples, but this is on the basis of population-specific haplotypes combined with haplotypes common to all. The relationship of the newly reported haplotypes to those previously recorded suggests that several mutation processes, particularly recombination as homologous exchange or gene conversion, have been involved in their production. A computer program based on the expectation-maximization (EM) algorithm was used to obtain maximum-likelihood estimates of haplotype frequencies for the entire data set: good concordance between the unambiguous and EM-derived sets was seen for the overall haplotype frequencies. Some of the low-frequency haplotypes reported by the estimation algorithm differ greatly, in structure, from those haplotypes known to be present in human populations, and they may not represent haplotypes actually present in the sample.  相似文献   

13.
Population bottlenecks and founder events reduce genetic diversity through stochastic processes associated with the sampling of alleles at the time of the bottleneck, and the recombination of alleles that are identical by descent. At the same time bottlenecks and founder events can structure populations through the stochastic distortion of allele frequencies. Here we undertake an empirical assessment of the impact of two independent bottlenecks of known size from a known source, and consider inference about evolutionary process in the context of simulations and theoretical expectations. We find a similar level of reduced variation in the parallel bottleneck events, with the greater impact on the population that began with the smaller number of females. The level of diversity remaining was consistent with model predictions, but only if re-growth of the population was essentially exponential and polygeny was minimal at the early stages. There was a high level of differentiation seen compared to the source population and between the two bottlenecked populations, reflecting the stochastic distortion of allele frequencies. We provide empirical support for the theoretical expectations that considerable diversity can remain following a severe bottleneck event, given rapid demographic recovery, and that populations founded from the same source can become quickly differentiated. These processes may be important during the evolution of population genetic structure for species affected by rapid changes in available habitat.  相似文献   

14.
Population geneticists and community ecologists have long recognized the importance of sampling design for uncovering patterns of diversity within and among populations and in communities. Invasion ecologists increasingly have utilized phylogeographical patterns of mitochondrial or chloroplast DNA sequence variation to link introduced populations with putative source populations. However, many studies have ignored lessons from population genetics and community ecology and are vulnerable to sampling errors owing to insufficient field collections. A review of published invasion studies that utilized mitochondrial or chloroplast DNA markers reveals that insufficient sampling could strongly influence results and interpretations. Sixty per cent of studies sampled an average of less than six individuals per source population, vs. only 45% for introduced populations. Typically, far fewer introduced than source populations were surveyed, although they were sampled more intensively. Simulations based on published data forming a comprehensive mtDNA haplotype data set highlight and quantify the impact of the number of individuals surveyed per source population and number of putative source populations surveyed for accurate assignment of introduced individuals. Errors associated with sampling a low number of individuals are most acute when rare source haplotypes are dominant or fixed in the introduced population. Accuracy of assignment of introduced individuals is also directly related to the number of source populations surveyed and to the degree of genetic differentiation among them ( F ST). Incorrect interpretations resulting from sampling errors can be avoided if sampling design is considered before field collections are made.  相似文献   

15.
A commonly used tool in disease association studies is the search for discrepancies between the haplotype distribution in the case and control populations. In order to find this discrepancy, the haplotypes frequency in each of the populations is estimated from the genotypes. We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data or sequencing errors. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case, we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show that the maximum relaxed likelihood can be found in polynomial time and that the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the population. In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. We compared the performance of our algorithm to the widely used program PHASE, and we found that our estimates are at least 10% more accurate than PHASE and about ten times faster than PHASE. Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Particularly, they may be helpful in other maximum likelihood problems arising from survey sampling.  相似文献   

16.
The decreasing cost of whole-genome and whole-exome sequencing has resulted in a renaissance for identifying Mendelian disease mutations, and for the first time it is possible to survey the distribution and characteristics of these mutations in large population samples. We conducted carrier screening for all autosomal-recessive (AR) mutations known to be present in members of a founder population and revealed surprisingly high carrier frequencies for many of these mutations. By utilizing the rich demographic, genetic, and phenotypic data available on these subjects and simulations in the exact pedigree that these individuals belong to, we show that the majority of mutations were most likely introduced into the population by a single founder and then drifted to the high carrier frequencies observed. We further show that although there is an increased incidence of AR diseases overall, the mean carrier burden is likely to be lower in the Hutterites than in the general population. Finally, on the basis of simulations, we predict the presence of 30 or more undiscovered recessive mutations among these subjects, and this would at least double the number of AR diseases that have been reported in this isolated population.  相似文献   

17.
Although rarely assessed, the population genetics of hibernating colonies can help to understand some aspects of population structure, even when samples from nursery or mating colonies are not available, or in studies of migration when both types of samples are available and can be compared. Here we illustrate both points in a survey of mitochondrial DNA (mtDNA) control region sequences used to study the population genetics of hibernating colonies of a migrating species, the noctule bat (Nyctalus noctula). Lacking samples from Scandinavian nursery colonies, we use a North European hibernacula to suggest that Scandinavian populations are isolated from Central and East European colonies. Then, we compare genetic diversities of nursery and hibernating colonies. We find a significantly higher haplotype diversity in hibernacula, confirming that they consist of individuals from different nursery colonies. Finally, we show that pairwise comparisons of the haplotype frequencies of nursery and hibernating colonies contain some information on the migration direction of the noctule bat.  相似文献   

18.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

19.
Inference of population structure using multilocus genotype data   总被引:243,自引:0,他引:243  
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci-e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/ approximately pritch/home. html.  相似文献   

20.
Effectiveness of computational methods in haplotype prediction   总被引:11,自引:0,他引:11  
Haplotype analysis has been used for narrowing down the location of disease-susceptibility genes and for investigating many population processes. Computational algorithms have been developed to estimate haplotype frequencies and to predict haplotype phases from genotype data for unrelated individuals. However, the accuracy of such computational methods needs to be evaluated before their applications can be advocated. We have experimentally determined the haplotypes at two loci, the N-acetyltransferase 2 gene ( NAT2, 850 bp, n=81) and a 140-kb region on chromosome X ( n=77), each consisting of five single nucleotide polymorphisms (SNPs). We empirically evaluated and compared the accuracy of the subtraction method, the expectation-maximization (EM) method, and the PHASE method in haplotype frequency estimation and in haplotype phase prediction. Where there was near complete linkage disequilibrium (LD) between SNPs (the NAT2 gene), all three methods provided effective and accurate estimates for haplotype frequencies and individual haplotype phases. For a genomic region in which marked LD was not maintained (the chromosome X locus), the computational methods were adequate in estimating overall haplotype frequencies. However, none of the methods was accurate in predicting individual haplotype phases. The EM and the PHASE methods provided better estimates for overall haplotype frequencies than the subtraction method for both genomic regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号