首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
An estimator for pairwise relatedness using molecular markers   总被引:21,自引:0,他引:21  
Wang J 《Genetics》2002,160(3):1203-1215
I propose a new estimator for jointly estimating two-gene and four-gene coefficients of relatedness between individuals from an outbreeding population with data on codominant genetic markers and compare it, by Monte Carlo simulations, to previous ones in precision and accuracy for different distributions of population allele frequencies, numbers of alleles per locus, actual relationships, sample sizes, and proportions of relatives included in samples. In contrast to several previous estimators, the new estimator is well behaved and applies to any number of alleles per locus and any allele frequency distribution. The estimates for two- and four-gene coefficients of relatedness from the new estimator are unbiased irrespective of the sample size and have sampling variances decreasing consistently with an increasing number of alleles per locus to the minimum asymptotic values determined by the variation in identity-by-descent among loci per se, regardless of the actual relationship. The new estimator is also robust for small sample sizes and for unknown relatives being included in samples for estimating allele frequencies. Compared to previous estimators, the new one is generally advantageous, especially for highly polymorphic loci and/or small sample sizes.  相似文献   

2.
Interest in searching for genetic linkage between diseases and marker loci has been greatly increased by the recent introduction of DNA polymorphisms. However, even for the most well-behaved Mendelian disorders, those with clear-cut mode of inheritance, complete penetrance, and no phenocopies, genetic heterogeneity may exist; that is, in the population there may be more than one locus that can determine the disease, and these loci may not be linked. In such cases, two questions arise: (1) What sample size is necessary to detect linkage for a genetically heterogeneous disease? (2) What sample size is necessary to detect heterogeneity given linkage between a disease and a marker locus? We have answered these questions for the most important types of matings under specified conditions: linkage phase known or unknown, number of alleles involved in the cross at the marker locus, and different numbers of affected and unaffected children. In general, the presence of heterogeneity increases the recombination value at which lod scores peak, by an amount that increases with the degree of heterogeneity. There is a corresponding increase in the number of families necessary to establish linkage. For the specific case of backcrosses between disease and marker loci with two alleles, linkage can be detected at recombination fractions up to 20% with reasonable numbers of families, even if only half the families carry the disease locus linked to the marker. The task is easier if more than two informative children are available or if phase is known. For recessive diseases, highly polymorphic markers with four different alleles in the parents greatly reduce the number of families required.  相似文献   

3.
We have analyzed the allele frequency distribution at the highly polymorphic variable number of tandem repeat (VNTR) locus D1S80 (pMCT118) in seven ethnic populations (namely, New Guinea Highlanders of Papua New Guinea, Dogrib Indians of Canada, Pehuenche Indians of Chile, American and Western Samoans, Kacharis of Northeast India, and German Caucasians) using the polymerase chain reaction (PCR) technique. In the pooled sample of 443 unrelated individuals 20 segregating alleles were detected. A trimodal pattern of allelic distribution is present in the majority of populations and is indicative of the evolutionary antiquity of the polymorphism at this locus. In spite of the observed high degree of polymorphism (expected heterozygosity 56%–86%), with a single exception — the marginally significant P value (0.04) of the exact test in American Samoans — the genotype distributions in all populations conform to their respective Hardy-Weinberg expectations. Summary statistics indicate that, in general, the allele frequency distribution at this locus may be approximated by the infinite allele model. The data also demonstrate that alleles that are shared by all populations have the highest average frequency within populations. Furthermore, the kinship bioassay analysis demonstrates that the extensive variation observed at the D1S80 locus is at the interindividual within population level, which dwarfs any interpopulation allele frequency variation, consistent with the population dynamics of hypervariable polymorphisms. These characteristics of the D1S80 locus make it a very useful marker for population genetic research, genetic linkage studies, forensic identification of individuals, and for determination of biological relatedness of individuals.  相似文献   

4.
Kang SJ  Finch SJ  Haynes C  Gordon D 《Human heredity》2004,58(3-4):139-144
Kang et al. [Genet Epidemiol 2004;26:132-141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 x 3 chi2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 x 3 chi2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given.  相似文献   

5.
M. Slatkin  B. Rannala 《Genetics》1997,147(4):1855-1861
A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.  相似文献   

6.
Single nucleotide polymorphisms (SNPs) are currently being developed for use in disequilibrium analyses. These SNPs consist of two alleles with varying degrees of polymorphism. A natural design for use with SNPs is the 'haplotype relative risk' sampling design in which a father, mother, and child are typed at an SNP locus. Given such a trio of genotypes, we ask: what is the probability that a pedigree error (a change from one allele to the other) at an SNP locus will be detected using only Mendel's laws as a check? We calculate the probability of detecting such errors for a hypothetical SNP locus with varying degrees of polymorphism and for various true error rates. For the sets of allele frequencies considered, we find that the detection rates range between 25 and 30%, the detection rate being lowest when the two alleles have equal frequencies and the highest when one allele has a frequency of 10%. Based on this detection rate, we determine that the true error rate is roughly 3.3-4 times that of the apparent error rate at an SNP locus. The greatest discrepancy between true and apparent error rates occurs when allele frequencies are equal.  相似文献   

7.
We typed the Sardinian population at the D1S80 VNTR locus. Nineteen alleles were detected in a sample of 92 unrelated individuals, allele frequency distribution showing a modal pattern mostly in agreement with other Caucasoid populations. A high degree of heterozygosity (observed value=80.4%) was present. Goodness-of-fit tests demonstrated no departure from Hardy-Weinberg expectations. Data regarding heterozygosity, number of alleles and singletons appeared in accordance with the IAM mutation-drift equilibrium model and showed no evidence of hidden substructuring. Allele 34 exhibited in Sardinians the highest frequency never observed in Caucasians. Nonetheless, the comparison with other European populations did not disclose Sardinian genetic peculiarity. Indeed, measures of genetic divergence among Europeans demonstrated definitely smaller values at the D1S80 locus in comparison with those calculated over a high number of (pre-DNA) polymorphic loci. High mutation rate and selective neutrality typical of VNTRs could account for the observed moderate genetic divergence. Isolation and genetic drift, on the other hand, may have determined certain deviations in allele frequency distribution, as occurred to allele 34 in the Sardinian population.  相似文献   

8.
闫路娜  张德兴 《动物学报》2004,50(2):279-290
我们以中国飞蝗种群的微卫星遗传分析数据为例 ,评估了取样对种群遗传多样性指标的影响 ,结果显示 :样本大小与所观测到的每位点等位基因数、平均等位基因数及基因丰富度指数均呈显著正相关 ,而与期望杂合度无显著相关 ;微卫星位点多态性的高低直接影响所观测到的种群基因丰富度及其检测所需的样本量 ;对大多数种群遗传和分子生态学研究而言 ,30 - 5 0个个体是微卫星DNA分析所需要的最小样本量。基因丰富度经过稀疏法或多次随机抽样法校正后 ,可适用于瓶颈效应等种群历史数量变动的检测。另外 ,在研究中 ,还应避免采集时间的不同及样本的性比构成所可能造成的对种群遗传结构的影响  相似文献   

9.
Protein variants in Hiroshima and Nagasaki: tales of two cities.   总被引:8,自引:5,他引:3       下载免费PDF全文
The results of 1,465,423 allele product determinations based on blood samples from Hiroshima and Nagasaki, involving 30 different proteins representing 32 different gene products, are analyzed in a variety of ways, with the following conclusions: (1) Sibships and their parents are included in the sample. Our analysis reveals that statistical procedures designed to reduce the sample to equivalent independent genomes do not in population comparisons compensate for the familial cluster effect of rare variants. Accordingly, the data set was reduced to one representative of each sibship (937,427 allele products). (2) Both chi 2-type contrasts and a genetic distance measure (delta) reveal that rare variants (P less than .01) are collectively as effective as polymorphisms in establishing genetic differences between the two cities. (3) We suggest that rare variants that individually exhibit significant intercity differences are probably the legacy of tribal private polymorphisms that occurred during prehistoric times. (4) Despite the great differences in the known histories of the two cities, both the overall frequency of rare variants and the number of different rare variants are essentially identical in the two cities. (5) The well-known differences in locus variability are confirmed, now after adjustment for sample size differences for the various locus products; in this large series we failed to detect variants at only three of 29 loci for which sample size exceeded 23,000. (6) The number of alleles identified per locus correlates positively with subunit molecular weight. (7) Loci supporting genetic polymorphisms are characterized by more rare variants than are loci at which polymorphisms were not encountered. (8) Loci whose products do not appear to be essential for health support more variants than do loci the absence of whose product is detrimental to health. (9) There is a striking excess of rare variants over the expectation under the neutral mutation/drift/equilibrium theory. We suggest that this finding is primarily due to the relatively recent (in genetic time) agglomeration of previously separated tribal populations; efforts to test for agreement with the expectations of this theory by using data from modern cosmopolitan populations are exercises in futility. (10) All of these findings should characterize DNA variants in exons as more data become available, since the finding are the protein expression of such variants.  相似文献   

10.
The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to be polymorphic. This distribution is widely used in population genetic inferences, including statistical tests of neutrality in which a skew in the observed frequency spectrum across independent sites is taken as a signature of departure from neutral evolution. Theoretical aspects of the frequency spectrum have been well studied and several interesting results are available, but they are usually under the assumption that a site has undergone at most one mutation event in the history of the sample. Here, we extend previous theoretical results by allowing for at most two mutation events per site, under a general finite allele model in which the mutation rate is independent of current allelic state but the transition matrix is otherwise completely arbitrary. Our results apply to both nested and nonnested mutations. Only the former has been addressed previously, whereas here we show it is the latter that is more likely to be observed except for very small sample sizes. Further, for any mutation transition matrix, we obtain the joint sample frequency spectrum of the two mutant alleles at a triallelic site, and derive a closed-form formula for the expected age of the younger of the two mutations given their frequencies in the population. Several large-scale resequencing projects for various species are presently under way and the resulting data will include some triallelic polymorphisms. The theoretical results described in this paper should prove useful in population genomic analyses of such data.  相似文献   

11.
In this paper, we investigated the genetic structure and distribution of allelic frequencies at the gametophytic self-incompatibility locus in three populations of Prunus avium L. In line with theoretical predictions under balancing selection, genetic structure at the self-incompatibility locus was almost three times lower than at seven unlinked microsatellites. Furthermore, we found that S-allele frequencies in wild cherry populations departed significantly from the expected isoplethic distribution towards which balancing selection is expected to drive allelic frequencies (i.e. identical frequency equal to the inverse of the number of alleles in the population). To assess whether this departure could be caused either by drift alone or by population structure, we used numerical simulations to compare our observations with allelic frequency distributions expected : (1) within a single deme from a subdivided population with various levels of differentiation; and (2) within a finite panmictic population with identical allelic diversity. We also investigated the effects of sample size and degree of population structure on tests of departure from isoplethic equilibrium. Overall, our results showed that the observed allele frequency distributions were consistent with a model of subdivided population with demes linked by moderate migration rate.  相似文献   

12.
The potential of association studies for fine-mapping loci with common disease susceptibility alleles for complex genetic diseases in outbred populations is unclear. For a battery of tightly linked anonymous genetic markers spanning a candidate region centered around a disease locus, simulation methods based on a coalescent process with mutation, recombination, and genetic drift were used to study the spatial distribution of markers with large noncentrality parameters in a case-control study design. Simulations with a disease allele at intermediate frequency, presumably representing an old mutation, tend to exhibit the largest noncentrality parameter values at markers near the disease locus. In contrast, simulations with a disease allele at low frequency, presumably representing a young mutation, often exhibit the largest noncentrality parameter values at markers scattered over the candidate region. In the former case, sample sizes or marker densities sufficient to detect association are likely to lead to useful localization, whereas, in the latter case, localization of the disease locus within the candidate region is much less likely, regardless of the sample size or density of the map. The simulations suggest that for a single marker analysis, the simple strategy of choosing the marker with smallest associated P value to begin a laboratory search for the disease locus performs adequately for a common disease allele.  相似文献   

13.
To identify possible genetic factors affecting human longevity we compared allele pools at two candidate loci for longevity between a sample of 143 centenarians (S) and a control sample of 158 individuals (C). The candidate loci were APOB and TPO, which code for apolipoprotein B and thyroid peroxidase, respectively. Both restriction fragment length (RFL) (XbaI2488 and EcoRI4154) and variable number of tandem repeat (VNTR) (3′APOB-VNTR) polymorphisms were analysed at the APOB locus; the TPO-VNTR polymorphism (intron 10) was analysed at the TPO locus. The main result of the investigation was that there is an association between the APOB locus and longevity that is revealed only when multiallelic polymorphisms are considered. In particular: (i) the frequency of 3′APOB-VNTR alleles with fewer than 35 repeats is significantly lower in cases than in controls; (ii) the linkage disequilibrium between the XbaI-RFLP and the EcoRI-RFLP is significantly different from 0 in cases but not in controls; (iii) the EcoRI-RFLP and XbaI-RFLP allele frequencies do not discriminate between cases and controls. The differences observed between case and control allele pools are specific to the APOB locus, since no significant difference was observed at the TPO locus. Received: 27 November 1995 / Revised: 24 July 1996  相似文献   

14.
A new method is proposed to adjust allele frequencies when allelic drop‐out is common. This method assumes Hardy–Weinberg equilibrium (HWE), and treats the problematic alleles as a one‐locus two‐allele system with dominance. By assuming that the homozygote frequency of the ‘recessive’ allele is measured correctly, we can back calculate the allele frequency of the ‘dominant’ allele, and adjust the heterozygote frequency accordingly. The drawback is that multilocus genotypes cannot be constructed and tests that use deviations from Hardy–Weinberg such as tests for bottlenecks become impossible. An example is given where a large homozygote excess (FIS = 0.44) is adjusted to a reasonable level (FIS = 0.046). The effect of scoring error was set in relation to sampling error and while FIS values can be seriously biased, FST values are not necessarily so, if scoring error and sample size are both low. As sample size increases, the effect of scoring error increases.  相似文献   

15.
Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions.  相似文献   

16.
Summary The main purpose of germplasm banks is to preserve the genetic variability existing in crop species. The effectiveness of the regeneration of collections stored in gene banks is affected by factors such as sample size, random genetic drift, and seed viability. The objective of this paper is to review probability models and population genetics theory to determine the choice of sample size used for seed regeneration. A number of conclusions can be drawn from the results. First, the size of the sample depends largely on the frequency of the least common allele or genotype. Genotypes or alleles occurring at frequencies of more than 10% can be preserved with a sample size of 40 individuals. A sample size of 100 individuals will preserve genotypes (alleles) that occur at frequencies of 5%. If the frequency of rare genotypes (alleles) drops below 5%, larger sample sizes are required. A second conclusion is that for two, three, and four alleles per locus the sample size required to include a copy of each allele depends more on the frequency of the rare allele or alleles than on the number. Samples of 300 to 400 are required to preserve alleles that are present at a frequency of 1%. Third, if seed is bulked, the expected number of parents involved in any sample drawn from the bulk will be less than the number of parents included in the bulk. Fourth, to maintain a rate of breeding (F) of 1 %, the effective population size (N e) should be at least 150 for three alleles, and 300 for four alleles. Fifth, equalizing the reproductive output of each family to two progeny doubles the effective size of the population. Based on the results presented here, a practical option is considered for regenerating maize seed in a program constrained by limited funds.Part of this paper was presented at the Global Maize Germplasm Workshop, CIMMYT, El Batan, Mexico, March 6–12, 1988  相似文献   

17.
The highly polymorphic minisatellites contain a variable number of tandemly repeated (VNTR) DNA sequences. They are extremely useful and informative markers to study genetic variation among human populations. We have analysed the allele frequency distribution at the highly polymorphic apolipoprotein B (Apo B) VNTR locus in order to obtain the population data for the Cukurova region in Turkey by using the polymerase chain reaction and polyacrylamide gel electrophoresis. We observed 10 different alleles and 21 genotypes in a sample of 100 unrelated individuals. The allele frequencies ranged from 0.01 to 0.4, with an expected heterozygosity of 0.69 for the Apo B locus. Alleles 37 (frequency = 0.4) and 35 (frequency = 0.17) were the most common in the Cukurova population. There was a significant deviation from the Hardy-Weinberg equilibrium (HWE) for genotype frequencies (chi2 = 29.12; df = 1; p = 0.000). This study possesses novelty as it is the first DNA polymorphism study conducted at the Cukurova population using an Apo B minisatellite locus.  相似文献   

18.
The green–brown polymorphism of grasshoppers and bush-crickets represents one of the most penetrant polymorphisms in any group of organisms. This poses the question of why the polymorphism is shared across species and how it is maintained. There is mixed evidence for whether and in which species it is environmentally or genetically determined in Orthoptera. We report breeding experiments with the steppe grasshopper Chorthippus dorsatus, a polymorphic species for the presence and distribution of green body parts. Morph ratios did not differ between sexes, and we find no evidence that the rearing environment (crowding and habitat complexity) affected the polymorphism. However, we find strong evidence for genetic determination for the presence/absence of green and its distribution. Results are most parsimoniously explained by three autosomal loci with two alleles each and simple dominance effects: one locus influencing the ability to show green color, with a dominant allele for green; a locus with a recessive allele suppressing green on the dorsal side; and a locus with a recessive allele suppressing green on the lateral side. Our results contribute to the emerging contrast between the simple genetic inheritance of green–brown polymorphisms in the subfamily Gomphocerinae and environmental determination in other subfamilies of grasshoppers. In three out of four species of Gomphocerinae studied so far, the results suggest one or a few loci with a dominance of alleles allowing the occurrence of green. This supports the idea that brown individuals differ from green individuals by homozygosity for loss-of-function alleles preventing green pigment production or deposition.Subject terms: Quantitative trait loci, Quantitative trait  相似文献   

19.
The dinucleotide (TG)n interspersed repetitive sequences are the most abundant microsatellites in the human genome. Using the polymerase chain reaction to amplify a (TG)n(AG)m microsatellite in the first intron of the apo C-II gene, we have detected 15 different alleles in 242 unrelated individuals of French ancestry. The heterozygosity index was 0.85 and codominant Mendelian inheritance of the alleles was observed in individuals from 121 nuclear families. We report that polymorphism at this locus is attributable to length variation at both (TG)n and (AG)m motifs, although the (AG)m motif contains only two alleles differing by one repeat unit. A quadrimodal allele frequency distribution was observed at the (TG)n(AG)m locus. Each of the first three modes comprises one frequent allele and one very rare allele adjacent in size. No alleles of intermediate size were found between the three first modes. The fourth mode encompasses nine alleles that span from 27 to 35 repeat units. We suggest that this distribution reflects the molecular mechanisms by which alleles give rise to one another.  相似文献   

20.
S P Huang  B S Weir 《Genetics》2001,159(3):1365-1373
Previously reported methods for estimating the number of different alleles at a single locus in a population have not described a useful general result. Using the number of alleles observed in a sample gives an underestimate for the true number of alleles. The similar problem of estimating the number of species in a population was first investigated in 1943. In this article we use the sample coverage method proposed by Chao and Lee in 1992 to estimate the number of alleles in a population when there are unequal allele frequencies. Simulation studies under the recurrent mutation model show that, for reasonable sample sizes, a significantly better estimate of the true number can be obtained than that using only the observed alleles. Results under the stepwise mutation model and infinite-allele model are presented. Possible applications include improving the characterization of the prior distribution for the allele frequencies, adjusting the estimates of genetic diversity, and estimating the range of microsatellite alleles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号