首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multilocus genotype probabilities, estimated using the assumption of independent association of alleles within and across loci, are subject to sampling fluctuation, since allele frequencies used in such computations are derived from samples drawn from a population. We derive exact sampling variances of estimated genotype probabilities and provide simple approximation of sampling variances. Computer simulations conducted using real DNA typing data indicate that, while the sampling distribution of estimated genotype probabilities is not symmetric around the point estimate, the confidence interval of estimated (single-locus or multilocus) genotype probabilities can be obtained from the sampling of a logarithmic transformation of the estimated values. This, in turn, allows an examination of heterogeneity of estimators derived from data on different reference populations. Applications of this theory to DNA typing data at VNTR loci suggest that use of different reference population data may yield significantly different estimates. However, significant differences generally occur with rare (less than 1 in 40,000) genotype probabilities. Conservative estimates of five-locus DNA profile probabilities are always less than 1 in 1 million in an individual from the United States, irrespective of the racial/ethnic origin.  相似文献   

2.
Independence of Vntr Alleles Defined as Fixed Bins   总被引:22,自引:0,他引:22       下载免费PDF全文
B. S. Weir 《Genetics》1992,130(4):873-887
An analysis is presented of data collected by the Federal Bureau of Investigation at six unlinked variable number of tandem repeats (VNTR) loci for the United States population. Databases have been constructed of VNTR profiles of Caucasians, Blacks and Hispanics from Florida, Texas and California. There was very little evidence for correlations between lengths for pairs of VNTR fragments, within or between loci. When the fragment lengths were amalgamated into discrete bins, there was also little evidence for disequilibrium over all genotypes, within or between loci, for the Caucasian database, although some disequilibrium was found for the Black and Hispanic databases. No disequilibrium was found for the Caucasian or Black databases when tests were confined to heterozygous individuals. In cases of global disequilibrium, local tests can be applied to specific genotypes. The results suggest that, at the bin level, frequencies of VNTR profiles can generally be estimated as the products of the frequencies of the constituent elements. This overcomes the problem of estimating population frequencies when any particular profile does not exist in the database. There is some evidence for different frequencies, at the individual bin level, between geographic samples within each of the Caucasian, Black and Hispanic databases, and considerable evidence for differences between the three databases. These differences are less evident for the frequencies of four-locus profiles.  相似文献   

3.
Many previous studies have fit lumped parameter models to respiratory input (Zin) and transfer (Ztr) impedance data. For frequency ranges higher than 4-32 Hz, a six-element model may be required in which an airway branch (with a resistance and inertance) is separated from a tissue branch (with a resistance, inertance, and compliance) by a shunt compliance. A sensitivity analysis is applied to predict the effects of frequency range on the accuracy of parameter estimates in this model obtained from Zin or Ztr data. Using a parameter set estimated from experimental data between 4 and 64 Hz in dogs, both Zin and Ztr were simulated from 4 to 200 Hz. Impedance sensitivity to each parameter was also calculated over this frequency range. The simulation predicted that for Zin a second resonance occurs near 80 Hz and that the impedance is considerably more sensitive to several of the parameters at frequencies surrounding this resonance than at any other frequencies. Also, unless data is obtained at very high frequencies (where the model is suspect), Zin data provides more accurate estimates than Ztr data. After adding random noise to the simulated Zin data, we attempted to extract the original parameters by using a nonlinear regression applied to three frequency ranges: 4-32, 4-64, and 4-110 Hz. Estimated parameters were substantially incorrect when using only 4- to 32-Hz or 4- to 64-Hz data, but nearly correct when fitting 4- to 110-Hz data. These results indicate that respiratory system parameters can be more accurately extracted from Zin than Ztr, and to make physiological inferences from parameter estimates based on Zin impedance data in dogs, the data must include frequencies surrounding the second resonance.  相似文献   

4.
Using striped bass (Morone saxatilis) and six multiplexed microsatellite markers, we evaluated procedures for estimating allele frequencies by pooling DNA from multiple individuals, a method suggested as cost-effective relative to individual genotyping. Using moment-based estimators, we estimated allele frequencies in experimental DNA pools and found that the three primary laboratory steps, DNA quantitation and pooling, PCR amplification, and electrophoresis, accounted for 23, 48, and 29%, respectively, of the technical variance of estimates in pools containing DNA from 2-24 individuals. Exact allele-frequency estimates could be made for pools of sizes 2-8, depending on the locus, by using an integer-valued estimator. Larger pools of size 12 and 24 tended to yield biased estimates; however, replicates of these estimates detected allele frequency differences among pools with different allelic compositions. We also derive an unbiased estimator of Hardy-Weinberg disequilibrium coefficients that uses multiple DNA pools and analyze the cost-efficiency of DNA pooling. DNA pooling yields the most potential cost savings when a large number of loci are employed using a large number of individuals, a situation becoming increasingly common as microsatellite loci are developed in increasing numbers of taxa.  相似文献   

5.
The development of molecular typing techniques applied to the study of population genetic diversity originates data with increasing precision but at the cost of some ambiguities. As distinct techniques may produce distinct kinds of ambiguities, a crucial issue is to assess the differences between frequency distributions estimated from data produced by alternative techniques for the same sample. To that aim, we developed a resampling scheme that allows evaluating, by statistical means, the significance of the difference between two frequency distributions. The same approach is then shown to be applicable to test selective neutrality when only sample frequencies are known. The use of these original methods is presented here through an application to the genetic study of a Munda human population sample, where three different HLA loci were typed using two different molecular methods (reverse PCR-SSO typing on microbeads arrays based on Luminex technology and PCR-SSP typing), as described in details in the companion article by Riccio et al. [The Austroasiatic Munda population from India and its enigmatic origin: An HLA diversity study. Hum. Biol. 38:405-435 (2011)]. The differences between the frequency estimates of the two typing techniques were found to be smaller than those resulting from sampling. Overall, we show that using a resampling scheme in validating frequency estimates is effective when alternative frequency estimates are available. Moreover, resampling appears to be the unique way to test selective neutrality when only frequency data are available to describe the genetic structure of populations.  相似文献   

6.
Because primary data collection can be expensive, researchers are increasingly using information collected in medical administrative databases for scientific purposes. This information, however, is typically collected for reasons other than research, and many such databases have been shown to contain substantial proportions of misclassification errors. For example, many administrative databases contain fields for patient diagnostic codes, but these are often missing or inaccurate, in part because physician reimbursement schemes depend on medical acts performed rather than any diagnosis. Errors in ascertaining which individuals have a given disease bias not only prevalence estimates, but also estimates of associations between the disease and other variables, such as medication use. We attempt to estimate the prevalence of osteoarthritis (OA) among elderly Quebeckers using a government administrative database. We compare a naive estimate relying solely on the physician diagnoses of OA listed in the database to estimates from several different Bayesian latent class models which adjust for misclassified physician diagnostic codes via use of other available diagnostic clues. We find that the prevalence estimates vary widely, depending on the model used and assumptions made. We conclude that any inferences from these databases need to be interpreted with great caution, until further work estimating the reliability of database items is carried out.  相似文献   

7.
The GPR120 gene (also known as FFAR4 or O3FAR1) encodes for a functional omega-3 fatty acid receptor/sensor that mediates potent insulin sensitizing effects by repressing macrophage-induced tissue inflammation. For its functional role, GPR120 could be considered a potential target gene in animal nutrigenetics. In this work we resequenced the porcine GPR120 gene by high throughput Ion Torrent semiconductor sequencing of amplified fragments obtained from 8 DNA pools derived, on the whole, from 153 pigs of different breeds/populations (two Italian Large White pools, Italian Duroc, Italian Landrace, Casertana, Pietrain, Meishan, and wild boars). Three single nucleotide polymorphisms (SNPs), two synonymous substitutions and one in the putative 3′-untranslated region (g.114765469C > T), were identified and their allele frequencies were estimated by sequencing reads count. The g.114765469C > T SNP was also genotyped by PCR-RFLP confirming estimated frequency in Italian Large White pools. Then, this SNP was analyzed in two Italian Large White cohorts using a selective genotyping approach based on extreme and divergent pigs for back fat thickness (BFT) estimated breeding value (EBV) and average daily gain (ADG) EBV. Significant differences of allele and genotype frequencies distribution was observed between the extreme ADG-EBV groups (P < 0.001) whereas this marker was not associated with BFT-EBV.  相似文献   

8.
Sequencing pools of individuals rather than individuals separately reduces the costs of estimating allele frequencies at many loci in many populations. Theoretical and empirical studies show that sequencing pools comprising a limited number of individuals (typically fewer than 50) provides reliable allele frequency estimates, provided that the DNA pooling and DNA sequencing steps are carefully controlled. Unequal contributions of different individuals to the DNA pool and the mean and variance in sequencing depth both can affect the standard error of allele frequency estimates. To our knowledge, no study separately investigated the effect of these two factors on allele frequency estimates; so that there is currently no method to a priori estimate the relative importance of unequal individual DNA contributions independently of sequencing depth. We develop a new analytical model for allele frequency estimation that explicitly distinguishes these two effects. Our model shows that the DNA pooling variance in a pooled sequencing experiment depends solely on two factors: the number of individuals within the pool and the coefficient of variation of individual DNA contributions to the pool. We present a new method to experimentally estimate this coefficient of variation when planning a pooled sequencing design where samples are either pooled before or after DNA extraction. Using this analytical and experimental framework, we provide guidelines to optimize the design of pooled sequencing experiments. Finally, we sequence replicated pools of inbred lines of the plant Medicago truncatula and show that the predictions from our model generally hold true when estimating the frequency of known multilocus haplotypes using pooled sequencing.  相似文献   

9.
We provide experimental evidence showing that, during the restriction-enzyme digestion of DNA samples, some of the HaeIII-digested DNA fragments are small enough to prevent their reliable sizing on a Southern gel. As a result of such nondetectability of DNA fragments, individuals who show a single-band DNA profile at a VNTR locus may not necessarily be true homozygotes. In a population database, when the presence of such nondetectable alleles is ignored, we show that a pseudodependence of alleles within as well as across loci may occur. Using a known statistical method, under the hypothesis of independence of alleles within loci, we derive an efficient estimate of null allele frequency, which may be subsequently used for testing allelic independence within and across loci. The estimates of null allele frequencies, thus derived, are shown to agree with direct experimental data on the frequencies of HaeIII-null alleles. Incorporation of null alleles into the analysis of the forensic VNTR database suggests that the assumptions of allelic independence within and between loci are appropriate. In contrast, a failure to incorporate the occurrence of null alleles would provide a wrong inference regarding the independence of alleles within and between loci.  相似文献   

10.
Measurement of temporal change in allele frequencies represents an indirect method for estimating the genetically effective size of populations. When allele frequencies are estimated for gene markers that display dominant gene expression, such as, e.g. random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) markers, the estimates can be seriously biased. We quantify bias for previous allele frequency estimators and present a new expression that is generally less biased and provides a more precise assessment of temporal allele frequency change. We further develop an estimator for effective population size that is appropriate when dealing with dominant gene markers. Comparison with estimates based on codominantly expressed genes, such as allozymes or microsatellites, indicates that about twice as many loci or sampled individuals are required when using dominant markers to achieve the same precision.  相似文献   

11.
Microarray-based pooled DNA methods overcome the cost bottleneck of simultaneously genotyping more than 100000 markers for numerous study individuals. The success of such methods relies on the proper adjustment of preferential amplification/hybridization to ensure accurate and reliable allele frequency estimation. We performed a hybridization-based genome-wide single nucleotide polymorphisms (SNPs) genotyping analysis to dissect preferential amplification/hybridization. The majority of SNPs had less than 2-fold signal amplification or suppression, and the lognormal distributions adequately modeled preferential amplification/hybridization across the human genome. Comparative analyses suggested that the distributions of preferential amplification/hybridization differed among genotypes and the GC content. Patterns among different ethnic populations were similar; nevertheless, there were striking differences for a small proportion of SNPs, and a slight ethnic heterogeneity was observed. To fulfill appropriate and gratuitous adjustments, databases of preferential amplification/hybridization for African Americans, Caucasians and Asians were constructed based on the Affymetrix GeneChip Human Mapping 100 K Set. The robustness of allele frequency estimation using this database was validated by a pooled DNA experiment. This study provides a genome-wide investigation of preferential amplification/hybridization and suggests guidance for the reliable use of the database. Our results constitute an objective foundation for theoretical development of preferential amplification/hybridization and provide important information for future pooled DNA analyses.  相似文献   

12.
Various conventional methods to estimate the mean and median power spectral frequencies, and amplitude of the surface electromyogram during 30-90 min, cyclic, force-varying, constant-posture contractions were cross-compared in an experimental trial. The aim was to determine the most appropriate algorithm implementations and reduce the total number of algorithms that need to be considered when monitoring time trends. Subjects produced hand-grip contractions in a repeated intermittent pattern until exhaustion. For all estimated parameters: analysis of contraction levels below 25% maximum voluntary contraction produced poor estimates due to high relative measurement noise; parameter reproducibility was best when comparisons were aligned to the actual force produced rather than the target force and when the biomechanics of the contraction were more consistent; and estimates were not greatly influenced by the rate of change of the force trajectory. For frequency parameters: estimates based on the short-time Fourier transform were similar to those based on time-varying autoregressive methods; longer duration analysis windows exhibited better repeatability; and simple frequency-domain noise filters were not effective in reducing the impact of measurement noise. For amplitude estimates: whitening reduced the variance of the amplitude estimate; and the best analysis window duration was a trade-off between bias (decreased with a short duration window) and variance (decreased with a long duration window).  相似文献   

13.
Allele-rich VNTR loci provide valuable information for forensic inference. Interpretation of this information is complicated by measurement error, which renders discrete alleles difficult to distinguish. Two methods have been used to circumvent this difficulty--i.e., binning methods and direct evaluation of allele frequencies, the latter achieved by modeling the data as a mixture distribution. We use this modeling approach to estimate the allele frequency distributions for two loci--D17S79 and D2S44--for black, Caucasian, and Hispanic samples from the Lifecodes and FBI data bases. The data bases are differentiated by the restriction enzyme used: PstI (Lifecodes) and HaeIII (FBI). Our results show that alleles common in one ethnic group are almost always common in all ethnic groups, and likewise for rare alleles; this pattern holds for both loci. Gene diversity, or heterozygosity, measured as one minus the sum of the squared allele frequencies, is greater for D2S44 than for D17S79, in both data bases. The average gene diversity across ethnic groups when PstI (HaeIII) is used is .918 (.918) for D17S79 and is .985 (.983) for D2S44. The variance in gene diversity among ethnic groups is greater for D17S79 than for D2S44. The number of alleles, like the gene diversity, is greater for D2S44 than for D17S79. The mean numbers of alleles across ethnic groups, estimated from the PstI (HaeIII) data, are 40.25 (41.5) for D17S79 and 104 (103) for D2S44. The number of alleles is correlated with sample size. We use the estimated allele frequency distributions for each ethnic group to explore the effects of unwittingly mixing populations and thereby violating independence assumptions. We show that, even in extreme cases of mixture, the estimated genotype probabilities are good estimates of the true probabilities, contradicting recent claims. Because the binning methods currently used for forensic inference show even less differentiation among ethnic groups, we conclude that mixture has little or no impact on the use of VNTR loci for forensics.  相似文献   

14.
Carrier frequencies for the allele(s) causing Sandhoff disease have been estimated for the U.S. Jewish and non-Jewish populations. The estimates have been made directly, with data from 22,043 Jewish and 32,342 non-Jewish individuals measured for total serum hexosaminidase activity and the heat-labile fraction. These values have been shown to identify potential carriers of the Sandhoff allele(s) with 95% sensitivity. Subsequent leukocyte assays of total hexosaminidase activity and the heat-labile fraction in those identified in serum tests have been shown to provide a much finer discrimination between those who carry the allele(s) and those who do not. Results from such assays were used to generate these carrier frequency estimates. Carrier frequency estimates have also been made indirectly from Sandhoff disease incidence data collected during the period 1979-84. These estimates are in agreement with data for the Jewish population under analysis, but in the non-Jewish population the estimate derived from data on screened individuals is greater than the estimate derived from incidence figures. The possible causes for such a difference are discussed. In a study of non-Jewish individuals each of whose grandparents derives from a single country of origin, the distribution of countries among Sandhoff disease carriers differs significantly from that in the non-Jewish sample under analysis, indicating possible ethnic groups with increased or decreased carrier frequencies. These analyses suggest an increased Sandhoff disease carrier frequency among Mexican and Central-American populations and a decreased carrier frequency among non-Jewish German populations.  相似文献   

15.
准确计算农田土壤磷储量对农业可持续发展和面源污染治理具有重要意义,但以往的磷储量研究并没有考虑不同土壤数据源和制图尺度造成的估算误差.本文以江苏北部29个县(市)约393×104 hm2旱地为例,分析了我国《县级土种志》、《地级市土种志》、《省级土种志》和《中国土种志》中记录土壤剖面资料分别建立的1∶5万、1∶25万、1∶50万、1∶100万、1∶400万和1∶1000万数据库对土壤全磷储量估算的影响.结果表明: 与数据最详细、记录有983个土壤剖面的《县级土种志》1∶5万尺度全磷密度和储量相比,其他不同土壤数据源建立的各个尺度土壤数据库估算的全磷密度和储量相对偏差分别在4.8%~48.9%和1.6%~48.4%.大部分《县级土种志》和《地级市土种志》土壤数据源建立的不同尺度全磷密度与《县级土种志》1∶5万尺度之间存在极显著或显著差异,《省级土种志》和《中国土种志》土壤数据源建立的不同尺度全磷密度与《县级土种志》1∶5万尺度之间均存在极显著差异,说明在旱地磷储量估算的研究中,选择适宜的制图尺度和土壤数据源是非常必要的.  相似文献   

16.
There is interest in general population screening for hemochromatosis and other primary iron overload disorders, although not all persons are at equal risk. We developed a model to estimate the numbers of persons in national, racial, or ethnic population subgroups in Jefferson County, Alabama, who would be detected using transferrin saturation (phenotype) or HFE mutation analysis (genotype) screening. Approximately 62% are Caucasians, 37% are African Americans, and the remainder are Hispanics, Asians, or Native Americans. The predicted phenotype frequencies are greatest in a Caucasian subgroup, ethnicity unspecified, which consists predominantly of persons of Scotch and Irish descent (0.0065 men, 0.0046 women), and in African Americans (0.0089 men, 0.0085 women). Frequencies of the HFE genotype C282Y/C282Y > or = 0.0001 are predicted to occur only among Caucasians; the greatest frequency (0.0080) was predicted to occur in the ethnicity-unspecified Caucasian population. C282Y/C282Y frequency estimates were lower in Italian, Greek, and Jewish subgroups. There is excellent agreement in the numbers of the ethnicity-unspecified Caucasians who would be detected using phenotype and genotype criteria. Our model also indicates that phenotyping would identify more persons with primary iron overload than would genotyping in our Italian Caucasian, Hispanic, and African American subgroups. This is consistent with previous observations that indicate that primary iron overload disorders in persons of southern Italian descent and African Americans are largely attributable to non-HFE alleles. Because the proportions of population subgroups and their genetic constitution may differ significantly in other geographic regions, we suggest that models similar to the present one be constructed to predict optimal screening strategies for primary iron overload disorders.  相似文献   

17.
Population genetic studies, in Australian, Assamese, Cambodian, Chinese, Caucasian and Melanesian populations, were performed with several highly polymorphic DNA loci. Results showed that the Caucasian and Chinese had the highest level of heterozygosity. The size range of the majority of the polymorphic DNA fragments of a locus was the same in the different populations. The distinguishing feature of each ethnic group was the relative frequency of a particular set or group of alleles. For example, alleles greater than 9.0 kb in size, in D14S13, or from 4.5 to 4.7 kb, in D18S27, were less than half as frequent in Caucasians than in the other populations. Overall, there were groups of alleles, at one or more loci, whose frequencies were different among some of the ethnic groups and therefore could be used to differentiate one group from the other.  相似文献   

18.
T(1;13)70H/+ translocation heretozygous mice were used for assessing heritability values for chiasma frequencies and the epididymal sperm count. The chiasma frequency estimates were based on 15 son-sire pairs, the translocation heterozygotes being maintained in a Swiss random-bred genetic background. The chiasma frequencies were scored separately for the T70H/+ derived multivalent, specific pairing segments within the multivalent and the remaining bivalents. Chiasma counts within these specified parts of the genome were positively correlated. The heritability estimates, significantly greater than zero, ranged from 0.78-0.98, depending on the chromosome segments included. These results indicate a strong genetic control on a cellular basis for the formation of chiasmata in the mouse. Despite significantly positive correlations and regressions between the various chiasma frequencies and the sperm count (for which 29 pairs of observations were available), no significant heritability estimate for the sperm count was obtained. The relation between the chiasma frequency and the sperm count was weakest when the chiasma count was confined to a region of the translocation-caused multivalent in which the absence of a chiasma almost always resulted in the production of an univalent. This indicates that in the translocation heterozygotes used, the overall chiasma frequency has a greater predictive value for the sperm count than autosomal univalence alone.  相似文献   

19.
We estimated the frequencies of serum butyrylcholinesterase (BChE) alleles in three tribes of Mapuche Indians from southern Chile, using enzymatic methods, and we estimated the frequency of allele BCHE*K in one tribe using primer reduced restriction analysis (PCR-PIRA). The three tribes have different degrees of European admixture, which is reflected in the observed frequencies of the atypical allele BCHE*A: 1.11% in Huilliches, 0.89% in Cuncos, and 0% in Pehuenches. This result is evidence in favor of the hypothesis that BCHE*A is absent in native Amerindians. The frequencies of BCHE*F were higher than in most reported studies (3.89%, 5.78%, and 4.41%, respectively). These results are probably due to an overestimation of the frequency of allele BCHE*F, since none of the 20 BCHE UF individuals (by the enzymatic test) individuals analyzed showed either of the two DNA base substitutions associated with this allele. Although enzymatic methods rarely detect the presence of allele BCHE*K, PCR-PIRA found the allele in an appreciable frequency (5.76%), although lower than that found in other ethnic groups. Since observed frequencies of unusual alleles correspond to estimated percentages of European admixture, it is likely that none of these unusual alleles were present in Mapuche Indians before the arrival of Europeans.  相似文献   

20.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号