首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 265 毫秒
1.
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.  相似文献   

2.
Robust estimation of allele frequencies in pools of DNA has the potential to reduce genotyping costs and/or increase the number of individuals contributing to a study where hundreds of thousands of genetic markers need to be genotyped in very large populations sample sets, such as genome wide association studies. In order to make accurate allele frequency estimations from pooled samples a correction for unequal allele representation must be applied. We have developed the polynomial based probe specific correction (PPC) which is a novel correction algorithm for accurate estimation of allele frequencies in data from high-density microarrays. This algorithm was validated through comparison of allele frequencies from a set of 10 individually genotyped DNA's and frequencies estimated from pools of these 10 DNAs using GeneChip 10K Mapping Xba 131 arrays. Our results demonstrate that when using the PPC to correct for allelic biases the accuracy of the allele frequency estimates increases dramatically.  相似文献   

3.
Molecular markers produced by next‐generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user‐friendly application assessing the accuracy of allele frequency estimation from both pool‐ and individual‐based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site–associated DNA (RAD) sequencing in pool‐ and individual‐based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost‐effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome‐wide patterns of genetic variation for large numbers of individuals in multiple populations.  相似文献   

4.
Goldringer I  Bataillon T 《Genetics》2004,168(1):563-568
The effective population size (Ne) is frequently estimated using temporal changes in allele frequencies at neutral markers. Such temporal changes in allele frequencies are usually estimated from the standardized variance in allele frequencies (Fc). We simulate Wright-Fisher populations to generate expected distributions of Fc and of Fc (Fc averaged over several loci). We explore the adjustment of these simulated Fc distributions to a chi-square distribution and evaluate the resulting precision on the estimation of Ne for various scenarios. Next, we outline a procedure to test for the homogeneity of the individual Fc across loci and identify markers exhibiting extreme Fc-values compared to the rest of the genome. Such loci are likely to be in genomic areas undergoing selection, driving Fc to values greater (or smaller) than expected under drift alone. Our procedure assigns a P-value to each locus under the null hypothesis (drift is homogeneous throughout the genome) and simultaneously controls the rate of false positive among loci declared as departing significantly from the null. The procedure is illustrated using two published data sets: (i) an experimental wheat population subject to natural selection and (ii) a maize population undergoing recurrent selection.  相似文献   

5.
Sequencing of pooled samples (Pool-Seq) using next-generation sequencing technologies has become increasingly popular, because it represents a rapid and cost-effective method to determine allele frequencies for single nucleotide polymorphisms (SNPs) in population pools. Validation of allele frequencies determined by Pool-Seq has been attempted using an individual genotyping approach, but these studies tend to use samples from existing model organism databases or DNA stores, and do not validate a realistic setup for sampling natural populations. Here we used pyrosequencing to validate allele frequencies determined by Pool-Seq in three natural populations of Arabidopsis halleri (Brassicaceae). The allele frequency estimates of the pooled population samples (consisting of 20 individual plant DNA samples) were determined after mapping Illumina reads to (i) the publicly available, high-quality reference genome of a closely related species (Arabidopsis thaliana) and (ii) our own de novo draft genome assembly of A. halleri. We then pyrosequenced nine selected SNPs using the same individuals from each population, resulting in a total of 540 samples. Our results show a highly significant and accurate relationship between pooled and individually determined allele frequencies, irrespective of the reference genome used. Allele frequencies differed on average by less than 4%. There was no tendency that either the Pool-Seq or the individual-based approach resulted in higher or lower estimates of allele frequencies. Moreover, the rather high coverage in the mapping to the two reference genomes, ranging from 55 to 284x, had no significant effect on the accuracy of the Pool-Seq. A resampling analysis showed that only very low coverage values (below 10-20x) would substantially reduce the precision of the method. We therefore conclude that a pooled re-sequencing approach is well suited for analyses of genetic variation in natural populations.  相似文献   

6.
Single-nucleotide polymorphisms (SNPs) are considered useful polymorphic markers for genetic studies of polygenic traits. A new practical approach to high-throughput genotyping of SNPs in a large number of individuals is needed in association study and other studies on relationships between genes and diseases. We have developed an accurate and high-throughput method for determining the allele frequencies by pooling the DNA samples and applying a DNA microarray hybridization analysis. In this method, the combination of the microarray, DNA pooling, probe pair hybridization, and fluorescent ratio analysis solves the dual problems of parallel multiple sample analysis, and parallel multiplex SNP genotyping for association study. Multiple DNA samples are immobilized on a slide and a single hybridization is performed with a pool of allele-specific oligonucleotide probes. The results of this study show that hybridization of microarray from pooled DNA samples can accurately obtain estimates of absolute allele frequencies in a sample pool. This method can also be used to identify differences in allele frequencies in distinct populations. It is amenable to automation and is suitable for immediate utilization for high-throughput genotyping of SNP.  相似文献   

7.
Despite the increasing opportunity to collect large‐scale data sets for population genomic analyses, the use of high‐throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty–ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high‐throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.  相似文献   

8.

Objective

The purpose of this study was to develop a novel approach without control population to examine the relationship between the presence of specific allele combinations at different loci with the onset of gastric cancer.

Methods

DNA samples were collected from patients with gastric cancer. Alleles from short tandem repeat loci were determined using the STR Profiler Plus PCR amplification kit (15 STR loci). The observed and expected frequencies of specific allele combinations were calculated; statistically significant allele combinations were identified by comparing the observed frequency with the expected frequency. The age at disease onset of patients carrying a specific allele combination was further analysed; allele combinations related to the gastric cancer were effectively identified from the large number of possible allele combinations by cross-validation of the 2 sets of analytical results.

Results

A total of 2209 pairwise combinations were obtained by computer counting, of which 11 pairs of genes showed significant differences between the observed and expected frequencies (p < 0.05). The p value for the cross-validation was also less than 0.05 for 2 pair of alleles (D8S1179-16 and D5S818-13; D2S1338-23 and D6S1043-11).

Conclusion

Gastric cancer onset may be associated with these allele combinations. The new methodology without control group will enable additional discoveries pertaining to the relationship between specific allele combinations at different loci and the onset of complex diseases.  相似文献   

9.
The usual method to locate and compare loci regulating quantitative traits (QTLs) requires a segregating population of plants with each one genotyped with molecular markers. However, plants from such segregating populations can also be grouped according to phenotypic expression of a trait and tested for differences in allele frequency between the population bulks: bulk segregant analysis (BSA). The same probes used for making a genetic map (e.g. isozyme, RFLP, RAPD, etc) can be used for BSA. A molecular marker showing polymorphism between the parents of the population and which is closely-linked to a major QTL regulating a particular trait will mainly co-segregate with that QTL, i.e. segregate according to the phenotype if the QTL has a large effect. Thus, if plants are grouped according to expression of the trait and extreme groups tested with that polymorphic marker, the frequency of the two marker alleles present within each of the two bulks should deviate significantly from the ratio of 1 : 1 expected for most populations. As chromosomal locations of many molecular markers have now been determined in many species, the map location of closely-linked QTLs can therefore be deduced without having to genotype every individual in segregating populations. This has been used successfully with composite populations of maize to locate QTLs associated with yield under severe drought. An inbred line derived from one of the populations selected for higher drought yield has been crossed with a drought-susceptible inbred line to produce a mapping population for QTL analysis of physiological and developmental traits likely to regulate yield under drought. Future work to identify traits having QTLs with flanking markers showing significant allele frequency differences in the GSA studies will indicate those traits likely to be important in determining yield under drought.Key words: Bulk segregant analysis (BSA), drought resistance, genetic maps, maize, molecular markers, Zea mays (L.).   相似文献   

10.
 In order to compare the potential of enzyme and DNA markers to investigate genetic diversity within and among populations, ten maize populations were characterized for (1) 20 isozyme loci and (2) restriction fragment length polymorphism (RFLP) for 35 probe-enzyme combinations. Each population was represented by a sample of at least 30 individuals. The average number of alleles detected per locus was clearly higher for RFLPs (6.3) than for isozymes (2.4). Similarly, total diversity was higher for RFLPs (0.60) than for isozymes (0.23). This difference is consistent with observations on inbred-line collections and can be related to the fact that many variations at the DNA level do not change the amino-acid composition or the global charge of proteins. By contrast, the magnitude of population differentiation, relative to the total diversity, was similar for isozymes (23%) and RFLPs (22%). This suggests that the isozyme and RFLP loci considered in this study are subject to similar evolutionary forces, and that both are mostly neutral. However, RFLPs proved clearly superior to isozymes both to (1) identify the origin of a given individual and (2) reveal a relevant genetic structure among populations. The higher polymorphism observed for RFLP loci and the greater number of these loci contributed to the superior discriminative ability of the RFLP data. Received: 1 June 1997 / Accepted: 3 November 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号