首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
STRUCTURE is the most widely used clustering software to detect population genetic structure. The last version of this software (STRUCTURE 2.1) has been enhanced recently to take into account the occurrence of linkage disequilibrium (LD) caused by admixture between populations. This last version, however, still does not consider the effects of strong background LD caused by genetic drift, and which may cause spurious results. STRUCTURE authors have, therefore, suggested a rough threshold value of the distance (1.0 cM) between two loci below which the pair of loci should not be used. Because of the sensitiveness of LD to demographic events, the distance between loci is not always a good indicator of the strength of LD. In this study, we examine the link between genomic distance and the strength of the correlation between loci (r(LD)) in a free-ranging population of mouflon (Ovis aries), and we present an empirical test of effect of r(LD) on the clustering results provided by the linkage model in STRUCTURE. We showed that a high r(LD) value increases the probability of detecting spurious clustering. We propose to use r(LD) as an index to base a decision on whether or not to use a pair of loci in a clustering analysis.  相似文献   

2.

Background  

Combining data from different ethnic populations in a study can increase efficacy of methods designed to identify expression quantitative trait loci (eQTL) compared to analyzing each population independently. In such studies, however, the genetic diversity of minor allele frequencies among populations has rarely been taken into account. Due to the fact that allele frequency diversity and population-level expression differences are present in populations, a consensus regarding the optimal statistical approach for analysis of eQTL in data combining different populations remains inconclusive.  相似文献   

3.

Background

The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics.

Methods

In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set.

Results

The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found.

Conclusion

This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present.  相似文献   

4.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

5.

Background

The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy.

Methodology/Principal Findings

We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability.

Conclusions/Significance

This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.  相似文献   

6.

Background  

Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data.  相似文献   

7.

Background  

The frequency of a haplotype comprising one allele at each of two loci can be expressed as a cubic equation (the 'Hill equation'), the solution of which gives that frequency. Most haplotype and linkage disequilibrium analysis programs use iteration-based algorithms which substitute an estimate of haplotype frequency into the equation, producing a new estimate which is repeatedly fed back into the equation until the values converge to a maximum likelihood estimate (expectation-maximisation).  相似文献   

8.
Since Mexican mestizos are an admixed population, it is necessary to determine the effects that the substructure of the population has on genetic and forensic parameters. With this aim, a study was performed with 15 STR loci (CODIS plus D2S1338 and D19S433) on 1,640 unrelated Mexican mestizos. We determine allele and genotypic frequencies observing departure from Hardy–Weinberg expectation (12 out of 15 loci, with an excess of homozygotes, Fis?>?0), as well as pairs of loci in an apparent linkage disequilibrium (13 of 92 loci). We conducted a test for genetic population stratification, the results show that the Mexican mestizo population is substructured into three subgroups, which are in HW and linkage equilibrium. The combination of the 15 loci in the whole population has high forensic efficiency with the capacity to genetically discriminate one individual in one quintillion (1/1018). Our data potentially validates the use of these 15 STR loci to establish forensic identity and parentage testing for legal purposes, and offers a powerful tool for genetic variation analysis. However, given that the population is stratified, we highly recommend applying a correction with the inbreeding coefficient in calculations of paternity and forensic studies to avoid erroneous assumptions.  相似文献   

9.
Variance component modeling for linkage analysis of quantitative traits is a powerful tool for detecting and locating genes affecting a trait of interest, but the presence of genetic heterogeneity will decrease the power of a linkage study and may even give biased estimates of the location of the quantitative trait loci. Many complex diseases are believed to be influenced by multiple genes and therefore genetic heterogeneity is likely to be present for many real applications of linkage analysis. We consider a mixture of multivariate normals to model locus heterogeneity by allowing only a proportion of the sampled pedigrees to segregate trait-influencing allele(s) at a specific locus. However, for mixtures of normals the classical asymptotic distribution theory of the maximum likelihood estimates does not hold, so tests of linkage and/or heterogeneity are evaluated using resampling methods. It is shown that allowing for genetic heterogeneity leads to an increase in power to detect linkage. This increase is more prominent when the genetic effect of the locus is small or when the percentage of pedigrees not segregating trait-influencing allele(s) at the locus is high.  相似文献   

10.

Background

Haplotype analysis of closely associated markers has proven to be a powerful tool in kinship analysis, especially when short tandem repeats (STR) fail to resolve uncertainty in relationship analysis. STR located on the X chromosome show stronger linkage disequilibrium compared with autosomal STR. So, it is necessary to estimate the haplotype frequencies directly from population studies as linkage disequilibrium is population-specific.

Methodology and Findings

Twenty-six X-STR loci including six clusters of linked markers DXS6807-DXS8378-DXS9902(Xp22), DXS7132-DXS10079-DXS10074-DXS10075-DXS981 (Xq12), DXS6801-DXS6809-DXS6789-DXS6799(Xq21), DXS7424-DXS101-DXS7133(Xq22), DXS6804-GATA172D05(Xq23), DXS8377-DXS7423 (Xq28) and the loci DXS6800, DXS6803, DXS9898, GATA165B12, DXS6854, HPRTB and GATA31E08 were typed in four nationality (Han, Uigur, Kazakh and Mongol) samples from China (n = 1522, 876 males and 646 females). Allele and haplotype frequency as well as linkage disequilibrium data for kinship calculation were observed. The allele frequency distribution among different populations was compared. A total of 5–20 alleles for each locus were observed and altogether 289 alleles for all the selected loci were found. Allele frequency distribution for most X-STR loci is different in different populations. A total of 876 male samples were investigated by haplotype analysis and for linkage disequilibrium. A total of 89, 703, 335, 147, 39 and 63 haplotypes were observed. Haplotype diversity was 0.9584, 0.9994, 0.9935, 0.9736, 0.9427 and 0.9571 for cluster I, II, III, IV, V and VI, respectively. Eighty-two percent of the haplotype of cluster IIwas found only once. And 94% of the haplotype of cluster III show a frequency of <1%.

Conclusions

These results indicate that allele frequency distribution for most X-STR loci is population-specific and haplotypes of six clusters provide a powerful tool for kinship testing and relationship investigation. So it is necessary to obtain allele frequency and haplotypes data of the linked loci for forensic application.  相似文献   

11.
Genetic individuals, or genets, of Armillaria and other root-infecting basidiomycetes are usually found in discrete patches that often include the root systems of several adjacent trees. Each diploid individual is thought to arise in an unique mating event and then grow vegetatively in an expanding territory over a long period of time. Our objective in this study was to describe the population from which such genetic individuals are drawn. In a sample including 274 collections representing 121 genetic individuals of A. gallica (synonym A. bulbosa ) from two sites in each of four regions of eastern North America, genotype frequencies at seven nuclear loci were not significantly different from Hardy-Weinberg expectations. Furthermore, allele frequencies at the seven loci were not significantly different between regions. Additional allelic data from four non-contiguous regions of mitochondrial DNA showed little or no population subdivision over the four regions. Analysis of the distribution of multilocus mtDNA haplotypes revealed some clonal transmission of mtDNAs between genets and nonrandom mating within sites. Despite the sharing of mtDNA types by some individuals, the overall sample contained a high level of genotypic diversity. The apparent linkage equilibrium between some pairs of loci and the high level of phylogenetic inconsistency among all four loci suggest the occurrence heteroplasmy and recombination among mtDNAs of A. gallica in nature. In laboratory matings of two haploid strains with different mtDNA types, a low frequency of recombination in mtDNA was detected.  相似文献   

12.
The limpet Patella ferruginea is one of the most endangered marine invertebrates on western Mediterranean rocky shores. We have isolated and characterised 11 polymorphic microsatellites markers to provide new tools to investigate genetic structure and gather information necessary for the proper management of this severely threatened species. The number of alleles per locus ranged from 2 to 16 (mean; Na = 8.37) with an average observed heterozygosity of 0.64 (He = 0.66). The levels of polymorphism uncovered at these loci suggest that they should be useful for population genetic studies, parentage analysis and assessment of connectivity among protected areas. None of the pairwaise comparisons among loci showed significant linkage disequilibrium after sequential Bonferroni correction. All but one locus (Pf-31IF1) conformed to HW equilibrium. Further investigation revealed that departures at Pf-31IF1 were not caused by null alleles. Results from cross-species amplifications suggest that some of these loci may also be useful for Patella tenuis (two loci) P. ulyssiponensis (one locus) and P. piperata (three loci).  相似文献   

13.
Polymorphic Admixture Typing in Human Ethnic Populations   总被引:5,自引:4,他引:1       下载免费PDF全文
A panel of 257 RFLP loci was selected on the basis of high heterozygosity in Caucasian DNA surveys and equivalent spacing throughout the human genome. Probes from each locus were used in a Southern blot survey of allele frequency distribution for four human ethnic groups: Caucasian, African American, Asian (Chinese), and American Indian (Cheyenne). Nearly all RFLP loci were polymorphic in each group, albeit with a broad range of differing allele frequencies (δ). The distribution of frequency differences (δ values) was used for three purposes: (1) to provide estimates for genetic distance (differentiation) among these ethnic groups, (2) to revisit with a large data set the proportion of human genetic variation attributable to differentiation within ethnic groups, and (3) to identify loci with high δ values between recently admixed populations of use in mapping by admixture linkage disequilibrium (MALD). Although most markers display significant allele frequency differences between ethnic groups, the overall genetic distances between ethnic groups were small (.066–.098), and <10% of the measured overall molecular genetic diversity in these human samples can be attributed to “racial” differentiation. The median δ values for pairwise comparisons between groups fell between .15 and .20, permitting identification of highly informative RFLP loci for MALD disease association studies.  相似文献   

14.

Background  

Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.  相似文献   

15.

Background

Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee.

Results

Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ2 statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r2 values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency.

Conclusion

We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html  相似文献   

16.

Background

Kelp (Saccharina japonica) has been intensively cultured in China for almost a century. Its genetic improvement is comparable with that of rice. However, the development of its molecular tools is extremely limited, thus its genes, genetics and genomics. Kelp performs an alternative life cycle during which sporophyte generation alternates with gametophyte generation. The gametophytes of kelp can be cloned and crossed. Due to these characteristics, kelp may serve as a reference for the biological and genetic studies of Volvox, mosses and ferns.

Results

We constructed a high density single nucleotide polymorphism (SNP) linkage map for kelp by restriction site associated DNA (RAD) sequencing. In total, 4,994 SNP-containing physical (tag-defined) RAD loci were mapped on 31 linkage groups. The map expanded a total genetic distance of 1,782.75 cM, covering 98.66% of the expected (1,806.94 cM). The length of RAD tags (85 bp) was extended to 400–500 bp with Miseq method, offering us an easiness of developing SNP chips and shifting SNP genotyping to a high throughput track. The number of linkage groups was in accordance with the documented with cytological methods. In addition, we identified a set of microsatellites (99 in total) from the extended RAD tags. A gametophyte sex determining locus was mapped on linkage group 2 in a window about 9.0 cM in width, which was 2.66 cM up to marker_40567 and 6.42 cM down to marker_23595.

Conclusions

A high density SNP linkage map was constructed for kelp, an intensively cultured brown alga in China. The RAD tags were also extended so that a SNP chip could be developed. In addition, a set of microsatellites were identified among mapped loci, and a gametophyte sex determining locus was mapped. This map will facilitate the genetic studies of kelp including for example the evaluation of germplasm and the decipherment of the genetic bases of economic traits.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1371-1) contains supplementary material, which is available to authorized users.  相似文献   

17.
Isolated populations that recently have been derived from small homogeneous groups of founders should have low genetic diversity and high levels of linkage disequilibrium and should be ideal for mapping ancestral polymorphisms that influence complex genetic disease susceptibility. Populations that fulfill these criteria have been difficult to identify. We have been looking for Polynesian populations with these characteristics, because Polynesians have high rates of complex genetic diseases. In Niue Islanders all ancestral female (mitochondrial HSVI sequence) and 90.4% of ancestral male (Y-chromosome haplogroup) lineages are of Southeast Asian origin. The frequency of European Y-chromosome haplogroups is 7.2%. The diversities of mitochondrial HSV1 sequences (h = 0.18 +/- 0.05) and Y-chromosome haplo-groups (h = 0.18 +/- 0.05) are lower than values published for any other population. Ten autosomal microsatellites spaced over 5.8 cM show low allele numbers in Niue Islanders relative to Europeans (55 vs. 88 total alleles, respectively) and a modest reduction in heterozygous loci (0.71 +/- 0.02 vs. 0.78 +/- 0.02, p = 0.04). The higher linkage disequilibrium (d2) between these loci in Niue Islanders relative to Europeans (p = 0.001) is negatively correlated (r = -0.47, p = 0.01) with genetic distance. In summary, Niue Islanders are genetically isolated and have a homogeneous Southeast Asian ancestry. They have reduced autosomal genetic diversity and high levels of linkage disequilibrium that are consistent with the influence of genetic drift mechanisms, such as a founder effect or bottlenecks. High-powered linkage disequilibrium studies designed to map ancestral polymorphisms that influence complex genetic disease susceptibility may be feasible in this population.  相似文献   

18.
Although many studies have shown that animal-associated bacterial species exhibit linkage disequilibrium at chromosomal loci, recent studies indicate that both animal-associated and soil-borne bacterial species can display a nonclonal genetic structure in which alleles at chromosomal loci are in linkage equilibrium. To examine the situation in soil-borne species further, we compared genetic structure in two soil populations of Rhizobium leguminosarum bv. trifolii and two populations of R. leguminosarum bv. viciae from two sites in Oregon, with genetic structure in R. leguminosarum bv. viciae populations recovered from peas grown at a site in Washington, USA, and at a site in Norfolk, UK. A total of 234 chromosomal types (ET) were identified among 682 strains analysed for allelic variation at 13 enzyme-encoding chromosomal loci by multilocus enzyme electrophoresis (MLEE). Chi-square tests for heterogeneity of allele frequencies showed that the populations were not genetically uniform. A comparison of the genetic diversity within combined and individual populations confirmed that the Washington population was the primary cause of genetic differentiation between the populations. Each individual population exhibited linkage disequilibrium, with the magnitude of the disequilibrium being greatest in the Washington population and least in the UK population of R. leguminosarum bv. viciae. Linkage disequilibrium in the UK population was created between two clusters of 9 and 23 ETs, which, individually, were in linkage equilibrium. Strong linkage disequilibrium between the two major clusters of 8 and 12 ETs in the Washington population was caused by the low genetic diversity of the ETs within each cluster relative to the inter-cluster genetic distance. Because neither the magnitude of genetic diversity nor of linkage disequilibrium increased as hierarchical combinations of the six local populations were analysed, we conclude that the populations have not been isolated from each other for sufficient time, nor have they been exposed to enough selective pressure to develop unique multilocus genetic structure.  相似文献   

19.

Background  

Worldwide, coral reefs are in decline due to a range of anthropogenic disturbances, and are now also under threat from global climate change. Virtually nothing is currently known about the genetic factors that might determine whether corals adapt to the changing climate or continue to decline. Quantitative genetics studies aiming to identify the adaptively important genomic loci will require a high-resolution genetic linkage map. The phylogenetic position of corals also suggests important applications for a coral genetic map in studies of ancestral metazoan genome architecture.  相似文献   

20.

Background  

Association mapping using abundant single nucleotide polymorphisms is a powerful tool for identifying disease susceptibility genes for complex traits and exploring possible genetic diversity. Genotyping large numbers of SNPs individually is performed routinely but is cost prohibitive for large-scale genetic studies. DNA pooling is a reliable and cost-saving alternative genotyping method. However, no software has been developed for complete pooled-DNA analyses, including data standardization, allele frequency estimation, and single/multipoint DNA pooling association tests. This motivated the development of the software, 'PDA' (Pooled DNA Analyzer), to analyze pooled DNA data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号