首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments (). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5′698 Holstein Friesian bulls genotyped with 50 K SNPs and 1′332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2–10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is . The proportion of genetic variance captured by the complete SNP sets () was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20′000 SNPs in the Brown Swiss population studied.  相似文献   

3.
The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods—downsampling, imputation and subsampling—with sample sizes of 4–100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.  相似文献   

4.
Nielsen R 《Genetics》2000,154(2):931-942
Some general likelihood and Bayesian methods for analyzing single nucleotide polymorphisms (SNPs) are presented. First, an efficient method for estimating demographic parameters from SNPs in linkage equilibrium is derived. The method is applied in the estimation of growth rates of a human population based on 37 SNP loci. It is demonstrated how ascertainment biases, due to biased sampling of loci, can be avoided, at least in some cases, by appropriate conditioning when calculating the likelihood function. Second, a Markov chain Monte Carlo (MCMC) method for analyzing linked SNPs is developed. This method can be used for Bayesian and likelihood inference on linked SNPs. The utility of the method is illustrated by estimating recombination rates in a human data set containing 17 SNPs and 60 individuals. Both methods are based on assumptions of low mutation rates.  相似文献   

5.
Heritability is a central element in quantitative genetics. New molecular markers to assess genetic variance and heritability are continually under development. The availability of molecular single nucleotide polymorphism (SNP) markers can be applied for estimation of variance components and heritability on population, where relationship information is unknown. In this study, we evaluated the capabilities of two Bayesian genomic models to estimate heritability in simulated populations. The populations comprised different family structures of either no or a limited number of relatives, a single quantitative trait, and with one of two densities of SNP markers. All individuals were both genotyped and phenotyped. Results illustrated that the two models were capable of estimating heritability, when true heritability was 0.15 or higher and populations had a sample size of 400 or higher. For heritabilities of 0.05, all models had difficulties in estimating the true heritability. The two Bayesian models were compared with a restricted maximum likelihood (REML) approach using a genomic relationship matrix. The comparison showed that the Bayesian approaches performed equally well as the REML approach. Differences in family structure were in general not found to influence the estimation of the heritability. For the sample sizes used in this study, a 10-fold increase of SNP density did not improve precision estimates compared with set-ups with a less dense distribution of SNPs. The methods used in this study showed that it was possible to estimate heritabilities on the basis of SNPs in animals with direct measurements. This conclusion is valuable in cases when quantitative traits are either difficult or expensive to measure.  相似文献   

6.
Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results.  相似文献   

7.

Background

Genomic selection estimates genetic merit based on dense SNP (single nucleotide polymorphism) genotypes and phenotypes. This requires that SNPs explain a large fraction of the genetic variance. The objectives of this work were: (1) to estimate the fraction of genetic variance explained by dense genome-wide markers using 54 K SNP chip genotyping, and (2) to evaluate the effect of alternative marker-based relationship matrices and corrections for the base population on the fraction of the genetic variance explained by markers.

Methods

Two alternative marker-based relationship matrices were estimated using 35 706 SNPs on 1086 dairy bulls. Both pedigree- and marker-based relationship matrices were fitted simultaneously or separately in an animal model to estimate the fraction of variance not explained by the markers, i.e. the fraction explained by the pedigree. The phenotypes considered in the analysis were the deregressed estimated breeding values (dEBV) for milk, fat and protein yield and for somatic cell score (SCS).

Results

When dEBV were not sufficiently accurate (50 or 70%), the estimated fraction of the genetic variance explained by the markers was around 65% for yield traits and 45% for SCS. Scaling marker genotypes with locus-specific frequencies of heterozygotes slightly increased the variance explained by markers, compared with scaling with the average frequency of heterozygotes across loci. The estimated fraction of the genetic variance explained by the markers using separately both relationships matrices followed the same trends but the results were underestimated. With less accurate dEBV estimates, the fraction of the genetic variance explained by markers was underestimated, which is probably an artifact due to the dEBV being estimated by a pedigree-based animal model.

Conclusions

When using only highly accurate dEBV, the proportion of the genetic variance explained by the Illumina 54 K SNP chip was approximately 80% for Brown Swiss cattle. These results depend on the SNP chip used and the family structure of the population, i.e. more dense SNPs and closer family relationships are expected to result in a higher fraction of the variance explained by the SNPs.  相似文献   

8.
Defining subpopulations using genetics has traditionally used data from microsatellite markers to investigate population structure; however, single‐nucleotide polymorphisms (SNPs) have emerged as a tool for detection of fine‐scale structure. In Hudson Bay, Canada, three polar bear (Ursus maritimus) subpopulations (Foxe Basin (FB), Southern Hudson Bay (SH), and Western Hudson Bay (WH)) have been delineated based on mark–recapture studies, radiotelemetry and satellite telemetry, return of marked animals in the subsistence harvest, and population genetics using microsatellites. We used SNPs to detect fine‐scale population structure in polar bears from the Hudson Bay region and compared our results to the current designations using 414 individuals genotyped at 2,603 SNPs. Analyses based on discriminant analysis of principal components (DAPC) and STRUCTURE support the presence of four genetic clusters: (i) Western—including individuals sampled in WH, SH (excluding Akimiski Island in James Bay), and southern FB (south of Southampton Island); (ii) Northern—individuals sampled in northern FB (Baffin Island) and Davis Strait (DS) (Labrador coast); (iii) Southeast—individuals from SH (Akimiski Island in James Bay); and (iv) Northeast—individuals from DS (Baffin Island). Population structure differed from microsatellite studies and current management designations demonstrating the value of using SNPs for fine‐scale population delineation in polar bears.  相似文献   

9.
Soybean cyst nematode (SCN) (Heterodera glycines Ichinohe) is a highly recalcitrant endoparasite of soybean roots, causing more yield loss than any other pest. To identify quantitative trait loci (QTL) controlling resistance to SCN (HG type 2.5.7, race 1), a genome-wide association study (GWAS) was performed. The association panel, consisting of 120 Chinese soybean cultivars, was genotyped with 7189 single nucleotide polymorphism (SNPs). A total of 6204 SNPs with minor allele frequency >0.05 were used to estimate linkage disequilibrium (LD) and population structure. The mean level of LD measured by r 2 declined very rapidly to half its maximum value (0.51) at 220 kb. The overall population structure was approximately coincident with geographic origin. The GWAS results identified 13 SNPs in 7 different genomic regions significantly associated with SCN resistance. Of these, three SNPs were localized in previously mapped QTL intervals, including rhg1 and Rhg4. The GWAS results also detected 10 SNPs in 5 different genomic regions associated with SCN resistance. The identified loci explained an average of 95.5% of the phenotypic variance. The proportion of phenotypic variance was due to additive genetic variance of the validated SNPs. The present study identified multiple new loci and refined chromosomal regions of known loci associated with SCN resistance. The loci and trait-associated SNPs identified in this study can be used for developing soybean cultivars with durable resistance against SCN.  相似文献   

10.
Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies.  相似文献   

11.
Liu N  Chen L  Wang S  Oh C  Zhao H 《BMC genetics》2005,6(Z1):S26
Single-nucleotide polymorphisms (SNPs) are a class of attractive genetic markers for population genetic studies and for identifying genetic variations underlying complex traits. However, the usefulness and efficiency of SNPs in comparison to microsatellites in different scientific contexts, e.g., population structure inference or association analysis, still must be systematically evaluated through large empirical studies. In this article, we use the Collaborative Studies on Genetics of Alcoholism (COGA) data from Genetic Analysis Workshop 14 (GAW14) to compare the performance of microsatellites and SNPs in the whole human genome in the context of population structure inference. A total of 328 microsatellites and 15,840 SNPs are used to infer population structure in 236 unrelated individuals. We find that, on average, the informativeness of random microsatellites is four to twelve times that of random SNPs for various population comparisons, which is consistent with previous studies. Our results also indicate that for the combined set of microsatellites and SNPs, SNPs constitute the majority among the most informative markers and the use of these SNPs leads to better inference of population structure than the use of microsatellites. We also find that the inclusion of less informative markers may add noise and worsen the results.  相似文献   

12.
Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations.  相似文献   

13.
曹宗富  马传香  王雷  蔡斌 《遗传》2010,32(9):921-928
在复杂疾病的全基因组关联研究中,人群分层现象会增加结果的假阳性率,因此考虑人群遗传结构、控制人群分层是很有必要的。而在人群分层研究中,使用随机选择的SNP的效果还有待进一步探讨。文章利用HapMap Phase2人群中无关个体的Affymetrix SNP 6.0芯片分型数据,在全基因组上随机均匀选择不同数量的SNP,同时利用f值和Fisher精确检验方法筛选祖先信息标记(Ancestry Informative Markers,AIMs)。然后利用HapMap Phase3中的无关个体的数据,以F-statistics和STRUCTURE分析两种方法评估所选出的不同SNP组合对人群的区分效果。研究发现,随机均匀分布于全基因组的SNP可用于识别人群内部存在的遗传结构。文章进一步提示,在全基因组关联研究中,当没有针对特定人群的AIMs时,可在全基因组上随机选择3000以上均匀分布的SNP来控制人群分层。  相似文献   

14.

Background

Genetic isolates such as the Ashkenazi Jews (AJ) potentially offer advantages in mapping novel loci in whole genome disease association studies. To analyze patterns of genetic variation in AJ, genotypes of 101 healthy individuals were determined using the Affymetrix EAv3 500 K SNP array and compared to 60 CEPH-derived HapMap (CEU) individuals. 435,632 SNPs overlapped and met annotation criteria in the two groups.

Results

A small but significant global difference in allele frequencies between AJ and CEU was demonstrated by a mean F ST of 0.009 (P < 0.001); large regions that differed were found on chromosomes 2 and 6. Haplotype blocks inferred from pairwise linkage disequilibrium (LD) statistics (Haploview) as well as by expectation-maximization haplotype phase inference (HAP) showed a greater number of haplotype blocks in AJ compared to CEU by Haploview (50,397 vs. 44,169) or by HAP (59,269 vs. 54,457). Average haplotype blocks were smaller in AJ compared to CEU (e.g., 36.8 kb vs. 40.5 kb HAP). Analysis of global patterns of local LD decay for closely-spaced SNPs in CEU demonstrated more LD, while for SNPs further apart, LD was slightly greater in the AJ. A likelihood ratio approach showed that runs of homozygous SNPs were approximately 20% longer in AJ. A principal components analysis was sufficient to completely resolve the CEU from the AJ.

Conclusion

LD in the AJ versus was lower than expected by some measures and higher by others. Any putative advantage in whole genome association mapping using the AJ population will be highly dependent on regional LD structure.  相似文献   

15.
16.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

17.
为研究中华绒螯蟹(Eriocheir sinensis)肌肉生长抑制素基因(myostatin, MSTN)的多态性及其与生长性状的相关性, 对中华绒螯蟹3个群体(育种群体、大赛群体、野生群体)共321个个体MSTN基因的多态性进行筛选, 发现该基因的第1外显子存在3个多态性SNP位点(S1: C714T; S2:G729A; S3:G753T), 均为处于Hardy-Weinberg平衡(P>0.05)的中、高度多态性位点。利用一般线性模型分析3个位点及其基因型组合与生长性状的相关性, 发现S1位点对中华绒螯蟹的体重和壳长等生长性状有显著影响(P≤0.05), 而其余2个位点与生长性状无显著关联性。结果表明S1位点的TT基因型对中华绒螯蟹的生长最为有利, 可作为分子标记辅助育种的候选标记。  相似文献   

18.
Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.  相似文献   

19.
Zhang J 《PloS one》2010,5(11):e13734
Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM) to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.  相似文献   

20.
Sex in Oreochromis niloticus (Nile tilapia) is principally determined by an XX/XY locus but other genetic and environmental factors also influence sex ratio. Restriction Associated DNA (RAD) sequencing was used in two families derived from crossing XY males with females from an isogenic clonal line, in order to identify Single Nucleotide Polymorphisms (SNPs) and map the sex-determining region(s). We constructed a linkage map with 3,802 SNPs, which corresponded to 3,280 informative markers, and identified a major sex-determining region on linkage group 1, explaining nearly 96% of the phenotypic variance. This sex-determining region was mapped in a 2 cM interval, corresponding to approximately 1.2 Mb in the O. niloticus draft genome. In order to validate this, a diverse family (4 families; 96 individuals in total) and population (40 broodstock individuals) test panel were genotyped for five of the SNPs showing the highest association with phenotypic sex. From the expanded data set, SNPs Oni23063 and Oni28137 showed the highest association, which persisted both in the case of family and population data. Across the entire dataset all females were found to be homozygous for these two SNPs. Males were heterozygous, with the exception of five individuals in the population and two in the family dataset. These fish possessed the homozygous genotype expected of females. Progeny sex ratios (over 95% females) from two of the males with the “female” genotype indicated that they were neomales (XX males). Sex reversal induced by elevated temperature during sexual differentiation also resulted in phenotypic males with the “female” genotype. This study narrows down the region containing the main sex-determining locus, and provides genetic markers tightly linked to this locus, with an association that persisted across the population. These markers will be of use in refining the production of genetically male O. niloticus for aquaculture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号