首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).  相似文献   

2.
Modern genomics approaches rely on the availability of high-throughput and high-density genotyping platforms. A major breakthrough in wheat genotyping was the development of an SNP array. In this study, we used a diverse panel of 172 elite European winter wheat lines to evaluate the utility of the SNP array for genomic analyses in wheat germplasm derived from breeding programs. We investigated population structure and genetic relatedness and found that the results obtained with SNP and SSR markers differ. This suggests that additional research is required to determine the optimum approach for the investigation of population structure and kinship. Our analysis of linkage disequilibrium (LD) showed that LD decays within approximately 5–10 cM. Moreover, we found that LD is variable along chromosomes. Our results suggest that the number of SNPs needs to be increased further to obtain a higher coverage of the chromosomes. Taken together, SNPs can be a valuable tool for genomics approaches and for a knowledge-based improvement of wheat.  相似文献   

3.
《Genomics》2020,112(5):3238-3246
Knowledge on population structure and genetic diversity is a focal point for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Here we used the GBS approach for the genome-wide identification of SNPs in a collection of Cynoglossus semilaevis and for the assessment of the level of genetic diversity in C. semilaevis genotypes. GBS analysis generated a total of 55.12 Gb high-quality sequence data, with an average of 0.63 Gb per sample. The total number of SNP markers was 563, 109. In order to explore the genetic diversity of C. semilaevis and to select a minimal core set representing most of the total genetic variation with minimum redundancy, C. semilaevis sequences were analyzed using high quality SNPs. Based on hierarchical clustering, it was possible to divide the collection into 2 clusters. The marine fishing populations were clustered and clearly separated from the cultured populations, and the cultured populations from Hebei was also distinct from the other two local populations. These analyses showed that genotypes were clustered based on species-related features. Differential significant SNPs were also captured and validated by GBS and SNaPshot, with linkage disequilibrium and haplotype analysis, seven SNPs have been confirmed to have obvious differentiation in two populations, which may be used as the characteristic evaluation sites of sea-captured and cultured Cynoglossus semilaevis populations. And SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs. These differential SNPs could be also employed as the characteristic evaluation sites of sea-captured and cultured Cynoglossus semilaevis populations in future.  相似文献   

4.
The implications of transitioning to single nucleotide polymorphism (SNPs) from microsatellite markers (MSs) have been investigated in a number of population genetics studies, but the effect of genomic location on the amount of information each type of marker reveals has not been explored in detail. We developed novel SNP markers flanking 1 kb regions of 13 genic (within gene or <1 kb away from gene) and 13 nongenic (>10 kb from annotated gene) MSs in the threespine stickleback genome to obtain comparable data for both types of markers. We analysed patterns of genetic diversity and divergence on various geographic scales after converting the SNP loci within each genomic region into haplotypes. Marker type (SNP haplotype or MS) and location (genic or nongenic) significantly affected most estimates of population diversity and divergence. Between‐lineage divergence was significantly higher in SNP haplotypes (genic and nongenic), however, within‐lineage divergence was similar between marker types. Most divergence and diversity measures were uncorrelated between markers, except for population differentiation which was correlated between MSs and SNP haplotypes (both genic and nongenic). Broad‐scale population structure and assignment were similarly resolved by both marker types, however, only the MSs were able to delimit fine‐scale population structuring, particularly when genic and nongenic markers were combined. These results demonstrate that estimates of genetic variability and differentiation among populations can be strongly influenced by marker type, their genomic location in relation to genes and by the interaction of these two factors. This highlights the importance of having an awareness of the inherent strengths and limitations associated with different molecular tools to select the most appropriate methods for accurately addressing various ecological and evolutionary questions.  相似文献   

5.
Unaccounted population stratification can lead to spurious associations in genome-wide association studies (GWAS) and in this context several methods have been proposed to deal with this problem. An alternative line of research uses whole-genome random regression (WGRR) models that fit all markers simultaneously. Important objectives in WGRR studies are to estimate the proportion of variance accounted for by the markers, the effect of individual markers, prediction of genetic values for complex traits, and prediction of genetic risk of diseases. Proposals to account for stratification in this context are unsatisfactory. Here we address this problem and describe a reparameterization of a WGRR model, based on an eigenvalue decomposition, for simultaneous inference of parameters and unobserved population structure. This allows estimation of genomic parameters with and without inclusion of marker-derived eigenvectors that account for stratification. The method is illustrated with grain yield in wheat typed for 1279 genetic markers, and with height, HDL cholesterol and systolic blood pressure from the British 1958 cohort study typed for 1 million SNP genotypes. Both sets of data show signs of population structure but with different consequences on inferences. The method is compared to an advocated approach consisting of including eigenvectors as fixed-effect covariates in a WGRR model. We show that this approach, used in the context of WGRR models, is ill posed and illustrate the advantages of the proposed model. In summary, our method permits a unified approach to the study of population structure and inference of parameters, is computationally efficient, and is easy to implement.  相似文献   

6.
Heritability is a central element in quantitative genetics. New molecular markers to assess genetic variance and heritability are continually under development. The availability of molecular single nucleotide polymorphism (SNP) markers can be applied for estimation of variance components and heritability on population, where relationship information is unknown. In this study, we evaluated the capabilities of two Bayesian genomic models to estimate heritability in simulated populations. The populations comprised different family structures of either no or a limited number of relatives, a single quantitative trait, and with one of two densities of SNP markers. All individuals were both genotyped and phenotyped. Results illustrated that the two models were capable of estimating heritability, when true heritability was 0.15 or higher and populations had a sample size of 400 or higher. For heritabilities of 0.05, all models had difficulties in estimating the true heritability. The two Bayesian models were compared with a restricted maximum likelihood (REML) approach using a genomic relationship matrix. The comparison showed that the Bayesian approaches performed equally well as the REML approach. Differences in family structure were in general not found to influence the estimation of the heritability. For the sample sizes used in this study, a 10-fold increase of SNP density did not improve precision estimates compared with set-ups with a less dense distribution of SNPs. The methods used in this study showed that it was possible to estimate heritabilities on the basis of SNPs in animals with direct measurements. This conclusion is valuable in cases when quantitative traits are either difficult or expensive to measure.  相似文献   

7.
High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus) samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production.  相似文献   

8.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.  相似文献   

9.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

10.
Genome-wide association and genomic selection in animal breeding   总被引:2,自引:0,他引:2  
Hayes B  Goddard M 《Génome》2010,53(11):876-883
Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.  相似文献   

11.
Sexual compatibility limits the production of cacao plantations, being an important selection criterion in breeding programs. However, the current method for characterizing compatibility, based on the frequency of flower setting after controlled pollination, is time consuming, requiring a long time to identify self-compatible individuals. The identification of molecular markers in genomic regions can be an alternative to allow early selection of self-compatible plants. The present study aimed to identify SNP markers associated with sexual compatibility in cacao, by utilizing genome-wide association (GWAS) mapping. A population of 295 individuals mostly from third-generation breeding populations, but also founder clones, was used. This population was phenotypically characterized by hand pollinating 8199 flowers and evaluating the flower retention 15 days after pollination. In addition, leaf samples of each individual were collected and DNA extracted for genotyping by sequencing, generating 5301 SNP markers after cleaning. Genome-wide association mapping analysis was performed using Synbreed, GCTA, and TASSEL softwares. Significant markers associated to incompatibility, likely in strong linkage disequilibrium, were found within a region of 196 kb, in the proximal end of chromosome 4, suggesting the existence of a major gene in that region. However, this result should be validated in a larger population, considering that only 295 trees were used here. When the SNP effects were treated as random in the estimation process, many other regions in the genome appears to be involved with sexual incompatibility in cacao. Candidate genes were found not only in the proximal end of chromosome 4 but also spread in several other regions of the genome.  相似文献   

12.
Open-pollinated progeny of Corymbia citriodora established in replicated field trials were assessed for stem diameter, wood density, and pulp yield prior to genotyping single nucleotide polymorphisms (SNP) and testing the significance of associations between markers and assessment traits. Multiple individuals within each family were genotyped and phenotyped, which facilitated a comparison of standard association testing methods and an alternative method developed to relate markers to additive genetic effects. Narrow-sense heritability estimates indicated there was significant additive genetic variance within this population for assessment traits ( $ {\widehat{h}^{{2}}} = 0.{28}\;{\text{to}}\;0.{44} $ ) and genetic correlations between the three traits were negligible to moderate (r G?=?0.08 to 0.50). The significance of association tests (p values) were compared for four different analyses based on two different approaches: (1) two software packages were used to fit standard univariate mixed models that include SNP-fixed effects, (2) bivariate and multivariate mixed models including each SNP as an additional selection trait were used. Within either the univariate or multivariate approach, correlations between the tests of significance approached +1; however, correspondence between the two approaches was less strong, although between-approach correlations remained significantly positive. Similar SNP markers would be selected using multivariate analyses and standard marker-trait association methods, where the former facilitates integration into the existing genetic analysis systems of applied breeding programs and may be used with either single markers or indices of markers created with genomic selection processes.  相似文献   

13.
Chen L  Liu N  Wang S  Oh C  Carriero NJ  Zhao H 《BMC genetics》2005,6(Z1):S130
Alcoholism is a complex disease. As with other common diseases, genetic variants underlying alcoholism have been illusive, possibly due to the small effect from each individual susceptible variant, gene x environment and gene x gene interactions and complications in phenotype definition. We conducted association tests, the family-based association tests (FBAT) and the backward haplotype transmission association (BHTA), on the Collaborative Study of the Genetics of Alcoholism (COGA) data provided by Genetic Analysis Workshop (GAW) 14. Efron's local false discovery rate method was applied to control the proportion of false discoveries. For FBAT, we compared the results based on different types of genetic markers (single-nucleotide polymorphisms (SNPs) versus microsatellites) and different phenotype definitions (clinical diagnoses versus electrophysiological phenotypes). Significant association results were found only between SNPs and clinical diagnoses. In contrast, significant results were found only between microsatellites and electrophysiological phenotypes. In addition, we obtained the association results for SNPs and microsatellites using COGA diagnosis as phenotype based on BHTA. In this case, the results for SNPs and microsatellites are more consistent. Compared to FBAT, more significant markers are detected with BHTA.  相似文献   

14.
Because of increasing litter size in Western pig breeds, additional teats are desirable to increase the capacity for nursing offspring. We applied genome‐wide SNP markers to detect QTL regions that affect teat number in a Duroc population. We phenotyped 1024 animals for total teat number. A total of 36 588 SNPs on autosomes were used in the analysis. The estimated heritability for teat number was 0.34 ± 0.05 on the basis of a genomic relationship matrix constructed from all SNP markers. Using a BayesC method, we identified a total of 18 QTL regions that affected teat number in Duroc pigs; 9 of the 18 regions were newly detected.  相似文献   

15.
Association mapping of salt tolerance in barley (Hordeum vulgare L.)   总被引:1,自引:0,他引:1  
A spring barley collection of 192 genotypes from a wide geographical range was used to identify quantitative trait loci (QTLs) for salt tolerance traits by means of an association mapping approach using a thousand SNP marker set. Linkage disequilibrium (LD) decay was found with marker distances spanning 2–8 cM depending on the methods used to account for population structure and genetic relatedness between genotypes. The association panel showed large variation for traits that were highly heritable under salt stress, including biomass production, chlorophyll content, plant height, tiller number, leaf senescence and shoot Na+, shoot Cl? and shoot, root Na+/K+ contents. The significant correlations between these traits and salt tolerance (defined as the biomass produced under salt stress relative to the biomass produced under control conditions) indicate that these traits contribute to (components of) salt tolerance. Association mapping was performed using several methods to account for population structure and minimize false-positive associations. This resulted in the identification of a number of genomic regions that strongly influenced salt tolerance and ion homeostasis, with a major QTL controlling salt tolerance on chromosome 6H, and a strong QTL for ion contents on chromosome 4H.  相似文献   

16.
Ignacy Misztal 《Genetics》2016,202(2):401-409
Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called “algorithm for proven and young” (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.  相似文献   

17.
Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results.  相似文献   

18.
Kang HM  Zaitlen NA  Wade CM  Kirby A  Heckerman D  Daly MJ  Eskin E 《Genetics》2008,178(3):1709-1723
Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.  相似文献   

19.
The use of single nucleotide polymorphism (SNP) molecular markers has provided advances in selection methodologies used in breeding programs of different crops, reducing cost and time of cultivar release. Despite the great economic and social importance of Coffea arabica, studies with SNP markers are scarce and a small number of SNP are available for this species, when compared with other crops of agronomic importance. Thus, the objective of this study was to identify and validate SNP molecular markers for the species Coffea arabica and to introduce these markers to genetic breeding by means of an accurate analysis of the diversity and genetic structure of breeding populations of this species. After quality filtering, 11,187 SNP markers were selected from the coffee population obtained from crosses between the genotypes Catuaí and Híbrido de Timor. A great number of markers were distributed in the 11 chromosomes, within transcribed regions, and were used to estimate the genetic dissimilarity among the individuals of the breeding population. Dendrogram analysis and a Bayesian approach demonstrated the formation of two groups and the discrimination of all genotypes evaluated. The expressive number of SNP molecular markers distributed throughout C. arabica genome was efficient to discriminate all the accessions evaluated in the experiment, clustering them according to their genealogies. This work identified mixtures within the progenies. The genotyping data also provided detailed information about the parental genotypes and led to the identification of new candidate parents to be introduced to the breeding program. The study discussed population structure and its consequence in obtaining improved varieties of C. arabica.  相似文献   

20.
We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P 相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号