首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation. Identification of large numbers of SNPs is helpful for genetic diversity analysis, map-based cloning, genome-wide association analyses and marker-assisted breeding. Recently, identifying genome-wide SNPs in allopolyploid Brassica napus (rapeseed, canola) by resequencing many accessions has become feasible, due to the availability of reference genomes of Brassica rapa (2n = AA) and Brassica oleracea (2n = CC), which are the progenitor species of B. napus (2n = AACC). Although many SNPs in B. napus have been released, the objective in the present study was to produce a larger, more informative set of SNPs for large-scale and efficient genotypic screening. Hence, short-read genome sequencing was conducted on ten elite B. napus accessions for SNP discovery. A subset of these SNPs was randomly selected for sequence validation and for genotyping efficiency testing using the Illumina GoldenGate assay.

Results

A total of 892,536 bi-allelic SNPs were discovered throughout the B. napus genome. A total of 36,458 putative amino acid variants were located in 13,552 protein-coding genes, which were predicted to have enriched binding and catalytic activity as a result. Using the GoldenGate genotyping platform, 94 of 96 SNPs sampled could effectively distinguish genotypes of 130 lines from two mapping populations, with an average call rate of 92%.

Conclusions

Despite the polyploid nature of B. napus, nearly 900,000 simple SNPs were identified by whole genome resequencing. These SNPs were predicted to be effective in high-throughput genotyping assays (51% polymorphic SNPs, 92% average call rate using the GoldenGate assay, leading to an estimated >450 000 useful SNPs). Hence, the development of a much larger genotyping array of informative SNPs is feasible. SNPs identified in this study to cause non-synonymous amino acid substitutions can also be utilized to directly identify causal genes in association studies.  相似文献   

2.
3.

Introduction

Vitamin D deficiency has been reported to be common in patients with rheumatoid arthritis (RA) who have a higher prevalence of osteoporosis and hip fracture than healthy individuals. Genetic variants affecting serum 25-hydroxyvitamin D (25(OH)D) concentration, an indicator of vitamin D status, were recently identified by genome-wide association studies of Caucasian populations. The purpose of this study was to validate the association and to test whether the serum 25(OH)D-linked genetic variants were associated with the occurrence of hip fracture in Japanese RA patients.

Methods

DNA samples of 1,957 Japanese RA patients were obtained from the Institute of Rheumatology, Rheumatoid Arthritis (IORRA) cohort DNA collection. First, five single nucleotide polymorphisms (SNPs) that were reported to be associated with serum 25(OH)D concentration by genome-wide association studies were genotyped. The SNPs that showed a significant association with serum 25(OH)D level in the cross-sectional study were used in the longitudinal analysis of hip fracture risk. The genetic risk for hip fracture was determined by a multivariate Cox proportional hazards model in 1,957 patients with a maximum follow-up of 10 years (median, 8 years).

Results

Multivariate linear regression analyses showed that rs2282679 in GC (the gene encoding group-specific component (vitamin D binding protein)) locus was significantly associated with lower serum 25(OH)D concentration (P = 8.1 × 10-5). A Cox proportional hazards model indicated that rs2282679 in GC was significantly associated with the occurrence of hip fracture in a recessive model (hazard ratio (95% confidence interval) = 2.52 (1.05-6.05), P = 0.039).

Conclusions

A two-staged analysis demonstrated that rs2282679 in GC was associated with serum 25(OH)D concentration and could be a risk factor for hip fracture in Japanese RA patients.  相似文献   

4.

Background

As availability of primary cells can be limited for genetic studies of human disease, lymphoblastoid cell lines (LCL) are common sources of genomic DNA. LCL are created in a transformation process that entails in vitro infection of human B-lymphocytes with the Epstein-Barr Virus (EBV).

Methodology/Principal Findings

To test for genotypic errors potentially induced by the Epstein-Barr Virus transformation process, we compared single nucleotide polymorphism (SNP) genotype calls in peripheral blood mononuclear cells (PBMC) and LCL from the same individuals. The average mismatch rate across 19 comparisons was 0.12% for SNPs with a population call rate of at least 95%, and 0.03% at SNPs with a call rate of at least 99%. Mismatch rates were not correlated across genotype subarrays run on all sample pairs.

Conclusions/Significance

Genotypic discrepancies found in PBMC and LCL pairs were not significantly different than control pairs, and were not correlated across subarrays. These results suggest that mismatch rates are minimal with stringent quality control, and that most genotypic discrepancies are due to technical artifacts rather than the EBV transformation process. Thus, LCL likely constitute a reliable DNA source for host genotype analysis.  相似文献   

5.
6.

Background

Numerous efforts have been made to elucidate the etiology and improve the treatment of lung cancer, but the overall five-year survival rate is still only 15%. Although cigarette smoking is the primary risk factor for lung cancer, only 7% of female lung cancer patients in Taiwan have a history of smoking. Since cancer results from progressive accumulation of genetic aberrations, genomic rearrangements may be early events in carcinogenesis.

Results

In order to identify biomarkers of early-stage adenocarcinoma, the genome-wide DNA aberrations of 60 pairs of lung adenocarcinoma and adjacent normal lung tissue in non-smoking women were examined using Affymetrix Genome-Wide Human SNP 6.0 arrays. Common copy number variation (CNV) regions were identified by ≥30% of patients with copy number beyond 2 ± 0.5 of copy numbers for each single nucleotide polymorphism (SNP) and at least 100 continuous SNP variant loci. SNPs associated with lung adenocarcinoma were identified by McNemar’s test. Loss of heterozygosity (LOH) SNPs were identified in ≥18% of patients with LOH in the locus. Aberration of SNP rs10248565 at HDAC9 in chromosome 7p21.1 was identified from concurrent analyses of CNVs, SNPs, and LOH.

Conclusion

The results elucidate the genetic etiology of lung adenocarcinoma by demonstrating that SNP rs10248565 may be a potential biomarker of cancer susceptibility.  相似文献   

7.

Background

Large epidemiologic studies have the potential to make valuable contributions to the assessment of gene-environment interactions because they prospectively collected detailed exposure data. Some of these studies, however, have only serum or plasma samples as a low quantity source of DNA.

Methods

We examined whether DNA isolated from serum can be used to reliably and accurately genotype single nucleotide polymorphisms (SNPs) using Sequenom multiplex SNP genotyping technology. We genotyped 81 SNPs using samples from 158 participants in the NYU Women’s Health Study. Each participant had DNA from serum and at least one paired DNA sample isolated from a high quality source of DNA, i.e. clots and/or cell precipitates, for comparison.

Results

We observed that 60 of the 81 SNPs (74%) had high call frequencies (≥95%) using DNA from serum, only slightly lower than the 85% of SNPs with high call frequencies in DNA from clots or cell precipitates. Of the 57 SNPs with high call frequencies for serum, clot, and cell precipitate DNA, 54 (95%) had highly concordant (>98%) genotype calls across all three sample types. High purity was not a critical factor to successful genotyping.

Conclusions

Our results suggest that this multiplex SNP genotyping method can be used reliably on DNA from serum in large-scale epidemiologic studies.  相似文献   

8.

Background

Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set.

Results

We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6).

Conclusions

Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Genome-wide profiling of single-nucleotide polymorphisms is receiving increasing attention as a method of pre-implantation genetic diagnosis in humans and of commercial genotyping of pre-transfer embryos in cattle. However, the very small quantity of genomic DNA in biopsy material from early embryos poses daunting technical challenges. A reliable whole-genome amplification (WGA) procedure would greatly facilitate the procedure.

Results

Several PCR-based and non-PCR based WGA technologies, namely multiple displacement amplification, quasi-random primed library synthesis followed by PCR, ligation-mediated PCR, and single-primer isothermal amplification were tested in combination with different DNA extractions protocols for various quantities of genomic DNA inputs. The efficiency of each method was evaluated by comparing the genotypes obtained from 15 cultured cells (representative of an embryonic biopsy) to unamplified reference gDNA. The gDNA input, gDNA extraction method and amplification technology were all found to be critical for successful genome-wide genotyping. The selected WGA platform was then tested on embryo biopsies (n = 226), comparing their results to that of biopsies collected after birth. Although WGA inevitably leads to a random loss of information and to the introduction of erroneous genotypes, following genomic imputation the resulting genetic index of both sources of DNA were highly correlated (r = 0.99, P<0.001).

Conclusion

It is possible to generate high-quality DNA in sufficient quantities for successful genome-wide genotyping starting from an early embryo biopsy. However, imputation from parental and population genotypes is a requirement for completing and correcting genotypic data. Judicious selection of the WGA platform, careful handling of the samples and genomic imputation together, make it possible to perform extremely reliable genomic evaluations for pre-transfer embryos.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-889) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

The sensitivity of genome-wide association studies for the detection of quantitative trait loci (QTL) depends on the density of markers examined and the statistical models used. This study compares the performance of three marker densities to refine six previously detected QTL regions for mastitis traits: 54 k markers of a medium-density SNP (single nucleotide polymorphism) chip (MD), imputed 777 k markers of a high-density SNP chip (HD), and imputed whole-genome sequencing data (SEQ). Each dataset contained data for 4496 Danish Holstein cattle. Comparisons were performed using a linear mixed model (LM) and a Bayesian variable selection model (BVS).

Results

After quality control, 587, 7825, and 78 856 SNPs in the six targeted regions remained for MD, HD, and SEQ data, respectively. In general, the association patterns between SNPs and traits were similar for the three marker densities when tested using the same statistical model. With the LM model, 120 (MD), 967 (HD), and 7209 (SEQ) SNPs were significantly associated with mastitis, whereas with the BVS model, 43 (MD), 131 (HD), and 1052 (SEQ) significant SNPs (Bayes factor > 3.2) were observed. A total of 26 (MD), 75 (HD), and 465 (SEQ) significant SNPs were identified by both models. In addition, one, 16, and 33 QTL peaks for MD, HD, and SEQ data were detected according to the QTL intensity profile of SNP bins by post-analysis of the BVS model.

Conclusions

The power to detect significant associations increased with increasing marker density. The BVS model resulted in clearer boundaries between linked QTL than the LM model. Using SEQ data, the six targeted regions were refined to 33 candidate QTL regions for udder health. The comparison between these candidate QTL regions and known genes suggested that NPFFR2, SLC4A4, DCK, LIFR, and EDN3 may be considered as candidate genes for mastitis susceptibility.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0129-1) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

To better understand the genetic determination of udder health, we performed a genome-wide association study (GWAS) on a population of 2354 German Holstein bulls for which daughter yield deviations (DYD) for somatic cell score (SCS) were available. For this study, we used genetic information of 44 576 informative single nucleotide polymorphisms (SNPs) and 11 725 inferred haplotype blocks.

Results

When accounting for the sub-structure of the analyzed population, 16 SNPs and 10 haplotypes in six genomic regions were significant at the Bonferroni threshold of P ≤ 1.14 × 10-6. The size of the identified regions ranged from 0.05 to 5.62 Mb. Genomic regions on chromosomes 5, 6, 18 and 19 coincided with known QTL affecting SCS, while additional genomic regions were found on chromosomes 13 and X. Of particular interest is the region on chromosome 6 between 85 and 88 Mb, where QTL for mastitis traits and significant SNPs for SCS in different Holstein populations coincide with our results. In all identified regions, except for the region on chromosome X, significant SNPs were present in significant haplotypes. The minor alleles of identified SNPs on chromosomes 18 and 19, and the major alleles of SNPs on chromosomes 6 and X were favorable for a lower SCS. Differences in somatic cell count (SCC) between alternative SNP alleles reached 14 000 cells/mL.

Conclusions

The results support the polygenic nature of the genetic determination of SCS, confirm the importance of previously reported QTL, and provide evidence for the segregation of additional QTL for SCS in Holstein cattle. The small size of the regions identified here will facilitate the search for causal genetic variations that affect gene functions.  相似文献   

12.

Background

Twin studies have shown that anxiety in a general population sample of children involves both domain-general and trait-specific genetic effects. For this reason, in an attempt to identify genes responsible for these effects, we investigated domain-general and trait-specific genetic associations in the first genome-wide association (GWA) study on anxiety-related behaviours (ARBs) in childhood.

Methods

The sample included 2810 7-year-olds drawn from the Twins Early Development Study (TEDS) with data available for parent-rated anxiety and genome-wide DNA markers. The measure was the Anxiety-Related Behaviours Questionnaire (ARBQ), which assesses four anxiety traits and also yields a general anxiety composite. Affymetrix GeneChip 6.0 DNA arrays were used to genotype nearly 700,000 single-nucleotide polymorphisms (SNPs), and IMPUTE v2 was used to impute more than 1 million SNPs. Several GWA associations from this discovery sample were followed up in another TEDS sample of 4804 children. In addition, Genome-wide Complex Trait Analysis (GCTA) was used on the discovery sample, to estimate the total amount of variance in ARBs that can be accounted for by SNPs on the array.

Results

No SNP associations met the demanding criterion of genome-wide significance that corrects for multiple testing across the genome (p<5×10−8). Attempts to replicate the top associations did not yield significant results. In contrast to the substantial twin study estimates of heritability which ranged from 0.50 (0.03) to 0.61 (0.01), the GCTA estimates of phenotypic variance accounted for by the SNPs were much lower 0.01 (0.11) to 0.19 (0.12).

Conclusions

Taken together, these GWAS and GCTA results suggest that anxiety – similar to height, weight and intelligence − is affected by many genetic variants of small effect, but unlike these other prototypical polygenic traits, genetic influence on anxiety is not well tagged by common SNPs.  相似文献   

13.

Background

A low birth weight has been extensively related to poor adult health outcomes. Birth weight can be seen as a proxy for environmental conditions during prenatal development. Identical twin pairs discordant for birth weight provide an extraordinary model for investigating the association between birth weight and adult life health while controlling for not only genetics but also postnatal rearing environment. We performed an epigenome-wide profiling on blood samples from 150 pairs of adult monozygotic twins discordant for birth weight to look for molecular evidence of epigenetic signatures in association with birth weight discordance.

Results

Our association analysis revealed no CpG site with genome-wide statistical significance (FDR < 0.05) for either qualitative (larger or smaller) or quantitative discordance in birth weight. Even with selected samples of extremely birth weight discordant twin pairs, no significant site was found except for 3 CpGs that displayed age-dependent intra-pair differential methylation with FDRs 0.014 (cg26856578, p = 3.42e-08), 0.0256 (cg15122603, p = 1.25e-07) and 0.0258 (cg16636641, p = 2.05e-07). Among the three sites, intra-pair differential methylation increased with age for cg26856578 but decreased with age for cg15122603 and cg16636641. There was no genome-wide statistical significance for sex-dependent effects on intra-pair differential methylation in either the whole samples or the extremely discordant twins.

Conclusions

Genome-wide DNA methylation profiling did not reveal epigenetic signatures of birth weight discordance although some sites displayed age-dependent intra-pair differential methylation in the extremely discordant twin pairs.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1062) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Plasmodium falciparum resistance to artemisinins, the first line treatment for malaria worldwide, has been reported in western Cambodia. Resistance is characterized by significantly delayed clearance of parasites following artemisinin treatment. Artemisinin resistance has not previously been reported in Myanmar, which has the highest falciparum malaria burden among Southeast Asian countries.

Methods

A non-randomized, single-arm, open-label clinical trial of artesunate monotherapy (4 mg/kg daily for seven days) was conducted in adults with acute blood-smear positive P. falciparum malaria in Kawthaung, southern Myanmar. Parasite density was measured every 12 hours until two consecutive negative smears were obtained. Participants were followed weekly at the study clinic for three additional weeks. Co-primary endpoints included parasite clearance time (the time required for complete clearance of initial parasitemia), parasite clearance half-life (the time required for parasitemia to decrease by 50% based on the linear portion of the parasite clearance slope), and detectable parasitemia 72 hours after commencement of artesunate treatment. Drug pharmacokinetics were measured to rule out delayed clearance due to suboptimal drug levels.

Results

The median (range) parasite clearance half-life and time were 4.8 (2.1–9.7) and 60 (24–96) hours, respectively. The frequency distributions of parasite clearance half-life and time were bimodal, with very slow parasite clearance characteristic of the slowest-clearing Cambodian parasites (half-life longer than 6.2 hours) in approximately 1/3 of infections. Fourteen of 52 participants (26.9%) had a measurable parasitemia 72 hours after initiating artesunate treatment. Parasite clearance was not associated with drug pharmacokinetics.

Conclusions

A subset of P. falciparum infections in southern Myanmar displayed markedly delayed clearance following artemisinin treatment, suggesting either emergence of artemisinin resistance in southern Myanmar or spread to this location from its site of origin in western Cambodia. Resistance containment efforts are underway in Myanmar.

Trial Registration

Australian New Zealand Clinical Trials Registry ACTRN12610000896077  相似文献   

15.

Background

Single nucleotide polymorphism (SNP) markers have a wide range of applications in crop genetics and genomics. Due to their polyploidy nature, many important crops, such as wheat, cotton and rapeseed contain a large amount of repeat and homoeologous sequences in their genomes, which imposes a huge challenge in high-throughput genotyping with sequencing and/or array technologies. Allotetraploid Brassica napus (AACC, 2n = 4x = 38) comprises of two highly homoeologous sub-genomes derived from its progenitor species B. rapa (AA, 2n = 2x = 20) and B. oleracea (CC, 2n = 2x = 18), and is an ideal species to exploit methods for reducing the interference of extensive inter-homoeologue polymorphisms (mHemi-SNPs and Pseudo-simple SNPs) between closely related sub-genomes.

Results

Based on a recent B. napus 6K SNP array, we developed a bi-filtering procedure to identify unauthentic lines in a DH population, and mHemi-SNPs and Pseudo-simple SNPs in an array data matrix. The procedure utilized both monomorphic and polymorphic SNPs in the DH population and could effectively distinguish the mHemi-SNPs and Pseudo-simple SNPs that resulted from superposition of the signals from multiple SNPs. Compared with conventional procedure for array data processing, the bi-filtering method could minimize the pseudo linkage relationship caused by the mHemi-SNPs and Pseudo-simple SNPs, thus improving the quality of SNP genetic map. Furthermore, the improved genetic map could increase the accuracies of mapping of QTLs as demonstrated by the ability to eliminate non-real QTLs in the mapping population.

Conclusions

The bi-filtering analysis of the SNP array data represents a novel approach to effectively assigning the multi-loci SNP genotypes in polyploid B. napus and may find wide applications to SNP analyses in polyploid crops.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1559-4) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.

Objectives

Genome-wide association studies have facilitated the identification of over 30 susceptibility loci for rheumatoid arthritis (RA). However, evidence for a number of potential susceptibility genes have not so far reached genome-wide significance in studies of Caucasian RA.

Methods

A cohort of 4286 RA patients from across Europe and 5642 population matched controls were genotyped for 25 SNPs, then combined in a meta-analysis with previously published data.

Results

Significant evidence of association was detected for nine SNPs within the European samples. When meta-analysed with previously published data, 21 SNPs were associated with RA susceptibility. Although SNPs in the PTPN2 gene were previously reported to be associated with RA in both Japanese and European populations, we show genome-wide evidence for a different SNP within this gene associated with RA susceptibility in an independent European population (rs7234029, P = 4.4×10−9).

Conclusions

This study provides further genome-wide evidence for the association of the PTPN2 locus (encoding the T cell protein tyrosine phosphastase) with Caucasian RA susceptibility. This finding adds to the growing evidence for PTPN2 being a pan-autoimmune susceptibility gene.  相似文献   

18.

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.  相似文献   

19.

Background

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.

Methods

Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.

Results

Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.

Conclusions

Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.  相似文献   

20.

Background

A large single nucleotide polymorphism (SNP) dataset was used to analyze genome-wide diversity in a diverse collection of watermelon cultivars representing globally cultivated, watermelon genetic diversity. The marker density required for conducting successful association mapping depends on the extent of linkage disequilibrium (LD) within a population. Use of genotyping by sequencing reveals large numbers of SNPs that in turn generate opportunities in genome-wide association mapping and marker-assisted selection, even in crops such as watermelon for which few genomic resources are available. In this paper, we used genome-wide genetic diversity to study LD, selective sweeps, and pairwise FST distributions among worldwide cultivated watermelons to track signals of domestication.

Results

We examined 183 Citrullus lanatus var. lanatus accessions representing domesticated watermelon and generated a set of 11,485 SNP markers using genotyping by sequencing. With a diverse panel of worldwide cultivated watermelons, we identified a set of 5,254 SNPs with a minor allele frequency of ≥ 0.05, distributed across the genome. All ancestries were traced to Africa and an admixture of various ancestries constituted secondary gene pools across various continents. A sliding window analysis using pairwise FST values was used to resolve selective sweeps. We identified strong selection on chromosomes 3 and 9 that might have contributed to the domestication process. Pairwise analysis of adjacent SNPs within a chromosome as well as within a haplotype allowed us to estimate genome-wide LD decay. LD was also detected within individual genes on various chromosomes. Principal component and ancestry analyses were used to account for population structure in a genome-wide association study. We further mapped important genes for soluble solid content using a mixed linear model.

Conclusions

Information concerning the SNP resources, population structure, and LD developed in this study will help in identifying agronomically important candidate genes from the genomic regions underlying selection and for mapping quantitative trait loci using a genome-wide association study in sweet watermelon.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-767) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号