首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

2.
Estimating haplotype frequencies becomes increasingly important in the mapping of complex disease genes, as millions of single nucleotide polymorphisms (SNPs) are being identified and genotyped. When genotypes at multiple SNP loci are gathered from unrelated individuals, haplotype frequencies can be accurately estimated using expectation-maximization (EM) algorithms (Excoffier and Slatkin, 1995; Hawley and Kidd, 1995; Long et al., 1995), with standard errors estimated using bootstraps. However, because the number of possible haplotypes increases exponentially with the number of SNPs, handling data with a large number of SNPs poses a computational challenge for the EM methods and for other haplotype inference methods. To solve this problem, Niu and colleagues, in their Bayesian haplotype inference paper (Niu et al., 2002), introduced a computational algorithm called progressive ligation (PL). But their Bayesian method has a limitation on the number of subjects (no more than 100 subjects in the current implementation of the method). In this paper, we propose a new method in which we use the same likelihood formulation as in Excoffier and Slatkin's EM algorithm and apply the estimating equation idea and the PL computational algorithm with some modifications. Our proposed method can handle data sets with large number of SNPs as well as large numbers of subjects. Simultaneously, our method estimates standard errors efficiently, using the sandwich-estimate from the estimating equation, rather than the bootstrap method. Additionally, our method admits missing data and produces valid estimates of parameters and their standard errors under the assumption that the missing genotypes are missing at random in the sense defined by Rubin (1976).  相似文献   

3.
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

4.
Single-nucleotide polymorphisms in soybean   总被引:36,自引:0,他引:36  
  相似文献   

5.
Single nucleotide polymorphism (SNP) markers have become a genetic technology of choice because of their automation and high precision of allele calls. In this study, our goal was to develop 94 SNPs and test them across well-chosen common bean (Phaseolus vulgaris L.) germplasm. We validated and accessed SNP diversity at 84 gene-based and 10 non-genic loci using KASPar technology in a panel of 70 genotypes that have been used as parents of mapping populations and have been previously evaluated for SSRs. SNPs exhibited high levels of genetic diversity, an excess of middle frequency polymorphism, and a within-genepool mismatch distribution as expected for populations affected by sudden demographic expansions after domestication bottlenecks. This set of markers was useful for distinguishing Andean and Mesoamerican genotypes but less useful for distinguishing within each gene pool. In summary, slightly greater polymorphism and race structure was found within the Andean gene pool than within the Mesoamerican gene pool but polymorphism rate between genotypes was consistent with genepool and race identity. Our survey results represent a baseline for the choice of SNP markers for future applications because gene-associated SNPs could themselves be causative SNPs for traits. Finally, we discuss that the ideal genetic marker combination with which to carry out diversity, mapping and association studies in common bean should consider a mix of both SNP and SSR markers.  相似文献   

6.
Drought often delays developmental events so that plant height and above-ground biomass are reduced, resulting in yield loss due to inadequate photosynthate. In this study, plant height and biomass measured by the Normalized Difference Vegetation Index (NDVI) were used as criteria for drought tolerance. A total of 305 lines representing temperate, tropical and subtropical maize germplasm were genotyped using two single nucleotide polymorphism (SNP) chips each containing 1536 markers, from which 2052 informative SNPs and 386 haplotypes each constructed with two or more SNPs were used for linkage disequilibrium (LD) or association mapping. Single SNP- and haplotype-based LD mapping identified two significant SNPs and three haplotype loci [a total of four quantitative trait loci (QTL)] for plant height under well-watered and water-stressed conditions. For biomass, 32 SNPs and 12 haplotype loci (30 QTL) were identified using NDVIs measured at seven stages under the two water regimes. Some significant SNP and haplotype loci for NDVI were shared by different stages. Comparing significant loci identified by single SNP- and haplotype-based LD mapping, we found that six out of the 14 chromosomal regions defined by haplotype loci each included at least one significant SNP for the same trait. Significant SNP haplotype loci explained much higher phenotypic variation than individual SNPs. Moreover, we found that two significant SNPs (two QTL) and one haplotype locus were shared by plant height and NDVI. The results indicate the power of comparative LD mapping using single SNPs and SNP haplotypes with QTL shared by plant height and biomass as secondary traits for drought tolerance in maize.  相似文献   

7.
In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat.  相似文献   

8.
9.
Single nucleotide polymorphisms (SNPs) are predicted to supersede microsatellites as the marker of choice for population genetic studies in the near future. To date, however, very few studies have directly compared both marker systems in natural populations, particularly in non‐model organisms. In the present study, we compared the utility of SNPs and microsatellites for population genetic analysis of the red seaweed Chondrus crispus (Florideophyceae). Six SNP loci yielded very different patterns of intrapopulation genetic diversity compared to those obtained using seven moderately (mean 5.2 alleles) polymorphic microsatellite loci, although Bayesian clustering analysis gave largely congruent results between the two marker classes. A weak but significant pattern of isolation‐by‐distance was observed across scales from a few hundred metres to approximately 200 km using the combined SNP and microsatellite data set of 13 loci. Over larger scales, however, there was little correlation between genetic divergence and geographical distance. Our findings suggest that even a moderate number of SNPs is sufficient to determine patterns of genetic diversity across natural populations, and also highlight the fact that patterns of genetic variation in seaweeds arise through a complex interplay of short‐ and long‐term natural processes, as well as anthropogenic influence. © 2012 The Linnean Society of London, Biological Journal of the Linnean Society, 2012, 108 , 251–262.  相似文献   

10.
The manner in which organisms adapt to climate change informs a broader understanding of the evolution of biodiversity as well as conservation and mitigation plans. We apply common garden and association mapping approaches to quantify genetic variance and identify loci affecting bud flush and bud set, traits that define a tree's season for height growth, in the boreal forest tree Populus balsamifera L. (balsam poplar). Using data from 478 genotypes grown in each of two common gardens, one near the southern edge and another near the northern edge of P. balsamifera's range, we found that broad‐sense heritability for bud flush and bud set was generally high (H2 > 0.5 in most cases), suggesting that abundant genetic variation exists for phenological response to changes in the length of the growing season. To identify the molecular genetic basis of this variation, we genotyped trees for 346 candidate single nucleotide polymorphisms (SNPs) from 27 candidate genes for the CO/FT pathway in poplar. Mixed‐model analyses of variance identified SNPs in 10 genes to be associated with variation in either bud flush or bud set. Multiple SNPs within FRIGIDA were associated with bud flush, whereas multiple SNPs in LEAFY and GIGANTEA 5 were associated with bud set. Although there was strong population structure in stem phenology, the geographic distribution of multilocus association SNP genotypes was widespread except at the most northern populations, indicating that geographic regions may harbour sufficient diversity in functional genes to facilitate adaption to future climatic conditions in many sites.  相似文献   

11.
Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species.  相似文献   

12.
Rapid expansion of available data, both phenotypic and genotypic, for multiple strains of mice has enabled the development of new methods to interrogate the mouse genome for functional genetic perturbations. In silico mapping provides an expedient way to associate the natural diversity of phenotypic traits with ancestrally inherited polymorphisms for the purpose of dissecting genetic traits. In mouse, the current single nucleotide polymorphism (SNP) data have lacked the density across the genome and coverage of enough strains to properly achieve this goal. To remedy this, 470,407 allele calls were produced for 10,990 evenly spaced SNP loci across 48 inbred mouse strains. Use of the SNP set with statistical models that considered unique patterns within blocks of three SNPs as an inferred haplotype could successfully map known single gene traits and a cloned quantitative trait gene. Application of this method to high-density lipoprotein and gallstone phenotypes reproduced previously characterized quantitative trait loci (QTL). The inferred haplotype data also facilitates the refinement of QTL regions such that candidate genes can be more easily identified and characterized as shown for adenylate cyclase 7.  相似文献   

13.
Kim JJ  Kim HH  Park JH  Ryu HJ  Kim J  Moon S  Gu H  Kim HT  Lee JY  Han BG  Park C  Kimm K  Park CS  Lee JK  Oh B 《Immunogenetics》2005,57(9):636-643
Asthma is a chronic inflammatory disorder of the airways, and a number of genetic loci are associated with the disease. Candidate gene association studies have been regarded as effective tools to study complex traits. Knowledge of the sequence variation and structure of the candidate genes is required for association studies. Thus, we investigated the genetic variants of 32 asthma candidate genes selected by colocalization of positional and functional candidate genes. We screened all exons and promoter regions of those genes using 12 healthy individuals and 12 asthma patients and identified a total of 418 single nucleotide polymorphisms (SNPs), including 270 known SNPs and 148 novel SNPs. Levels of nucleotide diversity varied from gene to gene (0.72×10−4–14.53×10−4), but the average nucleotide diversity between coding SNPs (cSNPs) and noncoding SNPs was roughly equivalent (4.63×10−4 vs 4.69×10−4). However, nucleotide diversity of cSNPs was strongly correlated to codon degeneracy. Nucleotide diversity was much higher at fourfold degenerate sites than at nondegenerate sites (9.42×10−4 vs 3.14×10−4). Gene-based haplotype analysis of asthma-associated genes in this study revealed that common haplotypes (frequency >5%) represented 90.5% of chromosomes, and they could be uniquely identified with five or fewer haplotype-tagging SNPs per gene. Therefore, our results may have important implications for the selection of asthma candidate genes and SNP markers for comprehensive association studies using large sample populations.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation.  相似文献   

15.
EST (expressed sequence tags) sequencing, SNP (single nucleotide polymorphisms) development and haplotype assessment are powerful tools for the support of marker-assisted selection. The grapevine genome is currently being scavenged in our laboratory using an EST-SNP approach. Nine parental genotypes, used to create five inter- or intra-specific hybrids, have been tested to evaluate the degree of polymorphism between Vitis vinifera, Vitis riparia and a further intraspecific hybrid, measuring their nucleotide diversity. The SNPs were analysed on cDNA sequences of 4 functional classes of genes based on homology with genes present in a public database: sugar metabolism, cell signalling, anthocyanin metabolism and defence related. Primer pairs were deduced and used to amplify corresponding genomic sequences. Almost 12,000 bp of DNA have been scanned revealing differences among genotypes of up to 247 SNPs, with the highest rate of one SNP occurring every 78 bp when clones of different Vitis species are compared. Re-sequencing allowed the definition of haplotypes in the nine genotypes studied and these were confirmed by analysing segregating populations. The efficiency of SSCP, in comparison with re-sequencing, was considered for 25 gene fragments of the same 9 genotypes.these two authors contributed equally to this work  相似文献   

16.
Genomic tools are lacking for invasive and native populations of sea lamprey (Petromyzon marinus). Our objective was to discover single nucleotide polymorphism (SNP) loci to conduct pedigree analyses to quantify reproductive contributions of adult sea lampreys and dispersion of sibling larval sea lampreys of different ages in Great Lakes tributaries. Additional applications of data were explored using additional geographically expansive samples. We used restriction site‐associated DNA sequencing (RAD‐Seq) to discover genetic variation in Duffins Creek (DC), Ontario, Canada, and the St. Clair River (SCR), Michigan, USA. We subsequently developed RAD capture baits to genotype 3,446 RAD loci that contained 11,970 SNPs. Based on RAD capture assays, estimates of variance in SNP allele frequency among five Great Lakes tributary populations (mean FST 0.008; range 0.00–0.018) were concordant with previous microsatellite‐based studies; however, outlier loci were identified that contributed substantially to spatial population genetic structure. At finer scales within streams, simulations indicated that accuracy in genetic pedigree reconstruction was high when 200 or 500 independent loci were used, even in situations of high spawner abundance (e.g., 1,000 adults). Based on empirical collections of larval sea lamprey genotypes, we found that age‐1 and age‐2 families of full and half‐siblings were widely but nonrandomly distributed within stream reaches sampled. Using the genomic scale set of SNP loci developed in this study, biologists can rapidly genotype sea lamprey in non‐native and native ranges to investigate questions pertaining to population structuring and reproductive ecology at previously unattainable scales.  相似文献   

17.
Watermelon (Citrullus lanatus var. lanatus) is one of the most important vegetable crops in the world. Molecular markers have become the tools of choice for resolving watermelon taxonomic relationships and evolution. Increased numbers of single nucleotide polymorphism (SNP) markers together with simple sequence repeat (SSR) markers would be useful for phylogenetic analyses of germplasm accessions and for linkage mapping for marker-assisted breeding with quantitative trait loci and single genes. We aimed to construct a genetic map based on SNPs (generated by Illumina Veracode multiplex assays for genotyping) and SSR markers and evaluate relationships inferred from SNP genotypes between 130 watermelon accessions collected throughout the world. We incorporated 282 markers (232 SNPs and 50 SSRs) into the linkage map. The genetic map consisted of 11 linkage groups spanning 924.72 cM with an average distance of 3.28 cM between markers. Because all of the SNP-containing sequences were assembled with the whole-genome sequence draft for watermelon, chromosome numbers could be readily assigned for all the linkage groups. We found that 134 SNPs were polymorphic in 130 watermelon accessions chosen for diversity studies. The current 384-plex SNP set is a powerful tool for characterizing genetic relatedness and for developing medium-resolution genetic maps.  相似文献   

18.
We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.  相似文献   

19.

Background  

Whole genome association studies using highly dense single nucleotide polymorphisms (SNPs) are a set of methods to identify DNA markers associated with variation in a particular complex trait of interest. One of the main outcomes from these studies is a subset of statistically significant SNPs. Finding the potential biological functions of such SNPs can be an important step towards further use in human and agricultural populations (e.g., for identifying genes related to susceptibility to complex diseases or genes playing key roles in development or performance). The current challenge is that the information holding the clues to SNP functions is distributed across many different databases. Efficient bioinformatics tools are therefore needed to seamlessly integrate up-to-date functional information on SNPs. Many web services have arisen to meet the challenge but most work only within the framework of human medical research. Although we acknowledge the importance of human research, we identify there is a need for SNP annotation tools for other organisms.  相似文献   

20.
The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号