首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets.  相似文献   

2.

Background

The most efficient method to maintain genetic diversity in populations under conservation programmes is to optimize, for each potential parent, the number of offspring left to the next generation by minimizing the global coancestry. Coancestry is usually calculated from genealogical data but molecular markers can be used to replace genealogical coancestry with molecular coancestry. Recent studies showed that optimizing contributions based on coancestry calculated from a large number of SNP markers can maintain higher levels of diversity than optimizing contributions based on genealogical data. In this study, we investigated how SNP density and effective population size impact the use of molecular coancestry to maintain diversity.

Results

At low SNP densities, the genetic diversity maintained using genealogical coancestry for optimization was higher than that maintained using molecular coancestry. The performance of molecular coancestry improved with increasing marker density, and, for the scenarios evaluated, it was as efficient as genealogical coancestry if SNP density reached at least 3 times the effective population size.However, increasing SNP density resulted in reduced returns in terms of maintained diversity. While a benefit of 12% was achieved when marker density increased from 10 to 100 SNP/Morgan, the benefit was only 2% when it increased from 100 to 500 SNP/Morgan.

Conclusions

The marker density of most SNP chips already available for farm animals is sufficient for molecular coancestry to outperform genealogical coancestry in conservation programmes aimed at maintaining genetic diversity. For the purpose of effectively maintaining genetic diversity, a marker density of around 500 SNPs/Morgan can be considered as the most cost effective density when developing SNP chips for new species. Since the costs to develop SNP chips are decreasing, chips with 500 SNPs/Morgan should become available in a short-term horizon for non domestic species.  相似文献   

3.

Background  

In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets.  相似文献   

4.

Background  

Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only.  相似文献   

5.

Background

Ziziphus Mill. (jujube), the most valued genus of Rhamnaceae, comprises of a number of economically and ecologically important species such as Z. jujuba Mill., Z. acidojujuba Cheng et Liu and Z. mauritiana Lam. Single nucleotide polymorphism (SNP) markers and a high-density genetic map are of great benefit to the improvement of the crop, mapping quantitative trait loci (QTL) and analyzing genome structure. However, such a high-density map is still absent in the genus Ziziphus and even the family Rhamnaceae. The recently developed restriction-site associated DNA (RAD) marker has been proven to be most powerful in genetic map construction. The objective of this study was to construct a high-density linkage map using the RAD tags generated by next generation sequencing.

Results

An interspecific F1 population and their parents (Z. jujuba Mill. ‘JMS2’ × Z. acidojujuba Cheng et Liu ‘Xing 16’) were genotyped using a mapping-by-sequencing approach, to generate RAD-based SNP markers. A total of 42,784 putative high quality SNPs were identified between the parents and 2,872 high-quality RAD markers were grouped in genetic maps. Of the 2,872 RAD markers, 1,307 were linked to the female genetic map, 1,336 to the male map, and 2,748 to the integrated map spanning 913.87 centi-morgans (cM) with an average marker interval of 0.34 cM. The integrated map contained 12 linkage groups (LGs), consistent with the haploid chromosome number of the two parents.

Conclusion

We first generated a high-density genetic linkage map with 2,748 RAD markers for jujube and a large number of SNPs were also developed. It provides a useful tool for both marker-assisted breeding and a variety of genome investigations in jujube, such as sequence assembly, gene localization, QTL detection and genome structure comparison.  相似文献   

6.

Background

Restriction-site associated DNA sequencing (RADseq) technology was recently employed to identify a large number of single nucleotide polymorphisms (SNP) for linkage mapping of a North American and Eastern Asian Populus species. However, there is also the need for high-density genetic linkage maps for the European aspen (P. tremula) as a tool for further mapping of quantitative trait loci (QTLs) and marker-assisted selection of the Populus species native to Europe.

Results

We established a hybrid F1 population from the cross of two aspen parental genotypes diverged in their phenological and morphological traits. We performed RADseq of 122 F1 progenies and two parents yielding 15,732 high-quality SNPs that were successfully identified using the reference genome of P. trichocarpa. 2055 SNPs were employed for the construction of maternal and paternal linkage maps. The maternal linkage map was assembled with 1000 SNPs, containing 19 linkage groups and spanning 3054.9 cM of the genome, with an average distance of 3.05 cM between adjacent markers. The paternal map consisted of 1055 SNPs and the same number of linkage groups with a total length of 3090.56 cM and average interval distance of 2.93 cM. The linkage maps were employed for QTL mapping of one-year-old seedlings height variation. The most significant QTL (LOD = 5.73) was localized to LG5 (96.94 cM) of the male linkage map, explaining 18% of the phenotypic variation.

Conclusions

The set of 15,732 SNPs polymorphic in aspen and high-density genetic linkage maps constructed for the P. tremula intra-specific cross will provide a valuable source for QTL mapping and identification of candidate genes facilitating marker-assisted selection in European aspen.
  相似文献   

7.

Key message

A new time- and cost-effective strategy was developed for medium-density SNP genotyping of rice biparental populations, using GoldenGate assays based on parental resequencing.

Abstract

Since the advent of molecular markers, crop researchers and breeders have dedicated huge amounts of effort to detecting quantitative trait loci (QTL) in biparental populations for genetic analysis and marker-assisted selection (MAS). In this study, we developed a new time- and cost-effective strategy for genotyping a population of progeny from a rice cross using medium-density single nucleotide polymorphisms (SNPs). Using this strategy, 728,362 “high quality” SNPs were identified by resequencing Teqing and Lemont, the parents of the population. We selected 384 informative SNPs that were evenly distributed across the genome for genotyping the biparental population using the Illumina GoldenGate assay. 335 (87.2 %) validated SNPs were used for further genetic analyses. After removing segregation distortion markers, 321 SNPs were used for linkage map construction and QTL mapping. This strategy generated SNP markers distributed more evenly across the genome than previous SSR assays. Taking the GW5 gene that controls grain shape as an example, our strategy provided higher accuracy (0.8 Mb) and significance (LOD 5.5 and 10.1) in QTL mapping than SSR analysis. Our study thus provides a rapid and efficient strategy for genetic studies and QTL mapping using SNP genotyping assays.  相似文献   

8.
Dong C  Qian Z  Jia P  Wang Y  Huang W  Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions.  相似文献   

9.

Background  

Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.  相似文献   

10.

Background

There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C).

Methodology/Principal Findings

A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates).

Conclusions/Significance

This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome.  相似文献   

11.

Background  

We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs.  相似文献   

12.
13.

Background

Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.

Results

We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0∼1%.

Conclusions

Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing.  相似文献   

14.

Background  

Single Nucleotide Polymorphisms (SNPs) are an increasingly important tool for genetic and biomedical research. Although current genomic databases contain information on several million SNPs and are growing at a very fast rate, the true value of a SNP in this context is a function of the quality of the annotations that characterize it. Retrieving and analyzing such data for a large number of SNPs often represents a major bottleneck in the design of large-scale association studies.  相似文献   

15.
A set of EST-SNPs for map saturation and cultivar identification in melon   总被引:2,自引:0,他引:2  

Background

There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs.

Results

EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars.

Conclusion

This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon accessions as efficiently as with SSR markers, and these markers may also be useful for cultivar identification in Occidental melon varieties.  相似文献   

16.

Background

Recent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. There is increasing recognition of the importance of SNPs for medicine and developmental biology. However, SNP data set typically has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease data set) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data.

Results

In this paper, we use shrunken dissimilarity measure to analyze and select relevant SNPs for classification problems. Examples of HapMap data and Parkinson disease (PD) data are given to demonstrate the effectiveness of the proposed method, and illustrate it has a potential to become a useful analysis tool for SNP data sets. We use Parkinson disease data as an example, and perform a whole genome analysis. For the 367440 SNPs with less than 1% missing percentage from all 22 chromosomes, we can select 357 SNPs from this data set. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that has a similarity value being greater than a threshold are selected to construct several groups of genes. For the SNPs that involved in these groups of genes, a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and SNPs with significance of P < 0.01 are chosen to identify SNPs networks based on their P values. Here SNPs networks are constructed based on Gene Ontology knowledge, and therefore each SNP network plays a role in the biological process. An analysis shows that such networks have relationships directly or indirectly to Parkinson disease.

Conclusions

Experimental results show that our approach is suitable to handle genetic variations, and provide useful knowledge in a genome-wide SNP study.
  相似文献   

17.

Background

The identification of disease-associated genes using single nucleotide polymorphisms (SNPs) has been increasingly reported. In particular, the Affymetrix Mapping 10 K SNP microarray platform uses one PCR primer to amplify the DNA samples and determine the genotype of more than 10,000 SNPs in the human genome. This provides the opportunity for large scale, rapid and cost-effective genotyping assays for linkage analysis. However, the analysis of such datasets is nontrivial because of the large number of markers, and visualizing the linkage scores in the context of genome maps remains less automated using the current linkage analysis software packages. For example, the haplotyping results are commonly represented in the text format.

Results

Here we report the development of a novel software tool called CompareLinkage for automated formatting of the Affymetrix Mapping 10 K genotype data into the "Linkage" format and the subsequent analysis with multi-point linkage software programs such as Merlin and Allegro. The new software has the ability to visualize the results for all these programs in dChip in the context of genome annotations and cytoband information. In addition we implemented a variant of the Lander-Green algorithm in the dChipLinkage module of dChip software (V1.3) to perform parametric linkage analysis and haplotyping of SNP array data. These functions are integrated with the existing modules of dChip to visualize SNP genotype data together with LOD score curves. We have analyzed three families with recessive and dominant diseases using the new software programs and the comparison results are presented and discussed.

Conclusions

The CompareLinkage and dChipLinkage software packages are freely available. They provide the visualization tools for high-density oligonucleotide SNP array data, as well as the automated functions for formatting SNP array data for the linkage analysis programs Merlin and Allegro and calling these programs for linkage analysis. The results can be visualized in dChip in the context of genes and cytobands. In addition, a variant of the Lander-Green algorithm is provided that allows parametric linkage analysis and haplotyping.  相似文献   

18.

Background

We analyzed 143 pedigrees (364 nuclear families) in the Collaborative Study on the Genetics of Alcoholism (COGA) data provided to the participants in the Genetic Analysis Workshop 14 (GAW14) with the goal of comparing results obtained from genome linkage analysis using microsatellite and with results obtained using SNP markers for two measures of alcoholism (maximum number of drinks -MAXDRINK and an electrophysiological measure from EEG -TTTH1). First, we constructed haplotype blocks by using the entire set of single-nucleotide polymorphisms (SNP) in chromosomes 1, 4, and 7. These chromosomes have shown linkage signals for MAXDRINK or EEG-TTTH1 in previous reports. Second, we randomly selected one, two, three, four, and five SNPs from each block (referred to as Rep1 – Rep5, respectively) to conduct linkage analysis using variance component approach. Finally, results of all SNP analyses were compared with those obtained using microsatellite markers.

Results

The LOD scores obtained from SNPs were slightly higher but the curves were not radically different from those obtained from microsatellite analyses. The peaks of linkage regions from SNP sets were slightly shifted to the left when compared to those from microsatellite markers. The reduced sets of SNPs provide signals in the same linkage regions but with a smaller LOD score suggesting a significant impact of the decrease in information content on linkage results. The widths of 1 LOD support interval of linkage regions from SNP sets were smaller when compared to those of microsatellite markers. However, two linkage regions obtained from the microsatellite linkage analysis on chromosome 7 for LOG of TTTH1 were not detected in the SNP based analyses.

Conclusion

The linkage results from SNPs showed narrower linkage regions and slightly higher LOD scores when compared to those of microsatellite markers. The different builds of the genetic maps used in microsatellite and SNPs markers or/and errors in genotyping may account for the microsatellite linkage signals on chromosome 7 that were not identified using SNPs. Also, unresolved map issues between SNPs and microsatellite markers may be partly responsible for the shifted linkage peaks when comparing the two types of markers.
  相似文献   

19.
Although numerous linkage maps have been constructed in the genus Populus, they are typically sparse and thus have limited applications due to low throughput of traditional molecular markers. Restriction-site associated DNA sequencing (RADSeq) technology allows us to identify a large number of single nucleotide polymorphisms (SNP) across genomes of many individuals in a fast and cost-effective way, and makes it possible to construct high-density genetic linkage maps. We performed RADSeq for 299 progeny and their two parents in an F1 hybrid population generated by crossing the female Populus deltoides ‘I-69’ and male Populus simonii ‘L3’. A total of 2,545 high quality SNP markers were obtained and two parent-specific linkage maps were constructed. The female genetic map contained 1601 SNPs and 20 linkage groups, spanning 4,249.12 cM of the genome with an average distance of 2.69 cM between adjacent markers, while the male map consisted of 940 SNPs and also 20 linkage groups with a total length of 3,816.24 cM and an average marker interval distance of 4.15 cM. Finally, our analysis revealed that synteny and collinearity are highly conserved between the parental linkage maps and the reference genome of P. trichocarpa. We demonstrated that RAD sequencing is a powerful technique capable of rapidly generating a large number of SNPs for constructing genetic maps in outbred forest trees. The high-quality linkage maps constructed here provided reliable genetic resources to facilitate locating quantitative trait loci (QTLs) that control growth and wood quality traits in the hybrid population.  相似文献   

20.

Background  

Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r 2 LD statistic have gained popularity because r 2 is directly related to statistical power to detect disease associations. Most of existing r 2 based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号