首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.  相似文献   

2.
3.
AutoSNP is a program to detect single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) in expressed sequence tag (EST) data. The program uses d2cluster and cap3 to cluster and align EST sequences, and uses redundancy to differentiate between candidate SNPs and sequence errors. Candidate polymorphisms are identified as occurring in multiple reads within an alignment. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co segregation of the candidate SNP with other SNPs in the alignment. AVAILABILITY: The program was written in PERL and is freely available to non-commercial users by request from the authors.  相似文献   

4.

Background

The most efficient method to maintain genetic diversity in populations under conservation programmes is to optimize, for each potential parent, the number of offspring left to the next generation by minimizing the global coancestry. Coancestry is usually calculated from genealogical data but molecular markers can be used to replace genealogical coancestry with molecular coancestry. Recent studies showed that optimizing contributions based on coancestry calculated from a large number of SNP markers can maintain higher levels of diversity than optimizing contributions based on genealogical data. In this study, we investigated how SNP density and effective population size impact the use of molecular coancestry to maintain diversity.

Results

At low SNP densities, the genetic diversity maintained using genealogical coancestry for optimization was higher than that maintained using molecular coancestry. The performance of molecular coancestry improved with increasing marker density, and, for the scenarios evaluated, it was as efficient as genealogical coancestry if SNP density reached at least 3 times the effective population size.However, increasing SNP density resulted in reduced returns in terms of maintained diversity. While a benefit of 12% was achieved when marker density increased from 10 to 100 SNP/Morgan, the benefit was only 2% when it increased from 100 to 500 SNP/Morgan.

Conclusions

The marker density of most SNP chips already available for farm animals is sufficient for molecular coancestry to outperform genealogical coancestry in conservation programmes aimed at maintaining genetic diversity. For the purpose of effectively maintaining genetic diversity, a marker density of around 500 SNPs/Morgan can be considered as the most cost effective density when developing SNP chips for new species. Since the costs to develop SNP chips are decreasing, chips with 500 SNPs/Morgan should become available in a short-term horizon for non domestic species.  相似文献   

5.
6.
目的 通过全基因组测序(whole genome sequencing,WGS)获得高密度单核苷酸多态性(single nucleotide polymorphism,SNP)分型数据,评估分型准确性,研究建立WGS数据用于法医SNP系谱推断的方法。方法 通过华大MGISEQ-200RS测序平台对样本进行深度为30×的WGS,从测序数据中提取Wegene GSA芯片中的645 199个常染色体SNP位点,质控过滤后运用IBS/IBD算法计算预测亲缘关系,并对样本的族群来源进行分析。结果 从测序数据中提取的SNP分型与Wegene GSA芯片分型的一致率大于99.62%。测序获得的SNP数据使用IBS算法可预测1~4级亲缘关系,4级亲缘预测置信区间准确性达100%,使用IBD算法可预测1~7级亲缘关系,7级亲缘预测为有亲缘关系的准确性达100%,通过高深度WGS数据获取的SNP系谱推断能力与芯片预测结果无显著差异。同时,WGS数据用于族群推断与调查结果一致。结论 WGS技术可应用于法医SNP系谱推断,为案件侦破提供线索。  相似文献   

7.
8.

Background

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.

Methods

Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.

Results

Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.

Conclusions

Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.  相似文献   

9.

Background  

DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.  相似文献   

10.
We performed linkage and linkage disequilibrium (LD) mapping analyses to compare the power between microsatellite and single nucleotide polymorphism (SNP) markers. Chromosome-wide analyses were performed for a quantitative electrophysiological phenotype, ttth1, on chromosome 7. Multipoint analysis of microsatellite markers using the variance component (VC) method showed the highest LOD score of 4.20 at 162 cM, near D7S509 (163.7 cM). Two-point analysis of SNPs using the VC method yielded the highest LOD score of 3.98 in the Illumina SNP data and 3.45 in the Affymetrix SNP data around 152-153 cM. In family-based single SNP and SNP haplotype LD analysis, we identified seven SNPs associated with ttth1. We searched for any potential candidate genes in the location of the seven SNPs. The SNPs rs1476640 and rs768055 are located in the FLJ40852 gene (a hypothetical protein), and SNP rs1859646 is located in the TAS2R5 gene (a taste receptor). The other four SNPs are not located in any known or annotated genes. We found the high density SNP scan to be superior to microsatellites because it is effective in downstream fine mapping due to a better defined linkage region. Our study proves the utility of high density SNP in genome-wide mapping studies.  相似文献   

11.
The rapid development and application of molecular marker assays have facilitated genomic selection and genome‐wide linkage and association studies in wheat breeding. Although PCR‐based markers (e.g. simple sequence repeats and functional markers) and genotyping by sequencing have contributed greatly to gene discovery and marker‐assisted selection, the release of a more accurate and complete bread wheat reference genome has resulted in the design of single‐nucleotide polymorphism (SNP) arrays based on different densities or application targets. Here, we evaluated seven types of wheat SNP arrays in terms of their SNP number, distribution, density, associated genes, heterozygosity and application. The results suggested that the Wheat 660K SNP array contained the highest percentage (99.05%) of genome‐specific SNPs with reliable physical positions. SNP density analysis indicated that the SNPs were almost evenly distributed across the whole genome. In addition, 229 266 SNPs in the Wheat 660K SNP array were located in 66 834 annotated gene or promoter intervals. The annotated genes revealed by the Wheat 660K SNP array almost covered all genes revealed by the Wheat 35K (97.44%), 55K (99.73%), 90K (86.9%) and 820K (85.3%) SNP arrays. Therefore, the Wheat 660K SNP array could act as a substitute for other 6 arrays and shows promise for a wide range of possible applications. In summary, the Wheat 660K SNP array is reliable and cost‐effective and may be the best choice for targeted genotyping and marker‐assisted selection in wheat genetic improvement.  相似文献   

12.
A popular way to represent clustered binary, count, or other data is via the generalized linear mixed model framework, which accommodates correlation through incorporation of random effects. A standard assumption is that the random effects follow a parametric family such as the normal distribution; however, this may be unrealistic or too restrictive to represent the data. We relax this assumption and require only that the distribution of random effects belong to a class of 'smooth' densities and approximate the density by the seminonparametric (SNP) approach of Gallant and Nychka (1987). This representation allows the density to be skewed, multi-modal, fat- or thin-tailed relative to the normal and includes the normal as a special case. Because an efficient algorithm to sample from an SNP density is available, we propose a Monte Carlo EM algorithm using a rejection sampling scheme to estimate the fixed parameters of the linear predictor, variance components and the SNP density. The approach is illustrated by application to a data set and via simulation.  相似文献   

13.
Genotyping sheep for genome‐wide SNPs at lower density and imputing to a higher density would enable cost‐effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low‐density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50–475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single‐breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%.  相似文献   

14.

Background  

A variety of diseases are caused by chromosomal abnormalities such as aneuploidies (having an abnormal number of chromosomes), microdeletions, microduplications, and uniparental disomy. High density single nucleotide polymorphism (SNP) microarrays provide information on chromosomal copy number changes, as well as genotype (heterozygosity and homozygosity). SNP array studies generate multiple types of data for each SNP site, some with more than 100,000 SNPs represented on each array. The identification of different classes of anomalies within SNP data has been challenging.  相似文献   

15.
High‐density SNP microarrays (“SNP chips”) are a rapid, accurate and efficient method for genotyping several hundred thousand polymorphisms in large numbers of individuals. While SNP chips are routinely used in human genetics and in animal and plant breeding, they are less widely used in evolutionary and ecological research. In this article, we describe the development and application of a high‐density Affymetrix Axiom chip with around 500,000 SNPs, designed to perform genomics studies of great tit (Parus major) populations. We demonstrate that the per‐SNP genotype error rate is well below 1% and that the chip can also be used to identify structural or copy number variation. The chip is used to explore the genetic architecture of exploration behaviour (EB), a personality trait that has been widely studied in great tits and other species. No SNPs reached genomewide significance, including at DRD4, a candidate gene. However, EB is heritable and appears to have a polygenic architecture. Researchers developing similar SNP chips may note: (i) SNPs previously typed on alternative platforms are more likely to be converted to working assays; (ii) detecting SNPs by more than one pipeline, and in independent data sets, ensures a high proportion of working assays; (iii) allele frequency ascertainment bias is minimized by performing SNP discovery in individuals from multiple populations; and (iv) samples with the lowest call rates tend to also have the greatest genotyping error rates.  相似文献   

16.
We report on the comparative utilities of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers for characterizing maize germplasm in terms of their informativeness, levels of missing data, repeatability and the ability to detect expected alleles in hybrids and DNA pools. Two different SNP chemistries were compared; single-base extension detected by Sequenom MassARRAY, and invasive cleavage detected by Invader chemistry with PCR. A total of 58 maize inbreds and four hybrids were genotyped with 80 SSR markers, 69 Invader SNP markers and 118 MassARRAY SNP markers, with 64 SNP loci being common to the two SNP marker chemistries. Average expected heterozygosity values were 0.62 for SSRs, 0.43 for SNPs (pre-selected for their high level of polymorphism) and 0.63 for the underlying sequence haplotypes. All individual SNP markers within the same set of sequences had an average expected heterozygosity value of 0.26. SNP marker data had more than a fourfold lower level of missing data (2.1-3.1%) compared with SSRs (13.8%). Data repeatability was higher for SNPs (98.1% for MassARRAY SNPs and 99.3% for Invader) than for SSRs (91.7%). Parental alleles were observed in hybrid genotypes in 97.0% of the cases for MassARRAY SNPs, 95.5% for Invader SNPs and 81.9% for SSRs. In pooled samples with mixtures of alleles, SSRs, MassARRAY SNPs and Invader SNPs were equally capable of detecting alleles at mid to high frequencies. However, at low frequencies, alleles were least likely to be detected using Invader SNP markers, and this technology had the highest level of missing data. Collectively, these results showed that SNP technologies can provide increased marker data quality and quantity compared with SSRs. The relative loss in polymorphism compared with SSRs can be compensated by increasing SNP numbers and by using SNP haplotypes. Determining the most appropriate SNP chemistry will be dependent upon matching the technical features of the method within the context of application, particularly in consideration of whether genotypic samples will be pooled or assayed individually.  相似文献   

17.
Molecular markers are used to provide the link between genotype and phenotype, for the production of molecular genetic maps and to assess genetic diversity within and between related species. Single nucleotide polymorphisms (SNPs) are the most abundant molecular genetic marker. SNPs can be identified in silico , but care must be taken to ensure that the identified SNPs reflect true genetic variation and are not a result of errors associated with DNA sequencing. The SNP detection method autoSNP has been developed to identify SNPs from sequence data for any species. Confidence in the predicted SNPs is based on sequence redundancy, and haplotype co-segregation scores are calculated for a further independent measure of confidence. We have extended the autoSNP method to produce autoSNPdb, which integrates SNP and gene annotation information with a graphical viewer. We have applied this software to public barley expressed sequences, and the resulting database is available over the Internet. SNPs can be viewed and searched by sequence, functional annotation or predicted synteny with a reference genome, in this case rice. The correlation between SNPs and barley cultivar, expressed tissue type and development stage has been collated for ease of exploration. An average of one SNP per 240 bp was identified, with SNPs more prevalent in the 5' regions and simple sequence repeat (SSR) flanking sequences. Overall, autoSNPdb can provide a wealth of genetic polymorphism information for any species for which sequence data are available.  相似文献   

18.
Genetic association studies of common disease often rely on linkage disequilibrium (LD) along the human genome and in the population under study. Although understanding the characteristics of this correlation has been the focus of many large-scale surveys (culminating in genomewide haplotype maps), the results of different studies have yielded wide-ranging estimates. Since understanding these differences (and whether they can be reconciled) has important implications for whole-genome association studies, in this article we dissect biases in these estimations that are due to known aspects of study design and analytic methodology. In particular, we document in the empirical data that the long-known complicating effects of allele frequency, marker density, and sample size largely reconcile all large-scale surveys. Two exceptions are an underappraisal of redundancy among single-nucleotide polymorphisms (SNPs) when evaluation is limited to short regions (as in candidate-gene resequencing studies) and an inflation in the extent of LD in HapMap phase I, which is likely due to oversampling of specific haplotypes in the creation of the public SNP map. Understanding these factors can guide the understanding of empirical LD surveys and has implications for genetic association studies.  相似文献   

19.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.  相似文献   

20.
Vainrub A  Pettitt BM 《Biopolymers》2004,73(5):614-620
We present a theoretical model for typical microarray-based single nucleotide polymorphism (SNP) assay of small genomic DNA amount. We derived the adsorption isotherm expressing the on-array hybridization efficiency in terms of genomic target sequence and concentration, oligonucleotide probe sequence and surface density, hybridization buffer, and temperature. This isotherm correctly describes the surface probe density effects, the sensitivity peak, and the melting temperature depression, and is in accord with published experiments. We discuss optimization of parallel SNP genotyping. Our estimates show that SNP detection at a single temperature in aqueous hybridization buffer is restricted by DNA regions that differ by less than 20% in GC content. We predict that the variety of genotyped SNPs could be substantially extended using an assay design with high probe density and a large fraction of probes hybridized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号