期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automated SNP genotype clustering algorithm to improve data completeness in high-throughput SNP genotyping datasets from custom arrays

Smith EM Littrell J Olivier M 《基因组蛋白质组与生物信息学报(英文版)》2007,5(3-4):256-259

High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author. 相似文献

2.

Simultaneous SNP identification and assessment of allele-specific bias from ChIP-seq data

Y Ni AW Hall A Battenhouse VR Iyer 《BMC genetics》2012,13(1):79

相似文献

3.

Effects of ascertainment bias on recovering human demographic history.

E Eller 《Human biology; an international record of research》2001,73(3):411-427

In recent years multilocus data sets have been used to study the demographic history of human populations. In this paper (1) analyses previously done on 60 short tandem repeat (STR) loci are repeated on 30 restriction site polymorphism (RSP) markers; (2) relative population weights are estimated from the RSP data set and compared to previously published estimates from STR and craniometric data sets; and (3) computer simulations are performed to show the effects of ascertainment bias on relative population weight estimates. Not surprisingly, given that the RSP markers were originally identified in a small panel of Caucasians, estimates of relative population weights are biased and the European population weight is artificially inflated. However, the effects of ascertainment bias are not apparent in a principal components plot or estimates of FST. Ascertainment bias can have a large effect in other genetic systems with inherently low heterozygosity such as Alus or single nucleotide polymorphisms (SNPs), and care must be taken to have prior knowledge of how polymorphic markers in a given data set were originally identified. Otherwise, results can be skewed and interpretations faulty. 相似文献

4.

Evaluating SNP ascertainment bias and its impact on population assignment in Atlantic cod, Gadus morhua

Bradbury IR Hubert S Higgins B Bowman S Paterson IG Snelgrove PV Morris CJ Gregory RS Hardie DC Borza T Bentzen P 《Molecular ecology resources》2011,11(Z1):218-225

The increasing use of single nucleotide polymorphisms (SNPs) in studies of nonmodel organisms accentuates the need to evaluate the influence of ascertainment bias on accurate ecological or evolutionary inference. Using a panel of 1641 expressed sequence tag-derived SNPs developed for northwest Atlantic cod (Gadus morhua), we examined the influence of ascertainment bias and its potential impact on assignment of individuals to populations ranging widely in origin. We hypothesized that reductions in assignment success would be associated with lower diversity in geographical regions outside the location of ascertainment. Individuals were genotyped from 13 locations spanning much of the contemporary range of Atlantic cod. Diversity, measured as average sample heterozygosity and number of polymorphic loci, declined (c. 30%) from the western (H(e) = 0.36) to eastern (H(e) = 0.25) Atlantic, consistent with a signal of ascertainment bias. Assignment success was examined separately for pools of loci representing differing degrees of reductions in diversity. SNPs displaying the largest declines in diversity produced the most accurate assignment in the ascertainment region (c. 83%) and the lowest levels of correct assignment outside the ascertainment region (c. 31%). Interestingly, several isolated locations showed no effect of assignment bias and consistently displayed 100% correct assignment. Contrary to expectations, estimates of accurate assignment range-wide using all loci displayed remarkable similarity despite reductions in diversity. Our results support the use of large SNP panels in assignment studies of high geneflow marine species. However, our evidence of significant reductions in assignment success using some pools of loci suggests that ascertainment bias may influence assignment results and should be evaluated in large-scale assignment studies. 相似文献

5.

Estimating population divergence time and phylogeny from single-nucleotide polymorphisms data with outgroup ascertainment bias

Wang Y Nielsen R 《Molecular ecology》2012,21(4):974-986

The inference of population divergence times and branching patterns is of fundamental importance in many population genetic analyses. Many methods have been developed for estimating population divergence times, and recently, there has been particular attention towards genome-wide single-nucleotide polymorphisms (SNP) data. However, most SNP data have been affected by an ascertainment bias caused by the SNP selection and discovery protocols. Here, we present a modification of an existing maximum likelihood method that will allow approximately unbiased inferences when ascertainment is based on a set of outgroup populations. We also present a method for estimating trees from the asymmetric dissimilarity measures arising from pairwise divergence time estimation in population genetics. We evaluate the methods by simulations and by applying them to a large SNP data set of seven East Asian populations. 相似文献

6.

Dynamic variable selection in SNP genotype autocalling from APEX microarray data

Mohua Podder William J Welch Ruben H Zamar Scott J Tebbutt 《BMC bioinformatics》2006,7(1):521

Background

Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. 相似文献

7.

A high-throughput SNP marker system for parental polymorphism screening, and diversity analysis in common bean (Phaseolus vulgaris L.)

Matthew W. Blair Andrés J. Cortés R. Varma Penmetsa Andrew Farmer Noelia Carrasquilla-Garcia Doug R. Cook 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2013,126(2):535-548

Single nucleotide polymorphism (SNP) detection has become a marker system of choice, because of the high abundance of source polymorphisms and the ease with which allele calls are automated. Various technologies exist for the evaluation of SNP loci and previously we validated two medium throughput technologies. In this study, our goal was to utilize a 768 feature, Illumina GoldenGate assay for common bean (Phaseolus vulgaris L.) developed from conserved legume gene sequences and to use the new technology for (1) the evaluation of parental polymorphisms in a mini-core set of common bean accessions and (2) the analysis of genetic diversity in the crop. A total of 736 SNPs were scored on 236 diverse common bean genotypes with the GoldenGate array. Missing data and heterozygosity levels were low and 94 % of the SNPs were scorable. With the evaluation of the parental polymorphism genotypes, we estimated the utility of the SNP markers in mapping for inter-genepool and intra-genepool populations, the latter being of lower polymorphism than the former. When we performed the diversity analysis with the diverse genotypes, we found Illumina GoldenGate SNPs to provide equivalent evaluations as previous gene-based SNP markers, but less fine-distinctions than with previous microsatellite marker analysis. We did find, however, that the gene-based SNPs in the GoldenGate array had some utility in race structure analysis despite the low polymorphism. Furthermore the SNPs detected high heterozygosity in wild accessions which was probably a reflection of ascertainment bias. The Illumina SNPs were shown to be effective in distinguishing between the genepools, and therefore were most useful in saturation of inter-genepool genetic maps. The implications of these results for breeding in common bean are discussed as well as the advantages and disadvantages of the GoldenGate system for SNP detection. 相似文献

8.

SNP identification and marker assay development for high-throughput selection of soybean cyst nematode resistance

Zi Shi Shiming Liu James Noe Prakash Arelli Khalid Meksem Zenglu Li 《BMC genomics》2015,16(1)

Background

Soybean cyst nematode (SCN) is the most economically devastating pathogen of soybean. Two resistance loci, Rhg1 and Rhg4 primarily contribute resistance to SCN race 3 in soybean. Peking and PI 88788 are the two major sources of SCN resistance with Peking requiring both Rhg1 and Rhg4 alleles and PI 88788 only the Rhg1 allele. Although simple sequence repeat (SSR) markers have been reported for both loci, they are linked markers and limited to be applied in breeding programs due to accuracy, throughput and cost of detection methods. The objectives of this study were to develop robust functional marker assays for high-throughput selection of SCN resistance and to differentiate the sources of resistance.

Results

Based on the genomic DNA sequences of 27 soybean lines with known SCN phenotypes, we have developed Kompetitive Allele Specific PCR (KASP) assays for two Single nucleotide polymorphisms (SNPs) from Glyma08g11490 for the selection of the Rhg4 resistance allele. Moreover, the genomic DNA of Glyma18g02590 at the Rhg1 locus from 11 soybean lines and cDNA of Forrest, Essex, Williams 82 and PI 88788 were fully sequenced. Pairwise sequence alignment revealed seven SNPs/insertion/deletions (InDels), five in the 6th exon and two in the last exon. Using the same 27 soybean lines, we identified one SNP that can be used to select the Rhg1 resistance allele and another SNP that can be employed to differentiate Peking and PI 88788-type resistance. These SNP markers have been validated and a strong correlation was observed between the SNP genotypes and reactions to SCN race 3 using a panel of 153 soybean lines, as well as a bi-parental population, F₅–derived recombinant inbred lines (RILs) from G00-3213 x LG04-6000.

Conclusions

Three functional SNP markers (two for Rhg1 locus and one for Rhg4 locus) were identified that could provide genotype information for the selection of SCN resistance and differentiate Peking from PI 88788 source for most germplasm lines. The robust KASP SNP marker assays were developed. In most contexts, use of one or two of these markers is sufficient for high-throughput marker-assisted selection of plants that will exhibit SCN resistance.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1531-3) contains supplementary material, which is available to authorized users. 相似文献

9.

Integrated study of copy number states and genotype calls using high-density SNP arrays

Wei Sun Fred A. Wright Zhengzheng Tang Silje H. Nordgard Peter Van Loo Tianwei Yu Vessela N. Kristensen Charles M. Perou 《Nucleic acids research》2009,37(16):5365-5377

We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls. 相似文献

10.

SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data

R Nielsen T Korneliussen A Albrechtsen Y Li J Wang 《PloS one》2012,7(7):e37558

We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set. 相似文献

11.

Systematic bias in high-throughput sequencing data and its correction by BEADS

Cheung MS Down TA Latorre I Ahringer J 《Nucleic acids research》2011,39(15):e103

Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses. 相似文献

12.

Features of SNP and SSR diversity in a set of ICARDA barley germplasm collection 总被引：1，自引：0，他引：1

R. K. Varshney M. Baum P. Guo S. Grando S. Ceccarelli A. Graner 《Molecular breeding : new strategies in plant improvement》2010,26(2):229-242

Detection and utilization of genetic variation available in the germplasm collection for crop improvement have been the prime activities of breeders. Here a set of ICARDA barley germplasm collection comprising of 185 cultivated (Hordeum vulgare L.) and 38 wild (H. spontaneum L.) genotypes originated from 30 countries of four continents was genotyped with 68 single nucleotide polymorphism (SNP) and 45 microsatellite or simple sequence repeat (SSR) markers derived from genes (expressed sequence tags, ESTs). As two SNP markers provided 2 and 3 datapoints, a total of 71 SNPs were surveyed that yielded a total of 143 alleles. The number of SSR alleles per locus ranged from 3 to 22 with an average of 7.9 per marker. Average PIC (polymorphism information content) value for SSR and SNP markers were recorded as 0.63 and 0.38, respectively. Heterogeneity was recorded at both SNP and SSR loci in an average of 5.72 and 12.42% accessions, respectively. Genetic similarity matrices for SSR and SNP allelic data were highly correlated (r = 0.75, P < 0.005) and therefore allelic data for both markers were combined and analyzed for understanding the genetic relationships among the germplasm surveyed. Majority of clusters/subclusters were found to contain genotypes from the same geographic origins. While comparing the genetic diversity, the accessions coming from Middle East Asia and North East Asia showed more diversity as compared to that of other geographic regions. Majority of countries representing Africa, Middle East Asia, North East Asia and Arabian Peninsula included the genotypes that contained rare alleles. As expected, spontaneum accessions, as compared to vulgare accessions, showed a higher number of total alleles, higher number of alleles per locus, higher effective number of alleles and higher allelic richness and a higher number of rare alleles were observed. In summary, the examined ICARDA germplasm set showed ample natural genetic variation that can be harnessed for future breeding of barley as climate change and sustainability have become important throughout all growing areas of the world, drought/heat tolerance being the most important ones. 相似文献

13.

An application of high-throughput SNP genotyping for barley genome mapping and characterization of recombinant chromosome substitution lines

Kazuhiro Sato Kazuyoshi Takeda 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2009,119(4):613-619

相似文献

14.

Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium

Nielsen R Signorovitch J 《Theoretical population biology》2003,63(3):245-255

As large-scale sequencing efforts turn from single genome sequencing to polymorphism discovery, single nucleotide polymorphisms (SNPs) are becoming an increasingly important class of population genetic data. But because of the ascertainment biases introduced by many methods of SNP discovery, most SNP data cannot be analyzed using classical population genetic methods. Statistical methods must instead be developed that can explicitly take into account each method of SNP discovery. Here we review some of the current methods for analyzing SNPs and derive sampling distributions for single SNPs and pairs of SNPs for some common SNP discovery schemes. We also show that the ascertainment scheme has a large effect on the estimation of linkage disequilibrium and recombination, and describe some methods of correcting for ascertainment biases when estimating recombination rates from SNP data. 相似文献

15.

Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data 总被引：3，自引：0，他引：3

Carvalho B Bengtsson H Speed TP Irizarry RA 《Biostatistics (Oxford, England)》2007,8(2):485-499

In most microarray technologies, a number of critical stepsare required to convert raw intensity measurements into thedata relied upon by data analysts, biologists, and clinicians.These data manipulations, referred to as preprocessing, caninfluence the quality of the ultimate measurements. In the lastfew years, the high-throughput measurement of gene expressionis the most popular application of microarray technology. Forthis application, various groups have demonstrated that theuse of modern statistical methodology can substantially improveaccuracy and precision of the gene expression measurements,relative to ad hoc procedures introduced by designers and manufacturersof the technology. Currently, other applications of microarraysare becoming more and more popular. In this paper, we describea preprocessing methodology for a technology designed for theidentification of DNA sequence variants in specific genes orregions of the human genome that are associated with phenotypesof interest such as disease. In particular, we describe a methodologyuseful for preprocessing Affymetrix single-nucleotide polymorphismchips and obtaining genotype calls with the preprocessed data.We demonstrate how our procedure improves existing approachesusing data from 3 relatively large studies including the onein which large numbers of independent calls are available. Theproposed methods are implemented in the package oligo availablefrom Bioconductor. 相似文献

16.

Identification of linked regions using high-density SNP genotype data in linkage analysis 总被引：1，自引：0，他引：1

Lin G Wang Z Wang L Lau YL Yang W 《Bioinformatics (Oxford, England)》2008,24(1):86-93

MOTIVATION: With the knowledge of large number of SNPs in human genome and the fast development in high-throughput genotyping technologies, identification of linked regions in linkage analysis through allele sharing status determination will play an ever important role, while consideration of recombination fractions becomes unnecessary. RESULTS: In this study, we have developed a rule-based program that identifies linked regions for underlined diseases using allele sharing information among family members. Our program uses high-density SNP genotype data and works in the face of genotyping errors. It works on nuclear family structures with two or more siblings. The program graphically displays allele sharing status for all members in a pedigree and identifies regions that are potentially linked to the underlined diseases according to user-specified inheritance mode and penetrance. Extensive simulations based on the chi(2) model for recombination show that our program identifies linked regions with high sensitivity and accuracy. Graphical display of allele sharing status helps to detect misspecification of inheritance mode and penetrance, as well as mislabeling or misdiagnosis. Allele sharing determination may represent the future direction of linkage analysis due to its better adaptation to high-density SNP genotyping data. AVAILABILITY: http://paed.hku.hk/uploadarea/yangwl/html/index.html 相似文献

17.

Genetic diversity analysis of elite European maize (Zea mays L.) inbred lines using AFLP, SSR, and SNP markers reveals ascertainment bias for a subset of SNPs

Elisabetta Frascaroli Tobias A. Schrag Albrecht E. Melchinger 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2013,126(1):133-141

Recent advances in high-throughput sequencing technologies have triggered a shift toward single-nucleotide polymorphism (SNP) markers. A systematic bias can be introduced if SNPs are ascertained in a small panel of genotypes and then used for characterizing a larger population (ascertainment bias). With the objective of evaluating a potential ascertainment bias of the Illumina MaizeSNP50 array with respect to elite European maize dent and flint inbred lines, we compared the genetic diversity among these materials based on 731 amplified fragment length polymorphisms (AFLPs), 186 simple sequence repeats (SSRs), 41,434 SNPs of the MaizeSNP50 array (SNP-A), and two subsets of it, i.e., 30,068 Panzea (SNP-P) and 11,366 Syngenta markers (SNP-S). We evaluated the bias effects on major allele frequency, allele number, gene diversity, modified Roger’s distance (MRD), and on molecular variance (AMOVA). We revealed ascertainment bias in SNP-A, compared to AFLPs and SSRs. It affected especially European flint lines analyzed with markers (SNP-S) specifically developed to maximize differences among North American dent germplasm. The bias affected all genetic parameters, but did not substantially alter the relative distances between inbred lines within groups. For these reasons, we conclude that the SNP markers of the MaizeSNP50 array can be employed for breeding purposes in the investigated material. However, attention should be paid in case of comparisons between genotypes belonging to different heterotic groups. In this case, it is advisable to prefer a marker subset with potentially low ascertainment bias, like in our case the SNP-P marker set. 相似文献

18.

StructHDP: automatic inference of number of clusters and population structure from admixed genotype data

Shringarpure S Won D Xing EP 《Bioinformatics (Oxford, England)》2011,27(13):i324-i332

相似文献

19.

An optimization framework for unsupervised identification of rare copy number variation from SNP array data

G?khan Yava? Mehmet Koyutürk Meral ?zsoyo?lu Meetha P Gould Thomas LaFramboise 《Genome biology》2009,10(10):R119

Copy number variants (CNVs) have roles in human disease, and DNA microarrays are important tools for identifying them. In this paper, we frame CNV identification as an objective function optimization problem. We apply our method to data from hundreds of samples, and demonstrate its ability to detect CNVs at a high level of sensitivity without sacrificing specificity. Its performance compares favorably with currently available methods and it reveals previously unreported gains and losses. 相似文献

20.

An optimization framework for unsupervised identification of rare copy number variation from SNP array data

Gökhan Yavaş Mehmet Koyutürk Meral Özsoyoğlu Meetha P Gould Thomas LaFramboise 《Genome biology》2009,10(10):1-18

Background

Specific chromatin characteristics, especially the modification status of the core histone proteins, are associated with active and inactive genes. There is growing evidence that genes that respond to environmental or developmental signals may possess distinct chromatin marks. Using a T cell model and both genome-wide and gene-focused approaches, we examined the chromatin characteristics of genes that respond to T cell activation.

Results

To facilitate comparison of genes with similar basal expression levels, we used expression-profiling data to bin genes according to their basal expression levels. We found that inducible genes in the lower basal expression bins, especially rapidly induced primary response genes, were more likely than their non-responsive counterparts to display the histone modifications of active genes, have RNA polymerase II (Pol II) at their promoters and show evidence of ongoing basal elongation. There was little or no evidence for the presence of active chromatin marks in the absence of promoter Pol II on these inducible genes. In addition, we identified a subgroup of genes with active promoter chromatin marks and promoter Pol II but no evidence of elongation. Following T cell activation, we find little evidence for a major shift in the active chromatin signature around inducible gene promoters but many genes recruit more Pol II and show increased evidence of elongation.

Conclusions

These results suggest that the majority of inducible genes are primed for activation by having an active chromatin signature and promoter Pol II with or without ongoing elongation. 相似文献