共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Runs of homozygosity are long, uninterrupted stretches of homozygous genotypes that enable reliable estimation of levels of inbreeding (i.e., autozygosity) based on high-throughput, chip-based single nucleotide polymorphism (SNP) genotypes. While the theoretical definition of runs of homozygosity is straightforward, their empirical identification depends on the type of SNP chip used to obtain the data and on a number of factors, including the number of heterozygous calls allowed to account for genotyping errors. We analyzed how SNP chip density and genotyping errors affect estimates of autozygosity based on runs of homozygosity in three cattle populations, using genotype data from an SNP chip with 777 972 SNPs and a 50 k chip.Results
Data from the 50 k chip led to overestimation of the number of runs of homozygosity that are shorter than 4 Mb, since the analysis could not identify heterozygous SNPs that were present on the denser chip. Conversely, data from the denser chip led to underestimation of the number of runs of homozygosity that were longer than 8 Mb, unless the presence of a small number of heterozygous SNP genotypes was allowed within a run of homozygosity.Conclusions
We have shown that SNP chip density and genotyping errors introduce patterns of bias in the estimation of autozygosity based on runs of homozygosity. SNP chips with 50 000 to 60 000 markers are frequently available for livestock species and their information leads to a conservative prediction of autozygosity from runs of homozygosity longer than 4 Mb. Not allowing heterozygous SNP genotypes to be present in a homozygosity run, as has been advocated for human populations, is not adequate for livestock populations because they have much higher levels of autozygosity and therefore longer runs of homozygosity. When allowing a small number of heterozygous calls, current software does not differentiate between situations where these calls are adjacent and therefore indicative of an actual break of the run versus those where they are scattered across the length of the homozygous segment. Simple graphical tests that are used in this paper are a current, yet tedious solution. 相似文献2.
Hiroshi Fujii Takehiko Shimada Keisuke Nonaka Masayuki Kita Takeshi Kuniga Tomoko Endo Yoshinori Ikoma Mitsuo Omura 《Tree Genetics & Genomes》2013,9(1):145-153
We developed a 384 multiplexed SNP array, named CitSGA-1, for the genotyping of Citrus cultivars, and evaluated the performance and reliability of the genotyping. SNPs were surveyed by direct sequence comparison of the sequence tagged site (STS) fragment amplified from genomic DNA of cultivars representing the genetic diversity of citrus breeding in Japan. Among 1497 SNPs candidates, 384 SNPs for a high-throughput genotyping array were selected based on physical parameters of Illumina’s bead array criteria. The assay using CitSGA-1 was applied to a hybrid population of 88 progeny and 103 citrus accessions for breeding in Japan, which resulted in 73,726 SNP calls. A total of 351 SNPs (91 %) could call different genotypes among the DNA samples, resulting in a success rate for the assay comparable to previously reported rates for other plant species. To confirm the reliability of SNP genotype calls, parentage analysis was applied, and it indicated that the number of reliable SNPs and corresponding STSs were 276 and 213, respectively. The multiplexed SNP genotyping array reported here will be useful for the efficient construction of linkage map, for the detection of markers for marker-assisted breeding, and for the identification of cultivars. 相似文献
3.
4.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology. 相似文献
5.
SUMMARY: Three recent publications have examined the quality and completeness of public database single nucleotide polymorphism (dbSNP) and have come to dramatically different conclusions regarding dbSNPs false positive rate and the proportion of dbSNPs that are expected to be common. These studies employed different genotyping technologies and different protocols in determining minimum acceptable genotyping quality thresholds. Because heterozygous sites typically have lower quality scores than homozygous sites, a higher minimum quality threshold reduces the number of false positive SNPs, but yields fewer heterozygotes and leads to fewer confirmed SNPs. To account for the different confirmation rates and distributions of minor allele frequencies, we propose that the three confirmation studies have different false positive and false negative rates. We developed a mathematical model to predict SNP confirmation rates and the apparent distribution of minor allele frequencies under user-specified false positive and false negative rates. We applied this model to the three published studies and to our own resequencing effort. We conclude that the dbSNP false positive rate is approximately 15-17% and that the reported confirmation studies have vastly different genotyping error rates and patterns. 相似文献
6.
7.
8.
Bráulio Fabiano Xavier de Moraes Rodrigo Furtado dos Santos Bruno Marco de Lima Aurélio Mendes Aguiar Alexandre Alves Missiaggia Donizete da Costa Dias Gabriel Dehon Peçanha Sampaio Rezende Flávia Maria Avelar Gonçalves Juan J. Acosta Matias Kirst Márcio F. R. ResendeJr Patricio R. Muñoz 《Molecular breeding : new strategies in plant improvement》2018,38(9):115
The successful application of genomic selection (GS) approaches is dependent on genetic makers derived from high-throughput and low-cost genotyping methods. Recent GS studies in trees have predominantly relied on SNP arrays as the source of genotyping, though this technology has a high entry cost. The recent development of alternative genotyping platforms, tailored to specific species and with low entry cost, has become possible due to advances in next-generation sequencing and genome complexity reduction methods such as sequence capture. However, the performance of these new platforms in GS models has not yet been evaluated, or compared to models developed from SNP arrays. Here, we evaluate the impact of these genotyping technologies on the development of GS prediction models for a Eucalyptus breeding population composed of 739 trees phenotyped for 13 wood quality and growth traits. Genotyping data obtained with both methods were compared for linkage disequilibrium, minor allele frequency, and missing data. Phenotypic prediction methods RR-BLUP and BayesB were employed, while predictive ability using cross validation was used to evaluate the performance of GS models derived from the different genotyping platforms. Differences in linkage disequilibrium patterns, minor allele frequency, missing data, and marker distribution were detected between sequence capture and SNP arrays. However, RR-BLUP and BayesB GS models resulted in similar predictive abilities. These results demonstrate that both genotyping methods are equivalent for genomic prediction of the traits evaluated. Sequence capture offers an alternative for species where SNP arrays are not available, or for when the initial development cost is too high. 相似文献
9.
Single nucleotide polymorphisms (SNPs) have been increasingly utilized to investigate somatic genetic abnormalities in premalignancy and cancer. LOH is a common alteration observed during cancer development, and SNP assays have been used to identify LOH at specific chromosomal regions. The design of such studies requires consideration of the resolution for detecting LOH throughout the genome and identification of the number and location of SNPs required to detect genetic alterations in specific genomic regions. Our study evaluated SNP distribution patterns and used probability models, Monte Carlo simulation, and real human subject genotype data to investigate the relationships between the number of SNPs, SNP HET rates, and the sensitivity (resolution) for detecting LOH. We report that variances of SNP heterozygosity rate in dbSNP are high for a large proportion of SNPs. Two statistical methods proposed for directly inferring SNP heterozygosity rates require much smaller sample sizes (intermediate sizes) and are feasible for practical use in SNP selection or verification. Using HapMap data, we showed that a region of LOH greater than 200 kb can be reliably detected, with losses smaller than 50 kb having a substantially lower detection probability when using all SNPs currently in the HapMap database. Higher densities of SNPs may exist in certain local chromosomal regions that provide some opportunities for reliably detecting LOH of segment sizes smaller than 50 kb. These results suggest that the interpretation of the results from genome-wide scans for LOH using commercial arrays need to consider the relationships among inter-SNP distance, detection probability, and sample size for a specific study. New experimental designs for LOH studies would also benefit from considering the power of detection and sample sizes required to accomplish the proposed aims. 相似文献
10.
One well-known approach for the analysis of transmission-disequilibrium is the investigation of single nucleotide polymorphisms (SNPs) in trios consisting of an affected child and its parents. Results may be biased by erroneously given genotypes. Various reasons, among them sample swap or wrong pedigree structure, represent a possible source for biased results. As these can be partly ruled out by good study conditions together with checks for correct pedigree structure by a series of independent markers, the remaining main cause for errors is genotyping errors. Some of the errors can be detected by Mendelian checks whilst others are compatible with the pedigree structure. The extent of genotyping errors can be estimated by investigating the rate of detected genotyping errors by Mendelian checks. In many studies only one SNP of a specific genomic region is investigated by TDT which leaves Mendelian checks as the only tool to control genotyping errors. From the rate of detected errors the true error rate can be estimated. Gordon et al. [Hum Hered 1999;49:65-70] considered the case of genotyping errors that occur randomly and independently with some fixed probability for the wrong ascertainment of an allele. In practice, instead of single alleles, SNP genotypes are determined. Therefore, we study the proportion of detected errors (detection rate) based on genotypes. In contrast to Gordon et al., who reported detection rates between 25 and 30%, we obtain higher detection rates ranging from 39 up to 61% considering likely error structures in the data. We conclude that detection rates are probably substantially higher than those reported by Gordon et al. 相似文献
11.
C. MAUDET G. LUIKART D. DUBRAY A. VON HARDENBERG P. TABERLET 《Molecular ecology resources》2004,4(4):772-775
We show that Alpine ibex (Capra ibex) and Corsican mouflon (Ovis musimon) faeces yield useful DNA for microsatellite analysis, however, we detected higher genotyping error rates for spring faeces than for winter faeces. We quantified the genotyping error rate by repeatedly genotyping four microsatellites. Respectively, 99 and 95% of mouflon and ibex genotyping repetitions provided a correct genotype using winter samples, whereas spring samples provided only 52 and 59% correct genotypes. Thus, before starting a noninvasive study, we recommend that researchers conduct a pilot study to quantify genotyping error rates for each season, population and species to be studied. 相似文献
12.
13.
Jones FC Chan YF Schmutz J Grimwood J Brady SD Southwick AM Absher DM Myers RM Reimchen TE Deagle BE Schluter D Kingsley DM 《Current biology : CB》2012,22(1):83-90
Genes underlying repeated adaptive evolution in natural populations are still largely unknown. Stickleback fish (Gasterosteus aculeatus) have undergone a recent dramatic evolutionary radiation, generating numerous examples of marine-freshwater species pairs and a small number of benthic-limnetic species pairs found within single lakes [1]. We have developed a new genome-wide SNP genotyping array to study patterns of genetic variation in sticklebacks over a wide geographic range, and to scan the genome for regions that contribute to repeated evolution of marine-freshwater or benthic-limnetic species pairs. Surveying 34 global populations with 1,159 informative markers revealed substantial genetic variation, with predominant patterns reflecting demographic history and geographic structure. After correcting for geographic structure and filtering for neutral markers, we detected large repeated shifts in allele frequency at some loci, identifying both known and novel loci likely contributing to marine-freshwater and benthic-limnetic divergence. Several novel loci fall close to genes implicated in epithelial barrier or immune functions, which have likely changed as sticklebacks adapt to contrasting environments. Specific alleles differentiating sympatric benthic-limnetic species pairs are shared in nearby solitary populations, suggesting an allopatric origin for adaptive variants and selection pressures unrelated to sympatry in the initial formation of these classic vertebrate species pairs. 相似文献
14.
15.
de Leeuw N Hehir-Kwa JY Simons A Geurts van Kessel A Smeets DF Faas BH Pfundt R 《Cytogenetic and genome research》2011,135(3-4):212-221
Array-based comparative genomic hybridization analysis of genomic DNA was first applied in postnatal diagnosis for patients with intellectual disability (ID) and/or congenital anomalies (CA). Genome-wide single-nucleotide polymorphism (SNP) array analysis was subsequently implemented as the first line diagnostic test for ID/CA patients in our laboratory in 2009, because its diagnostic yield is significantly higher than that of routine cytogenetic analysis. In addition to the detection of copy number variations, the genotype information obtained with SNP array analysis enables the detection of stretches of homozygosity and thereby the possible identification of recessive disease genes, mosaic aneuploidy, or uniparental disomy. Patient-parent (trio) information analysis is used to screen for the presence of any form of uniparental disomy in the patient and can determine the parental origin of a de novo copy number variation. Moreover, the outcome of a genotype analysis is used as a final quality control by ruling out potential sample mismatches due to non-paternity or sample mix-up. SNP array analysis is now also used in our laboratory for patients with disorders for which locus heterogeneity is known (homozygosity pre-screening), in prenatal diagnosis in case of structural ultrasound anomalies, and for patients with leukemia. In this report, we summarize our array findings and experiences in the various diagnostic applications and demonstrate the power of a SNP-based array platform for molecular karyotyping, because it not only significantly improves the diagnostic yield in both constitutional and cancer genome diagnostics, but it also enhances the quality of the diagnostic laboratory workflow. 相似文献
16.
A simple and robust TDT-type test against genotyping error with error rates varying across families 总被引:1,自引:0,他引:1
The transmission/disequilibrium test (TDT), a family based test of linkage and association, is a popular test for studies of complex inheritance, as it is nonparametric and robust against spurious conclusions induced by hidden genetic structure, such as stratification or admixture. However, the TDT may be biased by genotyping errors. Undetected genotyping errors may be contributing to an inflated type I error rate among reported TDT-derived associations. To adjust for bias, a popular approach is to assume a genotype error model for describing the pattern of errors and propose association tests using likelihood method. However, all model-based approaches tend to perform unsatisfactorily if the related genotyping error rates are not identical across all families. In this paper, we propose a TDT-type association test which is not only simple, robust against population stratification (and hence the assumption of Hardy-Weinberg equilibrium is not required), but also robust against genotyping error with error rates varying across families. Simulation studies confirm that the new test has very reasonable performance. 相似文献
17.
Efficient inference of haplotypes from genotypes on a pedigree 总被引:1,自引:0,他引:1
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects. 相似文献
18.
Van Bers NE Crooijmans RP Groenen MA Dibbits BW Komen J 《Molecular ecology resources》2012,12(5):932-941
We have generated a unique resource consisting of nearly 175 000 short contig sequences and 3569 SNP markers from the widely cultured GIFT (Genetically Improved Farmed Tilapia) strain of Nile tilapia (Oreochromis niloticus). In total, 384 SNPs were selected to monitor the wider applicability of the SNPs by genotyping tilapia individuals from different strains and different geographical locations. In all strains and species tested (O. niloticus, O. aureus and O. mossambicus), the genotyping assay was working for a similar number of SNPs (288–305 SNPs). The actual number of polymorphic SNPs was, as expected, highest for individuals from the GIFT population (255 SNPs). In the individuals from an Egyptian strain and in individuals caught in the wild in the basin of the river Volta, 197 and 163 SNPs were polymorphic, respectively. A pairwise calculation of Nei’s genetic distance allowed the discrimination of the individual strains and species based on the genotypes determined with the SNP set. We expect that this set will be widely applicable for use in tilapia aquaculture, e.g. for pedigree reconstruction. In addition, this set is currently used for assaying the genetic diversity of native Nile tilapia in areas where tilapia is, or will be, introduced in aquaculture projects. This allows the tracing of escapees from aquaculture and the monitoring of effects of introgression and hybridization. 相似文献
19.
Matthews AG Haynes C Liu C Ott J 《Statistical applications in genetics and molecular biology》2008,7(1):Article23
Genome-wide association studies are now widely used tools to identify genes and/or regions which may contribute to the development of various diseases. With case-control data a 2x3 contingency table can be constructed for each SNP to perform genotype-based tests of association. An increasingly common technique to increase the power to detect an association is to collapse each 2x3 table into a table assuming either a dominant or recessive mode of inheritance (2x2 table). We consider three different methods of determining which genetic model to choose and show that each of these methods of collapsing genotypes increases the type I error rate (i.e., the rate of false positives). However, one of these methods does lead to an increase in power compared with the usual genotype- and allele-based tests for most genetic models. 相似文献