The number of polymorphisms identified with next‐generation sequencing approaches depends directly on the sequencing depth and therefore on the experimental cost. Although higher levels of depth ensure more sensitive and more specific SNP calls, economic constraints limit the increase of depth for whole‐genome resequencing (WGS). For this reason, capture resequencing is used for studies focusing on only some specific regions of the genome. However, several biases in capture resequencing are known to have a negative impact on the sensitivity of SNP detection. Within this framework, the aim of this study was to compare the accuracy of WGS and capture resequencing on SNP detection and genotype calling, which differ in terms of both sequencing depth and biases. Indeed, we have evaluated the SNP calling and genotyping accuracy in a WGS dataset (13X) and in a capture resequencing dataset (87X) performed on 11 individuals. The percentage of SNPs not identified due to a sevenfold sequencing depth decrease was estimated at 7.8% using a down‐sampling procedure on the capture sequencing dataset. A comparison of the 87X capture sequencing dataset with the WGS dataset revealed that capture‐related biases were leading with the loss of 5.2% of SNPs detected with WGS. Nevertheless, when considering the SNPs detected by both approaches, capture sequencing appears to achieve far better SNP genotyping, with about 4.4% of the WGS genotypes that can be considered as erroneous and even 10% focusing on heterozygous genotypes. In conclusion, WGS and capture deep sequencing can be considered equivalent strategies for SNP detection, as the rate of SNPs not identified because of a low sequencing depth in the former is quite similar to SNPs missed because of method biases of the latter. On the other hand, capture deep sequencing clearly appears more adapted for studies requiring great accuracy in genotyping.  相似文献   

ABSTRACT: BACKGROUND: The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world's poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. RESULTS: Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. CONCLUSION: The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.  相似文献   



Obtaining chloroplast genome sequences is important to increase the knowledge about the fundamental biology of plastids, to understand evolutionary and ecological processes in the evolution of plants, to develop biotechnological applications (e.g. plastid engineering) and to improve the efficiency of breeding schemes. Extraction of pure chloroplast DNA is required for efficient sequencing of chloroplast genomes. Unfortunately, most protocols for extracting chloroplast DNA were developed for eudicots and do not produce sufficiently pure yields for a shotgun sequencing approach of whole plastid genomes from the monocot grasses.

Methodology/Principal Findings

We have developed a simple and inexpensive method to obtain chloroplast DNA from grass species by modifying and extending protocols optimized for the use in eudicots. Many protocols for extracting chloroplast DNA require an ultracentrifugation step to efficiently separate chloroplast DNA from nuclear DNA. The developed method uses two more centrifugation steps than previously reported protocols and does not require an ultracentrifuge.


The described method delivered chloroplast DNA of very high quality from two grass species belonging to highly different taxonomic subfamilies within the grass family (Lolium perenne, Pooideae; Miscanthus×giganteus, Panicoideae). The DNA from Lolium perenne was used for whole chloroplast genome sequencing and detection of SNPs. The sequence is publicly available on EMBL/GenBank.  相似文献   

We use the patterns of homozygosity at multiple loci to distinguish between excess homozygosity caused by consanguineous mating and that due to undetected population subdivision (the Wahlund effect). Clarification of the underlying causes of excess homozygosity is of practical importance in explaining the occurrence of recessive genetic disorders and in forensic match probability calculations. We calculated a likelihood surface for two parameters: C, the proportion of the population practicing consanguinity, and theta, the genetic correlation due population subdivision. To illustrate the method, we applied it to multilocus genotypic data of two U.K. Asian populations, one practicing a high frequency of cousin marriage, and another in which caste endogamy was suspected. The method was able to successfully distinguish the different patterns of relatedness. The method also returned accurate estimates of C and theta using simulated data sets. We show how our method can be extended to allow for degrees of inbreeding closer than cousin unions, including selfing. With closer inbreeding, the relatedness of recent ancestors beyond the parents becomes an issue.  相似文献   

With an increased emphasis on genotyping of single nucleotide polymorphisms (SNPs) in disease association studies, the genotyping platform of choice is constantly evolving. In addition, the development of more specific SNP assays and appropriate genotype validation applications is becoming increasingly critical to elucidate ambiguous genotypes. In this study, we have used SNP specific Locked Nucleic Acid (LNA) hybridization probes on a real-time PCR platform to genotype an association cohort and propose three criteria to address ambiguous genotypes. Based on the kinetic properties of PCR amplification, the three criteria address PCR amplification efficiency, the net fluorescent difference between maximal and minimal fluorescent signals and the beginning of the exponential growth phase of the reaction. Initially observed SNP allelic discrimination curves were confirmed by DNA sequencing (n = 50) and application of our three genotype criteria corroborated both sequencing and observed real-time PCR results. In addition, the tested Caucasian association cohort was in Hardy-Weinberg equilibrium and observed allele frequencies were very similar to two independently tested Caucasian association cohorts for the same tested SNP. We present here a novel approach to effectively determine ambiguous genotypes generated from a real-time PCR platform. Application of our three novel criteria provides an easy to use semi-automated genotype confirmation protocol.  相似文献   

There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.  相似文献   



Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel.


We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible.


It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.  相似文献   

Pear (Pyrus; 2n = 34), the third most important temperate fruit crop, has great nutritional and economic value. Despite the availability of many genomic resources in pear, it is challenging to genotype novel germplasm resources and breeding progeny in a timely and cost‐effective manner. Genotyping arrays can provide fast, efficient and high‐throughput genetic characterization of diverse germplasm, genetic mapping and breeding populations. We present here 200K AXIOM® PyrSNP, a large‐scale single nucleotide polymorphism (SNP) genotyping array to facilitate genotyping of Pyrus species. A diverse panel of 113 re‐sequenced pear genotypes was used to discover SNPs to promote increased adoption of the array. A set of 188 diverse accessions and an F1 population of 98 individuals from ‘Cuiguan’ × ‘Starkrimson’ was genotyped with the array to assess its effectiveness. A large majority of SNPs (166 335 or 83%) are of high quality. The high density and uniform distribution of the array SNPs facilitated prediction of centromeric regions on 17 pear chromosomes, and significantly improved the genome assembly from 75.5% to 81.4% based on genetic mapping. Identification of a gene associated with flowering time and candidate genes linked to size of fruit core via genome wide association studies showed the usefulness of the array in pear genetic research. The newly developed high‐density SNP array presents an important tool for rapid and high‐throughput genotyping in pear for genetic map construction, QTL identification and genomic selection.  相似文献   



The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.  相似文献   

Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.  相似文献   

To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.  相似文献   

MOTIVATION: The wealth of single nucleotide polymorphism (SNP) data within candidate genes and anticipated across the genome poses enormous analytical problems for studies of genotype-to-phenotype relationships, and modern data mining methods may be particularly well suited to meet the swelling challenges. In this paper, we introduce the method of Belief (Bayesian) networks to the domain of genotype-to-phenotype analyses and provide an example application. RESULTS: A Belief network is a graphical model of a probabilistic nature that represents a joint multivariate probability distribution and reflects conditional independences between variables. Given the data, optimal network topology can be estimated with the assistance of heuristic search algorithms and scoring criteria. Statistical significance of edge strengths can be evaluated using Bayesian methods and bootstrapping. As an example application, the method of Belief networks was applied to 20 SNPs in the apolipoprotein (apo) E gene and plasma apoE levels in a sample of 702 individuals from Jackson, MS. Plasma apoE level was the primary target variable. These analyses indicate that the edge between SNP 4075, coding for the well-known epsilon2 allele, and plasma apoE level was strong. Belief networks can effectively describe complex uncertain processes and can both learn from data and incorporate prior knowledge. AVAILABILITY: Various alternative and supplemental networks (not given in the text) as well as source code extensions, are available from the authors. SUPPLEMENTARY INFORMATION: http://bioinformatics.oxfordjournals.org.  相似文献   

The objective of the present study was to estimate genetic parameters for body weight at different ages in Arabi sheep using data collected from 1999 to 2009. Investigated traits consisted of birth weight (N = 2776), weaning weight (N = 2002) and weight at six months of age (N = 1885). The data were analyzed using restricted maximum likelihood analysis, by fitting univariate and multivariate animal models. All three weight traits were significantly influenced by birth year, sex and birth type. Age of dam only significantly affected birth weight. Log-likelihood ratio tests were conducted to determine the most suitable model for each growth trait in univariate analyses. Direct and total heritability estimates for birth weight, weaning weight and weight at six months of age (based on the best model) were 0.42 and 0.16 (model 4), 0.38 and 0.13 (model 4) and 0.14 and 0.14 (model 1), respectively. Estimation of maternal heritability for birth weight and weaning weight was 0.22 and 0.18, respectively. Genetic and phenotypic correlations among these traits were positive. Phenotypic correlations among traits were low to moderate. Genetic correlations among traits were positive and higher than the corresponding phenotypic correlations. Weaning weight had a strong and significant correlation with weight at six months of age (0.99). We conclude that selection can be made in animals based on weaning weight instead of the present practice of selection based on weight at six months.  相似文献   

A large F2 cross with 920 Japanese quail was used to map QTL for phosphorus utilization, calcium utilization, feed per gain and body weight gain. In addition, four bone ash traits were included, because it is known that they are genetically correlated with the focal trait of phosphorus utilization. Trait recording was done at the juvenile stage of the birds. The individuals were genotyped genome‐wide for about 4k SNPs and a linkage map constructed, which agreed well with the reference genome. QTL linkage mapping was performed using multimarker regression analysis in a line cross model. Single marker association mapping was done within the mapped QTL regions. The results revealed several genome‐wide significant QTL. For the focal trait phosphorus utilization, a QTL on chromosome CJA3 could be detected by linkage mapping, which was substantiated by the results of the SNP association mapping. Four candidate genes were identified for this QTL, which should be investigated in future functional studies. Some overlap of QTL regions for different traits was detected, which is in agreement with the corresponding genetic correlations. It seems that all traits investigated are polygenic in nature with some significant QTL and probably many other small‐effect QTL that were not detectable in this study.  相似文献   

Genomic approaches hold great promise for resolving unanswered questions about transmission patterns and responses to control efforts for schistosomiasis and other neglected tropical diseases. However, the cost of generating genomic data and the challenges associated with obtaining sufficient DNA from individual schistosome larvae (miracidia) from mammalian hosts have limited the application of genomic data for studying schistosomes and other complex macroparasites. Here, we demonstrate the feasibility of utilizing whole genome amplification and sequencing (WGS) to analyze individual archival miracidia. As an example, we sequenced whole genomes of 22 miracidia from 11 human hosts representing two villages in rural Sichuan, China, and used these data to evaluate patterns of relatedness and genetic diversity. We also down-sampled our dataset to test how lower coverage sequencing could increase the cost effectiveness of WGS while maintaining power to accurately infer relatedness. Collectively, our results illustrate that population-level WGS datasets are attainable for individual miracidia and represent a powerful tool for ultimately providing insight into overall genetic diversity, parasite relatedness, and transmission patterns for better design and evaluation of disease control efforts.  相似文献   

Biologists are frequently facing the problem of dealing with data sets with a small amount of data and a high proportion of missing information. We were particularly interested in analysing fragmentary data sets generated by the application of molecular methods in palaeoanthropology in order to determine whether individuals are genetically related. In this note, we announce the release of the software burial (version 1.0) to test the null hypothesis that the observed grouping of individuals at a particular burial site reflects random placement of genotypes. The proposed test, however, can also be applied to data sets whose objects can be grouped according to nongenetic criteria such as the style of clothing, the kind of burial gifts or cultural artefacts. The C + + source code and binary executables for Windows and Linux are available for download at: http://www.uni‐tuebingen.de/uni/bcm/BURIAL/index.html .  相似文献   

