共查询到20条相似文献,搜索用时 15 毫秒
1.
Many plants and some animal species are polyploids. Nondisomically inherited markers (e.g. microsatellites) in such species cannot be analysed directly by standard population genetics methods developed for diploid species. One solution is to transform the polyploid codominant genotypes to pseudodiploid‐dominant genotypes, which can then be analysed by standard methods for various purposes such as spatial genetic structure, individual relatedness and relationship. Although this data transformation approach has been used repeatedly in the literature, no systematic study has been conducted to investigate how efficient it is, how much marker information is lost and thus how much analysis accuracy is reduced. More specifically, it is unknown whether or not the transformed data can be used to infer parentage and sibship jointly, and how different sampling schemes (number and polymorphism of markers, number of individuals) and ploidy level affect the inference accuracy. This study analyses both simulated and empirical data to examine the effects of polyploid levels, actual pedigree structures and marker number and polymorphism on the accuracy of joint parentage and sibship assignments in polyploid species. We show that sibship, parentage and selfing rates in polyploids can be inferred accurately from a typical set of microsatellite loci. We also show that inferences can be substantially improved by allowing for a small genotyping error rate to accommodate the distortion in assumed Mendelian inheritance of the converted markers when large sibship groups are involved. The results are discussed in the context of polyploid data analysis in molecular ecology. 相似文献
2.
Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing
下载免费PDF全文
![点击此处可从《Molecular ecology》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Many plants and animals of polyploid origin are currently enjoying a genomics explosion enabled by modern sequencing and genotyping technologies. However, routine filtering of duplicated loci in most studies using genotyping by sequencing introduces an unacceptable, but often overlooked, bias when detecting selection. Retained duplicates from ancient whole‐genome duplications (WGDs) may be found throughout genomes, whereas retained duplicates from recent WGDs are concentrated at distal ends of some chromosome arms. Additionally, segmental duplicates can be found at distal ends or nearly anywhere in a genome. Evidence shows that these duplications facilitate adaptation through one of two pathways: neo‐functionalization or increased gene expression. Filtering duplicates removes distal ends of some chromosomes, and distal ends are especially known to harbour adaptively important genes. Thus, filtering of duplicated loci impoverishes the interpretation of genomic data as signals from contiguous duplicated genes are ignored. We review existing strategies to genotype and map duplicated loci; we focus in detail on an overlooked strategy of using gynogenetic haploids (1N) as a part of new genotyping by sequencing studies. We provide guidelines on how to use this haploid strategy for studies on polyploid‐origin vertebrates including how it can be used to screen duplicated loci in natural populations. We conclude by discussing areas of research that will benefit from better inclusion of polyploid loci; we particularly stress the sometimes overlooked fact that basing genomic studies on dense maps provides value added in the form of locating and annotating outlier loci or colocating outliers into islands of divergence. 相似文献
3.
4.
Ivone de Bem Oliveira Rodrigo Rampazo Amadeu Luis Felipe Ventorim Ferro Patricio R. Muoz 《Heredity》2020,125(6):437
Blueberry (Vaccinium spp.) is an important autopolyploid crop with significant benefits for human health. Apart from its genetic complexity, the feasibility of genomic prediction has been proven for blueberry, enabling a reduction in the breeding cycle time and increasing genetic gain. However, as for other polyploid crops, sequencing costs still hinder the implementation of genome-based breeding methods for blueberry. This motivated us to evaluate the effect of training population sizes and composition, as well as the impact of marker density and sequencing depth on phenotype prediction for the species. For this, data from a large real breeding population of 1804 individuals were used. Genotypic data from 86,930 markers and three traits with different genetic architecture (fruit firmness, fruit weight, and total yield) were evaluated. Herein, we suggested that marker density, sequencing depth, and training population size can be substantially reduced with no significant impact on model accuracy. Our results can help guide decisions toward resource allocation (e.g., genotyping and phenotyping) in order to maximize prediction accuracy. These findings have the potential to allow for a faster and more accurate release of varieties with a substantial reduction of resources for the application of genomic prediction in blueberry. We anticipate that the benefits and pipeline described in our study can be applied to optimize genomic prediction for other diploid and polyploid species. 相似文献
5.
DNA extracted from hair or faeces shows increasing promise for censusing populations whose individuals are difficult to locate. To date, the main problem with this approach has been that genotyping errors are common. If these errors are not identified, counting genotypes is likely to overestimate the number of individuals in a population. Here, we describe an algorithm that uses maximum likelihood estimates of genotyping error rates to calculate the evidence that samples came from the same individual. We test this algorithm with a hypothetical model of genotyping error and show that this algorithm works well with substantial rates of genotyping error and reasonable amounts of data. Additional work is necessary to develop statistical models of error in empirical data. 相似文献
6.
A genotype calling algorithm for the Illumina BeadArray platform 总被引:2,自引:0,他引:2
Teo YY Inouye M Small KS Gwilliam R Deloukas P Kwiatkowski DP Clark TG 《Bioinformatics (Oxford, England)》2007,23(20):2741-2746
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors. 相似文献
7.
Bevan E. Huang Marco Maccaferri Silvio Salvi Sara G. Milner Luigi Cattivelli Anna M. Mastrangelo Alex Whan Stuart Stephen Gary Barker Ralf Wieseke Joerg Plieske International Wheat Genome Sequencing Consortium Morten Lillemo Diane Mather Rudi Appels Rudy Dolferus Gina Brown‐Guedira Abraham Korol Alina R. Akhunova Catherine Feuillet Jerome Salse Michele Morgante Curtis Pozniak Ming‐Cheng Luo Jan Dvorak Matthew Morell Jorge Dubcovsky Martin Ganal Roberto Tuberosa Cindy Lawley Ivan Mikoulitch Colin Cavanagh Keith J. Edwards Matthew Hayden Eduard Akhunov 《Plant biotechnology journal》2014,12(6):787-796
High‐density single nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships between individuals in populations and studying marker–trait associations in mapping experiments. We developed a genotyping array including about 90 000 gene‐associated SNPs and used it to characterize genetic variation in allohexaploid and allotetraploid wheat populations. The array includes a significant fraction of common genome‐wide distributed SNPs that are represented in populations of diverse geographical origin. We used density‐based spatial clustering algorithms to enable high‐throughput genotype calling in complex data sets obtained for polyploid wheat. We show that these model‐free clustering algorithms provide accurate genotype calling in the presence of multiple clusters including clusters with low signal intensity resulting from significant sequence divergence at the target SNP site or gene deletions. Assays that detect low‐intensity clusters can provide insight into the distribution of presence–absence variation (PAV) in wheat populations. A total of 46 977 SNPs from the wheat 90K array were genetically mapped using a combination of eight mapping populations. The developed array and cluster identification algorithms provide an opportunity to infer detailed haplotype structure in polyploid wheat and will serve as an invaluable resource for diversity studies and investigating the genetic basis of trait variation in wheat. 相似文献
8.
Relatedness between individuals is central to ecological genetics. Multiple methods are available to quantify relatedness from molecular data, including method-of-moment and maximum-likelihood estimators. We describe a maximum-likelihood estimator for autopolyploids, and quantify its statistical performance under a range of biologically relevant conditions. The statistical performances of five additional polyploid estimators of relatedness were also quantified under identical conditions. When comparing truncated estimators, the maximum-likelihood estimator exhibited lower root mean square error under some conditions and was more biased for non-relatives, especially when the number of alleles per loci was low. However, even under these conditions, this bias was reduced to be statistically insignificant with more robust genetic sampling. We also considered ambiguity in polyploid heterozygote genotyping and developed a weighting methodology for candidate genotypes. The statistical performances of three polyploid estimators under both ideal and actual conditions (including inbreeding and double reduction) were compared. The software package POLYRELATEDNESS is available to perform this estimation and supports a maximum ploidy of eight. 相似文献
9.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology. 相似文献
10.
High-resolution melting analysis for SNP genotyping and mapping in tetraploid alfalfa (Medicago sativa L.) 总被引:1,自引:0,他引:1
Han Y Khu DM Monteros MJ 《Molecular breeding : new strategies in plant improvement》2012,29(2):489-501
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic polymorphism in plant genomes. SNP markers
are valuable tools for genetic analysis of complex traits of agronomic importance, linkage and association mapping, genome-wide
selection, map-based cloning, and marker-assisted selection. Current challenges for SNP genotyping in polyploid outcrossing
species include multiple alleles per loci and lack of high-throughput methods suitable for variant detection. In this study,
we report on a high-resolution melting (HRM) analysis system for SNP genotyping and mapping in outcrossing tetraploid genotypes.
The sensitivity and utility of this technology is demonstrated by identification of the parental genotypes and segregating
progeny in six alfalfa populations based on unique melting curve profiles due to differences in allelic composition at one
or multiple loci. HRM using a 384-well format is a fast, consistent, and efficient approach for SNP discovery and genotyping,
useful in polyploid species with uncharacterized genomes. Possible applications of this method include variation discovery,
analysis of candidate genes, genotyping for comparative and association mapping, and integration of genome-wide selection
in breeding programs. 相似文献
11.
A statistical framework for quantitative trait mapping 总被引:39,自引:0,他引:39
We describe a general statistical framework for the genetic analysis of quantitative trait data in inbred line crosses. Our main result is based on the observation that, by conditioning on the unobserved QTL genotypes, the problem can be split into two statistically independent and manageable parts. The first part involves only the relationship between the QTL and the phenotype. The second part involves only the location of the QTL in the genome. We developed a simple Monte Carlo algorithm to implement Bayesian QTL analysis. This algorithm simulates multiple versions of complete genotype information on a genomewide grid of locations using information in the marker genotype data. Weights are assigned to the simulated genotypes to capture information in the phenotype data. The weighted complete genotypes are used to approximate quantities needed for statistical inference of QTL locations and effect sizes. One advantage of this approach is that only the weights are recomputed as the analyst considers different candidate models. This device allows the analyst to focus on modeling and model comparisons. The proposed framework can accommodate multiple interacting QTL, nonnormal and multivariate phenotypes, covariates, missing genotype data, and genotyping errors in any type of inbred line cross. A software tool implementing this procedure is available. We demonstrate our approach to QTL analysis using data from a mouse backcross population that is segregating multiple interacting QTL associated with salt-induced hypertension. 相似文献
12.
Wensheng Zhu Anthony Y. C. Kuk Jianhua Guo 《Biometrical journal. Biometrische Zeitschrift》2009,51(4):644-658
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors. 相似文献
13.
Eduard Akhunov Charles Nicolet Jan Dvorak 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2009,119(3):507-517
Single nucleotide polymorphisms (SNPs) are indispensable in such applications as association mapping and construction of high-density
genetic maps. These applications usually require genotyping of thousands of SNPs in a large number of individuals. Although
a number of SNP genotyping assays are available, most of them are designed for SNP genotyping in diploid individuals. Here,
we demonstrate that the Illumina GoldenGate assay could be used for SNP genotyping of homozygous tetraploid and hexaploid
wheat lines. Genotyping reactions could be carried out directly on genomic DNA without the necessity of preliminary PCR amplification.
A total of 53 tetraploid and 38 hexaploid homozygous wheat lines were genotyped at 96 SNP loci. The genotyping error rate
estimated after removal of low-quality data was 0 and 1% for tetraploid and hexaploid wheat, respectively. Developed SNP genotyping
assays were shown to be useful for genotyping wheat cultivars. This study demonstrated that the GoldenGate assay is a very
efficient tool for high-throughput genotyping of polyploid wheat, opening new possibilities for the analysis of genetic variation
in wheat and dissection of genetic basis of complex traits using association mapping approach.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. 相似文献
14.
15.
Wei Liu Jinhui Ding Jesse Raphael Gibbs Sue Jane Wang John Hardy Andrew Singleton 《Molecular systems biology》2009,5(1)
Here we propose a simple statistical algorithm for rapidly scoring loci associated with disease or traits due to recessive mutations or deletions using genome‐wide single nucleotide polymorphism genotyping case–control data in unrelated individuals. This algorithm identifies loci by defining homozygous segments of the genome present at significantly different frequencies between cases and controls. We found that false positive loci could be effectively removed from the output of this procedure by applying different physical size thresholds for the homozygous segments. This procedure is then conducted iteratively using random sub‐datasets until the number of selected loci converges. We demonstrate this method in a publicly available data set for Alzheimer's disease and identify 26 candidate risk loci in the 22 autosomes. In this data set, these loci can explain 75% of the genetic risk variability of the disease. 相似文献
16.
Oilseed rape: learning about ancient and recent polyploid evolution from a recent crop species
下载免费PDF全文
![点击此处可从《Plant biology (Stuttgart, Germany)》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Oilseed rape (Brassica napus) is one of our youngest crop species, arising several times under cultivation in the last few thousand years and completely unknown in the wild. Oilseed rape originated from hybridisation events between progenitor diploid species B. rapa and B. oleracea, both important vegetable species. The diploid progenitors are also ancient polyploids, with remnants of two previous polyploidisation events evident in the triplicated genome structure. This history of polyploid evolution and human agricultural selection makes B. napus an excellent model with which to investigate processes of genomic evolution and selection in polyploid crops. The ease of de novo interspecific hybridisation, responsiveness to tissue culture, and the close relationship of oilseed rape to the model plant Arabidopsis thaliana, coupled with the recent availability of reference genome sequences and suites of molecular cytogenetic and high‐throughput genotyping tools, allow detailed dissection of genetic, genomic and phenotypic interactions in this crop. In this review we discuss the past and present uses of B. napus as a model for polyploid speciation and evolution in crop species, along with current and developing analysis tools and resources. We further outline unanswered questions that may now be tractable to investigation. 相似文献
17.
Giancola S McKhann HI Bérard A Camilleri C Durand S Libeau P Roux F Reboud X Gut IG Brunel D 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2006,112(6):1115-1124
The application of high-throughput SNP genotyping is a great challenge for many research projects in the plant genetics domain. The GOOD assay for mass spectrometry, Amplifluor and TaqMan are three methods that rely on different principles for allele discrimination and detection, specifically, primer extension, allele-specific PCR and hybridization, respectively. First, with the goal of assessing allele frequencies by means of SNP genotyping, we compared these methods on a set of three SNPs present in the herbicide resistance genes CSR, AXR1 and IXR1 of Arabidopsis thaliana. In this comparison, we obtained the best results with TaqMan based on PCR specificity, flexibility in primer design and success rate. We also used mass spectrometry for genotyping polyploid species. Finally, a combination of the three methods was used for medium- to high-throughput genotyping in a number of different plant species. Here, we show that all three genotyping technologies are successful in discriminating alleles in various plant species and discuss the factors that must be considered in assessing which method to use for a given application. 相似文献
18.
A new statistical method for haplotype reconstruction from population data 总被引:151,自引:0,他引:151
下载免费PDF全文
![点击此处可从《American journal of human genetics》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by > 50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources. 相似文献
19.
Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy–Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets imputed under our model can be investigated in additional subsequent analyses, our method will be useful for preparing data for applications in diverse contexts in population genetics and molecular ecology. 相似文献