首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Haplotype inference in random population samples   总被引:16,自引:0,他引:16       下载免费PDF全文
Contemporary genotyping and sequencing methods do not provide information on linkage phase in diploid organisms. The application of statistical methods to infer and reconstruct linkage phase in samples of diploid sequences is a potentially time- and labor-saving method. The Stephens-Smith-Donnelly (SSD) algorithm is one such method, which incorporates concepts from population genetics theory in a Markov chain-Monte Carlo technique. We applied a modified SSD method, as well as the expectation-maximization and partition-ligation algorithms, to sequence data from eight loci spanning >1 Mb on the human X chromosome. We demonstrate that the accuracy of the modified SSD method is better than that of the other algorithms and is superior in terms of the number of sites that may be processed. Also, we find phase reconstructions by the modified SSD method to be highly accurate over regions with high linkage disequilibrium (LD). If only polymorphisms with a minor allele frequency >0.2 are analyzed and scored according to the fraction of neighbor relations correctly called, reconstructions are 95.2% accurate over entire 100-kb stretches and are 98.6% accurate within blocks of high LD.  相似文献   

2.
Haplotype inference by maximum parsimony   总被引:5,自引:0,他引:5  
MOTIVATION: Haplotypes have been attracting increasing attention because of their importance in analysis of many fine-scale molecular-genetics data. Since direct sequencing of haplotype via experimental methods is both time-consuming and expensive, haplotype inference methods that infer haplotypes based on genotype samples become attractive alternatives. RESULTS: (1) We design and implement an algorithm for an important computational model of haplotype inference that has been suggested before in several places. The model finds a set of minimum number of haplotypes that explains the genotype samples. (2) Strong supports of this computational model are given based on the computational results on both real data and simulation data. (3) We also did some comparative study to show the strength and weakness of this computational model using our program. AVAILABILITY: The software HAPAR is free for non-commercial uses. Available upon request (lwang@cs.cityu.edu.hk).  相似文献   

3.
The common assumption in quantitative trait locus (QTL) linkage mapping studies that parents of multiple connected populations are unrelated is unrealistic for many plant breeding programs. We remove this assumption and propose a Bayesian approach that clusters the alleles of the parents of the current mapping populations from locus-specific identity by descent (IBD) matrices that capture ancestral marker and pedigree information. Moreover, we demonstrate how the parental IBD data can be incorporated into a QTL linkage analysis framework by using two approaches: a Threshold IBD model (TIBD) and a Latent Ancestral Allele Model (LAAM). The TIBD and LAAM models are empirically tested via numerical simulation based on the structure of a commercial maize breeding program. The simulations included a pilot dataset with closely linked QTL on a single linkage group and 100 replicated datasets with five linkage groups harboring four unlinked QTL. The simulation results show that including parental IBD data (similarly for TIBD and LAAM) significantly improves the power and particularly accuracy of QTL mapping, e.g., position, effect size and individuals’ genotype probability without significantly increasing computational demand.  相似文献   

4.
The inference of haplotype pairs directly from unphased genotype data is a key step in the analysis of genetic variation in relation to disease and pharmacogenetically relevant traits. Most popular methods such as Phase and PL do require either the coalescence assumption or the assumption of linkage between the single-nucleotide polymorphisms (SNPs). We have now developed novel approaches that are independent of these assumptions. First, we introduce a new optimization criterion in combination with a block-wise evolutionary Monte Carlo algorithm. Based on this criterion, the 'haplotype likelihood', we develop two kinds of estimators, the maximum haplotype-likelihood (MHL) estimator and its empirical Bayesian (EB) version. Using both real and simulated data sets, we demonstrate that our proposed estimators allow substantial improvements over both the expectation-maximization (EM) algorithm and Clark's procedure in terms of capacity/scalability and error rate. Thus, hundreds and more ambiguous loci and potentially very large sample sizes can be processed. Moreover, applying our proposed EB estimator can result in significant reductions of error rate in the case of unlinked or only weakly linked SNPs.  相似文献   

5.
Albers CA  Heskes T  Kappen HJ 《Genetics》2007,177(2):1101-1116
We present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities. We focused on single-nucleotide polymorphism (SNP) markers in the evaluation of our approach. In simulated data sets where exact computation was feasible, we found that the accuracy of CVMHAPLO was high and similar to that of maximum-likelihood methods. In simulated data sets where exact computation of the maximum-likelihood haplotype configuration was not feasible, the accuracy of CVMHAPLO was similar to that of state of the art Markov chain Monte Carlo (MCMC) maximum-likelihood approximations when all ordered genotypes were assigned and higher when only a subset of the ordered genotypes was assigned. CVMHAPLO was faster than the MCMC approach and provided more detailed information about the uncertainty in the inferred haplotypes. We conclude that CVMHAPLO is a practical tool for the inference of haplotypes in large complex pedigrees.  相似文献   

6.
In diploid species, many multiparental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase the power of QTL detection in comparison with the marker-based association analysis. However, such software tools for polyploid species are few and limited to a single biparental F1 population. In this study, a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents, regardless of the number of parents or mating design. Given a genetic or physical map of markers, PolyOrigin first phases parental genotypes, then refines the input marker map, and finally reconstructs offspring haplotypes. PolyOrigin can utilize single nucleotide polymorphism (SNP) data coming from arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype calling errors at low depth. With extensive simulation we show that PolyOrigin is robust to the errors in the input genotypic data and marker map. It works well for various population designs with 30 offspring per parent and for sequences with read depth as low as 10x. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3 × 3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.  相似文献   

7.
Haplotypes, as they specify the linkage patterns between dispersed genetic variations, provide important information for understanding the genetics of human traits. However, haplotypes are not directly obtainable from current genotyping platforms, which pushes extensive investigations of computational methods to recover such information. Two major computational challenges arising in current family-based disease studies are large family sizes and many ungenotyped family members. Traditional haplotyping methods can neither handle large families nor families with missing members. In this article, we propose a method that addresses these issues by integrating multiple novel techniques. The method consists of three major components: pairwise identical-by-descent (IBD) inference, global IBD reconstruction, and haplotype restoring. By reconstructing the global IBD of a family from pairwise IBD and then restoring the haplotypes based on the inferred IBD, this method can scale to large pedigrees, and more importantly it can handle families with missing members. Compared with existing approaches, this method demonstrates much higher power to recover haplotype information, especially in families with many untyped individuals. Availability: http://sites.google.com/site/xinlishomepage/pedibd.  相似文献   

8.
Efficient inference of haplotypes from genotypes on a pedigree   总被引:1,自引:0,他引:1  
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.  相似文献   

9.
Analysis of haplotypes based on multiple single-nucleotide polymorphisms (SNP) is becoming common for both candidate gene and fine-mapping studies. Before embarking on studies of haplotypes from genetically distinct populations, however, it is important to consider variation both in linkage disequilibrium (LD) and in haplotype frequencies within and across populations, as both vary. Such diversity will influence the choice of "tagging" SNPs for candidate gene or whole-genome association studies because some markers will not be polymorphic in all samples and some haplotypes will be poorly represented or completely absent. Here we analyze 11 genes, originally chosen as candidate genes for oral clefts, where multiple markers were genotyped on individuals from four populations. Estimated haplotype frequencies, measures of pairwise LD, and genetic diversity were computed for 135 European-Americans, 57 Chinese-Singaporeans, 45 Malay-Singaporeans, and 46 Indian-Singaporeans. Patterns of pairwise LD were compared across these four populations and haplotype frequencies were used to assess genetic variation. Although these populations are fairly similar in allele frequencies and overall patterns of LD, both haplotype frequencies and genetic diversity varied significantly across populations. Such haplotype diversity has implications for designing studies of association involving samples from genetically distinct populations.  相似文献   

10.
Management of certain populations requires the preservation of its pure genetic background. When, for different reasons, undesired alleles are introduced, the original genetic conformation must be recovered. The present study tested, through computer simulations, the power of recovery (the ability for removing the foreign information) from genealogical data. Simulated scenarios comprised different numbers of exogenous individuals taking part of the founder population and different numbers of unmanaged generations before the removal program started. Strategies were based on variables arising from classical pedigree analyses such as founders’ contribution and partial coancestry. The efficiency of the different strategies was measured as the proportion of native genetic information remaining in the population. Consequences on the inbreeding and coancestry levels of the population were also evaluated. Minimisation of the exogenous founders’ contributions was the most powerful method, removing the largest amount of genetic information in just one generation. However, as a side effect, it led to the highest values of inbreeding. Scenarios with a large amount of initial exogenous alleles (i.e. high percentage of non native founders), or many generations of mixing became very difficult to recover, pointing out the importance of being careful about introgression events in populations where these are undesired.  相似文献   

11.
In a genetic analysis of a polymorphic system, differences between the observed type of an individual and that expected from the parental types can arise either from an incorrect model or from pedigree errors. Such pedigree errors can cause severe difficulties in studies of the mode of inheritance of a novel polymorphic system. A method is proposed which overcomes the problem by including sire and dam error rates explicitly in the genetic model. The error rates are estimated by maximum likelihood, and likelihood ratio tests used to compare different models or estimates from different data sets. The proposals are applied to a study of the inheritance of the bovine serum AmI amylases.  相似文献   

12.

Background

The accuracy of genomic prediction depends largely on the number of animals with phenotypes and genotypes. In some industries, such as sheep and beef cattle, data are often available from a mixture of breeds, multiple strains within a breed or from crossbred animals. The objective of this study was to compare the accuracy of genomic prediction for several economically important traits in sheep when using data from purebreds, crossbreds or a combination of those in a reference population.

Methods

The reference populations were purebred Merinos, crossbreds of Border Leicester (BL), Poll Dorset (PD) or White Suffolk (WS) with Merinos and combinations of purebred and crossbred animals. Genomic breeding values (GBV) were calculated based on genomic best linear unbiased prediction (GBLUP), using a genomic relationship matrix calculated based on 48 599 Ovine SNP (single nucleotide polymorphisms) genotypes. The accuracy of GBV was assessed in a group of purebred industry sires based on the correlation coefficient between GBV and accurate estimated breeding values based on progeny records.

Results

The accuracy of GBV for Merino sires increased with a larger purebred Merino reference population, but decreased when a large purebred Merino reference population was augmented with records from crossbred animals. The GBV accuracy for BL, PD and WS breeds based on crossbred data was the same or tended to decrease when more purebred Merinos were added to the crossbred reference population. The prediction accuracy for a particular breed was close to zero when the reference population did not contain any haplotypes of the target breed, except for some low accuracies that were obtained when predicting PD from WS and vice versa.

Conclusions

This study demonstrates that crossbred animals can be used for genomic prediction of purebred animals using 50 k SNP marker density and GBLUP, but crossbred data provided lower accuracy than purebred data. Including data from distant breeds in a reference population had a neutral to slightly negative effect on the accuracy of genomic prediction. Accounting for differences in marker allele frequencies between breeds had only a small effect on the accuracy of genomic prediction from crossbred or combined crossbred and purebred reference populations.  相似文献   

13.
Polymorphism of mtDNA was examined in five ethnic populations that belong to the Turkic language group and inhabit the territory of the Altai-Sayan upland (N = 1007). Most of the haplogroups identified in the examined populations belonged to East Eurasian lineages. In all five populations, only three haplogroups, C, D, and F, were prevailing. The frequencies of the other six haplogroups (A, B, G, M, Y, and Z) varied in the range from 1.1 to 6.5%. Among West Eurasian haplogrous, the most common were haplogroups H, J, T, and U. An analysis of Y-chromosome haplogroups in 407 individuals showed that only two haplogroups, N* and R1a1, were present in all five populations examined. Moreover, in different ethnic groups, the highest frequencies were observed for C-M130, N-P43, and N-Tat haplogroups. The differences in the distribution patterns of ancient West Eurasian and East Eurasian haplotypes from Gorny Altai in the present-day populations from the northern part of Eurasia revealed can be explained in terms of the multistage expansion of humans across these territories. The ubiquity of haplotypes from haplogroup H and cluster U across the wide territory from the Yenisei River basin to the Atlantic Ocean can indicate directional human expansion, which most likely occurred out of Central Asia as early as in the Paleolithic era, and took place in several waves with the glacier retreat.  相似文献   

14.
To overcome limitations of diversity measures applied to livestock breeds marker based estimations of kinship within and between populations were proposed. This concept was extended from the single locus consideration to chromosomal segments of a given length in Morgan. Algorithms for the derivation of haplotype kinship were suggested and the behaviour of marker based haplotype kinship was investigated theoretically. In the present study the results of the first practical application of this concept are presented. Full sib pairs of three sub-populations of the Goettingen minipig were genotyped for six chromosome segments. After haplotype reconstruction the haplotypes were compared and mean haplotype kinships were estimated within and between populations. Based on haplotype kinships a distance measure is proposed which is approximatively linear with the number of generations since fission. The haplotype kinship distances, the respective standard errors and the pedigree-based expected values are presented and are shown to reflect the true population history better than distances based on single-locus kinships. However the marker estimated haplotype kinship reveals variable among segments. This leads to high standard errors of the respective distances. Possible reasons for this phenomenon are discussed and a pedigree-based approach to correct for identical haplotypes which are not identical by descent is proposed.  相似文献   

15.
Knowledge of the parentage of individuals is required to address a variety of questions concerning the evolutionary dynamics of wild populations. A major advance in parentage inference in natural populations has been the use of molecular markers and the development of statistical methods to analyse these data. Cervus, one of the most widely used parentage inference programs, uses molecular data to determine parent–offspring relationships. However, Cervus does not make use of all available information: additional phenotypic information may exist predicting parent–offspring relationships, and additional genetic information may be exploited by simultaneously considering multiple types of relationships rather than just pairwise or just parent–offspring relationships. Here we reanalyse data from a wild red deer population using two programs capable of using this additional information, MasterBayes and COLONY2, and quantify the impact of these alternative approaches by comparison with a ‘known pedigree’ estimated using a larger suite of microsatellite makers for a subset of the population. The use of phenotypic information and multiple relationships increased the number of correct assignments. We highlight the differences between programs, particularly the use of population‐ rather than individual‐level statistical confidence in Cervus. We conclude that the use of additional information allows MasterBayes and COLONY2 to assign more correct paternities, whereas their use of individual‐ rather than population‐level confidence generates fewer erroneous assignments. We suggest that maximal information may be gained by combining outputs from different programs. Higher accuracy and completeness of pedigree information will improve parameters estimated from pedigree information in studies of natural populations.  相似文献   

16.
Baruch E  Weller JI  Cohen-Zinder M  Ron M  Seroussi E 《Genetics》2006,172(3):1757-1765
We present a simple algorithm for reconstruction of haplotypes from a sample of multilocus genotypes. The algorithm is aimed specifically for analysis of very large pedigrees for small chromosomal segments, where recombination frequency within the chromosomal segment can be assumed to be zero. The algorithm was tested both on simulated pedigrees of 155 individuals in a family structure of three generations and on real data of 1149 animals from the Israeli Holstein dairy cattle population, including 406 bulls with genotypes, but no females with genotypes. The rate of haplotype resolution for the simulated data was >91% with a standard deviation of 2%. With 20% missing data, the rate of haplotype resolution was 67.5% with a standard deviation of 1.3%. In both cases all recovered haplotypes were correct. In the real data, allele origin was resolved for 22% of the heterozygous genotypes, even though 70% of the genotypes were missing. Haplotypes were resolved for 36% of the males. Computing time was insignificant for both data sets. Despite the intricacy of large-scale real pedigree genotypes, the proposed algorithm provides a practical rule-based solution for resolving haplotypes for small chromosomal segments in commercial animal populations.  相似文献   

17.
18.
Usually, genetic correlations are estimated from breeding designs in the laboratory or greenhouse. However, estimates of the genetic correlation for natural populations are lacking, mostly because pedigrees of wild individuals are rarely known. Recently Lynch (1999) proposed a formula to estimate the genetic correlation in the absence of data on pedigree. This method has been shown to be particularly accurate provided a large sample size and a minimum (20%) proportion of relatives. Lynch (1999) proposed the use of the bootstrap to estimate standard errors associated with genetic correlations, but did not test the reliability of such a method. We tested the bootstrap and showed the jackknife can provide valid estimates of the genetic correlation calculated with the Lynch formula. The occurrence of undefined estimates, combined with the high number of replicates involved in the bootstrap, means there is a high probability of obtaining a biased upward, incomplete bootstrap, even when there is a high fraction of related pairs in a sample. It is easier to obtain complete jackknife estimates for which all the pseudovalues have been defined. We therefore recommend the use of the jackknife to estimate the genetic correlation with the Lynch formula. Provided data can be collected for more than two individuals at each location, we propose a group sampling method that produces low standard errors associated with the jackknife, even when there is a low fraction of relatives in a sample.  相似文献   

19.
The multitrait detections of QTL applied to a mixture of full- and half-sib families require specific strategies. Indeed, the number of parameters estimated by the multivariate methods is excessive compared with the size of the population. Thus, only multitrait methods based on a univariate analysis of a linear combination (LC) of the traits can be extensively performed. We compared three strategies to obtain the LC of the traits. Two linear transformations were performed on the overall population. The last one was performed within each half-sib family. Their powers were compared on simulated data depending on the frequency of the two QTL alleles in each of the grand parental populations of an intercross design. The transformations from the whole population did not lead to a large loss of power even though the frequency of the QTL alleles was similar in the two grand parental populations. In these cases, applying the within-sire family transformation improved the detection when the number of progeny per sire was greater than 100.  相似文献   

20.
Pedigree data are useful for a wealth of research purposes in human population biology and genetics. The collection of extended pedigrees represents the most powerful sampling design for quantitative genetic and linkage studies of both normal and disease-related quantitative traits. In this paper we outline an approach for collecting pedigree data in stable isolate populations. As an example, the pedigree for the Jirel population, which was obtained using the methods presented, is described. The Jirel pedigree contains 2,000 study participants and more than 62,000 pairwise relationships that are informative for genetic analysis. Once such pedigrees are genetically characterized by a genome scan for a given trait, they become an invaluable resource for future genetic studies of any quantitative trait.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号