首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multipoint quantitative-trait linkage analysis in general pedigrees.   总被引:49,自引:12,他引:37       下载免费PDF全文
Multipoint linkage analysis of quantitative-trait loci (QTLs) has previously been restricted to sibships and small pedigrees. In this article, we show how variance-component linkage methods can be used in pedigrees of arbitrary size and complexity, and we develop a general framework for multipoint identity-by-descent (IBD) probability calculations. We extend the sib-pair multipoint mapping approach of Fulker et al. to general relative pairs. This multipoint IBD method uses the proportion of alleles shared identical by descent at genotyped loci to estimate IBD sharing at arbitrary points along a chromosome for each relative pair. We have derived correlations in IBD sharing as a function of chromosomal distance for relative pairs in general pedigrees and provide a simple framework whereby these correlations can be easily obtained for any relative pair related by a single line of descent or by multiple independent lines of descent. Once calculated, the multipoint relative-pair IBDs can be utilized in variance-component linkage analysis, which considers the likelihood of the entire pedigree jointly. Examples are given that use simulated data, demonstrating both the accuracy of QTL localization and the increase in power provided by multipoint analysis with 5-, 10-, and 20-cM marker maps. The general pedigree variance component and IBD estimation methods have been implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package.  相似文献   

2.
We propose an analytical approximation method for the estimation of multipoint identity by descent (IBD) probabilities in pedigrees containing a moderate number of distantly related individuals. We show that in large pedigrees where cases are related through untyped ancestors only, it is possible to formulate the hidden Markov model of the Lander-Green algorithm in terms of the IBD configurations of the cases. We use a first-order Markov approximation to model the changes in this IBD-configuration variable along the chromosome. In simulated and real data sets, we demonstrate that estimates of parametric and nonparametric linkage statistics based on the first-order Markov approximation are accurate. The computation time is exponential in the number of cases instead of in the number of meioses separating the cases. We have implemented our approach in the computer program ALADIN (accurate linkage analysis of distantly related individuals). ALADIN can be applied to general pedigrees and marker types and has the ability to model marker-marker linkage disequilibrium with a clustered-markers approach. Using ALADIN is straightforward: It requires no parameters to be specified and accepts standard input files.  相似文献   

3.
We present here four nonparametric statistics for linkage analysis that test whether pairs of affected relatives share marker alleles more often than expected. These statistics are based on simulating the null distribution of a given statistic conditional on the unaffecteds' marker genotypes. Each statistic uses a different measure of marker sharing: the SimAPM statistic uses the simulation-based affected-pedigree-member measure based on identity-by-state (IBS) sharing. The SimKIN (kinship) measure is 1.0 for identity-by-descent (IBD) sharing, 0.0 for no IBD status sharing, and the kinship coefficient when the IBD status is ambiguous. The simulation-based IBD (SimIBD) statistic uses a recursive algorithm to determine the probability of two affecteds sharing a specific allele IBD. The SimISO statistic is identical to SimIBD, except that it also measures marker similarity between unaffected pairs. We evaluated our statistics on data simulated under different two-locus disease models, comparing our results to those obtained with several other nonparametric statistics. Use of IBD information produces dramatic increases in power over the SimAPM method, which uses only IBS information. The power of our best statistic in most cases meets or exceeds the power of the other nonparametric statistics. Furthermore, our statistics perform comparisons between all affected relative pairs within general pedigrees and are not restricted to sib pairs or nuclear families.  相似文献   

4.
Computations for genome scans need to adapt to the increasing use of dense diallelic markers as well as of full-chromosome multipoint linkage analysis with either diallelic or multiallelic markers. Whereas suitable exact-computation tools are available for use with small pedigrees, equivalent exact computation for larger pedigrees remains infeasible. Markov chain-Monte Carlo (MCMC)-based methods currently provide the only computationally practical option. To date, no systematic comparison of the performance of MCMC-based programs is available, nor have these programs been systematically evaluated for use with dense diallelic markers. Using simulated data, we evaluate the performance of two MCMC-based linkage-analysis programs--lm_markers from the MORGAN package and SimWalk2--under a variety of analysis conditions. Pedigrees consisted of 14, 52, or 98 individuals in 3, 5, or 6 generations, respectively, with increasing amounts of missing data in larger pedigrees. One hundred replicates of markers and trait data were simulated on a 100-cM chromosome, with up to 10 multiallelic and up to 200 diallelic markers used simultaneously for computation of multipoint LOD scores. Exact computation was available for comparison in most situations, and comparison with a perfectly informative marker or interprogram comparison was available in the remaining situations. Our results confirm the accuracy of both programs in multipoint analysis with multiallelic markers on pedigrees of varied sizes and missing-data patterns, but there are some computational differences. In contrast, for large numbers of dense diallelic markers, only the lm_markers program was able to provide accurate results within a computationally practical time. Thus, programs in the MORGAN package are the first available to provide a computationally practical option for accurate linkage analyses in genome scans with both large numbers of diallelic markers and large pedigrees.  相似文献   

5.
GENEHUNTER and SimWalk2 are among the most commonly used software for parametric multipoint linkage analysis. In the context of extended kindred analysis, GENEHUNTER has a limitation in terms of the number of individuals it can handle. One solution is to manually split the kindred into smaller pedigrees. SimWalk2 can handle a much larger number of individuals. However, its major drawback is the time it takes to process the data when compared to GENEHUNTER. Aside from the limitations of each program, when studying extended kindreds researchers are typically confronted with missing data. In this work we used simulated genotype data based on the structure of a real extended pedigree in order to compare the results obtained through GENEHUNTER and SimWalk2, evaluate the effect of discarding individuals and splitting the kindred on the logarithm of odds (lod) score, and to assess how missing data affect the performance of each program. Our results show that (1) for pedigrees of a moderate size, GENEHUNTER and SimWalk2 produce nearly the same results; (2) when using GENEHUNTER, either splitting the kindred into smaller sub-pedigrees or discarding individuals has an adverse effect when compared to the results obtained when using SimWalk2 with the whole pedigree; and (3) the performance of both programs is qualitatively similar in the missing data scenario. These conclusions are based on the sample distributions of the lod score values and of the estimates of the recombination fraction.  相似文献   

6.
Single-nucleotide polymorphisms (SNPs) are rapidly replacing microsatellites as the markers of choice for genetic linkage studies and many other studies of human pedigrees. Here, we describe an efficient approach for modeling linkage disequilibrium (LD) between markers during multipoint analysis of human pedigrees. Using a gene-counting algorithm suitable for pedigree data, our approach enables rapid estimation of allele and haplotype frequencies within clusters of tightly linked markers. In addition, with the use of a hidden Markov model, our approach allows for multipoint pedigree analysis with large numbers of SNP markers organized into clusters of markers in LD. Simulation results show that our approach resolves previously described biases in multipoint linkage analysis with SNPs that are in LD. An updated version of the freely available Merlin software package uses the approach described here to perform many common pedigree analyses, including haplotyping and haplotype frequency estimation, parametric and nonparametric multipoint linkage analysis of discrete traits, variance-components and regression-based analysis of quantitative traits, calculation of identity-by-descent or kinship coefficients, and case selection for follow-up association studies. To illustrate the possibilities, we examine a data set that provides evidence of linkage of psoriasis to chromosome 17.  相似文献   

7.
Gao G  Hoeschele I 《Genetics》2005,171(1):365-376
Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.  相似文献   

8.
Haplotyping in pedigrees provides valuable information for genetic studies (e.g., linkage analysis and association study). In order to identify a set of haplotype configurations with the highest likelihoods for a large pedigree with a large number of linked loci, in our previous work, we proposed a conditional enumeration haplotyping method which sets a threshold for the conditional probabilities of the possible ordered genotypes at every unordered individual-marker to delete some ordered genotypes with low conditional probabilities and then eliminate some haplotype configurations with low likelihoods. In this article we present a rapid haplotyping algorithm based on a modification of our previous method by setting an additional threshold for the ratio of the conditional probability of a haplotype configuration to the largest conditional probability of all haplotype configurations in order to eliminate those configurations with relatively low conditional probabilities. The new algorithm is much more efficient than our previous method and the widely used software SimWalk2.  相似文献   

9.
In this paper, we present a unified mathematical model for linkage analysis that allows for inbreeding among founders in all families. The identical by descent (IBD) configuration of each pedigree is modeled as a Markov process containing two parameters; the inverse inbreeding and kinship coefficient and a rate parameter proportional to the inverse expected length of chromosome segments shared IBD by two different founder haplotypes. We use hidden Markov models and define a forward-backward algorithm for computing the conditional IBD-distribution given marker data, thereby extending the multipoint method of Lander and Green [1987. Construction of multilocus genetic maps in humans, Proc. Natl. Acad. Sci. USA 84, 2363-2367] to situations where founders are inbred. Our methodology is valid for arbitrary pedigree structures. Simulation and theoretical approximations for nonparametric linkage (NPL) analysis based on affected sib pairs reveal that NPL scores are inflated and type 1 errors increased when the inbreeding coefficient or rate parameter is underestimated. When the parents are genotyped, we present a general way of modifying the score function to drastically reduce this effect.  相似文献   

10.
Multipoint linkage analysis is commonly used to evaluate linkage of a disease to multiple markers in a small region. Multipoint analysis is particularly powerful when the IBD relations of family members at the trait locus are ambiguous. The increased power arises because, unlike single-marker analyses, multipoint analysis uses haplotype information from several markers to infer the IBD relations. We wish to temper this advantage with a cautionary note: multipoint analysis is sensitive to power loss due to misspecification of intermarker distances. Such misspecification is especially problematic when dealing with closely spaced markers. We present computer simulations comparing the power of single-point and multipoint analyses, both when IBD relations are ambiguous, and when the intermarker distances are misspecified. We conclude that when evaluating markers in a small region to confirm or refute previous findings, a situation in which p values of modest statistical significance are important, single marker analyses may provide more reliable measures of the strength of support for linkage than multipoint statistics.  相似文献   

11.
The calculation of multipoint likelihoods is computationally challenging, with the exact calculation of multipoint probabilities only possible on small pedigrees with many markers or large pedigrees with few markers. This paper explores the utility of calculating multipoint likelihoods using data on markers flanking a hypothesized position of the trait locus. The calculation of such likelihoods is often feasible, even on large pedigrees with missing data and complex structures. Performance characteristics of the flanking marker procedure are assessed through the calculation of multipoint heterogeneity LOD scores on data simulated for Genetic Analysis Workshop 14 (GAW14). Analysis is restricted to data on the Aipotu population on chromosomes 1, 3, and 4, where chromosomes 1 and 3 are known to contain disease loci. The flanking marker procedure performs well, even when missing data and genotyping errors are introduced.  相似文献   

12.
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum-recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our previous results show that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This paper presents an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branch-and-bound strategy that utilizes a partial order relationship and some other special relationships among variables to decide the branching order. Nontrivial lower and upper bounds on the optimal number of recombinants are introduced at each branching node to effectively prune the search tree. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. The paper also shows for the first time how to incorporate marker interval distance into a rule-based haplotyping algorithm. Our results on simulated data show that the algorithm could recover haplotypes with 50 loci from a pedigree of size 29 in seconds on a Pentium IV computer. Its accuracy is more than 99.8% for data with no missing alleles and 98.3% for data with 20% missing alleles in terms of correctly recovered phase information at each marker locus. A comparison with a statistical approach SimWalk2 on simulated data shows that the ILP algorithm runs much faster than SimWalk2 and reports better or comparable haplotypes on average than the first and second runs of SimWalk2. As an application of the algorithm to real data, we present some test results on reconstructing haplotypes from a genome-scale SNP dataset consisting of 12 pedigrees that have 0.8% to 14.5% missing alleles.  相似文献   

13.
相关个体基因型联合概率分布及在身份鉴定中的应用   总被引:1,自引:0,他引:1  
从联合父系基因概率出发,得出处在同一代的多个个体的联合基因型概率,讨论两种符合我国国情的家谱图,得到同一家族内第m代独生子女之间的联合基因型概率,相应的方法可用来求多个家族、多代独生子女之间的联合基因型概率.列举了两个案例来说明相关个体的联合基因型概率在身份鉴定中的应用.  相似文献   

14.
A fast, partly recursive deterministic method for calculating Identity-by-Descent (IBD) probabilities was developed with the objective of using IBD in Quantitative Trait Locus (QTL) mapping. The method combined a recursive method for a single marker locus with a method to estimate IBD between sibs using multiple markers. Simulated data was used to compare the deterministic method developed in the present paper with a stochastic method (LOKI) for precision in estimating IBD probabilities and performance in the task of QTL detection with the variance component approach. This comparison was made in a variety of situations by varying family size and degree of polymorphism among marker loci. The following were observed for the deterministic method relative to MCMC: (i) it was an order of magnitude faster; (ii) its estimates of IBD probabilities were found to agree closely, even though it does not extract information when haplotypes are not known with certainty; (iii) the shape of the profile for the QTL test statistic as a function of location was similar, although the magnitude of the test statistic was slightly smaller; and (iv) the estimates of QTL variance was similar. It was concluded that the method proposed provided a rapid means of calculating the IBD matrix with only a small loss in precision, making it an attractive alternative to the use of stochastic MCMC methods. Furthermore, developments in marker technology providing denser maps would enhance the relative advantage of this method.  相似文献   

15.
The accurate estimation of the probability of identity by descent (IBD) at loci or genome positions of interest is paramount to the genetic study of quantitative and disease resistance traits. We present a Monte Carlo Markov Chain method to compute IBD probabilities between individuals conditional on DNA markers and on pedigree information. The IBDs can be obtained in a completely general pedigree at any genome position of interest, and all marker and pedigree information available is used. The method can be split into two steps at each iteration. First, phases are sampled using current genotypic configurations of relatives and second, crossover events are simulated conditional on phases. Internal track is kept of all founder origins and crossovers such that the IBD probabilities averaged over replicates are rapidly obtained. We illustrate the method with some examples. First, we show that all pedigree information should be used to obtain line origin probabilities in F2 crosses. Second, the distribution of genetic relationships between half and full sibs is analysed in both simulated data and in real data from an F2 cross in pigs.  相似文献   

16.
An empirical comparison between three different methods for estimation of pair-wise identity-by-descent (IBD) sharing at marker loci was conducted in order to quantify the resulting differences in power and localization precision in variance components-based linkage analysis. On the examined simulated, error-free data set, it was found that an increase in accuracy of allele sharing calculation resulted in an increase in power to detect linkage. Linkage analysis based on approximate multi-marker IBD matrices computed by a Markov chain Monte Carlo approach was much more powerful than linkage analysis based on exact single-marker IBD probabilities. A "multiple two-point" approximation to true "multipoint" IBD computation was found to be roughly intermediate in power. Both multi-marker approaches were similar to each other in accuracy of localization of the quantitative trait locus and far superior to the single-marker approach. The overall conclusions of this study with respect to power are expected to also hold for different data structures and situations, even though the degree of superiority of one approach over another depends on the specific circumstances. It should be kept in mind, however, that an increase in computational accuracy is expected to go hand in hand with a decrease in robustness to various sources of errors.  相似文献   

17.
Abney M 《Genetics》2008,179(3):1577-1590
Computing identity-by-descent sharing between individuals connected through a large, complex pedigree is a computationally demanding task that often cannot be done using exact methods. What I present here is a rapid computational method for estimating, in large complex pedigrees, the probability that pairs of alleles are IBD given the single-point genotype data at that marker for all individuals. The method can be used on pedigrees of essentially arbitrary size and complexity without the need to divide the individuals into separate subpedigrees. I apply the method to do qualitative trait linkage mapping using the nonparametric sharing statistic S(pairs). The validity of the method is demonstrated via simulation studies on a 13-generation 3028-person pedigree with 700 genotyped individuals. An analysis of an asthma data set of individuals in this pedigree finds four loci with P-values <10(-3) that were not detected in prior analyses. The mapping method is fast and can complete analyses of approximately 150 affected individuals within this pedigree for thousands of markers in a matter of hours.  相似文献   

18.
Detection and Integration of Genotyping Errors in Statistical Genetics   总被引:15,自引:0,他引:15       下载免费PDF全文
Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second-and currently most useful-example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications.  相似文献   

19.
Stewart WC  Thompson EA 《Biometrics》2006,62(3):728-734
As a result of previous large, multipoint linkage studies there is a substantial amount of existing marker data. Due to the increased sample size, genetic maps estimated from these data could be more accurate than publicly available maps. However, current methods for map estimation are restricted to data sets containing pedigrees with a small number of individuals, or cannot make full use of marker data that are observed at several loci on members of large, extended pedigrees. In this article, a maximum likelihood (ML) method for map estimation that can make full use of the marker data in a large, multipoint linkage study is described. The method is applied to replicate sets of simulated marker data involving seven linked loci, and pedigree structures based on the real multipoint linkage study of Abkevich et al. (2003, American Journal of Human Genetics 73, 1271-1281). The variance of the ML estimate is accurately estimated, and tests of both simple and composite null hypotheses are performed. An efficient procedure for combining map estimates over data sets is also suggested.  相似文献   

20.
Meuwissen TH  Goddard ME 《Genetics》2007,176(4):2551-2560
A novel multipoint method, based on an approximate coalescence approach, to analyze multiple linked markers is presented. Unlike other approximate coalescence methods, it considers all markers simultaneously but only two haplotypes at a time. We demonstrate the use of this method for linkage disequilibrium (LD) mapping of QTL and estimation of effective population size. The method estimates identity-by-descent (IBD) probabilities between pairs of marker haplotypes. Both LD and combined linkage and LD mapping rely on such IBD probabilities. The method is approximate in that it considers only the information on a pair of haplotypes, whereas a full modeling of the coalescence process would simultaneously consider all haplotypes. However, full coalescence modeling is computationally feasible only for few linked markers. Using simulations of the coalescence process, the method is shown to give almost unbiased estimates of the effective population size. Compared to direct marker and haplotype association analyses, IBD-based QTL mapping showed clearly a higher power to detect a QTL and a more realistic confidence interval for its position. The modeling of LD could be extended to estimate other LD-related parameters such as recombination rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号