首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Thomas SC  Hill WG 《Genetics》2000,155(4):1961-1972
Previous techniques for estimating quantitative genetic parameters, such as heritability in populations where exact relationships are unknown but are instead inferred from marker genotypes, have used data from individuals on a pairwise level only. At this level, families are weighted according to the number of pairs within which each family appears, hence by size rather than information content, and information from multiple relationships is lost. Estimates of parameters are therefore not the most efficient achievable. Here, Markov chain Monte Carlo techniques have been used to partition the population into complete sibships, including, if known, prior knowledge of the distribution of family sizes. These pedigrees have then been used with restricted maximum likelihood under an animal model to estimate quantitative genetic parameters. Simulations to compare the properties of parameter estimates with those of existing techniques indicate that the use of sibship reconstruction is superior to earlier methods, having lower mean square errors and showing nonsignificant downward bias. In addition, sibship reconstruction allows the estimation of population allele frequencies that account for the relationships within the sample, so prior knowledge of allele frequencies need not be assumed. Extensions to these techniques allow reconstruction of half sibships when some or all of the maternal genotypes are known.  相似文献   

2.
Can we find the family trees, or pedigrees, that relate the haplotypes of a group of individuals? Collecting the genealogical information for how individuals are related is a very time-consuming and expensive process. Methods for automating the construction of pedigrees could stream-line this process. While constructing single-generation families is relatively easy given whole genome data, reconstructing multi-generational, possibly inbred, pedigrees is much more challenging. This article addresses the important question of reconstructing monogamous, regular pedigrees, where pedigrees are regular when individuals mate only with other individuals at the same generation. This article introduces two multi-generational pedigree reconstruction methods: one for inbreeding relationships and one for outbreeding relationships. In contrast to previous methods that focused on the independent estimation of relationship distances between every pair of typed individuals, here we present methods that aim at the reconstruction of the entire pedigree. We show that both our methods out-perform the state-of-the-art and that the outbreeding method is capable of reconstructing pedigrees at least six generations back in time with high accuracy. The two programs are available at http://cop.icsi.berkeley.edu/cop/.  相似文献   

3.
Using genetic marker data, we have developed a general methodology for estimating genetic relationships between a set of individuals. The purpose of this paper is to illustrate the practical utility of these methods as applied to the problem of paternity testing. Bayesian methods are used to compute the posterior probability distribution of the genetic relationship parameters. Use of an interval-estimation approach rather than a hypothesis-testing one avoids the problem of the specification of an appropriate null hypothesis in calculating the probability of paternity. Monte Carlo methods are used to evaluate the utility of two sets of genetic markers in obtaining suitably precise estimates of genetic relationship as well as the effect of the prior distribution chosen. Results indicate that with currently available markers a "true" father may be reliably distinguished from any other genetic relationship to the child and that with a reasonable number of markers one can often discriminate between an unrelated individual and one with a second-degree relationship to the child.  相似文献   

4.
Methods for detecting genetic linkage are more powerful when they fully use all of the data collected from pedigrees. We first discuss a method for obtaining the probability that a pedigree member has a given genotype, conditional on the phenotypes of his relatives. We then develop a rapid method to obtain the conditional probabilities of identity-by-descent sharing of marker alleles for all related pairs of individuals from extended pedigrees. The method assumes that the individuals are noninbred and that the relationship between genotype and phenotype is known for the marker locus studied. The probabilities of identity-by-descent sharing among relative pairs, conditional on marker phenotype information, can then be used in any of the model free tests for linkage between a trait locus and a marker locus.  相似文献   

5.
Summary We examine situations where interest lies in the conditional association between outcome and exposure variables, given potential confounding variables. Concern arises that some potential confounders may not be measured accurately, whereas others may not be measured at all. Some form of sensitivity analysis might be employed, to assess how this limitation in available data impacts inference. A Bayesian approach to sensitivity analysis is straightforward in concept: a prior distribution is formed to encapsulate plausible relationships between unobserved and observed variables, and posterior inference about the conditional exposure–disease relationship then follows. In practice, though, it can be challenging to form such a prior distribution in both a realistic and simple manner. Moreover, it can be difficult to develop an attendant Markov chain Monte Carlo (MCMC) algorithm that will work effectively on a posterior distribution arising from a highly nonidentified model. In this article, a simple prior distribution for acknowledging both poorly measured and unmeasured confounding variables is developed. It requires that only a small number of hyperparameters be set by the user. Moreover, a particular computational approach for posterior inference is developed, because application of MCMC in a standard manner is seen to be ineffective in this problem.  相似文献   

6.
The coefficient of relationship is defined as the correlation between the additive genetic values of two individuals. This coefficient can be defined specifically for a single quantitative trait locus (QTL) and may deviate considerably from the overall expectation if it is taken conditional on information from linked marker loci. Conditional halfsib correlations are derived under a simple genetic model with a biallelic QTL linked to a biallelic marker locus. The conditional relationship coefficients are shown to depend on the recombination rate between the marker and the QTL and the population frequency of the marker alleles, but not on parameters of the QTL, i.e. number and frequency of QTL alleles, degree of dominance etc., nor on the (usually unknown) QTL genotype of the sire. Extensions to less simplified cases (multiple alleles at the marker locus and the QTL, two marker loci flanking the QTL) are given. For arbitrary pedigrees, conditional relationship coefficients can also be derived from the conditional gametic covariance matrix suggested by Fernando and Grossman (1989). The connection of these two approaches is discussed. The conditional relationship coefficient can be used for marker-assisted genetic evaluation as well as for the detection of QTL and the estimation of their effects.  相似文献   

7.
Best linear unbiased allele-frequency estimation in complex pedigrees   总被引:4,自引:0,他引:4  
McPeek MS  Wu X  Ober C 《Biometrics》2004,60(2):359-367
Many types of genetic analyses depend on estimates of allele frequencies. We consider the problem of allele-frequency estimation based on data from related individuals. The motivation for this work is data collected on the Hutterites, an isolated founder population, so we focus particularly on the case in which the relationships among the sampled individuals are specified by a large, complex pedigree for which maximum likelihood estimation is impractical. For this case, we propose to use the best linear unbiased estimator (BLUE) of allele frequency. We derive this estimator, which is equivalent to the quasi-likelihood estimator for this problem, and we describe an efficient algorithm for computing the estimate and its variance. We show that our estimator has certain desirable small-sample properties in common with the maximum likelihood estimator (MLE) for this problem. We treat both the case when parental origin of each allele is known and when it is unknown. The results are extended to prediction of allele frequency in some set of individuals S based on genotype data collected on a set of individuals R. We compare the mean-squared error of the BLUE, the commonly used naive estimator (sample frequency) and the MLE when the latter is feasible to calculate. The results indicate that although the MLE performs the best of the three, the BLUE is close in performance to the MLE and is substantially easier to calculate, making it particularly useful for large complex pedigrees in which MLE calculation is impractical or infeasible. We apply our method to allele-frequency estimation in a Hutterite data set.  相似文献   

8.
Knowledge of relatedness between pairs of individuals plays an important role in many research areas including evolutionary biology, quantitative genetics, and conservation. Pairwise relatedness estimation methods based on genetic data from highly variable molecular markers are now used extensively as a substitute for pedigrees. Although the sampling variance of the estimators has been intensively studied for the most common simple genetic relationships, such as unrelated, half- and full-sib, or parent-offspring, little attention has been paid to the average performance of the estimators, by which we mean the performance across all pairs of individuals in a sample. Here we apply two measures to quantify the average performance: first, misclassification rates between pairs of genetic relationships and, second, the proportion of variance explained in the pairwise relatedness estimates by the true population relatedness composition (i.e., the frequencies of different relationships in the population). Using simulated data derived from exceptionally good quality marker and pedigree data from five long-term projects of natural populations, we demonstrate that the average performance depends mainly on the population relatedness composition and may be improved by the marker data quality only within the limits of the population relatedness composition. Our five examples of vertebrate breeding systems suggest that due to the remarkably low variance in relatedness across the population, marker-based estimates may often have low power to address research questions of interest.  相似文献   

9.
Inferring the parentage of a sample of individuals is often a prerequisite for many types of analysis in molecular ecology, evolutionary biology and quantitative genetics. In all but a few cases, the method of parentage assignment is divorced from the methods used to estimate the parameters of primary interest, such as mate choice or heritability. Here we present a Bayesian approach that simultaneously estimates the parentage of a sample of individuals and a wide range of population-level parameters in which we are interested. We show that joint estimation of parentage and population-level parameters increases the power of parentage assignment, reduces bias in parameter estimation, and accurately evaluates uncertainty in both. We illustrate the method by analysing a number of simulated test data sets, and through a re-analysis of parentage in the Seychelles warbler, Acrocephalus sechellensis. A combination of behavioural, spatial and genetic data are used in the analyses and, importantly, the method does not require strong prior information about the relationship between nongenetic data and parentage.  相似文献   

10.
Sun L  Wilder K  McPeek MS 《Human heredity》2002,54(2):99-110
Accurate information on the relationships among individuals in a study is critical for valid linkage analysis. We extend the MLLR, EIBD, AIBS and IBS tests for detection of misspecified relationships to a broader range of relative pairs, and we improve the two-stage screening procedure for analyzing large data sets. We have developed software, PREST, which calculates the test statistics and performs the corresponding hypothesis tests for relationship misclassification in general outbred pedigrees. When a potential pedigree error is detected, our companion program, ALTERTEST, can be used to determine which relationships are compatible with the genotype data. Both programs are now freely available on the web.  相似文献   

11.
An extraordinarily large number of single nucleotide polymorphisms (SNPs) are now available in humans as well as in other model organisms. Technological advancements may soon make it feasible to assay hundreds of SNPs in virtually any organism of interest. One potential application of SNPs is the determination of pairwise genetic relationships in populations without known pedigrees. Although microsatellites are currently the marker of choice for this purpose, the number of independently segregating microsatellite markers that can be feasibly assayed is limited. Thus, it can be difficult to distinguish reliably some classes of relationship (e.g. full-sibs from half-sibs) with microsatellite data alone. We assess, via Monte Carlo computer simulation, the potential for using a large panel of independently segregating SNPs to infer genetic relationships, following the analytical approach of Blouin et al. (1996). We have explored a 'best case scenario' in which 100 independently segregating SNPs are available. For discrimination among single-generation relationships or for the identification of parent-offspring pairs, it appears that such a panel of moderately polymorphic SNPs (minor allele frequency of 0.20) will provide discrimination power equivalent to only 16-20 independently segregating microsatellites. Although newly available analytical methods that can account for tight genetic linkage between markers will, in theory, allow improved estimation of relationships using thousands of SNPs in highly dense genomic scans, in practice such studies will only be feasible in a handful of model organisms. Given the comparable amount of effort required for the development of both types of markers, it seems that microsatellites will remain the marker of choice for relationship estimation in nonmodel organisms, at least for the foreseeable future.  相似文献   

12.
Analysing the impact of anthropogenic and natural river barriers on the dispersal of aquatic and semi‐aquatic species may be critical for their conservation. Knowledge of kinship relationships between individuals and reconstructions of pedigrees obtained using genomic data can be extremely useful, not only for studying the social organization of animals, but also inferring contemporary dispersal and quantifying the effect of specific barriers on current connectivity. In this study, we used kinship data to analyse connectivity patterns in a small semi‐aquatic mammal, the Pyrenean desman (Galemys pyrenaicus), in an area comprising two river systems with close headwaters and dams of various heights and types. Using a large SNP dataset from 70 specimens, we obtained kinship categories and reconstructed pedigrees. To quantify the barrier effect of specific obstacles, we built kinship networks and devised a method based on the assortativity coefficient, which measures the proportion between observed and expected kinship relationships across a barrier. The estimation of this parameter enabled us to infer that the most important barrier in the area was the watershed divide between the rivers, followed by a dam on one of the rivers. Other barriers did not significantly reduce the expected number of kinship relationships across them. This strategy and the information obtained with it may be crucial in determining the most important connectivity problems in an area and help develop conservation plans aimed at improving genetic exchange between populations of threatened species.  相似文献   

13.
Abney M 《Genetics》2008,179(3):1577-1590
Computing identity-by-descent sharing between individuals connected through a large, complex pedigree is a computationally demanding task that often cannot be done using exact methods. What I present here is a rapid computational method for estimating, in large complex pedigrees, the probability that pairs of alleles are IBD given the single-point genotype data at that marker for all individuals. The method can be used on pedigrees of essentially arbitrary size and complexity without the need to divide the individuals into separate subpedigrees. I apply the method to do qualitative trait linkage mapping using the nonparametric sharing statistic S(pairs). The validity of the method is demonstrated via simulation studies on a 13-generation 3028-person pedigree with 700 genotyped individuals. An analysis of an asthma data set of individuals in this pedigree finds four loci with P-values <10(-3) that were not detected in prior analyses. The mapping method is fast and can complete analyses of approximately 150 affected individuals within this pedigree for thousands of markers in a matter of hours.  相似文献   

14.
Use of variance-component estimation for mapping of quantitative-trait loci in humans is a subject of great current interest. When only trait values, not genotypic information, are considered, variance-component estimation can also be used to estimate heritability of a quantitative trait. Inbred pedigrees present special challenges for variance-component estimation. First, there are more variance components to be estimated in the inbred case, even for a relatively simple model including additive, dominance, and environmental effects. Second, more identity coefficients need to be calculated from an inbred pedigree in order to perform the estimation, and these are computationally more difficult to obtain in the inbred than in the outbred case. As a result, inbreeding effects have generally been ignored in practice. We describe here the calculation of identity coefficients and estimation of variance components of quantitative traits in large inbred pedigrees, using the example of HDL in the Hutterites. We use a multivariate normal model for the genetic effects, extending the central-limit theorem of Lange to allow for both inbreeding and dominance under the assumptions of our variance-component model. We use simulated examples to give an indication of under what conditions one has the power to detect the additional variance components and to examine their impact on variance-component estimation. We discuss the implications for mapping and heritability estimation by use of variance components in inbred populations.  相似文献   

15.
Molecular marker data collected from natural populations allows information on genetic relationships to be established without referencing an exact pedigree. Numerous methods have been developed to exploit the marker data. These fall into two main categories: method of moment estimators and likelihood estimators. Method of moment estimators are essentially unbiased, but utilise weighting schemes that are only optimal if the analysed pair is unrelated. Thus, they differ in their efficiency at estimating parameters for different relationship categories. Likelihood estimators show smaller mean squared errors but are much more biased. Both types of estimator have been used in variance component analysis to estimate heritability. All marker-based heritability estimators require that adequate levels of the true relationship be present in the population of interest and that adequate amounts of informative marker data are available. I review the different approaches to relationship estimation, with particular attention to optimizing the use of this relationship information in subsequent variance component estimation.  相似文献   

16.
The accurate estimation of the probability of identity by descent (IBD) at loci or genome positions of interest is paramount to the genetic study of quantitative and disease resistance traits. We present a Monte Carlo Markov Chain method to compute IBD probabilities between individuals conditional on DNA markers and on pedigree information. The IBDs can be obtained in a completely general pedigree at any genome position of interest, and all marker and pedigree information available is used. The method can be split into two steps at each iteration. First, phases are sampled using current genotypic configurations of relatives and second, crossover events are simulated conditional on phases. Internal track is kept of all founder origins and crossovers such that the IBD probabilities averaged over replicates are rapidly obtained. We illustrate the method with some examples. First, we show that all pedigree information should be used to obtain line origin probabilities in F2 crosses. Second, the distribution of genetic relationships between half and full sibs is analysed in both simulated data and in real data from an F2 cross in pigs.  相似文献   

17.
Detection and Integration of Genotyping Errors in Statistical Genetics   总被引:15,自引:0,他引:15       下载免费PDF全文
Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second-and currently most useful-example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications.  相似文献   

18.
A heuristic algorithm for finding gene transmission patterns on large and complex pedigrees with partially observed genotype data is proposed. The method can be used to generate an initial point for a Markov chain Monte Carlo simulation or to check that the given pedigree and the genotype data are consistent. In small pedigrees, the algorithm is exact by exhaustively enumerating all possibilities, but, in large pedigrees, with a considerable amount of unknown data, only a subset of promising configurations can actually be checked. For that purpose, the configurations are ordered by combining the approximative conditional probability distribution of the unknown genotypes with the information on the relationships between individuals. We also introduce a way to divide the task into subparts, which has been shown to be useful in large pedigrees. The algorithm has been implemented in a program called APE (Allelic Path Explorer) and tested in three different settings with good results.  相似文献   

19.
This paper investigates effects on lod scores when one individual in a data set changes diagnostic or recombinant status. First we examine the situation in which a single offspring in a nuclear family changes status. The nuclear-family situation, in addition to being of interest in its own right, also has general theoretical importance, since nuclear families are "transparent"; that is, one can track genetic events more precisely in nuclear families than in complex pedigrees. We demonstrate that in nuclear families log10 [(1-theta)/theta] gives an upper limit on the impact that a single offspring's change in status can have on the lod score at that recombination fraction (theta). These limits hold for a fully penetrant dominant condition and fully informative marker, in either phase-known or phase-unknown matings. Moreover, log10 [(1-theta)/theta] (where theta denotes the value of theta at which Zmax occurs) gives an upper limit on the impact of a single offspring's status change on the maximum lod score (Zmax). In extended pedigrees, in contrast to nuclear families, no comparable limit can be set on the impact of a single individual on the lod score. Complex pedigrees are subject to both stabilizing and destabilizing influences, and these are described. Finally, we describe a "sensitivity analysis," in which, after all linkage analysis is completed, every informative individual in the data set is changed, one at a time, to see the effect which each separate change has on the lod scores. The procedure includes identifying "critical individuals," i.e., those who would have the greatest impact on the lod scores, should their diagnostic status in fact change. To illustrate use of the sensitivity analysis, we apply it to the large bipolar pedigree reported by Egeland et al. and Kelsoe et al. We show that the changes in lod scores observed there, on the order of 1.1-1.2 per person, are not unusual. We recommend that investigators include a sensitivity analysis as a standard part of reporting the results of a linkage analysis.  相似文献   

20.
Large-scale, multilocus genetic association studies require powerful and appropriate statistical-analysis tools that are designed to relate genotype and haplotype information to phenotypes of interest. Many analysis approaches consider relating allelic, haplotypic, or genotypic information to a trait through use of extensions of traditional analysis techniques, such as contingency-table analysis, regression methods, and analysis-of-variance techniques. In this work, we consider a complementary approach that involves the characterization and measurement of the similarity and dissimilarity of the allelic composition of a set of individuals' diploid genomes at multiple loci in the regions of interest. We describe a regression method that can be used to relate variation in the measure of genomic dissimilarity (or "distance") among a set of individuals to variation in their trait values. Weighting factors associated with functional or evolutionary conservation information of the loci can be used in the assessment of similarity. The proposed method is very flexible and is easily extended to complex multilocus-analysis settings involving covariates. In addition, the proposed method actually encompasses both single-locus and haplotype-phylogeny analysis methods, which are two of the most widely used approaches in genetic association analysis. We showcase the method with data described in the literature. Ultimately, our method is appropriate for high-dimensional genomic data and anticipates an era when cost-effective exhaustive DNA sequence data can be obtained for a large number of individuals, over and above genotype information focused on a few well-chosen loci.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号