首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The calculation of multipoint likelihoods of pedigree data is crucial for extracting the full available information needed for both parametric and nonparametric linkage analysis. Recent mathematical advances in both the Elston-Stewart and Lander-Green algorithms for computing exact multipoint likelihoods of pedigree data have enabled researchers to analyze data sets containing more markers and more individuals both faster and more efficiently. This paper presents novel algorithms that further extend the computational boundary of the Elston-Stewart algorithm. They have been implemented into the software package VITESSE v. 2 and are shown to be several orders of magnitude faster than the original implementation of the Elston-Stewart algorithm in VITESSE v. 1 on a variety of real pedigree data. VITESSE v. 2 was faster by a factor ranging from 168 to over 1,700 on these data sets, thus making a qualitative difference in the analysis. The main algorithm is based on the faster computation of the conditional probability of a component nuclear family within the pedigree by summing over the joint genotypes of the children instead of the parents as done in the VITESSE v. 1. This change in summation allows the parent-child transmission part of the calculation to be not only computed for each parent separately, but also for each locus separately by using inheritance vectors as is done in the Lander-Green algorithm. Computing both of these separately can lead to substantial computational savings. The use of inheritance vectors in the nuclear family calculation represents a partial synthesis of the techniques of the Lander-Green algorithm into the Elston-Stewart algorithm. In addition, the technique of local set recoding is introduced to further reduce the complexity of the nuclear family computation. These new algorithms, however, are not universally faster on all types of pedigree data compared to the method implemented in VITESSE v. 1 of summing over the parents. Therefore, a hybrid algorithm is introduced which combines the strength of both summation methods by using a numerical heuristic to decide which of the two to use for a given nuclear family within the pedigree and is shown to be faster than either method on its own. Finally, this paper discusses various complexity issues regarding both the Elston-Stewart and Lander-Green algorithms and possible future directions of further synthesis.  相似文献   

2.
The Ts65Dn mouse is trisomic for orthologs of about half the genes on Hsa21. A number of phenotypes in these trisomic mice parallel those in humans with trisomy 21 (Down syndrome), including cognitive deficits due to hippocampal malfunction that are sufficiently similar to human that “therapies” developed in Ts65Dn mice are making their way to human clinical trials. However, the impact of the model is limited by availability. Ts65Dn cannot be completely inbred and males are generally considered to be sterile. Females have few, small litters and they exhibit poor care of offspring, frequently abandoning entire litters. Here we report identification and selective breeding of rare fertile males from two working colonies of Ts65Dn mice. Trisomic offspring can be propagated by natural matings or by in vitro fertilization (IVF) to produce large cohorts of closely related siblings. The use of a robust euploid strain as recipients of fertilized embryos in IVF or as the female in natural matings greatly improves husbandry. Extra zygotes cultured to the blastocyst stage were used to create trisomic and euploid embryonic stem (ES) cells from littermates. We developed parameters for cryopreserving sperm from Ts65Dn males and used it to produce trisomic offspring by IVF. Use of cryopreserved sperm provides additional flexibility in the choice of oocyte donors from different genetic backgrounds, facilitating rapid production of complex crosses. This approach greatly increases the power of this important trisomic model to interrogate modifying effects of trisomic or disomic genes that contribute to trisomic phenotypes.  相似文献   

3.
The calculation of multipoint likelihoods is computationally challenging, with the exact calculation of multipoint probabilities only possible on small pedigrees with many markers or large pedigrees with few markers. This paper explores the utility of calculating multipoint likelihoods using data on markers flanking a hypothesized position of the trait locus. The calculation of such likelihoods is often feasible, even on large pedigrees with missing data and complex structures. Performance characteristics of the flanking marker procedure are assessed through the calculation of multipoint heterogeneity LOD scores on data simulated for Genetic Analysis Workshop 14 (GAW14). Analysis is restricted to data on the Aipotu population on chromosomes 1, 3, and 4, where chromosomes 1 and 3 are known to contain disease loci. The flanking marker procedure performs well, even when missing data and genotyping errors are introduced.  相似文献   

4.
Sib-pair analysis is an increasingly important tool for genetic dissection of complex traits. Current methods for sib-pair analysis are primarily based on studying individual genetic markers one at a time and thus fail to use the full inheritance information provided by multipoint linkage analysis. In this paper, we describe how to extract the complete multipoint inheritance information for each sib pair. We then describe methods that use this information to map loci affecting traits, thereby providing a unified approach to both qualitative and quantitative traits. Specifically, complete multipoint approaches are presented for (1) exclusion mapping of qualitative traits; (2) maximum-likelihood mapping of qualitative traits; (3) information-content mapping, showing the extent to which all inheritance information has been extracted at each location in the genome; and (4) quantitative-trait mapping, by two parametric methods and one nonparametric method. In addition, we explore the effects of marker density, marker polymorphism, and availability of parents on the information content of a study. We have implemented the analysis methods in a new computer package, MAPMAKER/SIBS. With this computer package, complete multipoint analysis with dozens of markers in hundreds of sib pairs can be carried out in minutes.  相似文献   

5.
Model-free linkage analysis using likelihoods.   总被引:6,自引:2,他引:4       下载免费PDF全文
Misspecification of transmission model parameters can produce artifactually negative lod scores at small recombination fractions and in multipoint analysis. To avoid this problem, we have tried to devise a test that aims to detect a genetic effect at a particular locus, rather than attempting to estimate the map position of a locus with specified effect. Maximizing likelihoods over transmission model parameters, as well as linkage parameters, can produce seriously biased parameter estimates and so yield tests that lack power for the detection of linkage. However, constraining the transmission model parameters to produce the correct population prevalence largely avoids this problem. For computational convenience, we recommend that the likelihoods under linkage and non-linkage are independently maximized over a limited set of transmission models, ranging from Mendelian dominant to null effect and from null effect to Mendelian recessive. In order to test for a genetic effect at a given map position, the likelihood under linkage is maximized over admixture, the proportion of families linked. Application to simulated data for a wide range of transmission models in both affected sib pairs and pedigrees demonstrates that the new method is well behaved under the null hypothesis and provides a powerful test for linkage when it is present. This test requires no specification of transmission model parameters, apart from an approximate estimate of the population prevalence. It can be applied equally to sib pairs and pedigrees, and, since it does not diminish the lod score at test positions very close to a marker, it is suitable for application to multipoint data.  相似文献   

6.
In complex disease studies, it is crucial to perform multipoint linkage analysis with many markers and to use robust nonparametric methods that take account of all pedigree information. Currently available methods fall short in both regards. In this paper, we describe how to extract complete multipoint inheritance information from general pedigrees of moderate size. This information is captured in the multipoint inheritance distribution, which provides a framework for a unified approach to both parametric and nonparametric methods of linkage analysis. Specifically, the approach includes the following: (1) Rapid exact computation of multipoint LOD scores involving dozens of highly polymorphic markers, even in the presence of loops and missing data. (2) Non-parametric linkage (NPL) analysis, a powerful new approach to pedigree analysis. We show that NPL is robust to uncertainty about mode of inheritance, is much more powerful than commonly used nonparametric methods, and loses little power relative to parametric linkage analysis. NPL thus appears to be the method of choice for pedigree studies of complex traits. (3) Information-content mapping, which measures the fraction of the total inheritance information extracted by the available marker data and points out the regions in which typing additional markers is most useful. (4) Maximum-likelihood reconstruction of many-marker haplotypes, even in pedigrees with missing data. We have implemented NPL analysis, LOD-score computation, information-content mapping, and haplotype reconstruction in a new computer package, GENEHUNTER. The package allows efficient multipoint analysis of pedigree data to be performed rapidly in a single user-friendly environment.  相似文献   

7.
Model misspecification and multipoint linkage analysis.   总被引:9,自引:0,他引:9  
Pairwise linkage analysis is robust to genetic model misspecification provided dominance is correctly specified, the primary effect being inflation of the recombination fraction. By contrast, we show that multipoint analysis under misspecified models is not robust when a putative disease locus is placed between close flanking markers, with potentially spuriously negative multipoint lod scores being produced. The problem is due to incorrect attribution of segregation of a disease allele and the consequent conclusion of (unlikely) double crossovers between flanking markers. As a possible solution, we propose the use of high disease allele frequencies, as this allows probabilistically for nonsegregation (through parental homozygosity or dual matings). We show analytically and through analysis of pedigree data simulated under a two-locus heterogeneity model that using a disease allele frequency of 0.05 in the dominant case and 0.25 in the recessive case is quite robust in producing positive multipoint lod scores with close flanking markers across a broad range of conditions including varying allele frequencies, epistasis, genetic heterogeneity and phenocopies.  相似文献   

8.
Complete pedigree information is a prerequisite for modern breeding and the ranking of parents and offspring for selection and deployment decisions. DNA fingerprinting and pedigree reconstruction can substitute for artificial matings, by allowing parentage delineation of naturally produced offspring. Here, we report on the efficacy of a breeding concept called "Breeding without Breeding" (BwB) that circumvents artificial matings, focusing instead on a subset of randomly sampled, maternally known but paternally unknown offspring to delineate their paternal parentage. We then generate the information needed to rank those offspring and their paternal parents, using a combination of complete (full-sib: FS) and incomplete (half-sib: HS) analyses of the constructed pedigrees. Using a random sample of wind-pollinated offspring from 15 females (seed donors), growing in a 41-parent western larch population, BwB is evaluated and compared to two commonly used testing methods that rely on either incomplete (maternal half-sib, open-pollinated: OP) or complete (FS) pedigree designs. BwB produced results superior to those from the incomplete design and virtually identical to those from the complete pedigree methods. The combined use of complete and incomplete pedigree information permitted evaluating all parents, both maternal and paternal, as well as all offspring, a result that could not have been accomplished with either the OP or FS methods alone. We also discuss the optimum experimental setting, in terms of the proportion of fingerprinted offspring, the size of the assembled maternal and paternal half-sib families, the role of external gene flow, and selfing, as well as the number of parents that could be realistically tested with BwB.  相似文献   

9.
Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We present a distributed system called "SUPERLINK-ONLINE," for the computation of multipoint LOD scores of large inbred pedigrees. It achieves high performance via the efficient parallelization of the algorithms in SUPERLINK, a state-of-the-art serial program for these tasks, and through the use of the idle cycles of thousands of personal computers. The main algorithmic challenge has been to efficiently split a large task for distributed execution in a highly dynamic, nondedicated running environment. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either the installation of software or the maintenance of a complicated distributed environment. As the system was being developed, it was extensively tested by collaborating medical centers worldwide on a variety of real data sets, some of which are presented in this article.  相似文献   

10.
Testing for deviations from Hardy–Weinberg equilibrium (HWE) is a common practice for quality control in genetic studies. Variable sites violating HWE may be identified as technical errors in the sequencing or genotyping process, or they may be of particular evolutionary interest. Large‐scale genetic studies based on next‐generation sequencing (NGS) methods have become more prevalent as cost is decreasing but these methods are still associated with statistical uncertainty. The large‐scale studies usually consist of samples from diverse ancestries that make the existence of some degree of population structure almost inevitable. Precautions are therefore needed when analysing these data set, as population structure causes deviations from HWE. Here we propose a method that takes population structure into account in the testing for HWE, such that other factors causing deviations from HWE can be detected. We show the effectiveness of PCAngsd in low‐depth NGS data, as well as in genotype data, for both simulated and real data set, where the use of genotype likelihoods enables us to model the uncertainty.  相似文献   

11.
Several programs are currently available for the detection of genotyping error that may or may not be Mendelianly inconsistent. However, no systematic study exists that evaluates their performance under varying pedigree structures and sizes, marker spacing, and allele frequencies. Our simulation study compares four multipoint methods: Merlin, Mendel4, SimWalk2, and Sibmed. We look at empirical thresholds, power, and false-positive rates on 7 small pedigree structures that included sibships with and without genotyped parents, and a three-generation pedigree, using 11 microsatellite markers with 3 different map spacings. Simulated data includes 5,000 replicates of each pedigree structure and marker map, with random genotyping errors in about 4% of the middle marker's genotypes. We found that the default thresholds used by these programs provide low power (47-72%). Power is improved more by adding genotyped siblings than by using more closely spaced markers. Some mistyping methods are sensitive to the frequencies of the observed alleles. Siblings of mistyped individuals have elevated false-positive rates, as do markers close to the mistyped marker. We conclude that thresholds should be decided based on the pedigree and marker data and that greater focus should be placed on modeling genotyping error when computing likelihoods, rather than on detecting and eliminating genotyping errors.  相似文献   

12.
In order to increase the size of declining salmonid populations, supplementation programmes intentionally release fish raised in hatcheries into the wild. Because hatchery-born fish often have lower fitness than wild-born fish, estimating rates of gene flow from hatcheries into wild populations is essential for predicting the fitness cost to wild populations. Steelhead trout (Oncorhynchus mykiss) have both freshwater resident and anadromous (ocean-going) life history forms, known as rainbow trout and steelhead, respectively. Juvenile hatchery steelhead that 'residualize' (become residents rather than go to sea as intended) provide a previously unmeasured route for gene flow from hatchery into wild populations. We apply a combination of parentage and grandparentage methods to a three-generation pedigree of steelhead from the Hood River, Oregon, to identify the missing parents of anadromous fish. For fish with only one anadromous parent, 83% were identified as having a resident father while 17% were identified as having a resident mother. Additionally, we documented that resident hatchery males produced more offspring with wild anadromous females than with hatchery anadromous females. One explanation is the high fitness cost associated with matings between two hatchery fish. After accounting for all of the possible matings involving steelhead, we find that only 1% of steelhead genes come from residualized hatchery fish, while 20% of steelhead genes come from wild residents. A further 23% of anadromous steelhead genes come from matings between two resident parents. If these matings mirror the proportion of matings between residualized hatchery fish and anadromous partners, then closer to 40% of all steelhead genes come from wild trout each generation. These results suggest that wild resident fish contribute substantially to endangered steelhead 'populations' and highlight the need for conservation and management efforts to fully account for interconnected Oncorhynchus mykiss life histories.  相似文献   

13.
Biological invasions reshape environments and affect the ecological and economic welfare of states and communities. Such invasions advance on multiple spatial scales, complicating their control. When modeling stochastic dispersal processes, intractable likelihoods and autocorrelated data complicate parameter estimation. As with other approaches, the recent synthetic likelihood framework for stochastic models uses summary statistics to reduce this complexity; however, it additionally provides usable likelihoods, facilitating the use of existing likelihood‐based machinery. Here, we extend this framework to parameterize multi‐scale spatio‐temporal dispersal models and compare existing and newly developed spatial summary statistics to characterize dispersal patterns. We provide general methods to evaluate potential summary statistics and present a fitting procedure that accurately estimates dispersal parameters on simulated data. Finally, we apply our methods to quantify the short and long range dispersal of Chagas disease vectors in urban Arequipa, Peru, and assess the feasibility of a purely reactive strategy to contain the invasion.  相似文献   

14.
alpha 1-antitrypsin (alpha 1AT) of the Pi type Z is associated with two diseases: pulmonary emphysema and cirrhosis of the liver. We report 23 families with both parents heterozygous for the PiZ allele, characterized from our own analysis and from world literature sources. All families were identified through members expressing disease. From the extended pedigrees, 18 backcross families (parents with Pi types MM and MZ) were identified. Analysis of the backcross families reveals a significant increase in Pi MZ offspring (.73) among families where the male is heterozygous. The distortion is not detected among families where the female is heterozygous. Among the matings where both parents are heterozygous, we found 0.43 Pi ZZ from families where one or more members expressed hepatic cirrhosis, and 0.40 Pi ZZ for total families studied. This contrasts to the 0.25 Pi ZZ expected, but is consistent with the distortion observed in backcross matings. The implications of various statistical approaches are discussed, and we point out why our findings differ from previous reports. We suggest a possible biological explanation residing in the fertilization process.  相似文献   

15.
Multipoint quantitative-trait linkage analysis in general pedigrees.   总被引:49,自引:12,他引:37       下载免费PDF全文
Multipoint linkage analysis of quantitative-trait loci (QTLs) has previously been restricted to sibships and small pedigrees. In this article, we show how variance-component linkage methods can be used in pedigrees of arbitrary size and complexity, and we develop a general framework for multipoint identity-by-descent (IBD) probability calculations. We extend the sib-pair multipoint mapping approach of Fulker et al. to general relative pairs. This multipoint IBD method uses the proportion of alleles shared identical by descent at genotyped loci to estimate IBD sharing at arbitrary points along a chromosome for each relative pair. We have derived correlations in IBD sharing as a function of chromosomal distance for relative pairs in general pedigrees and provide a simple framework whereby these correlations can be easily obtained for any relative pair related by a single line of descent or by multiple independent lines of descent. Once calculated, the multipoint relative-pair IBDs can be utilized in variance-component linkage analysis, which considers the likelihood of the entire pedigree jointly. Examples are given that use simulated data, demonstrating both the accuracy of QTL localization and the increase in power provided by multipoint analysis with 5-, 10-, and 20-cM marker maps. The general pedigree variance component and IBD estimation methods have been implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package.  相似文献   

16.
Modern genetic parentage methods reveal that alternative reproductive strategies are common in both males and females. Under ideal conditions, genetic methods accurately connect the parents to offspring produced by extra-pair matings or conspecific brood parasitism. However, some breeding systems and sampling scenarios present significant complications for accurate parentage assignment. We used simulated genetic pedigrees to assess the reliability of parentage assignment for a series of challenging sampling regimes that reflect realistic conditions for many brood-parasitic birds: absence of genetic samples from sires, absence of samples from brood parasites and female kin-structured populations. Using 18 microsatellite markers and empirical allele frequencies from two populations of a conspecific brood parasite, the wood duck (Aix sponsa), we simulated brood parasitism and determined maternity using two widely used programs, cervus and colony . Errors in assignment were generally modest for most sampling scenarios but differed by program: cervus suffered from false assignment of parasitic offspring, whereas colony sometimes failed to assign offspring to their known mothers. Notably, colony was able to accurately infer unsampled parents. Reducing the number of markers (nine loci rather than 18) caused the assignment error to slightly worsen with colony but balloon with cervus . One potential error with important biological implications was rare in all cases—few nesting females were incorrectly excluded as the mother of their own offspring, an error that could falsely indicate brood parasitism. We consider the implications of our findings for both a retrospective assessment of previous studies and suggestions for best practices for future studies.  相似文献   

17.
The HapMap Project is providing a great deal of new information on high-resolution haplotype structure in various human populations. This information has the potential to greatly increase the power of association mapping for a fixed amount of genotyping. A number of methods have been proposed for the identification of haplotype blocks, common haplotypes, and tagging single-nucleotide polymorphisms. Here, we build on this work by developing novel methods for case-control multipoint linkage-disequilibrium (LD) mapping that gain power and speed by making explicit use of the inferred block structure. Specifically, we developed a virtual-variant approach that uses the haplotype-block information to greatly increase power for detection of untyped common variants associated with a trait. Because full multipoint LD mapping can be slow, we exploited the haplotype-block information to develop a fast single-block multipoint mapping method. Our methods are appropriate for genotype data and take into account the uncertainty in phase. We describe the methods in the context of case-parents trios, although they are also applicable to unrelated cases and controls. Our simulations indicate that the most important gains from taking into account the haplotype-block structure at the analysis stage of multipoint LD mapping come from (1) greatly increased power to detect association with untyped variants and (2) greatly improved localization of untyped variants associated with the trait. More-modest gains are obtained in improving power to detect association with a variant that is typed with a moderate amount of missing data. The methods are applied to a Crohn disease data set.  相似文献   

18.
We recently showed genomewide linkage of centrotemporal sharp waves (CTS) in classic Rolandic epilepsy (RE) families to chromosome 11p13, and fine‐mapped this locus to variants in the ELP4 gene. Speech sound disorder (SSD) is a common comorbidity in RE subjects, of unknown etiology, which co‐aggregates in family members in a manner that could hypothetically be explained by shared underlying genetic risk with CTS. Furthermore, the neural mechanism of SSD is unknown, although individuals with rare, Mendelian forms of RE are described with severe verbal and oromotor apraxia. We therefore first performed genomewide linkage analysis for SSD, operationally defined as clinical history consistent with ICD‐10 speech articulation disorder, in 38 families singly ascertained through a proband with RE. We tested the hypothesis of shared genetic risk with CTS at the 11p13 locus. In the second part of the study we used computerized acoustic analysis of recorded speech to test the hypothesis of dyspraxia as a mechanism for SSD in a smaller subset of RE probands and relatives. In two‐point and multipoint LOD score analysis, we found that evidence for linkage to the 11p13 locus increased substantially when the phenotype was broadened from CTS to CTS/SSD. In multipoint analysis, the LOD score rose by 3.2 to HLOD 7.54 at D11S914 for CTS/SSD, the same marker at which multipoint linkage maximized for CTS alone. Non‐parametric, affected‐only methods in a sub‐set of the data provide further confirmatory evidence for pleiotropy. In acoustic analysis there were voice‐onset time abnormalities in 10/18 RE probands, 8/16 siblings and 5/15 parents, providing evidence of breakdown in the spatial/temporal properties of speech articulation consistent with a dyspraxic mechanism. The results from genetic and physiological studies suggest a pleiotropic role for the 11p13 locus in the development of both SSD and CTS, and also indicate a dyspraxic mechanism for the SSD linked to 11p13. Taken together, these data strongly support a neurodevelopmental origin for classic RE.  相似文献   

19.
Understanding and characterising biochemical processes inside single cells requires experimental platforms that allow one to perturb and observe the dynamics of such processes as well as computational methods to build and parameterise models from the collected data. Recent progress with experimental platforms and optogenetics has made it possible to expose each cell in an experiment to an individualised input and automatically record cellular responses over days with fine time resolution. However, methods to infer parameters of stochastic kinetic models from single-cell longitudinal data have generally been developed under the assumption that experimental data is sparse and that responses of cells to at most a few different input perturbations can be observed. Here, we investigate and compare different approaches for calculating parameter likelihoods of single-cell longitudinal data based on approximations of the chemical master equation (CME) with a particular focus on coupling the linear noise approximation (LNA) or moment closure methods to a Kalman filter. We show that, as long as cells are measured sufficiently frequently, coupling the LNA to a Kalman filter allows one to accurately approximate likelihoods and to infer model parameters from data even in cases where the LNA provides poor approximations of the CME. Furthermore, the computational cost of filtering-based iterative likelihood evaluation scales advantageously in the number of measurement times and different input perturbations and is thus ideally suited for data obtained from modern experimental platforms. To demonstrate the practical usefulness of these results, we perform an experiment in which single cells, equipped with an optogenetic gene expression system, are exposed to various different light-input sequences and measured at several hundred time points and use parameter inference based on iterative likelihood evaluation to parameterise a stochastic model of the system.  相似文献   

20.
Advances in dinucleotide-based genetic maps open possibilities for large scale genotyping at high resolution. The current rate-limiting steps in use of these dense maps is data interpretation (allele definition), data entry, and statistical calculations. We have recently reported automated allele identification methods. Here we show that a 10-cM framework map of the human X chromosome can be analyzed on two lanes of an automated sequencer per individual (10–12 loci per lane). We use this map and analysis strategy to generate allele data for an X-linked recessive spastic paraplegia family with a known PLP mutation. We analyzed 198 genotypes in a single gel and used the data to test three methods of data analysis: manual meiotic breakpoint mapping, automated concordance analysis, and whole chromosome multipoint linkage analysis. All methods pinpointed the correct location of the gene. We propose that multipoint exclusion mapping may permit valid inflation of LOD scores using the equation max LOD — (next best LOD).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号