首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.  相似文献   

2.
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for microsatellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.  相似文献   

3.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

4.
Efficient inference of haplotypes from genotypes on a pedigree   总被引:1,自引:0,他引:1  
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.  相似文献   

5.
Linkage analysis identifies markers that appear to be co-inherited with a trait within pedigrees. The inheritance of a chromosomal segment may be probabilistically reconstructed, with missing data complicating inference. Inheritance patterns are further obscured in the analysis of complex traits, where variants in one or more genes may contribute to phenotypic variation within a pedigree. In this case, determining which relatives share a trait variant is not simple. We describe how to represent these patterns of inheritance for marker loci. We summarize how to sample patterns of inheritance consistent with genotypic and pedigree data using gl_auto, available in MORGAN v3.0. We describe identification of classes of equivalent inheritance patterns with the program IBDgraph. We finally provide an example of how these programs may be used to simplify interpretation of linkage analysis of complex traits in general pedigrees. We borrow information across loci in a parametric linkage analysis of a large pedigree. We explore the contribution of each equivalence class to a linkage signal, illustrate estimated patterns of identity-by-descent sharing, and identify a haplotype tagging the chromosomal segment driving the linkage signal. Haplotype carriers are more likely to share the linked trait variant, and can be prioritized for subsequent DNA sequencing.  相似文献   

6.
The genotyping of mother–father–child trios is a very useful tool in disease association studies, as trios eliminate population stratification effects and increase the accuracy of haplotype inference. Unfortunately, the use of trios for association studies may reduce power, since it requires the genotyping of three individuals where only four independent haplotypes are involved. We describe here a method for genotyping a trio using two DNA pools, thus reducing the cost of genotyping trios to that of genotyping two individuals. Furthermore, we present extensions to the method that exploit the linkage disequilibrium structure to compensate for missing data and genotyping errors. We evaluated our method on trios from CEPH pedigree 66 of the Coriell Institute. We demonstrate that the error rates in the genotype calls of the proposed protocol are comparable to those of standard genotyping techniques, although the cost is reduced considerably. The approach described is generic and it can be applied to any genotyping platform that achieves a reasonable precision of allele frequency estimates from pools of two individuals. Using this approach, future trio-based association studies may be able to increase the sample size by 50% for the same cost and thereby increase the power to detect associations.  相似文献   

7.
Liu PY  Lu Y  Deng HW 《Genetics》2006,174(1):499-509
Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis.  相似文献   

8.
We describe a four-generation family with fully penetrant, autosomal dominant, congenital cataracts (ADCC), presenting with morphologically homogeneous "zonular pulverulent" cataracts (CZP) and typical early-onset phenotype. Linkage analysis was performed with a panel of polymorphic markers mapped to all genomic regions of ADCC susceptibility. Contiguous significant two-point lod scores were generated at autosomal region 13q11-q13 and further linkage and haplotype studies confined the disease locus to 13q11, supporting a previous linkage of CZP (specifically CZP3) to 13q11. Mutations in a gap-junction protein, connexin 46 (alphaa3 subunit or GJA3), have recently been reported as being linked to the 13q11 region. Mutational analysis of connexin 46 in our family revealed a C-->T at position 560 (P187L) of the cDNA sequence creating a novel MnlI restriction site that segregated with affected members of the pedigree. This family represents a second report of CZP3 linkage to 13q and is associated with a novel mutation in the connexin 46 (GJA3) gene.  相似文献   

9.
We have studied a four-generation family with features of Weyers acrofacial dysostosis, in which the proband has a more severe phenotype, resembling Ellis-van Creveld syndrome. Weyers acrofacial dysostosis is an autosomal dominant condition with dental anomalies, nail dystrophy, postaxial polydactyly, and mild short stature. Ellis-van Creveld syndrome is a similar condition, with autosomal recessive inheritance and the additional features of disproportionate dwarfism, thoracic dysplasia, and congenital heart disease. Linkage and haplotype analysis determined that the disease locus in this pedigree resides on chromosome 4p16, distal to the genetic marker D4S3007 and within a 17-cM region flanking the genetic locus D4S2366. This region includes the Ellis-van Creveld syndrome locus, which previously was reported to map within a 3-cM region between genetic markers D4S2957 and D4S827. Either the genes for the condition in our family and for Ellis-van Creveld syndrome are near one another or these two conditions are allelic with mutations in the same gene. These data also raise the possibility that Weyers acrofacial dysostosis is the heterozygous expression of a mutation that, in homozygous form, causes the autosomal recessive disorder Ellis-van Creveld syndrome.  相似文献   

10.
Effective identification of disease-causing gene locations can have significant impact on patient management decisions that will ultimately increase survival rates and improve the overall quality of health care. Linkage disequilibrium mapping is the process of finding disease gene locations through comparisons of haplotype frequencies between disease chromosomes and normal chromosomes. This work presents a new method for linkage disequilibrium mapping. The main advantage of the proposed algorithm, called LinkageTracker, is its consistency in producing good predictive accuracy under different conditions, including extreme conditions where the occurrence of disease samples with the mutation of interest is very low and there is presence of error or noise. We compared our method with some leading methods in linkage disequilibrium mapping such as HapMiner, Blade, GeneRecon, and Haplotype Pattern Mining (HPM). Experimental results show that for a substantial class of problems, our method has good predictive accuracy while taking reasonably short processing time. Furthermore, LinkageTracker does not require any population ancestry information about the disease and the genealogy of the haplotypes. Therefore, it is useful for linkage disequilibrium mapping when the users do not have such information about their datasets.  相似文献   

11.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

12.
13.
Gene mapping by linkage and association analysis   总被引:3,自引:0,他引:3  
Genetic analysis is used to map genes, including disease loci, to positions within the human genome. Linkage analysis depends on the co-segregation of a gene (locus) and a phenotype through a pedigree, while association analysis, or linkage disequilibrium mapping, depends on measuring deviation from the random occurrence of alleles in a haplotype in unrelated individuals or nuclear families. Complex computer programs may be used in both forms of analysis. In recent years most interest has focused on identifying genes involved in common, multifactorial diseases. Here I review some current and developing techniques of genetic analysis and give references to where further information can be obtained.  相似文献   

14.
Congenital pseudomyotonia in Chianina cattle is a muscle function disorder very similar to that of Brody disease in humans. Mutations in the human ATP2A1 gene, encoding SERCA1, cause Brody myopathy. The analysis of the collected Chianina pedigree data suggested monogenic autosomal recessive inheritance and revealed that all 17 affected individuals traced back to a single founder. A deficiency of SERCA1 function in skeletal muscle of pseudomyotonia affected Chianina cattle was observed as SERCA1 activity in affected animals was decreased by about 70%. Linkage analysis showed that the mutation was located in the ATP2A1 gene region on BTA25 and subsequent mutation analysis of the ATP2A1 exons revealed a perfectly associated missense mutation in exon 6 (c.491G > A) leading to a p.Arg164His substitution. Arg164 represents a functionally important and strongly conserved residue of SERCA1. This study provides a suitable large animal model for human Brody disease.  相似文献   

15.
Paget disease of bone (PDB) is characterized by increased osteoclast activity and localized abnormal bone remodeling. PDB has a significant genetic component, with evidence of linkage to chromosomes 6p21.3 (PDB1) and 18q21-22 (PDB2) in some pedigrees. There is evidence of genetic heterogeneity, with other pedigrees showing negative linkage to these regions. TNFRSF11A, a gene that is essential for osteoclast formation and that encodes receptor activator of nuclear factor-kappa B (RANK), has been mapped to the PDB2 region. TNFRSF11A mutations that segregate in pedigrees with either familial expansile osteolysis or familial PDB have been identified; however, linkage studies and mutation screening have excluded the involvement of RANK in the majority of patients with PDB. We have excluded linkage, both to PDB1 and to PDB2, in a large multigenerational pedigree with multiple family members affected by PDB. We have conducted a genomewide scan of this pedigree, followed by fine mapping and multipoint analysis in regions of interest. The peak two-point LOD scores from the genomewide scan were 2.75, at D7S507, and 1.76, at D18S70. Multipoint and haplotype analysis of markers flanking D7S507 did not support linkage to this region. Haplotype analysis of markers flanking D18S70 demonstrated a haplotype segregating with PDB in a large subpedigree. This subpedigree had a significantly lower age at diagnosis than the rest of the pedigree (51.2+/-8.5 vs. 64.2+/-9.7 years; P=.0012). Linkage analysis of this subpedigree demonstrated a peak two-point LOD score of 4.23, at marker D18S1390 (straight theta=0), and a peak multipoint LOD score of 4.71, at marker D18S70. Our data are consistent with genetic heterogeneity within the pedigree and indicate that 18q23 harbors a novel susceptibility gene for PDB.  相似文献   

16.
A user-friendly Hypercard interface for human linkage analysis   总被引:3,自引:0,他引:3  
The availability of a large number of highly informative geneticmarkers has made human linkage analysis faster and easier toperform. However, current linkage analysis software does notprovide an organizational database into which a large body oflinkage data can be easily stored and manipulated. This manualentry and editing of linkage data is often time consuming andprone to typing errors. In addition, the large number of allelesin many of these markers must be reduced in order to performlinkage analysis with multiple loci across large genetic distances.This reduction in allele number is often difficult and confusing,especially in large pedigrees. We have taken advantage of theMacintosh-based Hypercard program to develop an interface withwhich linkage data can be easily stored, retrieved and edited.For each family, the components of the pedigree, including IDnumbers, sex and affection status, only need to be entered once.The program (Linkage Interface) retrieves this information eachtime the data from a new polymorphic marker is entered. LinkageInterface has flexible editing capabilities that allow the userto change any portion of the pedigree, including the additionor deletion of family members, without affecting previouslyentered genotype data. Linkage Interface can also analyze boththe pedigree and marker data and will detect any inconsistenciesin inheritance patterns. In addition, the program can reducethe number of alleles for a polynwrphic marker. Linkage Interfacewill then compare the ‘reduced’ data to the originalmarker data and assists in maintaining all informative meiosesby pointing out which meioses have become non-informative. Oncepolymorphic marker data are entered, the pedigree data, includingthe marker genotypes, are easily exported to a text file. Thistext file can be transferred to an IBM-compatible computer fordirect use with DOS-based linkage programs.  相似文献   

17.

Background

Current methods for haplotype inference without pedigree information assume random mating populations. In animal and plant breeding, however, mating is often not random. A particular form of nonrandom mating occurs when parental individuals of opposite sex originate from distinct populations. In animal breeding this is called crossbreeding and hybridization in plant breeding. In these situations, association between marker and putative gene alleles might differ between the founding populations and origin of alleles should be accounted for in studies which estimate breeding values with marker data. The sequence of alleles from one parent constitutes one haplotype of an individual. Haplotypes thus reveal allele origin in data of crossbred individuals.

Results

We introduce a new method for haplotype inference without pedigree that allows nonrandom mating and that can use genotype data of the parental populations and of a crossbred population. The aim of the method is to estimate line origin of alleles. The method has a Bayesian set up with a Dirichlet Process as prior for the haplotypes in the two parental populations. The basic idea is that only a subset of the complete set of possible haplotypes is present in the population.

Conclusion

Line origin of approximately 95% of the alleles at heterozygous sites was assessed correctly in both simulated and real data. Comparing accuracy of haplotype frequencies inferred with the new algorithm to the accuracy of haplotype frequencies inferred with PHASE, an existing algorithm for haplotype inference, showed that the DP algorithm outperformed PHASE in situations of crossbreeding and that PHASE performed better in situations of random mating.  相似文献   

18.
Patterns of polymorphism and linkage disequilibrium for cystic fibrosis   总被引:33,自引:0,他引:33  
Four polymorphic markers that map within 80 kb of an HTF island which is genetically very close to the cystic fibrosis locus have been identified. We have analyzed the linkage disequilibrium between each of these markers and the cystic fibrosis mutation in 89 families from four European countries, Denmark, Finland, Spain, and Great Britain. Strong linkage disequilibrium between three polymorphic sites and cystic fibrosis was observed. The markers on the J3.11 (D7S8) side of the HTF island show stronger disequilibrium than those on the met side. Linkage disequilibrium between markers and disease alters the probability that a person of a given haplotype is a carrier in some populations and helps to identify regions of a sequence that are most likely to contain the cystic fibrosis mutation.  相似文献   

19.
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.  相似文献   

20.
Haplotype inference by maximum parsimony   总被引:5,自引:0,他引:5  
MOTIVATION: Haplotypes have been attracting increasing attention because of their importance in analysis of many fine-scale molecular-genetics data. Since direct sequencing of haplotype via experimental methods is both time-consuming and expensive, haplotype inference methods that infer haplotypes based on genotype samples become attractive alternatives. RESULTS: (1) We design and implement an algorithm for an important computational model of haplotype inference that has been suggested before in several places. The model finds a set of minimum number of haplotypes that explains the genotype samples. (2) Strong supports of this computational model are given based on the computational results on both real data and simulation data. (3) We also did some comparative study to show the strength and weakness of this computational model using our program. AVAILABILITY: The software HAPAR is free for non-commercial uses. Available upon request (lwang@cs.cityu.edu.hk).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号