首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A deductive method of haplotype analysis in pedigrees.   总被引:13,自引:4,他引:9       下载免费PDF全文
Derivation of haplotypes from pedigree data by means of likelihood techniques requires large computational resources and is thus highly limited in terms of the complexity of problems that can be analyzed. The present paper presents 20 rules of logic that are both necessary and sufficient for deriving haplotypes by means of nonstatistical techniques. As a result, automated haplotype analysis that uses these rules is fast and efficient, requiring computer memory that increases only linearly (rather than exponentially) with family size and the number of factors under analysis. Some error analysis is also possible. The rules are completely general with regard to any system of completely linked, discrete genetic markers that are autosomally inherited. There are no limitations on pedigree structure or the amount of missing data, although the existence of incomplete data usually reduces the fraction of haplotypes that can be completely determined.  相似文献   

2.
We present a correlation of molecular genetic data (mutations) and genetic data (dinucleotide-repeat polymorphisms) for a cohort of seven hyperkalemic periodic paralysis (HyperPP) and two paramyotonia congenita (PC) families from diverse ethnic backgrounds. We found that each of three previously identified point mutations of the adult skeletal muscle sodium-channel gene occurred on two different dinucleotide-repeat haplotypes. These results indicate that dinucleotide-repeat haplotypes are not predictive of allelic heterogeneity in sodium channelopathies, contrary to previous suggestions. In addition, we identified a HyperPP pedigree in which the dominant disorder was not linked to the sodium-channel gene. Thus, a second locus can give rise to a similar clinical phenotype. Some individuals in this pedigree exhibited a base change causing the nonconservative substitution of an evolutionarily conserved amino acid. Because this change was not present in 240 normal chromosomes and was near another HyperPP mutation, it fulfilled the most commonly used criteria for being a mutation rather than a polymorphism. However, linkage studies using single-strand conformation polymorphism–derived and sequence-derived haplotypes excluded this base change as a causative mutation: these data serve as a cautionary example of potential pitfalls in the delineation of change-of-function point mutations.  相似文献   

3.
Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes   总被引:4,自引:0,他引:4  
As more SNP marker data becomes available, researchers have used haplotypes of markers, rather than individual polymorphisms, for association analysis of candidate genes. In order to perform haplotype analysis in a population-based case-control study, haplotypes must be determined by estimation in the absence of family information or laboratory methods for establishing phase. Here, we test the accuracy of the Expectation-Maximization (EM) algorithm for estimating haplotype state and frequency in the CCR2-CCR5 gene region by comparison with haplotype state and frequency determined by pedigree analysis. To do this, we have characterized haplotypes comprising alleles at seven biallelic loci in the CCR2-CCR5 chemokine receptor gene region, a span of 20 kb on chromosome 3p21. Three-generation CEPH families (n=40), totaling 489 individuals, were genotyped by the 5'nuclease assay (TaqMan). Haplotype states and frequencies were compared in 103 grandparents who were assumed to have mated at random. Both pedigree analysis and the EM algorithm yielded the same small number of haplotypes for which linkage disequilibrium was nearly maximal. The haplotype frequencies generated by the two methods were nearly identical. These results suggest that the EM algorithm estimation of haplotype states, frequency, and linkage disequilibrium analysis will be an effective strategy in the CCR2-CCR5 gene region. For genetic epidemiology studies, CCR2-CCR5 allele and haplotype frequencies were determined in African-American (n=30), Hispanic (n=24) and European-American (n=34) populations.  相似文献   

4.
The genotyping of mother–father–child trios is a very useful tool in disease association studies, as trios eliminate population stratification effects and increase the accuracy of haplotype inference. Unfortunately, the use of trios for association studies may reduce power, since it requires the genotyping of three individuals where only four independent haplotypes are involved. We describe here a method for genotyping a trio using two DNA pools, thus reducing the cost of genotyping trios to that of genotyping two individuals. Furthermore, we present extensions to the method that exploit the linkage disequilibrium structure to compensate for missing data and genotyping errors. We evaluated our method on trios from CEPH pedigree 66 of the Coriell Institute. We demonstrate that the error rates in the genotype calls of the proposed protocol are comparable to those of standard genotyping techniques, although the cost is reduced considerably. The approach described is generic and it can be applied to any genotyping platform that achieves a reasonable precision of allele frequency estimates from pools of two individuals. Using this approach, future trio-based association studies may be able to increase the sample size by 50% for the same cost and thereby increase the power to detect associations.  相似文献   

5.
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods.  相似文献   

6.
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.  相似文献   

7.
8.
Global mitochondrial DNA (mtDNA) data indicates that the dog originates from domestication of wolf in Asia South of Yangtze River (ASY), with minor genetic contributions from dog-wolf hybridisation elsewhere. Archaeological data and autosomal single nucleotide polymorphism data have instead suggested that dogs originate from Europe and/or South West Asia but, because these datasets lack data from ASY, evidence pointing to ASY may have been overlooked. Analyses of additional markers for global datasets, including ASY, are therefore necessary to test if mtDNA phylogeography reflects the actual dog history and not merely stochastic events or selection. Here, we analyse 14,437 bp of Y-chromosome DNA sequence in 151 dogs sampled worldwide. We found 28 haplotypes distributed in five haplogroups. Two haplogroups were universally shared and included three haplotypes carried by 46% of all dogs, but two other haplogroups were primarily restricted to East Asia. Highest genetic diversity and virtually complete phylogenetic coverage was found within ASY. The 151 dogs were estimated to originate from 13-24 wolf founders, but there was no indication of post-domestication dog-wolf hybridisations. Thus, Y-chromosome and mtDNA data give strikingly similar pictures of dog phylogeography, most importantly that roughly 50% of the gene pools are shared universally but only ASY has nearly the full range of genetic diversity, such that the gene pools in all other regions may derive from ASY. This corroborates that ASY was the principal, and possibly sole region of wolf domestication, that a large number of wolves were domesticated, and that subsequent dog-wolf hybridisation contributed modestly to the dog gene pool.  相似文献   

9.
Sequencing of the mtDNA control region (385 or 695 bp) of 212 Lipizzans from eight studs revealed 37 haplotypes. Distribution of haplotypes among studs was biased, including many private haplotypes but only one haplotype was present in all the studs. According to historical data, numerous Lipizzan maternal lines originating from founder mares of different breeds have been established during the breed''s history, so the broad genetic base of the Lipizzan maternal lines was expected. A comparison of Lipizzan sequences with 136 sequences of domestic- and wild-horses from GenBank showed a clustering of Lipizzan haplotypes in the majority of haplotype subgroups present in other domestic horses. We assume that haplotypes identical to haplotypes of early domesticated horses can be found in several Lipizzan maternal lines as well as in other breeds. Therefore, domestic horses could arise either from a single large population or from several populations provided there were strong migrations during the early phase after domestication. A comparison of Lipizzan haplotypes with 56 maternal lines (according to the pedigrees) showed a disagreement of biological parentage with pedigree data for at least 11% of the Lipizzans. A distribution of haplotype-frequencies was unequal (0.2%–26%), mainly due to pedigree errors and haplotype sharing among founder mares.  相似文献   

10.
An immotile short tail sperm defect has recently been identified as a hereditary disorder present within the Finnish Yorkshire pig population. The syndrome is inherited as an autosomal recessive disease exclusively expressed in male individuals as shorter sperm tail length and immotile spermatozoa. Based on the assumption of a recent common origin of the disease-causing mutation, a genome-wide search was performed with 228 evenly spaced microsatellites by homozygosity mapping of affected and unaffected DNA pools. One locus, SW2411 on Chr 16, demonstrated a significantly skewed allele distribution between the two pools. Linkage analysis of five markers in this region mapped the disease-causing gene within a 6-cM confidence interval region with a highest LOD score of 7.7 at marker SW419. It appears that three-marker haplotypes can be used for marker-assisted selection within analyzed pedigrees. Furthermore, future fine mapping may reveal a more precise population-wide associated haplotype and facilitate identification of a new gene affecting sperm tail development. Received: 5 July 2001 / Accepted: 13 September 2001  相似文献   

11.
We present a method to identify molecular markers linked to a genomic interval in outbred pedigrees. Using information from fully informative RFLP markers on a single linkage group containing a quantitative trait locus for wood specific gravity, we constructed four DNA pools from nonrecombinant progeny of a three-generation outbred pedigree. The four pools were screened to identify linked RAPD markers. The phase and zygosity of a linked RAPD marker could be determined directly from the array of RAPD bands present or absent in the four pools. Two hundred fifty-six primers were tested on the four DNA pools, revealing 61 putatively linked loci. Nine RAPD loci were linked to the genomic interval. The approach developed here could be generally applied to saturation mapping in outbred pedigrees where fully informative markers have previously been mapped.  相似文献   

12.
Multipoint linkage analysis is a powerful method for mapping a rare disease gene on the human gene map despite limited genotype and pedigree data. However, there is no standard procedure for determining a confidence interval for gene location by using multipoint linkage analysis. A genetic counselor needs to know the confidence interval for gene location in order to determine the uncertainty of risk estimates provided to a consultant on the basis of DNA studies. We describe a resampling, or "bootstrap," method for deriving an approximate confidence interval for gene location on the basis of data from a single pedigree. This method was used to define an approximate confidence interval for the location of a gene causing nonsyndromal X-linked mental retardation in a single pedigree. The approach seemed robust in that similar confidence intervals were derived by using different resampling protocols. Quantitative bounds for the confidence interval were dependent on the genetic map chosen. Once an approximate confidence interval for gene location was determined for this pedigree, it was possible to use multipoint risk analysis to estimate risk intervals for women of unknown carrier status. Despite the limited genotype data, the combination of the resampling method and multipoint risk analysis had a dramatic impact on the genetic advice available to consultants.  相似文献   

13.
Single-copy nuclear DNA sequences have high potential as a source of genetic markers for population analyses. However, the difficulties that arise when haplotypes that are the product of recombinational rearrangements are present require additional consideration. Two statistical methods for identifying potential recombinants by detecting anomalies in the distribution of variable sites along sequences were used to screen sequences from a single-copy nuclear DNA fragment, cpnl-1, of the European meadow grasshopper (Chorthippus parallelus). Five of the 71 haplotypes in the cpnl-1 data set showed nonrandom distribution of polymorphic sites using both methods. The second method pinpointed an additional four haplotypes. Estimates of the rate of recombination in the entire data set were obtained using standard methods. It is concluded that cpnl-1 haplotypes have been involved in recombination or gene conversion events at a rate more than twice the mutation rate. This confirms that recombination and gene conversion are significant factors in the generation of haplotype variation in nuclear gene sequences. The cpnl-1 haplotypes identified by the tests were present only in populations that have had recent contact; the Balkan and Turkish refugial populations and their post-glacial colonies to the north. This is discussed in relation to the phylogenetic inferences drawn from the same data in a previous report.  相似文献   

14.
beta-Globin gene haplotypes obtained in Polynesian Samoans were similar to those described in Southern Chinese. An atypical HindIII restriction fragment length polymorphism detected with pRK29, a 3' beta-globin gene probe, was present at a gene frequency of 7% in Samoans. Haplotype patterns suggest that this polymorphism may have arisen by 1 or 2 mutational events. DNA haplotypes derived from the beta-globin gene cluster confirm nuclear and mitochondrial DNA data that Polynesian precursor populations were East Asian in origin.  相似文献   

15.
Using computer simulations, we generated and analyzed genetic distances among selectively neutral haplotypes transmitted through gene genealogies with random-mating organismal pedigrees. Constraints and possible biases on haplotype distances due to correlated ancestry were evaluated by comparing observed distributions of distances to those predicted from an inbreeding theory that assumes independence among haplotype pairs. Results suggest that: 1) mean time to common ancestry of neutral haplotypes can be a reasonably good predictor of evolutionary effective population size; 2) the nonindependence of haplotype paths of descent within a given gene genealogy typically produces significant departures from the theoretical probability distributions of haplotype distances; 3) frequency distributions of distances between haplotypes drawn from “replicate” organismal pedigrees or from multiple unlinked loci within an organismal pedigree exhibit very close agreement with the theory for independent haplotypes. These results are relevant to interpretations of current molecular data on genetic distances among nonrecombining haplotypes at either nuclear or cytoplasmic loci.  相似文献   

16.
The aims of this research were to assess the genetic structure of wild Phaseolus lunatus L. in the Americas and the hypothesis of a relatively recent Andean origin of the species. For this purpose, nuclear and non-coding chloroplast DNA markers were analyzed in a collection of 59 wild Lima bean accessions and six allied species. Twenty-three chloroplast and 28 nuclear DNA haplotypes were identified and shown to be geographically structured. Three highly divergent wild Lima bean gene pools, AI, MI, and MII, with mostly non-overlapping geographic ranges, are proposed. The results support an Andean origin of wild Lima beans during Pleistocene times and an early divergence of the three gene pools at an age that is posterior to completion of the Isthmus of Panama and major Andean orogeny. Gene pools would have evolved and reached their current geographic distribution mainly in isolation and therefore are of high priority for conservation and breeding programs.  相似文献   

17.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

18.
MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).  相似文献   

19.
Subtelomeric regions of human chromosomes are the sites of increased meiotic recombination and have a male-to-female recombination ratio that is higher than elsewhere in the genome. We isolated two novel, polymorphic CA repeat markers from the distal part of the immunoglobulin heavy chain gene cluster, approximately 90 and 200 kb from the telomere of chromosome 14q. The 14q telomere was unambiguously located by physical mapping of telomeric YACs andBal31 exonuclease digestion of genomic DNA. We then constructed haplotypes using genotype data from these markers and data from sCAW1 (D14S826) for use as a highly polymorphic genetic marker. Linkage analysis using the 40 pedigree CEPH reference panel and genotype data from these and other loci physically mapped to the terminal 1.5 Mb of chromosome 14q revealed an apparent increase in meiotic recombination within this region, relative to the average rate for the genome. Further, we found that recombination was higher in females than in males, indicating that the subtelomeric region of 14q differs from other human subtelomeric regions.  相似文献   

20.
A population genetic study of the polymorphism in the first hypervariable segment (HVSI) of mitochondrial DNA control region was carried out for three ethnic populations of the Volga-Ural region, Bashkirs, Russians, and Komi-Permyaks. This analysis showed that most of the mtDNA HVSI haplotypes detected in the populations of Bashkirs, Russians and Komi-Permyaks contained the combinations of nucleotide substitutions detected earlier in Asian, European, and Finno-Ugric populations. These findings are consistent with historical, anthropological, and ethnographical data suggesting the presence of European and Mongoloid components of different geographical descent in the gene pool of the contemporary population of the Volga-Ural region. The data on the genetic structure and the phylogenetic relationships between populations of the Volga-Ural region based on modern molecular genetic methods of mitochondrial genome investigation would be a substantial addition to the already existing information for some other regions of Europe and Asia. These data would provide more complete examination of the development of interethnic diversity of mitochondrial gene pools of contemporary ethnic populations with the purpose of reconstructing the genetic demographic processes that accompanied peopling of the Middle Ural and Volga region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号