首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Haplotype block structure is conserved across mammals   总被引:2,自引:0,他引:2  
Genetic variation in genomes is organized in haplotype blocks, and species-specific block structure is defined by differential contribution of population history effects in combination with mutation and recombination events. Haplotype maps characterize the common patterns of linkage disequilibrium in populations and have important applications in the design and interpretation of genetic experiments. Although evolutionary processes are known to drive the selection of individual polymorphisms, their effect on haplotype block structure dynamics has not been shown. Here, we present a high-resolution haplotype map for a 5-megabase genomic region in the rat and compare it with the orthologous human and mouse segments. Although the size and fine structure of haplotype blocks are species dependent, there is a significant interspecies overlap in structure and a tendency for blocks to encompass complete genes. Extending these findings to the complete human genome using haplotype map phase I data reveals that linkage disequilibrium values are significantly higher for equally spaced positions in genic regions, including promoters, as compared to intergenic regions, indicating that a selective mechanism exists to maintain combinations of alleles within potentially interacting coding and regulatory regions. Although this characteristic may complicate the identification of causal polymorphisms underlying phenotypic traits, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.  相似文献   

2.
There is considerable interest in identifying and characterizing block-like patterns of linkage disequilibrium (LD; haplotype blocks) in the human genome as these may facilitate the identification of complex disease genes via genome-wide association studies. Although recombination hot-spots have been suggested as the primary mechanism to explain the block-like pattern of LD, other forces, such as genetic drift, may also be important. To this end, we have studied the effect of various recombination models on patterns of LD by using extensive simulations. As expected, haplotype blocks were observed under a model allowing recombination hot-spots. However, we also observed similar block-like patterns in the models where recombination crossovers are randomly and uniformly distributed, and we demonstrate that these blocks are generated by genetic drift. We caution that genetic drift may be an alternative mechanism (in addition to recombination hot-spots) that can lead to block-like patterns of LD. Our findings highlight the necessity of characterizing haplotype blocks in world-wide populations.  相似文献   

3.
Genome-wide association (GWA) studies represent a powerful strategy for identifying susceptibility genes for complex diseases in human populations but results must be confirmed and replicated. Because of the close homology between mouse and human genomes, the mouse can be used to add evidence to genes suggested by human studies. We used the mouse quantitative trait loci (QTL) map to interpret results from a GWA study for genes associated with plasma HDL cholesterol levels. We first positioned single nucleotide polymorphisms (SNPs) from a human GWA study on the genomic map for mouse HDL QTL. We then used mouse bioinformatics, sequencing, and expression studies to add evidence for one well-known HDL gene (Abca1) and three newly identified genes (Galnt2, Wwox, and Cdh13), thus supporting the results of the human study. For GWA peaks that occur in human haplotype blocks with multiple genes, we examined the homologous regions in the mouse to prioritize the genes using expression, sequencing, and bioinformatics from the mouse model, showing that some genes were unlikely candidates and adding evidence for candidate genes Mvk and Mmab in one haplotype block and Fads1 and Fads2 in the second haplotype block. Our study highlights the value of mouse genetics for evaluating genes found in human GWA studies.  相似文献   

4.
Patterns of polymorphism and linkage disequilibrium in cultivated barley   总被引:1,自引:0,他引:1  
We carried out a genome-wide analysis of polymorphism (4,596 SNP loci across 190 elite cultivated accessions) chosen to represent the available genetic variation in current elite North West European and North American barley germplasm. Population sub-structure, patterns of diversity and linkage disequilibrium varied considerably across the seven barley chromosomes. Gene-rich and rarely recombining haplotype blocks that may represent up to 60% of the physical length of barley chromosomes extended across the ‘genetic centromeres’. By positioning 2,132 bi-parentally mapped SNP markers with minimum allele frequencies higher than 0.10 by association mapping, 87.3% were located to within 5 cM of their original genetic map position. We show that at this current marker density genetically diverse populations of relatively small size are sufficient to fine map simple traits, providing they are not strongly stratified within the sample, fall outside the genetic centromeres and population sub-structure is effectively controlled in the analysis. Our results have important implications for association mapping, positional cloning, physical mapping and practical plant breeding in barley and other major world cereals including wheat and rye that exhibit comparable genome and genetic features.  相似文献   

5.
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.  相似文献   

6.
A significant proportion of the human genome is contained within haplotype blocks across which pairwise linkage disequilibrium (LD) is very high. However, LD is also often high between markers at more remote distances, and within different haplotype blocks. Here, we evaluate the origins of haplotype block structure in the three genes for alpha1 adrenergic receptors (alpha1-AR) in the human genome ( ADRA1A, ADRA1B and ADRA1D) by genotyping dense single-nucleotide polymorphism (SNP) marker maps, and show that LD signals between distant markers are due to the presence of extended haplotype superblocks in individuals with ancient chromosomes which have escaped historic recombination. ARs mediate the physiological effects of epinephrine and norepinephrine, and are targets of many therapeutic drugs. This work has identified haplotype backgrounds of alpha1-AR missense variants, haplotype block structures in US Caucasians and African Americans, and haplotype tag SNPs for each block, and we present strong evidence for ancient haplotype block superstructure at these genes which has been partially disrupted by recombination, and evidence for reinstatement of linkage disequilibrium by subsequent recombination events. ADRA1A is comprised of four haplotype blocks in US Caucasians, while in African Americans Block 1 is split. ADRA1B has four blocks in US Caucasians, but in African Americans only the first two blocks are present. ADRA1D has two blocks in US Caucasians, and the first block is replaced by two smaller blocks in African Americans. For both ADRA1A and ADRA1B, haplotype superstructures may represent a novel, higher-level hierarchy in the human genome, which may reduce redundancy of testing by further aggregation of genotype data.Electronic Supplementary Material Supplementary material is available in the online version of this article at Communicated by W. R. McCombie  相似文献   

7.
With the completion of the first draft of the human genome sequencing project, a new challenge is to characterize patterns of linkage disequilibrium and haplotype structure across genomic regions to identify mutations associated with complex disease. Recent work shows considerable linkage disequilibrium heterogeneity, where genomic regions of extended haplotype blocks are punctuated by recombination hotspots. In this review we explore some of the current approaches to defining and characterizing 'hapblocks', mechanisms by which hapblocks may be generated, and the implications this block-like structure may have for successfully mapping mutations associated with complex disease.  相似文献   

8.
Recent studies have suggested that a significant fraction of the human genome is contained in blocks of strong linkage disequilibrium, ranging from ~5 to >100 kb in length, and that within these blocks a few common haplotypes may account for >90% of the observed haplotypes. Furthermore, previous studies have suggested that common haplotypes in candidate genes are generally shared across populations and represent the majority of chromosomes in each population. The conclusions drawn from these preliminary studies, however, are based on an incomplete knowledge of the variation in the regions examined. To bridge this gap in knowledge, we have completely resequenced 100 candidate genes in a population of African descent and one of European descent. Although these genes have been well studied because of their medical importance, we demonstrate that a large amount of sequence variation has not yet been described. We also report that the average number of inferred haplotypes per gene, when complete data is used, is higher than in previous reports and that the number and proportion of all haplotypes represented by common haplotypes per gene is variable. Furthermore, we demonstrate that haplotypes shared between the two populations constitute only a fraction of the total number of haplotypes observed and that these shared haplotypes represent fewer of the African-descent chromosomes than was expected from previous studies. Finally, we show that restricting variation discovery to coding regions does not adequately describe all common haplotypes or the true haplotype block structure observed when all common variation is used to infer haplotypes. These data, derived from complete knowledge of genetic variation in these genes, suggest that the haplotype architecture of candidate genes across the human genome is more complex than previously suggested, with important implications for candidate gene and genomewide association studies.  相似文献   

9.
There is currently a great deal of interest in using linkage disequilibrium (LD) mapping to locate both disease and quantitative-trait loci on a genomewide scale. Recent findings suggest that much of the human genome is organized in discrete "blocks" of low haplotype diversity, but the utility of such blocks in identifying genes influencing complex traits is not yet known and must ultimately be tested empirically through use of real data. We recently identified a putative functional polymorphism (-1021C-->T) in the 5' upstream region of the DBH gene that accounted for 35%-52% of the total phenotypic variance in plasma dopamine beta-hydroxylase (DBH) activity in samples from three distinct populations. In the present study, we genotyped 11 diallelic markers at the DBH locus surrounding -1021C-->T in 386 unrelated individuals of European origin. We identified a single 10-kb block containing -1021C-->T, in which four haplotypes comprised 93% of the observed chromosomes. Only markers within the block were highly associated with phenotype (P< or =2.2 x 10(-10)), with one exception. In general, association with phenotype was strongly correlated with the degree of LD between each marker and -1021C-->T. Of four LD measures assessed, d(2) was the best predictor of this relationship. Had one attempted to map quantitative-trait loci for plasma DBH activity on a genomewide basis without prior knowledge of candidate regions and not included (by chance) markers within this haplotype block, the DBH locus might have been missed entirely. These results provide a direct example of the potential value of constructing a haplotype map of the human genome prior to embarking on large-scale association studies.  相似文献   

10.
Autoimmune, inflammatory, and infectious diseases present a major burden to human health and are frequently associated with loci in the human major histocompatibility complex (MHC). Here, we report a high-resolution (1.9 kb) linkage-disequilibrium (LD) map of a 4.46-Mb fragment containing the MHC in U.S. pedigrees with northern and western European ancestry collected by the Centre d'Etude du Polymorphisme Humain (CEPH) and the first generation of haplotype tag single-nucleotide polymorphisms (tagSNPs) that provide up to a fivefold increase in genotyping efficiency for all future MHC-linked disease-association studies. The data confirm previously identified recombination hotspots in the class II region and allow the prediction of numerous novel hotspots in the class I and class III regions. The region of longest LD maps outside the classic MHC to the extended class I region spanning the MHC-linked olfactory-receptor gene cluster. The extended haplotype homozygosity analysis for recent positive selection shows that all 14 outlying haplotype variants map to a single extended haplotype, which most commonly bears HLA-DRB1*1501. The SNP data, haplotype blocks, and tagSNPs analysis reported here have been entered into a multidimensional Web-based database (GLOVAR), where they can be accessed and viewed in the context of relevant genome annotation. This LD map allowed us to give coordinates for the extremely variable LD structure underlying the MHC.  相似文献   

11.
Analysis of data on 1000 Holstein-Friesian bulls genotyped for 15,036 single-nucleotide polymorphisms (SNPs) has enabled genomewide identification of haplotype blocks and tag SNPs. A final subset of 9195 SNPs in Hardy-Weinberg equilibrium and mapped on autosomes on the bovine sequence assembly (release Btau 3.1) was used in this study. The average intermarker spacing was 251.8 kb. The average minor allele frequency (MAF) was 0.29 (0.05-0.5). Following recent precedents in human HapMap studies, a haplotype block was defined where 95% of combinations of SNPs within a region are in very high linkage disequilibrium. A total of 727 haplotype blocks consisting of > or =3 SNPs were identified. The average block length was 69.7 +/- 7.7 kb, which is approximately 5-10 times larger than in humans. These blocks comprised a total of 2964 SNPs and covered 50,638 kb of the sequence map, which constitutes 2.18% of the length of all autosomes. A set of tag SNPs, which will be useful for further fine-mapping studies, has been identified. Overall, the results suggest that as many as 75,000-100,000 tag SNPs would be needed to track all important haplotype blocks in the bovine genome. This would require approximately 250,000 SNPs in the discovery phase.  相似文献   

12.
With the availability of the HapMap--a resource which describes common patterns of linkage disequilibrium (LD) in four different human population samples, we now have a powerful tool to help dissect the role of genetic variation in the biology of the genome. HapMap is entirely complimentary to the human genome map and so it is particularly fitting that it should be viewed in a full genomic context. However, characterization of high resolution LD across the genome can be a challenging task, owing in part to the sheer volume of data and the inherent dimensionality that its analysis entails. However, a number of tools are now available to make this task easier for researchers. This review will examine tools for viewing and analysing haplotype and LD data, enabling a number of tasks; including identification of optimal sets of haplotype tagging single nucleotide polymorphisms (SNPs); drawing links between associated SNPs and putative causal alleles; or simply viewing LD and haplotypes across a gene or region of interest. The data generated by the HapMap also has other important applications, informing, for example, on the demographic history and evidence of selection in human populations and on previously undetected regulatory relationships and gene networks. All of these properties make the HapMap no less an important resource than the human genome sequence itself and so this makes it essential viewing for all in the field of human biology.  相似文献   

13.
Variation in gene expression may give rise to a significant fraction of inter-individual phenotypic variation. Studies searching for the underlying genetic controls for such variation have been conducted in model organisms and humans in recent years. In our previous effort of assessing conserved underlying haplotype patterns across ethnic populations, we constructed common haplotypes using SNPs having conserved linkage disequilibrium (LD) across ethnic populations. These common haplotypes cluster into a simple evolutionary structure based on their frequencies, defining only up to three conserved clusters termed 'haplotype frameworks'. One intriguing preliminary finding was that a significant portion of reported variants strongly associated with cis-regulation tags these globally conserved haplotype frameworks. Here we expand the investigation by collecting genes showing stringently determined cis-association between genotypes and expression phenotypes from major studies. We conducted phylogenetic analysis of current major haplotypes along with the corresponding haplotypes derived from chimpanzee reference sequences. Our analysis reveals that, for the vast majority of such cis-regulatory genes, the tagging SNPs showing the strongest association also tag the haplotype lineages directly separated from ancestry, inferred from either chimpanzee reference sequences or the allele frequency-derived haplotype frameworks, suggesting that the differentially expressed phenotypes were evolved relatively early in human history. Such evolutionary signatures provide keys for a more effective identification of globally-conserved candidate regulatory haplotypes across human genes in future epidemiologic and pharmacogenetic studies.  相似文献   

14.
Recently, genomic data have revealed a "block-like" structure of haplotype diversity on human chromosomes. This structure is anticipated to facilitate gene mapping studies, because strong associations among loci within a block may allow haplotype variation to be tagged with a limited number of markers. But its usefulness to mapping efforts depends on the consistency of the block structure within and among populations, which in turn depends on how the block structure arises. Recombination hot spots are generally thought to underlie the block structure, but haplotype blocks can also develop stochastically under random recombination, in which case the block structure will show limited consistency among populations. Using coalescent models, which we upscaled to simulate the evolution of haplotypes with many markers at fixed distances, we show that the relationship between block boundaries and historic recombination intensity may be surprisingly weak. The majority of historic recombinations do not leave a footprint in present-day linkage disequilibrium patterns, and the block structure is sensitive to factors that affect the timing of recombination relative to marker mutation events in the genealogy, such as marker frequency bias and historic population size changes. Our results give insight into the potential of stochastic events to affect haplotype block structure, which can limit the usefulness of the block structure to mapping studies.  相似文献   

15.
The majority of complete hydatidiform moles (CHMs) harbor duplicated haploid genomes that originate from sperm. This makes CHMs more advantageous than conventional diploid cells for determining haplotypes of SNPs and copy-number variations (CNVs), because all of the genetic variants in a CHM genome are homozygous. Here we report SNP and CNV haplotype structures determined by analysis of 100 CHMs from Japanese subjects via high-density DNA arrays. The obtained haplotype map should be useful as a reference for the haplotype structure of Asian populations. We resolved common CNV regions (merged CNV segments across the examined samples) into CNV events (clusters of CNV segments) on the basis of mutual overlap and found that the haplotype backgrounds of different CNV events within the same CNV region were predominantly similar, perhaps because of inherent structural instability.  相似文献   

16.
OBJECTIVE: The presence of linkage disequilibrium (LD) forms the basis for a range of uses, including the fine-mapping of diseases and studies on human genealogy. Recent findings indicate that single nucleotide polymorphisms (SNP) can occur in blocks of limited haplotypic diversity with high degrees of LD. Commonly used measures for LD, such as r(2) and D', consider only two loci and might miss information to appropriately describe LD in larger haplotypic structures. METHODS: We introduce the Normalized Entropy Difference, epsilon, as a new multilocus measure for LD. A related quantity, deltaS, provides an approximate chi(2) test for the significance of LD. The ability of the measure to detect haplotype blocks is investigated using simulated data sets as well as a real data set previously analyzed by Daly et al. (2001). RESULTS: epsilon allows for arbitrary numbers of loci, describes LD with regard to the loci sequence, and can be interpreted as a multilocus extension of r(2). The application of epsilon to the data sets demonstrated the measure's ability to appropriately describe simultaneous multilocus LD and to detect haplotype blocks. CONCLUSIONS: epsilon is a reasonable multilocus LD measure and might be of potential use in the construction of the human haplotype map.  相似文献   

17.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

18.
Phosphatase and tensin homolog deleted on chromosome 10 (PTEN) encodes a tumor-suppressor phosphatase frequently mutated in both sporadic and heritable forms of human cancer. Germline mutations are associated with a number of heritable cancer syndromes that are jointly referred to as the "PTEN hamartoma tumor syndrome" (PHTS) and include Cowden syndrome, Bannayan-Riley-Ruvalcaba syndrome, Proteus syndrome, and Proteus-like syndrome. Germline PTEN mutations have been identified in a significant proportion of patients with PHTS; however, there are still many individuals with classic diagnostic features for whom mutations have yet to be identified. To address this, we took a haplotype-based approach and investigated the association of specific genomic regions of the PTEN locus with PHTS. We found this locus to be characterized by three distinct haplotype blocks 33 kb, 65 kb, and 43 kb in length. Comparisons of the haplotype distributions for all three blocks differed significantly among patients with PHTS and controls (P=.0098, P<.0001, and P<.0001 for blocks 1, 2, and 3, respectively). "Rare" haplotype blocks and extended haplotypes account for two-to-threefold more PHTS chromosomes than control chromosomes. PTEN mutation-negative patients are strongly associated with a haplotype block spanning a region upstream of PTEN and the gene's first intron (P=.0027). Furthermore, allelic combinations contribute to the phenotypic complexity of this syndrome. Taken together, these data suggest that specific haplotypes and rare alleles underlie the disease etiology in these sample populations; constitute low-penetrance, modifying loci; and, specifically in the case of patients with PHTS for whom traditional mutations have yet to be identified, may harbor pathogenic variant(s) that have escaped detection by standard PTEN mutation-scanning methodologies.  相似文献   

19.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

20.
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号