首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Population-genetic basis of haplotype blocks in the 5q31 region   总被引:3,自引:0,他引:3       下载免费PDF全文
We investigated patterns of nucleotide variation in the 5q31 region identified by Daly et al. as containing haplotype blocks, to determine whether the blocklike pattern requires the assumption of hotspots in recombination. Using extensive simulations that generate data matched to the Daly et al. data set in (a) the method of ascertainment of single-nucleotide polymorphisms, (b) the heterozygosity of ascertained markers, (c) the number of block boundaries, and (d) the diversity of haplotypes within blocks, we show that the patterns found in the Daly et al. data are not consistent with the assumption of uniform recombination in a population of constant size but are consistent either with the presence of hotspots in a population of constant size or with the absence of hotspots if there was a period of rapid population growth. We further show that estimates of local recombination rate can distinguish between population growth and hotspots as the primary cause of a blocklike pattern. Estimates of local recombination rates for the Daly et al. data do not indicate the presence of recombination hotspots.  相似文献   

2.
MOTIVATION: Missing data in genotyping single nucleotide polymorphism (SNP) spots are common. High-throughput genotyping methods usually have a high rate of missing data. For example, the published human chromosome 21 data by Patil et al. contains about 20% missing SNPs. Inferring missing SNPs using the haplotype block structure is promising but difficult because the haplotype block boundaries are not well defined. Here we propose a global algorithm to overcome this difficulty. RESULTS: First, we propose to use entropy as a measure of haplotype diversity. We show that the entropy measure combined with a dynamic programming algorithm produces better haplotype block partitions than other measures. Second, based on the entropy measure, we propose a two-step iterative partition-inference algorithm for the inference of missing SNPs. At the first step, we apply the dynamic programming algorithm to partition haplotypes into blocks. At the second step, we use an iterative process similar to the expectation-maximization algorithm to infer missing SNPs in each haplotype block so as to minimize the block entropy. The algorithm iterates these two steps until the total block entropy is minimized. We test our algorithm in several experimental data sets. The results show that the global approach significantly improves the accuracy of the inference. AVAILABILITY: Upon request.  相似文献   

3.
Recent studies suggest that haplotypes are arranged into discrete blocklike structures throughout the human genome. Here, we present an alternative haplotype block definition that assumes no recombination within each block but allows for recombination between blocks, and we use it to study the combined effects of demographic history and various population genetic parameters on haplotype block characteristics. Through extensive coalescent simulations and analysis of published haplotype data on chromosome 21, we find that (1) the combined effects of population demographic history, recombination, and mutation dictate haplotype block characteristics and (2) haplotype blocks can arise in the absence of recombination hot spots. Finally, we provide practical guidelines for designing and interpreting studies investigating haplotype block structure.  相似文献   

4.
Haplotype reconstruction from genotype data using Imperfect Phylogeny   总被引:13,自引:0,他引:13  
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/  相似文献   

5.
The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).  相似文献   

6.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

7.
HaploBlockFinder: haplotype block analyses   总被引:8,自引:0,他引:8  
Recent studies have unveiled discrete block-like structures of linkage disequilibrium (LD) in the human genome. We have developed a set of computer programs to analyze the block-like LD structures (haplotype blocks) based on haplotype data. Three definitions of haplotype block are supported, including minimal LD range, no historic recombination, and chromosome coverage. Tagged SNPs that uniquely distinguish common haplotypes are identified. A greedy algorithm was used to improve the efficiency. Two separate utilities were also provided to assist visual inspection of haplotype block structure and pattern of linkage disequilibrium. AVAILABILITY: A web interface for the HaploBlockFinder is available at http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi the source codes are also freely available on the web site.  相似文献   

8.
The major histocompatibility complex (MHC) consists of polymorphic frozen blocks (PFBs) that are linked to form megabase haplotypes. These blocks consist of polymorphic sequences and define regions where recombination appears to be inhibited. We have been able to show, using a highly polymorphic sequence centromeric of HLA-B (within the beta block), that PFBs are conserved and contain specific insertions/deletions and substitutions that are the same for individuals with the same MHC haplotype but that differ between at least most different haplotypes. A sequence comparison between ethnic-specific haplotypes shows that these sequences have remained stable and predate the formation of these haplotypes. To determine whether the same conserved block has been involved in the generation of multiple haplotypes, we compared the block typing profiles of different ethnic specific haplotypes. Block typing profiles have previously been shown to be identical in individuals with the same MHC haplotype but, generally, to differ between different haplotypes. It was found that some PFBs are common to more than one haplotype, implying a common ancestry. Subsequently, haplotypes have been generated by the shuffling and exchange of these PFBs. The regions between these PFBs appear to permit the recombination sites and therefore could be expected to exhibit either low polymorphism or a localized ``hotspot.' Received: 20 January 1997 / Accepted: 11 March 1997  相似文献   

9.
Recently, genomic data have revealed a "block-like" structure of haplotype diversity on human chromosomes. This structure is anticipated to facilitate gene mapping studies, because strong associations among loci within a block may allow haplotype variation to be tagged with a limited number of markers. But its usefulness to mapping efforts depends on the consistency of the block structure within and among populations, which in turn depends on how the block structure arises. Recombination hot spots are generally thought to underlie the block structure, but haplotype blocks can also develop stochastically under random recombination, in which case the block structure will show limited consistency among populations. Using coalescent models, which we upscaled to simulate the evolution of haplotypes with many markers at fixed distances, we show that the relationship between block boundaries and historic recombination intensity may be surprisingly weak. The majority of historic recombinations do not leave a footprint in present-day linkage disequilibrium patterns, and the block structure is sensitive to factors that affect the timing of recombination relative to marker mutation events in the genealogy, such as marker frequency bias and historic population size changes. Our results give insight into the potential of stochastic events to affect haplotype block structure, which can limit the usefulness of the block structure to mapping studies.  相似文献   

10.
11.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes.  相似文献   

12.
Recombination is an important evolutionary mechanism responsible for creating the patterns of haplotype variation observable in human populations. Recently, there has been extensive research on understanding the fine-scale variation in recombination across the human genome using DNA polymorphism data. Historical recombination events leave signature patterns in haplotype data. A nonparametric approach for estimating the number of historical recombination events is to compute the minimum number of recombination events in the history of a set of haplotypes. In this paper, we provide new and improved methods for computing lower bounds on the minimum number of recombination events. These methods are shown to detect a higher number of recombination events for a haplotype dataset from a region in the lipoprotein lipase gene than previous lower bounds. We apply our methods to two datasets for which recombination hotspots have been experimentally determined and demonstrate a high density of detectable recombination events in the regions annotated as recombination hotspots. The programs implementing the methods in this paper are available at www.cs.ucsd.edu/users/vibansal/RecBounds/.  相似文献   

13.
The haplotype block structure of SNP variation in human DNA has been demonstrated by several recent studies. The presence of haplotype blocks can be used to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.  相似文献   

14.
Bayesian logistic regression using a perfect phylogeny   总被引:1,自引:0,他引:1  
Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. A perfect phylogeny demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotype blocks. Our approach extends the logic regression technique of Ruczinski and others (2003) to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. Environmental factors, as well as their interactions with SNPs, may be incorporated into the regression framework. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of sarcoidosis.  相似文献   

15.
We present a new stochastic model for genotype generation. The model offers a compromise between rigid block structure and no structure altogether: It reflects a general blocky structure of haplotypes, but also allows for "exchange" of haplotypes at nonboundary SNP sites; it also accommodates rare haplotypes and mutations. We use a hidden Markov model and infer its parameters by an expectation-maximization algorithm. The algorithm was implemented in a software package called HINT (haplotype inference tool) and tested on 58 datasets of genotypes. To evaluate the utility of the model in association studies, we used biological human data to create a simple disease association search scenario. When comparing HINT to three other models, HINT predicted association most accurately.  相似文献   

16.
Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver (http://www.cs.columbia.edu/compbio/hap/) is publicly available for predicting haplotypes from genotype data and partitioning genotype data into blocks.  相似文献   

17.
Shifting from the analysis of single nucleotide polymorphisms to the reconstruction of selected haplotypes greatly facilitates the interpretation of evolve and resequence (E&R) experiments. Merging highly correlated hitchhiker SNPs into haplotype blocks reduces thousands of candidates to few selected regions. Current methods of haplotype reconstruction from Pool‐seq data need a variety of data‐specific parameters that are typically defined ad hoc and require haplotype sequences for validation. Here, we introduce haplovalidate, a tool which detects selected haplotypes in Pool‐seq time series data without the need for sequenced haplotypes. Haplovalidate makes data‐driven choices of two key parameters for the clustering procedure, the minimum correlation between SNPs constituting a cluster and the window size. Applying haplovalidate to simulated E&R data reliably detects selected haplotype blocks with low false discovery rates. Importantly, our analyses identified a restriction of the haplotype block‐based approach to describe the genomic architecture of adaptation. We detected a substantial fraction of haplotypes containing multiple selection targets. These blocks were considered as one region of selection and therefore led to underestimation of the number of selection targets. We demonstrate that the separate analysis of earlier time points can significantly increase the separation of selection targets into individual haplotype blocks. We conclude that the analysis of selected haplotype blocks has great potential for the characterization of the adaptive architecture with E&R experiments.  相似文献   

18.
High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1∼22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework. Supported by the National Natural Science Foundation of China (Grant Nos. 30570424, 60601010 and 30600367), the National High-Tech Research and Development Program of China, (Grant No.2007AA02Z329), the Key Science and Technology Program of Heilongjiang Province(Grant No.GB03C602-4), Natural Science Foundation of Heilongjiang Province (Grant No. F2008-02), Youth Science Foundation of Harbin Medical University (Grant No. 060045) and Science Foundation of Heilongjiang Province Education Department (Grant Nos. 11531113 and 1152hq28).  相似文献   

19.
Meiotic recombination is not random in the proximal region of the mouse major histocompatibility complex (MHC). It is clustered at four restricted positions, so-called hotspots. Some of the MHC haplotypes derived from Asian wild mice enhance recombination at the hotspots in genetic crosses with standard MHC haplotypes of laboratory mouse strains. In particular, the wm7 haplotype derived from Japanese wild mouse indicated an approximately 2% recombination frequency within a 1.2 kb fragment of DNA in the interval between the Pb and Ob genes. Interestingly, this enhancement of recombination was observed only in female meiosis but not in male meiosis. Mating experiments demonstrated that the wm7 haplotype carries a genetic factor in the region proximal to the hotspot, which instigates recombination. In addition, the wm7 haplotype has a genetic factor located in the region distal to the hotspot, which suppresses recombination. From the molecular characterization of the two hotspots located in the Eb gene and the Pb-Ob interval, it appeared that there are several common molecular elements, the consensus of the middle repetitive MT-family, TCTG or CCTG tetramer repeats, and the solitary long terminal repeat (LTR) of mouse retrovirus.  相似文献   

20.
Patterns of linkage disequilibrium in the MHC region on human chromosome 6p   总被引:5,自引:0,他引:5  
Single nucleotide polymorphisms (SNPs) in the human genome are thought to be organised into blocks of high internal linkage disequilibrium (LD), separated by intermittent recombination hotspots. Since understanding haplotype structure is critical for an accurate assessment of inter-individual genetic differences, we investigated up to 968 SNPs from a 10-Mb region on chromosome 6p21, including the human major histocompatibility complex (MHC), in five different population samples (45–550 individuals). Regions of well-defined block structure were found to coexist alongside large areas lacking any clear structure; occasional long-range LD was observed in all five samples. The four white populations analysed were remarkably similar in terms of the extend and spatial distribution of local LD. In US African Americans, the distribution of LD was similar to that in the white populations but the observed haplotype diversity was higher. The existence of large regions without any clear block structure renders the systematic and thorough construction of SNP haplotype maps a crucial prerequisite for disease-association studies.Electronic Supplementary Material Supplementary material is available in the online version of this article at Electronic database information: URLs for the data in this article are as follows:  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号