首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed with regions of low LD. Such LD patterns make it possible to select a set of single nucleotide polymorphism (SNPs; tag SNPs) for genome-wide association studies. We have developed a suite of computer programs to analyze the block-like LD patterns and to select the corresponding tag SNPs. Compared to other programs for haplotype block partitioning and tag SNP selection, our program has several notable features. First, the dynamic programming algorithms implemented are guaranteed to find the block partition with minimum number of tag SNPs for the given criteria of blocks and tag SNPs. Second, both haplotype data and genotype data from unrelated individuals and/or from general pedigrees can be analyzed. Third, several existing measures/criteria for haplotype block partitioning and tag SNP selection have been implemented in the program. Finally, the programs provide flexibility to include specific SNPs (e.g. non-synonymous SNPs) as tag SNPs. AVAILABILITY: The HapBlock program and its supplemental documents can be downloaded from the website http://www.cmb.usc.edu/~msms/HapBlock.  相似文献   

2.
In this paper, a new efficient algorithm is presented for haplotype block partitioning based on haplotype diversity. In this algorithm, finding the largest meaningful block that satisfies the diversity condition is the main goal as an optimization problem. The algorithm can be performed in polynomial time complexity with regard to the number of haplotypes and SNPs. We apply our algorithm on three biological data sets from chromosome 21 in three different population data sets from HapMap data bulk; the obtained results show the efficiency and better performance of our algorithm in comparison with three other well known methods.  相似文献   

3.

Background

Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more.

Results

We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes.

Conclusions

Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
  相似文献   

4.
HaploBlockFinder: haplotype block analyses   总被引:8,自引:0,他引:8  
Recent studies have unveiled discrete block-like structures of linkage disequilibrium (LD) in the human genome. We have developed a set of computer programs to analyze the block-like LD structures (haplotype blocks) based on haplotype data. Three definitions of haplotype block are supported, including minimal LD range, no historic recombination, and chromosome coverage. Tagged SNPs that uniquely distinguish common haplotypes are identified. A greedy algorithm was used to improve the efficiency. Two separate utilities were also provided to assist visual inspection of haplotype block structure and pattern of linkage disequilibrium. AVAILABILITY: A web interface for the HaploBlockFinder is available at http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi the source codes are also freely available on the web site.  相似文献   

5.

Background  

Finding the genetic causes of quantitative traits is a complex and difficult task. Classical methods for mapping quantitative trail loci (QTL) in miceuse an F2 cross between two strains with substantially different phenotype and an interval mapping method to compute confidence intervals at each position in the genome. This process requires significant resources for breeding and genotyping, and the data generated are usually only applicable to one phenotype of interest. Recently, we reported the application of a haplotype association mapping method which utilizes dense genotyping data across a diverse panel of inbred mouse strains and a marker association algorithm that is independent of any specific phenotype. As the availability of genotyping data grows in size and density, analysis of these haplotype association mapping methods should be of increasing value to the statistical genetics community.  相似文献   

6.
J Green  H C Low 《Biometrics》1984,40(2):341-348
Nonrandom inheritance of the two HLA haplotypes of Chromosome 6, available from each parent among siblings affected by certain diseases, has afforded evidence of HLA-linked disease-susceptibility genes. Two algebraically equivalent measures of HLA haplotype concordance (that is, the excessive sharing of certain haplotypes among affected siblings) are used for family studies designed to test whether or not there is significant evidence of the existence of an HLA-linked disease-susceptibility gene or for inferring the mode of inheritance when this is already believed to apply. The distributions of these measures are derived under the null hypothesis of random inheritance of HLA haplotypes, and there is a short discussion of the case in which inheritance of a diseased gene causes a change, from the purely random case, in the distribution of haplotype concordance among affected siblings.  相似文献   

7.
The haplotype block structure of SNP variation in human DNA has been demonstrated by several recent studies. The presence of haplotype blocks can be used to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.  相似文献   

8.
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.  相似文献   

9.
Greenspan G  Geiger D 《Genetics》2006,172(4):2583-2599
Models of background variation in genomic regions form the basis of linkage disequilibrium mapping methods. In this work we analyze a background model that groups SNPs into haplotype blocks and represents the dependencies between blocks by a Markov chain. We develop an error measure to compare the performance of this model against the common model that assumes that blocks are independent. By examining data from the International Haplotype Mapping project, we show how the Markov model over haplotype blocks is most accurate when representing blocks in strong linkage disequilibrium. This contrasts with the independent model, which is rendered less accurate by linkage disequilibrium. We provide a theoretical explanation for this surprising property of the Markov model and relate its behavior to allele diversity.  相似文献   

10.
In this report, we examine the validity of the haplotype block concept by comparing block decompositions derived from public data sets by variants of several leading methods of block detection. We first develop a statistical method for assessing the concordance of two block decompositions. We then assess the robustness of inferred haplotype blocks to the specific detection method chosen, to arbitrary choices made in the block-detection algorithms, and to the sample analyzed. Although the block decompositions show levels of concordance that are very unlikely by chance, the absolute magnitude of the concordance may be low enough to limit the utility of the inference. For purposes of SNP selection, it seems likely that methods that do not arbitrarily impose block boundaries among correlated SNPs might perform better than block-based methods.  相似文献   

11.
12.
Genealogies estimated from haplotypic genetic data play a prominent role in various biological disciplines in general and in phylogenetics, population genetics and phylogeography in particular. Several software packages have specifically been developed for the purpose of reconstructing genealogies from closely related, and hence, highly similar haplotype sequence data. Here, we use simulated data sets to test the performance of traditional phylogenetic algorithms, neighbour-joining, maximum parsimony and maximum likelihood in estimating genealogies from nonrecombining haplotypic genetic data. We demonstrate that these methods are suitable for constructing genealogies from sets of closely related DNA sequences with or without migration. As genealogies based on phylogenetic reconstructions are fully resolved, but not necessarily bifurcating, and without reticulations, these approaches outperform widespread 'network' constructing methods. In our simulations of coalescent scenarios involving panmictic, symmetric and asymmetric migration, we found that phylogenetic reconstruction methods performed well, while the statistical parsimony approach as implemented in TCS performed poorly. Overall, parsimony as implemented in the PHYLIP package performed slightly better than other methods. We further point out that we are not making the case that widespread 'network' constructing methods are bad, but that traditional phylogenetic tree finding methods are applicable to haplotypic data and exhibit reasonable performance with respect to accuracy and robustness. We also discuss some of the problems of converting a tree to a haplotype genealogy, in particular that it is nonunique.  相似文献   

13.
Several recent studies have suggested that linkage disequilibrium (LD) in the human genome has a fundamentally "blocklike" structure. However, thus far there has been little formal assessment of how well the haplotype block model captures the underlying structure of LD. Here we propose quantitative criteria for assessing how blocklike LD is and apply these criteria to both real and simulated data. Analyses of several large data sets indicate that real data show a partial fit to the haplotype block model; some regions conform quite well, whereas others do not. Some improvement could be obtained by genotyping higher marker densities but not by increasing the number of samples. Nonetheless, although the real data are only moderately blocklike, our simulations indicate that, under a model of uniform recombination, the structure of LD would actually fit the block model much less well. Simulations of a model in which much of the recombination occurs in narrow hotspots provide a much better fit to the observed patterns of LD, suggesting that there is extensive fine-scale variation in recombination rates across the human genome.  相似文献   

14.
Linkage disequilibrium decay and haplotype block structure in the pig   总被引:3,自引:0,他引:3  
Linkage disequilibrium (LD) may reveal much about domestication and breed history. An investigation was conducted, to analyze the extent of LD, haploblock partitioning, and haplotype diversity within haploblocks across several pig breeds from China and Europe and in European wild boar. In total, 371 single-nucleotide-polymorphisms located in three genomic regions were genotyped. The extent of LD differed significantly between European and Chinese breeds, extending up to 2 cM in Europe and up to 0.05 cM in China. In European breeds, LD extended over large haploblocks up to 400 kb, whereas in Chinese breeds the extent of LD was smaller and generally did not exceed 10 kb. The European wild boar showed an intermediate level of LD between Chinese and European breeds. In Europe, the extent of LD also differed according to genomic region. Chinese breeds showed a higher level of haplotype diversity and shared high levels of frequent haplotypes with Large White, Landrace, and Duroc. The extent of LD differs between both centers of pig domestication, being higher in Europe. Two hypotheses can explain these findings. First, the European ancestral stock had a higher level of LD. Second, modern breeding programs increased the extent of LD in Europe and caused differences of LD between genomic regions. Large White, Landrace, and Duroc showed evidence of past introgression from Chinese breeds.  相似文献   

15.
16.

Background

The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Θ(n 2) time and memory complexity.

Results

We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG ++, has only Θ(n) memory complexity and, on a genome-wide scale, it omits >80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG ++ analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG ++ can support genome-wide haplotype association studies.

Conclusions

The MIG ++ enables to perform LD-based haplotype block recognition on genetic sequences of any length and density. In the new generation sequencing era, this can help identify haplotypes that carry rare variants of interest. The low computational requirements open the possibility to include the haplotype block structure into genome-wide association scans, downstream analyses, and visual interfaces for online genome browsers.  相似文献   

17.

Background  

There is recently great interest in haplotype block structure and haplotype tagging SNPs (htSNPs) in the human genome for its implication on htSNPs-based association mapping strategy for complex disease. Different definitions have been used to characterize the haplotype block structure in the human genome, and several different performance criteria and algorithms have been suggested on htSNPs selection.  相似文献   

18.
Recently, the first rigorous runtime analyses of ACO algorithms appeared, covering variants of the MAX–MIN ant system and their runtime on pseudo-Boolean functions. Interestingly, a variant called 1-ANT is very sensitive to the evaporation factor while Gutjahr and Sebastiani proved partly opposite results for their variant MMASbs. These algorithms differ in their pheromone update mechanisms and, moreover, 1-ANT accepts equally fit solutions in contrast to MMASbs. By analyzing variants of MMASbs, we prove that the different behavior of 1-ANT and MMASbs results from the different pheromone update mechanisms. Building upon results by Gutjahr and Sebastiani, we extend their analyses of MMASbs to the class of unimodal functions and show improved results for test functions using new and specialized techniques; in particular, we present new lower bounds. Finally, we compare MMASbs with a variant that also accepts equally fit solutions as this enables the exploration of plateaus. For well-known plateau functions we prove that this drastically reduces the optimization time. Our findings are complemented by experiments that support our asymptotic analyses and yield additional insights. A conference version appeared in SLS 2007 (Neumann et al. 2007).  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号