共查询到20条相似文献,搜索用时 0 毫秒
1.
van Iersel L Keijsper J Kelk S Stougie L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(2):301-312
The problem Parsimony Haplotyping (PH) asks for the smallest set of haplotypes which can explain a given set of genotypes, and the problem Minimum Perfect Phylogeny Haplotyping (MPPH) asks for the smallest such set which also allows the haplotypes to be embedded in a perfect phylogeny, an evolutionary tree with biologically-motivated restrictions. For PH, we extend recent work by further mapping the interface between ;;easy' and ;;hard' instances, within the framework of (k,l)-bounded instances where the number of 2's per column and row of the input matrix is restricted. By exploring, in the same way, the tractability frontier of MPPH we provide the first concrete, positive results for this problem. In addition, we construct for both PH and MPPH polynomial time approximation algorithms, based on properties of the columns of the input matrix. 相似文献
2.
Although the haplotype data can be used to analyze the function of DNA, due to the significant efforts required in collecting the haplotype data, usually the genotype data is collected and then the population haplotype inference (PHI) problem is solved to infer haplotype data from genotype data for a population. This paper investigates the PHI problem based on the pure parsimony criterion (HIPP), which seeks the minimum number of distinct haplotypes to infer a given genotype data. We analyze the mathematical structure and properties for the HIPP problem, propose techniques to reduce the given genotype data into an equivalent one of much smaller size, and analyze the relations of genotype data using a compatible graph. Based on the mathematical properties in the compatible graph, we propose a maximal clique heuristic to obtain an upper bound, and a new polynomial-sized integer linear programming formulation to obtain a lower bound for the HIPP problem. 相似文献
3.
Maximum likelihood haplotyping for general pedigrees 总被引:3,自引:0,他引:3
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping. 相似文献
4.
5.
6.
Many methods exist for genotyping—revealing which alleles an individual carries at different genetic loci. A harder problem is haplotyping—determining which alleles lie on each of the two homologous chromosomes in a diploid individual. Conventional approaches to haplotyping require the use of several generations to reconstruct haplotypes within a pedigree, or use statistical methods to estimate the prevalence of different haplotypes in a population. Several molecular haplotyping methods have been proposed, but have been limited to small numbers of loci, usually over short distances. Here we demonstrate a method which allows rapid molecular haplotyping of many loci over long distances. The method requires no more genotypings than pedigree methods, but requires no family material. It relies on a procedure to identify and genotype single DNA molecules, and reconstruction of long haplotypes by a ‘tiling’ approach. We demonstrate this by resolving haplotypes in two regions of the human genome, harbouring 20 and 105 single-nucleotide polymorphisms, respectively. The method can be extended to reconstruct haplotypes of arbitrary complexity and length, and can make use of a variety of genotyping platforms. We also argue that this method is applicable in situations which are intractable to conventional approaches. 相似文献
7.
Haplotypes contain genealogical information and play a prominent part in population genetic and evolutionary studies. However, haplotype inference is a complex statistical problem, showing considerable internal algorithm variability and among-algorithm discordance. Thus, haplotypes inferred by statistical algorithms often contain hidden uncertainties, which may complicate and even mislead downstream analysis. Consensus strategy is one of the effective means to increase the confidence of inferred haplotypes. Here, we present a consensus tool, the CVhaplot package, to automate consensus techniques for haplotype inference. It generates consensus haplotypes from inferrals of competing algorithms to increase the confidence of haplotype inference results, while improving the performance of individual algorithms by considering their internal variability. It can effectively identify uncertain haplotypes potentially associated with inference errors. In addition, this tool allows file format conversion for several popular algorithms and extends the applicability of some algorithms to complex data containing triallelic polymorphic sites. CVhaplot is written in PERL and freely available at http://www.ioz.ac.cn/department/agripest/group/zhangdx/CVhaplot.htm. 相似文献
8.
Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem. 相似文献
9.
10.
This article presents a six-rule algorithm for the reconstruction of multiple minimum-recombinant haplotype configurations in pedigrees. The algorithm has three major features: First, it allows exhaustive search of all possible haplotype configurations under the criterion that there are minimum recombinants between markers. Second, its computational requirement is on the order of O(J(2)L(3)) in current implementation, where J is the family size and L is the number of marker loci under analysis. Third, it applies to various pedigree structures, with and without consanguinity relationship, and allows missing alleles to be imputed, during the haplotyping process, from their identical-by-descent copies. Haplotyping examples are provided using both published and simulated data sets. 相似文献
11.
12.
13.
Schulmeister S 《Systematic biology》2004,53(4):521-528
Felsenstein (1978, Syst. Zool. 27:401-410) showed that the method of maximum parsimony can be inconsistent, i.e., lead to an incorrect result with an infinite amount of data. The situation in which this inconsistency occurs is often called the "Felsenstein zone," the phenomenon also known as "long-branch attraction." Felsenstein derived a sufficient inconsistency condition from a model for four taxa with only two different parameters for the probability of change on the five branches connecting the four taxa. In the present paper, his approach is used to derive the inconsistency condition of maximum parsimony from the most general model for four taxa, i.e., with five different parameters for the probabilities of change on the five branches and, for the first time, for characters with k states (k = 2, 3, 4, 5, 6, ...) This is used to determine the factors that can cause the inconsistency of maximum parsimony. It is shown that the probability of change on all five branches and the number of character states play a role in causing inconsistency. 相似文献
14.
Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples. 相似文献
15.
Olivier Delaneau Cédric Coulonges Pierre-Yves Boelle George Nelson Jean-Louis Spadoni Jean-François Zagury 《BMC bioinformatics》2007,8(1):205
Background
We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be explored, greatly reducing computation time, while the adaptation of the Gibbs sampler with a recombination model on this restricted space maintains high accuracy. On large SNP datasets (>30 SNPs), we used a segmented approach based on a specific partition-ligation strategy. We compared this software, Ishape (Iterative Segmented HAPlotyping by Em), with reference programs such as Phase, Fastphase, and PL-EM. Analogously with Phase, there are 2 versions of Ishape: Ishape1 which uses a simple coalescence model for the pseudo Gibbs sampler step, and Ishape2 which uses a recombination model instead. 相似文献16.
Single-molecule dilution and multiple displacement amplification for molecular haplotyping 总被引:2,自引:0,他引:2
Separate haploid analysis is frequently required for heterozygous genotyping to resolve phase ambiguity or confirm allelic sequence. We demonstrate a technique of single-molecule dilution followed by multiple strand displacement amplification to haplotype polymorphic alleles. Dilution of DNA to haploid equivalency, or a single molecule, is a simple method for separating di-allelic DNA. Strand displacement amplification is a robust method for non-specific DNA expansion that employs random hexamers and phage polymerase Phi29 for double-stranded DNA displacement and primer extension, resulting in high processivity and exceptional product length. Single-molecule dilution was followed by strand displacement amplification to expand separated alleles to microgram quantities of DNA for more efficient haplotype analysis of heterozygous genes. 相似文献
17.
Conventional experimental methods of studying the human genome are limited by the inability to independently study the combination of alleles, or haplotype, on each of the homologous copies of the chromosomes. We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell. Single-nucleotide polymorphism (SNP) array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal haplotypes of four individuals, including a HapMap trio with European ancestry (CEU) and an unrelated European individual. The phases of alleles were determined at ~99.8% accuracy for up to ~96% of all assayed SNPs. We demonstrate several practical applications, including direct observation of recombination events in a family trio, deterministic phasing of deletions in individuals and direct measurement of the human leukocyte antigen haplotypes of an individual. Our approach has potential applications in personal genomics, single-cell genomics and statistical genetics. 相似文献
18.
Christine Lo Rui Liu Jehyuk Lee Kimberly Robasky Susan Byrne Carolina Lucchesi John Aach George Church Vineet Bafna Kun Zhang 《Genome biology》2013,14(9):R100
Background
Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices.Results
We parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro.Conclusions
Our results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes. 相似文献19.
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data. 相似文献
20.
Haplotype inference by maximum parsimony 总被引:5,自引:0,他引:5
MOTIVATION: Haplotypes have been attracting increasing attention because of their importance in analysis of many fine-scale molecular-genetics data. Since direct sequencing of haplotype via experimental methods is both time-consuming and expensive, haplotype inference methods that infer haplotypes based on genotype samples become attractive alternatives. RESULTS: (1) We design and implement an algorithm for an important computational model of haplotype inference that has been suggested before in several places. The model finds a set of minimum number of haplotypes that explains the genotype samples. (2) Strong supports of this computational model are given based on the computational results on both real data and simulation data. (3) We also did some comparative study to show the strength and weakness of this computational model using our program. AVAILABILITY: The software HAPAR is free for non-commercial uses. Available upon request (lwang@cs.cityu.edu.hk). 相似文献