共查询到20条相似文献,搜索用时 9 毫秒
1.
Russell Schwartz Bjarni V Halldórsson Vineet Bafna Andrew G Clark Sorin Istrail 《Journal of computational biology》2003,10(1):13-19
In this report, we examine the validity of the haplotype block concept by comparing block decompositions derived from public data sets by variants of several leading methods of block detection. We first develop a statistical method for assessing the concordance of two block decompositions. We then assess the robustness of inferred haplotype blocks to the specific detection method chosen, to arbitrary choices made in the block-detection algorithms, and to the sample analyzed. Although the block decompositions show levels of concordance that are very unlikely by chance, the absolute magnitude of the concordance may be low enough to limit the utility of the inference. For purposes of SNP selection, it seems likely that methods that do not arbitrarily impose block boundaries among correlated SNPs might perform better than block-based methods. 相似文献
2.
Yuan A Chen G Rotimi C Bonney GE 《Journal of bioinformatics and computational biology》2005,3(5):1021-1038
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets. 相似文献
3.
Models of background variation in genomic regions form the basis of linkage disequilibrium mapping methods. In this work we analyze a background model that groups SNPs into haplotype blocks and represents the dependencies between blocks by a Markov chain. We develop an error measure to compare the performance of this model against the common model that assumes that blocks are independent. By examining data from the International Haplotype Mapping project, we show how the Markov model over haplotype blocks is most accurate when representing blocks in strong linkage disequilibrium. This contrasts with the independent model, which is rendered less accurate by linkage disequilibrium. We provide a theoretical explanation for this surprising property of the Markov model and relate its behavior to allele diversity. 相似文献
4.
5.
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods. 相似文献
6.
Haplotypes include essential SNP information used for a variety of purposes such as investigating potential links between certain diseases and genetic variations. Given a set of genotypes, the haplotype inference problem based on pure parsimony is the problem of finding a minimum set of haplotypes that explains all the given genotypes. The problem is especially important because, while it is fairly inexpensive to obtain genotypes, other approaches to obtaining haplotypes are significantly expensive. There are two types of methods proposed for the problem, namely exact and inexact methods. Existing exact methods guarantee obtaining purely parsimonious solutions but have exponential time-complexities and are not practical for large number or length of genotypes. However, inexact methods are relatively fast but do not always obtain optimum solutions. In this paper, an improved heuristic is proposed, based on which new inexact and exact methods are provided. Experimental results indicate that the proposed methods replace the state-of-the-art inexact and exact methods for the problem. 相似文献
7.
Hapi is a new dynamic programming algorithm that ignores uninformative states and state transitions in order to efficiently
compute minimum-recombinant and maximum likelihood haplotypes. When applied to a dataset containing 103 families, Hapi performs
3.8 and 320 times faster than state-of-the-art algorithms. Because Hapi infers both minimum-recombinant and maximum likelihood
haplotypes and applies to related individuals, the haplotypes it infers are highly accurate over extended genomic distances. 相似文献
8.
We cloned a gene, kexD, that provides a multidrug-resistant phenotype from multidrug-resistant Klebsiella pneumoniae MGH78578. The deduced amino acid sequence of KexD is similar to that of the inner membrane protein, RND-type multidrug efflux pump. Introduction of the kexD gene into Escherichia coli KAM32 resulted in a MIC that was higher for erythromycin, novobiocin, rhodamine 6G, tetraphenylphosphonium chloride, and ethidium bromide than that of the control. Intracellular ethidium bromide levels in E. coli cells carrying the kexD gene were lower than that in the control cells under energized conditions, suggesting that KexD is a component of an energy-dependent efflux pump. RND-type pumps typically consist of three components: an inner membrane protein, a periplasmic protein, and an outer membrane protein. We discovered that KexD functions with a periplasmic protein, AcrA, from E. coli and K. pneumoniae, but not with the periplasmic proteins KexA and KexG from K. pneumoniae. KexD was able to utilize either TolC of E. coli or KocC of K. pneumoniae as an outer membrane component. kexD mRNA was not detected in K. pneumoniae MGH78578 or ATCC10031. We isolated erythromycin-resistant mutants from K. pneumoniae ATCC10031, and some showed a multidrug-resistant phenotype similar to the drug resistance pattern of KexD. Two strains of multidrug-resistant mutants were investigated for kexD expression; kexD mRNA levels were increased in these strains. We conclude that changing kexD expression can contribute to the occurrence of multidrug-resistant K. pneumoniae. 相似文献
9.
This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NP-hard and can be formulated as an integer quadratic programming (IQP) problem. To solve the IQP problem, we propose an iterative semidefinite programming-based approximation algorithm, (called SDPHapInfer). We show that this algorithm finds a solution within a factor of O(log n) of the optimal solution, where n is the number of genotypes. This algorithm has been implemented and tested on a variety of simulated and biological data. In comparison with three other methods, (1) HAPAR, which was implemented based on the branching and bound algorithm, (2) HAPLOTYPER, which was implemented based on the expectation-maximization algorithm, and (3) PHASE, which combined the Gibbs sampling algorithm with an approximate coalescent prior, the experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error rates. In addition, the results generated by PHASE have lower error rates on some data but higher error rates on others. The error rates of HAPAR are higher than the others on biological data. In terms of efficiency, SDPHapInfer, HAPLOTYPER, and PHASE output a solution in a stable and consistent way, and they run much faster than HAPAR when the number of genotypes becomes large. 相似文献
10.
MOTIVATION: Haplotype information has become increasingly important in analyzing fine-scale molecular genetics data, such as disease genes mapping and drug design. Parsimony haplotyping is one of haplotyping problems belonging to NP-hard class. RESULTS: In this paper, we aim to develop a novel algorithm for the haplotype inference problem with the parsimony criterion, based on a parsimonious tree-grow method (PTG). PTG is a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree-grow process. In addition, a block-partitioning method is also proposed to improve the computational efficiency. We show that the proposed approach is not only effective with a high accuracy, but also very efficient with the computational complexity in the order of O(m2n) time for n single nucleotide polymorphism sites in m individual genotypes. AVAILABILITY: The software is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/ptg/ CONTACT: chen@elec.osaka-sandai.ac.jp SUPPLEMENTARY INFORMATION: Supporting materials is available from http://zhangroup.aporc.org/bioinfo/ptg/bti572supplementary.pdf 相似文献
11.
HaploBlockFinder: haplotype block analyses 总被引:8,自引:0,他引:8
Recent studies have unveiled discrete block-like structures of linkage disequilibrium (LD) in the human genome. We have developed a set of computer programs to analyze the block-like LD structures (haplotype blocks) based on haplotype data. Three definitions of haplotype block are supported, including minimal LD range, no historic recombination, and chromosome coverage. Tagged SNPs that uniquely distinguish common haplotypes are identified. A greedy algorithm was used to improve the efficiency. Two separate utilities were also provided to assist visual inspection of haplotype block structure and pattern of linkage disequilibrium. AVAILABILITY: A web interface for the HaploBlockFinder is available at http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi the source codes are also freely available on the web site. 相似文献
12.
Janina Linnik Mohammedyaseen Syedbasha Yvonne Hollenstein Jrg Halter Adrian Egli Jrg Stelling 《PLoS pathogens》2022,18(1)
To assess the response to vaccination, quantity (concentration) and quality (avidity) of neutralizing antibodies are the most important parameters. Specifically, an increase in avidity indicates germinal center formation, which is required for establishing long-term protection. For influenza, the classical hemagglutination inhibition (HI) assay, however, quantifies a combination of both, and to separately determine avidity requires high experimental effort. We developed from first principles a biophysical model of hemagglutination inhibition to infer IgG antibody avidities from measured HI titers and IgG concentrations. The model accurately describes the relationship between neutralizing antibody concentration/avidity and HI titer, and explains quantitative aspects of the HI assay, such as robustness to pipetting errors and detection limit. We applied our model to infer avidities against the pandemic 2009 H1N1 influenza virus in vaccinated patients (n = 45) after hematopoietic stem cell transplantation (HSCT) and validated our results with independent avidity measurements using an enzyme-linked immunosorbent assay with urea elution. Avidities inferred by the model correlated with experimentally determined avidities (ρ = 0.54, 95% CI = [0.31, 0.70], P < 10−4). The model predicted that increases in IgG concentration mainly contribute to the observed HI titer increases in HSCT patients and that immunosuppressive treatment is associated with lower baseline avidities. Since our approach requires only easy-to-establish measurements as input, we anticipate that it will help to disentangle causes for poor vaccination outcomes also in larger patient populations. This study demonstrates that biophysical modelling can provide quantitative insights into agglutination assays and complement experimental measurements to refine antibody response analyses. 相似文献
13.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes. 相似文献
14.
15.
Detailed analyses of the population-genetic nature of copy number variations (CNVs) and the linkage disequilibrium between CNV and single nucleotide polymorphism (SNP) loci from high-throughput experimental data require a computational tool to accurately infer alleles of CNVs and haplotypes composed of both CNV alleles and SNP alleles. Here we developed a new tool to infer population frequencies of such alleles and haplotypes from observed copy numbers and SNP genotypes, using the expectation-maximization algorithm. This tool can also handle copy numbers ambiguously determined, such as 2 or 3 copies, due to experimental noise. AVAILABILITY: http://emu.src.riken.jp/MOCSphaser/MOCSphaser.zip. 相似文献
16.
Irurozki E Calvo B Lozano JA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(5):1183-1195
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved. 相似文献
17.
Brown DG Harrower IM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):141-154
In 2003, Gusfield introduced the haplotype inference by pure parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem. Although it solved well on small instances, Gusfield's IP can be of exponential size in the worst case. Several authors have presented polynomial-sized IPs for the problem. In this paper, we further the work on IP approaches to HIPP. We extend the existing polynomial-sized IPs by introducing several classes of valid cuts for the IP. We also present a new polynomial-sized IP formulation that is a hybrid between two existing IP formulations and inherits many of the strengths of both. Many problems that are too complex for the exponential-sized formulations can still be solved in our new formulation in a reasonable amount of time. We provide a detailed empirical comparison of these IP formulations on both simulated and real genotype sequences. Our formulation can also be extended in a variety of ways to allow errors in the input or model the structure of the population under consideration. 相似文献
18.
Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms 总被引:1,自引:0,他引:1 下载免费PDF全文
The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called "GenoSpectrum") as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses. 相似文献
19.
Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families, can be an alternative strategy in determining linkage phase and estimating haplotype frequencies. In the case of no possibility to obtain genotypes for parents, and only full-sib information being used, a new approach is suggested to infer phase and to reconstruct haplotypes. We present a maximum-likelihood method via an expectation-maximization algorithm, called FSHAP, using only full-sib information when parent information is not available. FSHAP can deal with families with an arbitrary number of children, and missing parents or missing genotypes can be handled as well. In a simulation study we compare FSHAP with another existing expectation-maximization (EM)-based approach (FAMHAP), the conditioning approach implemented in FBAT and GENEHUNTER, which is only pedigree based and assumes linkage equilibrium. In most situations, FSHAP has the smallest discrepancy of haplotype frequency estimation and the lowest error rate in haplotype reconstruction, only in some cases FAMHAP yields comparable results. GENEHUNTER produces the largest discrepancy, and FBAT produces the highest error rate in offspring in most situations. Among the methods compared, FSHAP has the highest accuracy in reconstructing the diplotypes of the unavailable parents. Potential limitations of the method, e.g., in analyzing very large haplotypes, are indicated and possible solutions are discussed. 相似文献
20.
R A Kittles M Perola L Peltonen A W Bergen R A Aragon M Virkkunen M Linnoila D Goldman J C Long 《American journal of human genetics》1998,62(5):1171-1179
The Finnish population has often been viewed as an isolate founded 2, 000 years ago via a route across the Gulf of Finland. The founding event has been characterized as involving a limited number of homogeneous founders, isolation, and subsequent rapid population growth. Despite the purported isolation of the population, levels of gene diversity for the Finns at autosomal and mitochondrial DNA loci are indistinguishable from those of other Europeans. Thus, mixed or dual origins for the Finns have been proposed. Here we present genetic evidence for the dual origins of Finns by evaluating the pattern of Y chromosome variation in 280 unrelated males from nine Finnish provinces. Phylogenetic analysis of 77 haplotype configurations revealed two major star-shaped clusters of Y haplotypes, indicative of a population expansion from two common Y haplotypes. Dramatic and quite significant differences in Y haplotype variation were observed between eastern and western regions of Finland, revealing contributions from different paternal types. The geographic distribution and time of expansion for the two common Y haplotypes correlate well with archeological evidence for two culturally and geographically distinct groups of settlers. Also, a northeastern to southwestern gradient of Y haplotype frequencies provides convincing evidence for recent male migration from rural areas into urban Finland. 相似文献