共查询到20条相似文献,搜索用时 15 毫秒
1.
Accurate haplotype inference for multiple linked single-nucleotide polymorphisms using sibship data 总被引:1,自引:0,他引:1 下载免费PDF全文
Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis. 相似文献
2.
Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns 总被引:1,自引:0,他引:1
MOTIVATION: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
3.
OBJECTIVE: Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals. METHODS: We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study. RESULTS: While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study. 相似文献
4.
MOTIVATION: Haplotype information has become increasingly important in analyzing fine-scale molecular genetics data, such as disease genes mapping and drug design. Parsimony haplotyping is one of haplotyping problems belonging to NP-hard class. RESULTS: In this paper, we aim to develop a novel algorithm for the haplotype inference problem with the parsimony criterion, based on a parsimonious tree-grow method (PTG). PTG is a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree-grow process. In addition, a block-partitioning method is also proposed to improve the computational efficiency. We show that the proposed approach is not only effective with a high accuracy, but also very efficient with the computational complexity in the order of O(m2n) time for n single nucleotide polymorphism sites in m individual genotypes. AVAILABILITY: The software is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/ptg/ CONTACT: chen@elec.osaka-sandai.ac.jp SUPPLEMENTARY INFORMATION: Supporting materials is available from http://zhangroup.aporc.org/bioinfo/ptg/bti572supplementary.pdf 相似文献
5.
Yuan A Chen G Rotimi C Bonney GE 《Journal of bioinformatics and computational biology》2005,3(5):1021-1038
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets. 相似文献
6.
Missing data imputation and haplotype phase inference for genome-wide association studies 总被引:4,自引:2,他引:4
Browning SR 《Human genetics》2008,124(5):439-450
Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association
studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application
to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale
data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest
accuracy and fastest computational performance. 相似文献
7.
John H Schwacke Elizabeth G Hill Edward L Krug Susana Comte-Walters Kevin L Schey 《BMC bioinformatics》2009,10(1):342
Background
Isobaric Tags for Relative and Absolute Quantitation (iTRAQ™) [Applied Biosystems] have seen increased application in differential protein expression analysis. To facilitate the growing need to analyze iTRAQ data, especially for cases involving multiple iTRAQ experiments, we have developed a modeling approach, statistical methods, and tools for estimating the relative changes in protein expression under various treatments and experimental conditions. 相似文献8.
Alexandros Iliadis Dimitris Anastassiou Xiaodong Wang 《EURASIP Journal on Bioinformatics and Systems Biology》2014,2014(1):7
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv. 相似文献
9.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes. 相似文献
10.
Detailed analyses of the population-genetic nature of copy number variations (CNVs) and the linkage disequilibrium between CNV and single nucleotide polymorphism (SNP) loci from high-throughput experimental data require a computational tool to accurately infer alleles of CNVs and haplotypes composed of both CNV alleles and SNP alleles. Here we developed a new tool to infer population frequencies of such alleles and haplotypes from observed copy numbers and SNP genotypes, using the expectation-maximization algorithm. This tool can also handle copy numbers ambiguously determined, such as 2 or 3 copies, due to experimental noise. AVAILABILITY: http://emu.src.riken.jp/MOCSphaser/MOCSphaser.zip. 相似文献
11.
Irurozki E Calvo B Lozano JA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(5):1183-1195
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved. 相似文献
12.
Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families, can be an alternative strategy in determining linkage phase and estimating haplotype frequencies. In the case of no possibility to obtain genotypes for parents, and only full-sib information being used, a new approach is suggested to infer phase and to reconstruct haplotypes. We present a maximum-likelihood method via an expectation-maximization algorithm, called FSHAP, using only full-sib information when parent information is not available. FSHAP can deal with families with an arbitrary number of children, and missing parents or missing genotypes can be handled as well. In a simulation study we compare FSHAP with another existing expectation-maximization (EM)-based approach (FAMHAP), the conditioning approach implemented in FBAT and GENEHUNTER, which is only pedigree based and assumes linkage equilibrium. In most situations, FSHAP has the smallest discrepancy of haplotype frequency estimation and the lowest error rate in haplotype reconstruction, only in some cases FAMHAP yields comparable results. GENEHUNTER produces the largest discrepancy, and FBAT produces the highest error rate in offspring in most situations. Among the methods compared, FSHAP has the highest accuracy in reconstructing the diplotypes of the unavailable parents. Potential limitations of the method, e.g., in analyzing very large haplotypes, are indicated and possible solutions are discussed. 相似文献
13.
Haplotypes include essential SNP information used for a variety of purposes such as investigating potential links between certain diseases and genetic variations. Given a set of genotypes, the haplotype inference problem based on pure parsimony is the problem of finding a minimum set of haplotypes that explains all the given genotypes. The problem is especially important because, while it is fairly inexpensive to obtain genotypes, other approaches to obtaining haplotypes are significantly expensive. There are two types of methods proposed for the problem, namely exact and inexact methods. Existing exact methods guarantee obtaining purely parsimonious solutions but have exponential time-complexities and are not practical for large number or length of genotypes. However, inexact methods are relatively fast but do not always obtain optimum solutions. In this paper, an improved heuristic is proposed, based on which new inexact and exact methods are provided. Experimental results indicate that the proposed methods replace the state-of-the-art inexact and exact methods for the problem. 相似文献
14.
Hapi is a new dynamic programming algorithm that ignores uninformative states and state transitions in order to efficiently
compute minimum-recombinant and maximum likelihood haplotypes. When applied to a dataset containing 103 families, Hapi performs
3.8 and 320 times faster than state-of-the-art algorithms. Because Hapi infers both minimum-recombinant and maximum likelihood
haplotypes and applies to related individuals, the haplotypes it infers are highly accurate over extended genomic distances. 相似文献
15.
We cloned a gene, kexD, that provides a multidrug-resistant phenotype from multidrug-resistant Klebsiella pneumoniae MGH78578. The deduced amino acid sequence of KexD is similar to that of the inner membrane protein, RND-type multidrug efflux pump. Introduction of the kexD gene into Escherichia coli KAM32 resulted in a MIC that was higher for erythromycin, novobiocin, rhodamine 6G, tetraphenylphosphonium chloride, and ethidium bromide than that of the control. Intracellular ethidium bromide levels in E. coli cells carrying the kexD gene were lower than that in the control cells under energized conditions, suggesting that KexD is a component of an energy-dependent efflux pump. RND-type pumps typically consist of three components: an inner membrane protein, a periplasmic protein, and an outer membrane protein. We discovered that KexD functions with a periplasmic protein, AcrA, from E. coli and K. pneumoniae, but not with the periplasmic proteins KexA and KexG from K. pneumoniae. KexD was able to utilize either TolC of E. coli or KocC of K. pneumoniae as an outer membrane component. kexD mRNA was not detected in K. pneumoniae MGH78578 or ATCC10031. We isolated erythromycin-resistant mutants from K. pneumoniae ATCC10031, and some showed a multidrug-resistant phenotype similar to the drug resistance pattern of KexD. Two strains of multidrug-resistant mutants were investigated for kexD expression; kexD mRNA levels were increased in these strains. We conclude that changing kexD expression can contribute to the occurrence of multidrug-resistant K. pneumoniae. 相似文献
16.
Background
The three-dimensional shape of grain, measured as grain length, width, and thickness (GL, GW, and GT), is one of the most important components of grain appearance in rice. Determining the genetic basis of variations in grain shape could facilitate efficient improvements in grain appearance. In this study, an F7:8 recombinant inbred line population (RIL) derived from a cross between indica and japonica cultivars (Nanyangzhan and Chuan7) contrasting in grain size was used for quantitative trait locus (QTL) mapping. A genetic linkage map was constructed with 164 simple sequence repeat (SSR) markers. The major aim of this study was to detect a QTL for grain shape and to fine map a minor QTL, qGL7.Results
Four QTLs for GL were detected on chromosomes 3 and 7, and 10 QTLs for GW and 9 QTLs for GT were identified on chromosomes 2, 3, 5, 7, 9 and 10, respectively. A total of 28 QTLs were identified, of which several are reported for the first time; four major QTLs and six minor QTLs for grain shape were also commonly detected in both years. The minor QTL, qGL7, exhibited pleiotropic effects on GL, GW, GT, 1000-grain weight (TGW), and spikelets per panicle (SPP) and was further validated in a near isogenic F2 population (NIL-F2). Finally, qGL7 was narrowed down to an interval between InDel marker RID711 and SSR marker RM6389, covering a 258-kb region in the Nipponbare genome, and cosegregated with InDel markers RID710 and RID76.Conclusion
Materials with very different phenotypes were used to develop mapping populations to detect QTLs because of their complex genetic background. Progeny tests proved that the minor QTL, qGL7, could display a single mendelian characteristic. Therefore, we suggested that minor QTLs for traits with high heritability could be isolated using a map-based cloning strategy in a large NIL-F2 population. In addition, combinations of different QTLs produced diverse grain shapes, which provide the ability to breed more varieties of rice to satisfy consumer preferences. 相似文献17.
Kyle A. Stone Devarshi Shah Min Hea Kim Nathan R. M. Roberts Q. Peter He Jin Wang 《Biotechnology progress》2017,33(2):347-354
Due to many advantages associated with mixed cultures, their application in biotechnology has expanded rapidly in recent years. At the same time, many challenges remain for effective mixed culture applications. One obstacle is how to efficiently and accurately monitor the individual cell populations. Current approaches on individual cell mass quantification are suitable for off‐line, infrequent characterization. In this study, we propose a fast and accurate “soft sensor” approach for estimating individual cell concentrations in mixed cultures. The proposed approach utilizes optical density scanning spectrum of a mixed culture sample measured by a spectrophotometer over a range of wavelengths. A multivariate linear regression method, partial least squares or PLS, is applied to correlate individual cell concentrations to the spectrum. Three experimental case studies are used to examine the performance of the proposed soft sensor approach. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:347–354, 2017 相似文献
18.
A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data 总被引:4,自引:2,他引:4
Phylogeny reconstruction is a difficult computational problem, because the
number of possible solutions increases with the number of included taxa.
For example, for only 14 taxa, there are more than seven trillion possible
unrooted phylogenetic trees. For this reason, phylogenetic inference
methods commonly use clustering algorithms (e.g., the neighbor-joining
method) or heuristic search strategies to minimize the amount of time spent
evaluating nonoptimal trees. Even heuristic searches can be painfully slow,
especially when computationally intensive optimality criteria such as
maximum likelihood are used. I describe here a different approach to
heuristic searching (using a genetic algorithm) that can tremendously
reduce the time required for maximum-likelihood phylogenetic inference,
especially for data sets involving large numbers of taxa. Genetic
algorithms are simulations of natural selection in which individuals are
encoded solutions to the problem of interest. Here, labeled phylogenetic
trees are the individuals, and differential reproduction is effected by
allowing the number of offspring produced by each individual to be
proportional to that individual's rank likelihood score. Natural selection
increases the average likelihood in the evolving population of phylogenetic
trees, and the genetic algorithm is allowed to proceed until the likelihood
of the best individual ceases to improve over time. An example is presented
involving rbcL sequence data for 55 taxa of green plants. The genetic
algorithm described here required only 6% of the computational effort
required by a conventional heuristic search using tree
bisection/reconnection (TBR) branch swapping to obtain the same
maximum-likelihood topology.
相似文献
19.
PATRI is a new application for paternity analysis using genetic data that accounts for the sampling fraction of potential fathers. 相似文献
20.