首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
SUMMARY: We present the software implementation of the tree scanning method to detect associations between genetic haplotypes and quantitative traits, utilizing the evolutionary history of the haplotypes, in samples of unrelated individuals. AVAILABILITY: The program is available free of charge, under the GNU General Public License. A package including C source code, a Makefile, and Windows (DOS) and Macintosh binaries, can be downloaded from http://darwin.uvigo.es  相似文献   

Finding the genes involved in complex diseases susceptibility and among those genes, localizing the variant sites explaining this susceptibility is a major goal of genetic epidemiology. In this context, haplotypic methods that use the joint information on several markers may be of particular interest. When the number of haplotypes is large, a grouping may be required. Phylogenetic trees allow such groupings of haplotypes based on their evolutionary history and may help in the detection and localization of disease susceptibility sites. In this paper, we present a new software to perform phylogeny-based association and localization analysis.  相似文献   

A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.  相似文献   

SUMMARY: We have created PhenomicDB, a multi-species genotype/phenotype database by merging public genotype/phenotype data from a wide range of model organisms and Homo sapiens. Until now these data were available in distinct organism-specific databases (e.g. WormBase, OMIM, FlyBase and MGI). We compiled this wealth of data into a single integrated resource by coarse-grained semantic mapping of the phenotypic data fields, by including common gene indices (NCBI Gene), and by the use of associated orthology relationships. With its use-case-oriented user interface, PhenomicDB allows scientists to compare and browse known phenotypes for a given gene or a set of genes from different organisms simultaneously. AVAILABILITY: PhenomicDB has been implemented at Schering AG as described below. A PhenomicDB implementation differing in some technical details has been set up for the public at Metalife AG http://www.phenomicDB.de SUPPLEMENTARY INFORMATION: database model, semantic mapping table.  相似文献   

Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.  相似文献   

Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies has not been fully characterized. We propose a new approach of selecting tSNPs based on determining the set of SNPs with the highest power to detect association. Two-locus genotype frequencies are used in the power calculations. To show utility, we applied this power method to a large number of SNPs that had been genotyped in Caucasian samples. We demonstrate that a significant reduction in genotyping efforts can be achieved although the reduction depends on genotypic relative risk, inheritance mode and the prevalence of disease in the human population. The tSNP sets identified by our method are remarkably robust to changes in the disease model when small relative risk and additive mode of inheritance are employed. We have also evaluated the ability of the method to detect unidentified SNPs. Our findings have important implications in applying tSNPs from different data sources in association studies.  相似文献   



Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding.


To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp.


BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp.  相似文献   

A branch and bound algorithm is described for searching rapidlyfor minimal length trees from biological data. The algorithmadds characters one at a time, rather than adding taxa, as inprevious branch and bound methods. The algorithm has been programmedand is available from the authors. A worked example is givenwith 33 characters and 15 taxa. About 8 x 1012 binary treesare possible with 15 taxa but the branch and bound program findsthe minimal tree in <5 min on an IBM PC. Received on January 15, 1987; accepted on February 23, 1987  相似文献   

Browning SR 《Human genetics》2008,124(5):439-450
Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance.  相似文献   

Marginal tests based on individual SNPs are routinely used in genetic association studies. Studies have shown that haplotype‐based methods may provide more power in disease mapping than methods based on single markers when, for example, multiple disease‐susceptibility variants occur within the same gene. A limitation of haplotype‐based methods is that the number of parameters increases exponentially with the number of SNPs, inducing a commensurate increase in the degrees of freedom and weakening the power to detect associations. To address this limitation, we introduce a hierarchical linkage disequilibrium model for disease mapping, based on a reparametrization of the multinomial haplotype distribution, where every parameter corresponds to the cumulant of each possible subset of a set of loci. This hierarchy present in the parameters enables us to employ flexible testing strategies over a range of parameter sets: from standard single SNP analyses through the full haplotype distribution tests, reducing degrees of freedom and increasing the power to detect associations. We show via extensive simulations that our approach maintains the type I error at nominal level and has increased power under many realistic scenarios, as compared to single SNP and standard haplotype‐based studies. To evaluate the performance of our proposed methodology in real data, we analyze genome‐wide data from the Wellcome Trust Case‐Control Consortium.  相似文献   

OBJECTIVE: To develop a method to estimate haplotype effects on dichotomous outcomes when phase is unknown, that can also estimate reliable effects of rare haplotypes. METHODS: In short, the method uses a logistic regression approach, with weights attached to all possible haplotype combinations of an individual. An EM-algorithm was used: in the E-step the weights are estimated, and the M-step consists of maximizing the joint log-likelihood. When rare haplotypes were present, a penalty function was introduced. We compared four different penalties. To investigate statistical properties of our method, we performed a simulation study for different scenarios. The evaluation criteria are the mean bias of the parameter estimates, the root of the mean squared error, the coverage probability, power, Type I error rate and the false discovery rate. RESULTS: For the unpenalized approach, mean bias was small, coverage probabilities were approximately 95%, power ranged from 15.2 to 44.7% depending on haplotype frequency, and Type I error rate was around 5%. All penalty functions reduced the standard errors of the rare haplotypes, but introduced bias. This trade-off decreased power. CONCLUSION: The unpenalized weighted log-likelihood approach performs well. A penalty function can help to estimate an effect for rare haplotypes.  相似文献   

The study of biological systems commonly depends on inferring the state of a 'hidden' variable, such as an underlying genotype, from that of an 'observed' variable, such as an expressed phenotype. However, this cannot be achieved using traditional quantitative methods when more than one genetic mechanism exists for a single observable phenotype. Using a novel latent class Bayesian model, it is possible to infer the prevalence of different genetic elements in a population given a sample of phenotypes. As an exemplar, data comprising phenotypic resistance to six antimicrobials obtained from passive surveillance of Salmonella Typhimurium DT104 are analysed to infer the prevalence of individual resistance genes, as well as the prevalence of a genomic island known as SGI1 and its variants. Three competing models are fitted to the data and distinguished between using posterior predictive p-values to assess their ability to predict the observed number of unique phenotypes. The results suggest that several SGI1 variants circulate in a few fixed forms through the population from which our data were derived. The methods presented could be applied to other types of phenotypic data, and represent a useful and generic mechanism of inferring the genetic population structure of organisms.  相似文献   

MOTIVATION: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

For most common diseases with heritable components, not a single or a few single-nucleotide polymorphisms (SNPs) explain most of the variance for these disorders. Instead, much of the variance may be caused by interactions (epistasis) among multiple SNPs or interactions with environmental conditions. We present a new powerful statistical model for analyzing and interpreting genomic data that influence multifactorial phenotypic traits with a complex and likely polygenic inheritance. The new method is based on Markov chain Monte Carlo (MCMC) and allows for identification of sets of SNPs and environmental factors that when combined increase disease risk or change the distribution of a quantitative trait. Using simulations, we show that the MCMC method can detect disease association when multiple, interacting SNPs are present in the data. When applying the method on real large-scale data from a Danish population-based cohort, multiple interactions are identified that severely affect serum triglyceride levels in the study individuals. The method is designed for quantitative traits but can also be applied on qualitative traits. It is computationally feasible even for a large number of possible interactions and differs fundamentally from most previous approaches by entertaining nonlinear interactions and by directly addressing the multiple-testing problem.  相似文献   

Chen Z  Ng HK 《Human heredity》2012,73(1):26-34
In genetic association studies, due to the varying underlying genetic models, no single statistical test can be the most powerful test under all situations. Current studies show that if the underlying genetic models are known, trend-based tests, which outperform the classical Pearson χ2 test, can be constructed. However, when the underlying genetic models are unknown, the χ2 test is usually more robust than trend-based tests. In this paper, we propose a new association test based on a generalized genetic model, namely the generalized order-restricted relative risks model. Through a Monte Carlo simulation study, we show that the proposed association test is generally more powerful than the χ2 test, and more robust than those trend-based tests. The proposed methodologies are also illustrated by some real SNP datasets.  相似文献   

Apolipoproteins (apo) E and C-I are components of triglyceride (TG)-rich lipoproteins and impact their metabolism. Functional polymorphisms have been established in apoE but not in apoC-I. We studied the relationship between apoE and apoC-I gene polymorphisms and plasma lipoproteins and coronary artery disease (CAD) in 211 African Americans and 306 Caucasians. In African Americans but not in Caucasians, apoC-I H2-carriers had significantly lower total and LDL cholesterol and apoB levels, and higher glucose, insulin, and HOMA-IR levels compared with H1 homozygotes. Differences across CAD phenotypes were seen for the apoC-I polymorphism. African-American H2-carriers without CAD had significantly lower total cholesterol (P < 0.001), LDL cholesterol (P < 0.001), and apoB (P < 0.001) levels compared with H1 homozygotes, whereas no differences were found across apoC-I genotypes for African Americans with CAD. Among African-American apoC-I H1 homozygotes, subjects with CAD had a profile similar to the metabolic syndrome (i.e., higher triglyceride, glucose, and insulin) compared with subjects without CAD. For African-American H2-carriers, subjects with CAD had a pro-atherogenic lipid pattern (i.e., higher LDL cholesterol and apoB levels), compared with subjects without CAD. ApoC-I genotypes showed an ethnically distinct phenotype relationship with regard to CAD and CAD risk factors.  相似文献   

Association-based linkage disequilibrium (LD) mapping is an increasingly important tool for localizing genes that show potential influence on human aging and longevity. As haplotypes contain more LD information than single markers, a haplotype-based LD approach can have increased power in detecting associations as well as increased robustness in statistical testing. In this paper, we develop a new statistical model to estimate haplotype relative risks (HRRs) on human survival using unphased multilocus genotype data from unrelated individuals in cross-sectional studies. Based on the proportional hazard assumption, the model can estimate haplotype risk and frequency parameters, incorporate observed covariates, assess interactions between haplotypes and the covariates, and investigate the modes of gene function. By introducing population survival information available from population statistics, we are able to develop a procedure that carries out the parameter estimation using a nonparametric baseline hazard function and estimates sex-specific HRRs to infer gene-sex interaction. We also evaluate the haplotype effects on human survival while taking into account individual heterogeneity in the unobserved genetic and nongenetic factors or frailty by introducing the gamma-distributed frailty into the survival function. After model validation by computer simulation, we apply our method to an empirical data set to measure haplotype effects on human survival and to estimate haplotype frequencies at birth and over the observed ages. Results from both simulation and model application indicate that our survival analysis model is an efficient method for inferring haplotype effects on human survival in population-based association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号