首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

2.
OBJECTIVE: To develop a method to estimate haplotype effects on dichotomous outcomes when phase is unknown, that can also estimate reliable effects of rare haplotypes. METHODS: In short, the method uses a logistic regression approach, with weights attached to all possible haplotype combinations of an individual. An EM-algorithm was used: in the E-step the weights are estimated, and the M-step consists of maximizing the joint log-likelihood. When rare haplotypes were present, a penalty function was introduced. We compared four different penalties. To investigate statistical properties of our method, we performed a simulation study for different scenarios. The evaluation criteria are the mean bias of the parameter estimates, the root of the mean squared error, the coverage probability, power, Type I error rate and the false discovery rate. RESULTS: For the unpenalized approach, mean bias was small, coverage probabilities were approximately 95%, power ranged from 15.2 to 44.7% depending on haplotype frequency, and Type I error rate was around 5%. All penalty functions reduced the standard errors of the rare haplotypes, but introduced bias. This trade-off decreased power. CONCLUSION: The unpenalized weighted log-likelihood approach performs well. A penalty function can help to estimate an effect for rare haplotypes.  相似文献   

3.
Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease with respect to a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using age as a covariate is based on a dichotomous outcome and does not efficiently use such age-at-onset (time-to-event) information. We propose to analyze age-at-onset data using a modified case-cohort method by treating the control group as an approximation of a subcohort assuming rare events. We investigate the asymptotic bias of this approximation and show that the asymptotic bias of the proposed estimator is small when the disease rate is low. We evaluate the finite sample performance of the proposed method through a simulation study and illustrate the method using a breast cancer case-control data set.  相似文献   

4.
It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika 92, 399-418) developed an efficient retrospective maximum-likelihood method for analysis of case-control studies that exploits an assumption of gene-environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology 29, 108-127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case-control study of the association between calcium intake and the risk of colorectal adenoma.  相似文献   

5.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

6.
Summary Methods for the statistical analysis of stationary spatial point process data are now well established, methods for nonstationary processes less so. One of many sources of nonstationary point process data is a case–control study in environmental epidemiology. In that context, the data consist of a realization of each of two spatial point processes representing the locations, within a specified geographical region, of individual cases of a disease and of controls drawn at random from the population at risk. In this article, we extend work by Baddeley, Møller, and Waagepetersen (2000, Statistica Neerlandica 54 , 329–350) concerning estimation of the second‐order properties of a nonstationary spatial point process. First, we show how case–control data can be used to overcome the problems encountered when using the same data to estimate both a spatially varying intensity and second‐order properties. Second, we propose a semiparametric method for adjusting the estimate of intensity so as to take account of explanatory variables attached to the cases and controls. Our primary focus is estimation, but we also propose a new test for spatial clustering that we show to be competitive with existing tests. We describe an application to an ecological study in which juvenile and surviving adult trees assume the roles of controls and cases.  相似文献   

7.
Lu X  Zhao W  Huang J  Li H  Yang W  Wang L  Huang W  Chen S  Gu D 《Human genetics》2007,121(3-4):327-335
The human plasma kallikrein gene (KLKB1) encodes plasma kallikrein, a serine protease that catalyzes the release of kinins and other vasoactive peptides and may be involved in the pathogenesis of hypertension. In this study, we performed a haplotype-based study to assess the effect of common genetic variation in the KLKB1 gene on the risk of essential hypertension. Eight common single nucleotide polymorphisms (SNPs) were selected from the HapMap database and used to determine the pattern of linkage disequilibrium (LD) and haplotype structure within the KLKB1 gene. Four tag SNPs were then identified with over 85% power to predict both common haplotypes and remaining common SNPs, and genotyped in 1,317 cases with essential hypertension and 1,269 healthy controls. Single SNP analyses indicated that SNPs rs2304595 and rs4253325 were significantly associated with hypertension, adjusted for covariates. Compared with the most common Hap2 CAGC, Hap1 AGAC and Hap3 CGAC, which carry the susceptible rs2304595 G allele and rs4253325 A allele, were found to significantly increase the risk of essential hypertension with adjusted odds ratios equal to 1.37 and 1.17, respectively (P < 0.0001 and 0.028). A strongly significant interaction with gene-drinking was also observed. Among drinkers, the adjusted OR for Hap1 relative to Hap2 was increased to 2.50 (95% CI, 2.40 to 2.61; P < 0.0001). This was the first study to perform association analysis of the KLKB1 gene with essential hypertension. Our findings suggested that common genetic variation in the KLKB1 gene might contribute to the risk of hypertension in the northern Han Chinese population. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Conflict of interests: None.  相似文献   

8.
Information on the genetic diversity and population structure of cattle breeds is useful when deciding the most optimal, for example, crossbreeding strategies to improve phenotypic performance by exploiting heterosis. The present study investigated the genetic diversity and population structure of the most prominent dairy and beef breeds used in Ireland. Illumina high-density genotypes (777 962 single nucleotide polymorphisms; SNPs) were available on 4623 purebred bulls from nine breeds; Angus (n=430), Belgian Blue (n=298), Charolais (n=893), Hereford (n=327), Holstein-Friesian (n=1261), Jersey (n=75), Limousin (n=943), Montbéliarde (n=33) and Simmental (n=363). Principal component analysis revealed that Angus, Hereford, and Jersey formed non-overlapping clusters, representing distinct populations. In contrast, overlapping clusters suggested geographical proximity of origin and genetic similarity between Limousin, Simmental and Montbéliarde and to a lesser extent between Holstein, Friesian and Belgian Blue. The observed SNP heterozygosity averaged across all loci was 0.379. The Belgian Blue had the greatest mean observed heterozygosity (HO=0.389) among individuals within breed while the Holstein-Friesian and Jersey populations had the lowest mean heterozygosity (HO=0.370 and 0.376, respectively). The correlation between the genomic-based and pedigree-based inbreeding coefficients was weak (r=0.171; P<0.001). Mean genomic inbreeding estimates were greatest for Jersey (0.173) and least for Hereford (0.051). The pair-wise breed fixation index (Fst) ranged from 0.049 (Limousin and Charolais) to 0.165 (Hereford and Jersey). In conclusion, substantial genetic variation exists among breeds commercially used in Ireland. Thus custom-mating strategies would be successful in maximising the exploitation of heterosis in crossbreeding strategies.  相似文献   

9.
Dominant markers such as amplified fragment length polymorphisms (AFLPs) provide an economical way of surveying variation at many loci. However, the uncertainty about the underlying genotypes presents a problem for statistical analysis. Similarly, the presence of null alleles and the limitations of genotype calling in polyploids mean that many conventional analysis methods are invalid for many organisms. Here we present a simple approach for accounting for genotypic ambiguity in studies of population structure and apply it to AFLP data from whitefish. The approach is implemented in the program structure version 2.2, which is available from http://pritch.bsd.uchicago.edu/structure.html.  相似文献   

10.
As a main method for achieving environmental protection, improvement of environmental efficiency is vitally important for reducing environmental risk and level of ecological scarcity. Thus, quantitative analysis of environmental efficiency is not only an important premise for understanding the situation of regional environmental protection, but also a prerequisite for designing and adjusting relevant policies. In this paper, we treat the undesirable output in an economic activity by a linear monotonic decreasing transformation approach. We employ data envelopment analysis (DEA) technique to evaluate environmental efficiency in 30 provinces of China during the period 2001–2010, and conduct hypothesis tests on these environmental efficiencies. Furthermore, we investigate the horizontal differences of environmental efficiency in six regions of China and the vertical differences among different years. The results show that our empirical tests have a high statistical reliability that provides support for prompting environmental policies in different parts of China.  相似文献   

11.
Falush D  Stephens M  Pritchard JK 《Genetics》2003,164(4):1567-1587
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.  相似文献   

12.
Shih JH  Chatterjee N 《Biometrics》2002,58(3):502-509
In case-control family studies with survival endpoint, age of onset of diseases can be used to assess the familial aggregation of the disease and the relationship between the disease and genetic or environmental risk factors. Because of the retrospective nature of the case--control study, methods for analyzing prospectively collected correlated failure time data do not apply directly. In this article, we propose a semiparametric quasi-partial-likelihood approach to simultaneously estimate the effect of covariates on the age of onset and the association of ages of onset among family members that does not require specification of the baseline marginal distribution. We conducted a simulation study to evaluate the performance of the proposed approach and compare it with the existing semiparametric ones. Simulation results demonstrate that the proposed approach has better performance in terms of consistency and efficiency. We illustrate the methodology using a subset of data from the Washington Ashkenazi Study.  相似文献   

13.
This paper considers inference methods for case-control logistic regression in longitudinal setups. The motivation is provided by an analysis of plains bison spatial location as a function of habitat heterogeneity. The sampling is done according to a longitudinal matched case-control design in which, at certain time points, exactly one case, the actual location of an animal, is matched to a number of controls, the alternative locations that could have been reached. We develop inference methods for the conditional logistic regression model in this setup, which can be formulated within a generalized estimating equation (GEE) framework. This permits the use of statistical techniques developed for GEE-based inference, such as robust variance estimators and model selection criteria adapted for non-independent data. The performance of the methods is investigated in a simulation study and illustrated with the bison data analysis.  相似文献   

14.
15.
To elucidate natural hybridization of Korean Phragmites, we collected Phragmites plants from 29 regions in South Korea. Haplotypes of the samples, which were determined using two known chloroplast intergenic sequences in this study, were combined with previously known haplotypes. Phylogenetic analysis identified that 30 Korean Phragmites were grouped with two different haplotypes, ‘P’ or ‘W’, respectively, indicating that introduced Phragmites samples from other continents were not present in Korea. The vast majority (26) of the 27 test samples were grouped with the P haplotype, while the E4 sample and the three control Phragmites japonicus samples were grouped with haplotype W. Interestingly, parsimony network analysis revealed that Phragmites australis in Korea might have originated from various regions including Busan (S1), Icheon (M2), and Ansan (W2). Genotype analysis using the PhaHKT1 nuclear gene identified the M3 sample as Phragmites japonicus. For the first time, we found two hybrids (E4 and M3) in the wild by haplotype and genotype analyses, implying that the phenotype of Phragmites australis might be dominant in the hybrids. In summary, we suggest that hybrid speciation might be an important factor in the genetic diversity of Korean Phragmites.  相似文献   

16.
He MA  Zhang X  Wang J  Cheng L  Zhou L  Zeng H  Wang F  Chen Y  Xu Z  Wei Q  Hu FB  Wu T 《Cell stress & chaperones》2008,13(2):231-238
Background High levels of circulating heat shock protein 60 (Hsp60) and antibody to human Hsp60 have been associated with greater risk of coronary heart disease (CHD) in several studies, but associations between polymorphisms of the hsp60 gene and CHD risk have not been investigated. Methods By resequencing DNA from 30 unrelated Han Chinese and using HapMap Phase I Chinese data of hsp60 gene, we selected four tagging single nucleotide polymorphisms (tagSNPs) named rs2340690, rs788016, rs2305560, and rs2565163, and determined their frequencies in 1,003 Chinese CHD patients and 1,003 age- and sex-frequency-matched controls. Furthermore, we used PHASE 2.0 software to reconstruct haplotypes and logistic regression to control for potential confounders in multivariate analyses. Results We found 13 SNPs in hsp60 gene (including four novel SNPs) in Han Chinese subjects. Our results showed no significant differences in four selected SNPs in patients with CHD and controls after adjusting for other conventional risk factors and stratifying by age, sex, smoking status, past history of hypertension and DM; however, our results showed that subjects with the GCTC haplotype had about twofold higher risk of CHD than those with the GTTC haplotype (OR = 1.91, 95%CI: 1.26–2.89, P = 0.002). Conclusions Our results suggest that the GCTC haplotype in the hsp60 gene is significantly associated with higher CHD risk in a Chinese population. The first two authors contributed equally to this paper.  相似文献   

17.
Chen J  Lin D  Hochner H 《Biometrics》2012,68(3):869-877
Summary Case-control mother-child pair design represents a unique advantage for dissecting genetic susceptibility of complex traits because it allows the assessment of both maternal and offspring genetic compositions. This design has been widely adopted in studies of obstetric complications and neonatal outcomes. In this work, we developed an efficient statistical method for evaluating joint genetic and environmental effects on a binary phenotype. Using a logistic regression model to describe the relationship between the phenotype and maternal and offspring genetic and environmental risk factors, we developed a semiparametric maximum likelihood method for the estimation of odds ratio association parameters. Our method is novel because it exploits two unique features of the study data for the parameter estimation. First, the correlation between maternal and offspring SNP genotypes can be specified under the assumptions of random mating, Hardy-Weinberg equilibrium, and Mendelian inheritance. Second, environmental exposures are often not affected by offspring genes conditional on maternal genes. Our method yields more efficient estimates compared with the standard prospective method for fitting logistic regression models to case-control data. We demonstrated the performance of our method through extensive simulation studies and the analysis of data from the Jerusalem Perinatal Study.  相似文献   

18.
19.
The degree to which variation in plant community composition (beta-diversity) is predictable from environmental variation, relative to other spatial processes, is of considerable current interest. We addressed this question in Costa Rican rain forest pteridophytes (1,045 plots, 127 species). We also tested the effect of data quality on the results, which has largely been overlooked in earlier studies. To do so, we compared two alternative spatial models [polynomial vs. principal coordinates of neighbour matrices (PCNM)] and ten alternative environmental models (all available environmental variables vs. four subsets, and including their polynomials vs. not). Of the environmental data types, soil chemistry contributed most to explaining pteridophyte community variation, followed in decreasing order of contribution by topography, soil type and forest structure. Environmentally explained variation increased moderately when polynomials of the environmental variables were included. Spatially explained variation increased substantially when the multi-scale PCNM spatial model was used instead of the traditional, broad-scale polynomial spatial model. The best model combination (PCNM spatial model and full environmental model including polynomials) explained 32% of pteridophyte community variation, after correcting for the number of sampling sites and explanatory variables. Overall evidence for environmental control of beta-diversity was strong, and the main floristic gradients detected were correlated with environmental variation at all scales encompassed by the study (c. 100–2,000 m). Depending on model choice, however, total explained variation differed more than fourfold, and the apparent relative importance of space and environment could be reversed. Therefore, we advocate a broader recognition of the impacts that data quality has on analysis results. A general understanding of the relative contributions of spatial and environmental processes to species distributions and beta-diversity requires that methodological artefacts are separated from real ecological differences.  相似文献   

20.
Haplotype-based risk models can lead to powerful methods for detecting the association of a disease with a genomic region of interest. In population-based studies of unrelated individuals, however, the haplotype status of some subjects may not be discernible without ambiguity from available locus-specific genotype data. A score test for detecting haplotype-based association using genotype data has been developed in the context of generalized linear models for analysis of data from cross-sectional and retrospective studies. In this article, we develop a test for association using genotype data from cohort and nested case-control studies where subjects are prospectively followed until disease incidence or censoring (end of follow-up) occurs. Assuming a proportional hazard model for the haplotype effects, we derive an induced hazard function of the disease given the genotype data, and hence propose a test statistic based on the associated partial likelihood. The proposed test procedure can account for differential follow-up of subjects, can adjust for possibly time-dependent environmental co-factors and can make efficient use of valuable age-at-onset information that is available on cases. We provide an algorithm for computing the test statistic using readily available statistical software. Utilizing simulated data in the context of two genomic regions GPX1 and GPX3, we evaluate the validity of the proposed test for small sample sizes and study its power in the presence and absence of missing genotype data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号