首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).  相似文献   

2.
Expression QTL mapping by integrating genome-wide gene expression and genotype data is a promising approach to identifying functional genetic variation, but is hampered by the large number of multiple comparisons inherent in such studies. A novel approach to addressing multiple testing problems in genome-wide family-based association studies is screening candidate markers using heritability or conditional power. We apply these methods in settings in which microarray gene expression data are used as phenotypes, screening for SNPs near the expressed genes. We perform association analyses for phenotypes using a univariate approach. We also perform simulations on trios with large numbers of causal SNPs to determine the optimal number of markers to use in a screen. We demonstrate that our family-based screening approach performs well in the analysis of integrative genomic datasets and that screening using either heritability or conditional power produces similar, though not identical, results.  相似文献   

3.
We present a novel approach to disease-gene mapping via cladistic analysis of single-nucleotide polymorphism (SNP) haplotypes obtained from large-scale, population-based association studies, applicable to whole-genome screens, candidate-gene studies, or fine-scale mapping. Clades of haplotypes are tested for association with disease, exploiting the expected similarity of chromosomes with recent shared ancestry in the region flanking the disease gene. The method is developed in a logistic-regression framework and can easily incorporate covariates such as environmental risk factors or additional unlinked loci to allow for population structure. To evaluate the power of this approach to detect disease-marker association, we have developed a simulation algorithm to generate high-density SNP data with short-range linkage disequilibrium based on empirical patterns of haplotype diversity. The results of the simulation study highlight substantial gains in power over single-locus tests for a wide range of disease models, despite overcorrection for multiple testing.  相似文献   

4.
Kim S  Zhang K  Sun F 《BMC genetics》2003,4(Z1):S9
Complex diseases are generally caused by intricate interactions of multiple genes and environmental factors. Most available linkage and association methods are developed to identify individual susceptibility genes assuming a simple disease model blind to any possible gene - gene and gene - environmental interactions. We used a set association method that uses single-nucleotide polymorphism markers to locate genetic variation responsible for complex diseases in which multiple genes are involved. Here we extended the set association method from bi-allelic to multiallelic markers. In addition, we studied the type I error rates and power for both approaches using simulations based on the coalescent process. Both bi-allelic set association (BSA) and multiallelic set association (MSA) tests have the correct type I error rates. In addition, BSA and MSA can have more power than individual marker analysis when multiple genes are involved in a complex disease. We applied the MSA approach to the simulated data sets from Genetic Analysis Workshop 13. High cholesterol level was used as the definitive phenotype for a disease. MSA failed to detect markers with significant linkage disequilibrium with genes responsible for cholesterol level. This is due to the wide spacing between the markers and the lack of association between the marker loci and the simulated phenotype.  相似文献   

5.
Huang S  Wang S  Liu N  Chen L  Oh C  Zhao H 《BMC genetics》2005,6(Z1):S51
Recombination during meiosis is one of the most important biological processes, and the level of recombination rates for a given individual is under genetic control. In this study, we conducted genome-wide association studies to identify chromosomal regions associated with recombination rates. We analyzed genotype data collected on the pedigrees in the Collaborative Study on the Genetics on Alcoholism data provided by Genetic Analysis Workshop 14. A total of 315 microsatellites and 10,081 single-nucleotide polymorphisms from Affymetrix on 22 autosomal chromosomes were used in our association analysis. Genome-wide gender-specific recombination counts for family founders were inferred first and association analysis was performed using multiple linear regressions. We used the positive false discovery rate (pFDR) to account for multiple comparisons in the two genome-wide scans. Eight regions showed some evidence of association with recombination counts based on the single-nucleotide polymorphism analysis after adjusting for multiple comparisons. However, no region was found to be significant using microsatellites.  相似文献   

6.
Oh C  Wang S  Liu N  Chen L  Zhao H 《BMC genetics》2005,6(Z1):S116
Common human disorders, such as alcoholism, may be the result of interactions of many genes as well as environmental risk factors. Therefore, it is important to incorporate gene x gene and gene x environment interactions in complex disease gene mapping. In this study, we applied a robust Bayesian genome screening method that can incorporate interaction effects to map genes underlying alcoholism through its application to the data of the Collaborative Studies on Genetics of Alcoholism provided by Genetic Analysis Workshop 14. Our Bayesian genome screening method uses the regression-based stochastic variable selection, coupled with the new Haseman-Elston method to identify markers linked to phenotypes of interest. Compared to traditional linkage methods based on single-gene disease models, our method allows for multilocus disease models for simultaneous screening including both main and interaction (epistatic) effects. It is conceptually simple and computationally efficient through the use of Gibbs sampler. We conducted genome-wide analysis and comparison between scans based on microsatellites and single-nucleotide polymorphisms. A total of 328 microsatellites and 11,560 single-nucleotide polymorphisms (by Affymetrix) on 22 autosomal chromosomes and sex chromosome were used.  相似文献   

7.
We previously reported a linkage region on chromosome 1p (LOD = 3.41) for genes controlling age at onset (AAO) in Parkinson disease (PD). This region overlaps with the previously reported PARK10 locus. To identify the gene(s) associated with AAO and risk of PD in this region, we first applied a genomic convergence approach that combined gene expression and linkage data. No significant results were found. Second, we performed association mapping across a 19.2-Mb region centered under the AAO linkage peak. An iterative association mapping approach was done by initially genotyping single-nucleotide polymorphisms at an average distance of 100 kb apart and then by increasing the density of markers as needed. Using the overall data set of 267 multiplex families, we identified six associated genes in the region, but further screening of a subset of 83 families linked to the chromosome 1 locus identified only two genes significantly associated with AAO in PD: the gamma subunit of the translation initiation factor EIF2B gene (EIF2B3), which was more significant in the linked subset and the ubiquitin-specific protease 24 gene (USP24). Unexpectedly, the human immunodeficiency virus enhancer-binding protein 3 gene (HIVEP3) was found to be associated with risk for susceptibility to PD. We used several criteria to define significant results in the presence of multiple testing, including criteria derived from a novel cluster approach. The known or putative functions of these genes fit well with the current suspected pathogenic mechanisms of PD and thus show great potential as candidates for the PARK10 locus.  相似文献   

8.
Combining several screening tests: optimality of the risk score   总被引:5,自引:0,他引:5  
McIntosh MW  Pepe MS 《Biometrics》2002,58(3):657-664
The development of biomarkers for cancer screening is an active area of research. While several biomarkers exist, none is sufficiently sensitive and specific on its own for population screening. It is likely that successful screening programs will require combinations of multiple markers. We consider how to combine multiple disease markers for optimal performance of a screening program. We show that the risk score, defined as the probability of disease given data on multiple markers, is the optimal function in the sense that the receiver operating characteristic (ROC) curve is maximized at every point. Arguments draw on the Neyman-Pearson lemma. This contrasts with the corresponding optimality result of classic decision theory, which is set in a Bayesian framework and is based on minimizing an expected loss function associated with decision errors. Ours is an optimality result defined from a strictly frequentist point of view and does not rely on the notion of associating costs with misclassifications. The implication for data analysis is that binary regression methods can be used to yield appropriate relative weightings of different biomarkers, at least in large samples. We propose some modifications to standard binary regression methods for application to the disease screening problem. A flexible biologically motivated simulation model for cancer biomarkers is presented and we evaluate our methods by application to it. An application to real data concerning two ovarian cancer biomarkers is also presented. Our results are equally relevant to the more general medical diagnostic testing problem, where results of multiple tests or predictors are combined to yield a composite diagnostic test. Moreover, our methods justify the development of clinical prediction scores based on binary regression.  相似文献   

9.
We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P 相似文献   

10.
For assessment of genetic association between single-nucleotide polymorphisms (SNPs) and disease status, the logistic-regression model or generalized linear model is typically employed. However, testing for deviation from Hardy-Weinberg proportion in a patient group could be another approach for genetic-association studies. The Hardy-Weinberg proportion is one of the most important principles in population genetics. Deviation from Hardy-Weinberg proportion among cases (patients) could provide additional evidence for the association between SNPs and diseases. To develop a more powerful statistical test for genetic-association studies, we combined evidence about deviation from Hardy-Weinberg proportion in case subjects and standard regression approaches that use case and control subjects. In this paper, we propose two approaches for combining such information: the mean-based tail-strength measure and the median-based tail-strength measure. These measures integrate logistic regression and Hardy-Weinberg-proportion tests for the study of the association between a binary disease outcome and an SNP on the basis of case- and control-subject data. For both mean-based and median-based tail-strength measures, we derived exact formulas to compute p values. We also developed an approach for obtaining empirical p values with the use of a resampling procedure. Results from simulation studies and real-disease studies demonstrate that the proposed approach is more powerful than the traditional logistic-regression model. The type I error probabilities of our approach were also well controlled.  相似文献   

11.
The 12 genome-wide association studies (GWAS) published to-date for late-onset Alzheimer's disease (LOAD) have identified over 40 candidate LOAD risk modifiers, in addition to apolipoprotein (APOE) ε4. A few of these novel LOAD candidate genes, namely BIN1, CLU, CR1, EXOC3L2 and PICALM, have shown consistent replication, and are thus credible LOAD susceptibility genes. To evaluate other promising LOAD candidate genes, we have added data from our large, case-control series (n=5,043) to meta-analyses of all published follow-up case-control association studies for six LOAD candidate genes that have shown significant association across multiple studies (TNK1, GAB2, LOC651924, GWA_14q32.13, PGBD1 and GALP) and for an additional nine previously suggested candidate genes. Meta-analyses remained significant at three loci after addition of our data: GAB2 (OR=0.78, p=0.007), LOC651924 (OR=0.91, p=0.01) and TNK1 (OR=0.92, p=0.02). Breslow-Day tests revealed significant heterogeneity between studies for GAB2 (p<0.0001) and GWA_14q32.13 (p=0.006). We have also provided suggestive evidence that PGBD1 (p=0.04) and EBF3 (p=0.03) are associated with age-at-onset of LOAD. Finally, we tested for interactions between these 15 genes, APOE ε4 and the five novel LOAD genes BIN1, CLU, CR1, EXOC3L2 and PICALM but none were significant after correction for multiple testing. Overall, this large, independent follow-up study for 15 of the top LOAD candidate genes provides support for GAB2 and LOC651924 (6q24.1) as risk modifiers of LOAD and novel associations between PGBD1 and EBF3 with age-at-onset.  相似文献   

12.
Copy-number variation (CNV) is a major contributor to human genetic variation. Recently, CNV associations with human disease have been reported. Many genome-wide association (GWA) studies in complex diseases have been performed with sets of biallelic single-nucleotide polymorphisms (SNPs), but the available CNV methods are still limited. We present a new method (TriTyper) that can infer genotypes in case-control data sets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. Analysis of 3102 unrelated individuals with European descent, genotyped with Illumina Infinium BeadChips, resulted in the identification of 1880 SNPs with a common untyped allele, and these SNPs are in strong LD with neighboring biallelic SNPs. Simulations indicate our method has superior power to detect associations compared to biallelic SNPs that are in LD with these SNPs, yet without increasing type I errors, as shown in a GWA analysis in celiac disease. Genotypes for 1204 triallelic SNPs could be fully imputed, with only biallelic-genotype calls, permitting association analysis of these SNPs in many published data sets. We estimate that 682 of the 1655 unique loci reflect deletions; this is on average 99 deletions per individual, four times greater than those detected by other methods. Whereas the identified loci are strongly enriched for known deletions, 61% have not been reported before. Genes overlapping with these loci more often have paralogs (p = 0.006) and biologically interact with fewer genes than expected (p = 0.004).  相似文献   

13.
Autosomal dominant high myopia, a genetic disorder already mapped to region 18p11.31, is common in Carloforte (Sardinia, Italy), an isolated village of 8,000 inhabitants descending from a founder group of 300 in the early 1700s. Fifteen myopic propositi and 36 normal controls were selected for not having ancestors in common at least up to the grandparental generation, although still descendants of the original founders. All subjects were genotyped for 14 markers located on autosome 18 at a resolution of about 10 cM. Allelic distributions were found to be similar at all tested loci in propositi and controls, except for the candidate marker D18S63 known to segregate in close linkage association with high myopia. In particular, the frequency of allele 85 among the propositi was almost double that of the controls (Fisher's exact test, p = 0.037). The association is more striking when the frequency of the genotype 85/85 in the two groups is compared (Fisher's exact test, p = 0.005). This conclusion was further evaluated through a bootstrap analysis by computing the overall probability of the observed data under the null hypothesis (i.e. no difference between the two groups in frequency distributions for the chromosome 18 markers). Again, marker D18S63 was found to have a sample probability lower than 0.004, which is significant at the 0.05 level after correcting for simultaneous testing of multiple loci. The study demonstrates the efficiency of our novel strategy to detect identity by descent (IBD) in small numbers of patients and controls when they are both part of well-defined Mendelian breeding units (MBUs). The iterative application of our strategy in separate MBUs is expected to become the method of choice to evaluate the ever-growing number of reported associations between candidate genes and multifactorial traits and diseases.  相似文献   

14.
Liu PY  Lu Y  Deng HW 《Genetics》2006,174(1):499-509
Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis.  相似文献   

15.
Coronary artery disease, heart failure, fatal arrhythmias, stroke, and renal disease are the most common causes of mortality for humans, and essential hypertension remains a major risk factor. Elucidation of susceptibility loci for essential hypertension has been difficult because of its complex, multifactorial nature involving genetic, environmental, and sex- and age-dependent nature. We investigated whether the 11p15.5 region syntenic to rat chromosome 1 region containing multiple blood pressure quantitative trait loci (QTL) detected in Dahl rat intercrosses harbors polymorphisms that contribute to susceptibility/resistance to essential hypertension in a Sardinian population. Initial testing performed using microsatellite markers spanning 18 Mb of 11p15.5 detected a strong association between D11S1318 (at 2.1 Mb, P = 0.004) and D11S1346 (at 10.6 Mb, P = 0.00000004), suggesting that loci in close proximity to these markers may contribute to susceptibility in our Sardinian cohort. NLR family, pyrin domain containing 6/angiotensin-vasopressin receptor (NLRP6/AVR), and adrenomedullin (ADM) are in close proximity to D11S1318 and D11S1346, respectively; thus we tested single nucleotide polymorphisms (SNPs) within NLRP6/AVR and ADM for their association with hypertension in our Sardinian cohort. Upon sex stratification, we detected one NLRP6/AVR SNP associated with decreased susceptibility to hypertension in males (rs7948797G, P = 0.029; OR = 0.73 [0.57–0.94]). For ADM, sex-specific analysis showed a significant association between rs4444073C, with increased susceptibility to essential hypertension only in the male population (P = 0.006; OR = 1.44 [1.13–1.84]). Our results revealed an association between NLRP6/AVR and ADM loci with male essential hypertension, suggesting the existence of sex-specific NLRP6/AVR and ADM variants affecting male susceptibility to essential hypertension.  相似文献   

16.
Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provides important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.  相似文献   

17.
We apply a high-throughput protocol of chip-based mass spectrometry (matrix-assisted laser desorption/ionization time-of-flight; MALDI-TOF) as a method of screening for differences in single-nucleotide polymorphism (SNP) allele frequencies. Using pooled DNA from individuals with asthma, Crohn's disease (CD), schizophrenia, type 1 diabetes (T1D), and controls, we selected 534 SNPs from an initial set of 1435 SNPs spanning a 25-Mb region on chromosome 6p21. The standard deviations of measurements of time of flight at different dots, from different PCRs, and from different pools indicate reliable results on each analysis step. In 90% of the disease-control comparisons we found allelic differences of <10%. Of the T1D samples, which served as a positive control, 10 SNPs with significant differences were observed after taking into account multiple testing. Of these 10 SNPs, 5 are located between DQB1 and DRB1, confirming the known association with the DR3 and DR4 haplotypes whereas two additional SNPs also reproduced known associations of T1D with DOB and LTA. In the CD pool also, two earlier described associations were found with SNPs close to DRB1 and MICA. Additional associations were found in the schizophrenia and asthma pools. They should be confirmed in individual samples or can be used to develop further quality criteria for accepting true differences between pools. The determination of SNP allele frequencies in pooled DNA appears to be of value in assigning further genotyping priorities also in large linkage regions.  相似文献   

18.
We applied a recently developed multilocus association testing method (localized haplotype clustering) to Wellcome Trust Case Control Consortium data (14,000 cases of seven common diseases and 3,000 shared controls genotyped on the Affymetrix 500 K array). After rigorous data quality filtering, we identified three disease-associated loci with strong statistical support from localized haplotype cluster tests but with only marginal significance in single marker tests. These loci are chromosomes 10p15.1 with type 1 diabetes (p = 5.1 × 10−9), 12q15 with type 2 diabetes (p = 1.9 × 10−7) and 15q26.2 with hypertension (p = 2.8 × 10−8). We also detected the association of chromosome 9p21.3 with type 2 diabetes (p = 2.8 × 10−8), although this locus did not pass our stringent genotype quality filters. The association of 10p15.1 with type 1 diabetes and 9p21.3 with type 2 diabetes have both been replicated in other studies using independent data sets. Overall, localized haplotype cluster analysis had better success detecting disease associated variants than a previous single-marker analysis of imputed HapMap SNPs. We found that stringent application of quality score thresholds to genotype data substantially reduced false-positive results arising from genotype error. In addition, we demonstrate that it is possible to simultaneously phase 16,000 individuals genotyped on genome-wide data (450 K markers) using the Beagle software package. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

19.
A combined linkage-physical map of the human genome   总被引:18,自引:0,他引:18       下载免费PDF全文
We have constructed de novo a high-resolution genetic map that includes the largest set, to our knowledge, of polymorphic markers (N=14,759) for which genotype data are publicly available; that combines genotype data from both the Centre d'Etude du Polymorphisme Humain (CEPH) and deCODE pedigrees; that incorporates single-nucleotide polymorphisms; and that also incorporates sequence-based positional information. The position of all markers on our map is corroborated by both genomic sequence and recombination-based data. This specific combination of features maximizes marker inclusion, coverage, and resolution, making this map uniquely suitable as a comprehensive resource for determining genetic map information (order and distances) for any large set of polymorphic markers.  相似文献   

20.
The presence of systemic lupus erythematosus (SLE) susceptibility genes on chromosome 20 is suggested by the observation of genetic linkage in several independent SLE family collections. To further localize the genetic effects, we typed 59 microsatellites in the two best regions, as defined by genome screens. Genotypes were analyzed for statistical linkage and/or association with SLE, by use of a combination of nonparametric linkage methods, family-based tests of association (transmission/disequilibrium and pedigree disequilibrium tests), and haplotype-sharing statistics (haplotype runs test), in a set of 230 SLE pedigrees. Maximal evidence for linkage to SLE was to 20p12 (LOD = 2.84) and 20q13.1 (LOD = 1.64) in the white pedigrees. Subsetting families on the basis of evidence for linkage to 16q12 significantly improved the LOD scores at both chromosome 20 locations (20p12 LOD = 5.06 and 20q13 LOD = 3.65), consistent with epistasis. We then typed 162 single-nucleotide polymorphism markers across a 1.3-Mb candidate region on 20q13.1 and identified several SNPs that demonstrated significant evidence for association. These data provide additional support for linkage and association to 20p12 and 20q13.1 in SLE and further refine the intervals of interest. These data further suggest the possibility of epistatic relationships among loci within the 20q12, 20q13, and 16q12 regions in SLE families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号