共查询到20条相似文献,搜索用时 15 毫秒
1.
Haplotype block structure is conserved across mammals 总被引:2,自引:0,他引:2
Genetic variation in genomes is organized in haplotype blocks, and species-specific block structure is defined by differential contribution of population history effects in combination with mutation and recombination events. Haplotype maps characterize the common patterns of linkage disequilibrium in populations and have important applications in the design and interpretation of genetic experiments. Although evolutionary processes are known to drive the selection of individual polymorphisms, their effect on haplotype block structure dynamics has not been shown. Here, we present a high-resolution haplotype map for a 5-megabase genomic region in the rat and compare it with the orthologous human and mouse segments. Although the size and fine structure of haplotype blocks are species dependent, there is a significant interspecies overlap in structure and a tendency for blocks to encompass complete genes. Extending these findings to the complete human genome using haplotype map phase I data reveals that linkage disequilibrium values are significantly higher for equally spaced positions in genic regions, including promoters, as compared to intergenic regions, indicating that a selective mechanism exists to maintain combinations of alleles within potentially interacting coding and regulatory regions. Although this characteristic may complicate the identification of causal polymorphisms underlying phenotypic traits, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions. 相似文献
2.
OBJECTIVES: Discrete blocks of low haplotype diversity exist within the human genome. The non-redundant subset of 'haplotype tagging' single nucleotide polymorphisms (htSNPs) in such blocks can distinguish a majority of the haplotypes. Several approaches have been proposed to determine htSNPs, ranging from visual inspection to formal analytic procedures. Optimal htSNPs can be estimated using a small subgroup of an association study population that have been genotyped for a dense SNP map, and it is just these htSNPs that are genotyped in the remainder of the samples. We investigated by simulation how the size of the subsample affects the power of association studies, and what type of subjects it should include. METHODS: We used the program tagSNPs [Stram et al., Hum Hered 2003;55:27-36], which selects htSNPs to minimize the uncertainty in predicting common haplotypes for individuals with unphased genotype data. RESULTS: On average, 27% of the SNPs were designated as htSNPs. Genotyping as few as 25 unphased individuals to select the htSNPs did not appear to reduce the power of an association study, as compared with using all SNPs. For the disease models considered, selecting htSNPs based on cases, controls, or a mixture of both gave similar results. CONCLUSIONS: These results suggest that the genotyping effort in an association study can be substantially reduced with little loss of power by identifying htSNPs in a small subsample of individuals. 相似文献
3.
Haplotype diversity and the block structure of linkage disequilibrium 总被引:11,自引:0,他引:11
Stumpf MP 《Trends in genetics : TIG》2002,18(5):226-228
Several recent studies indicate that patterns of linkage disequilibrium in the human genome cannot be reconciled with a uniform distribution of recombination events, but crossovers appear to be localized in short hot-spots that separate longer stretches of DNA. Markers within these low-recombination blocks show increased levels of linkage disequilibrium and very low haplotype diversity. This could simplify study of the genetic basis of complex diseases if causal variants are common. 相似文献
4.
Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data 总被引:13,自引:0,他引:13 下载免费PDF全文
Recent studies have shown that the human genome has a haplotype block structure such that it can be decomposed into large blocks with high linkage disequilibrium (LD) and relatively limited haplotype diversity, separated by short regions of low LD. One of the practical implications of this observation is that only a small fraction of all the single-nucleotide polymorphisms (SNPs) (referred as "tag SNPs") can be chosen for mapping genes responsible for human complex diseases, which can significantly reduce genotyping effort, without much loss of power. Algorithms have been developed to partition haplotypes into blocks with the minimum number of tag SNPs for an entire chromosome. In practice, investigators may have limited resources, and only a certain number of SNPs can be genotyped. In the present article, we first formulate this problem as finding a block partition with a fixed number of tag SNPs that can cover the maximal percentage of the whole genome, and we then develop two dynamic programming algorithms to solve this problem. The algorithms are sufficiently flexible to permit knowledge of functional polymorphisms to be considered. We apply the algorithms to a data set of SNPs on human chromosome 21, combining the information of coding and noncoding regions. We study the density of SNPs in intergenic regions, introns, and exons, and we find that the SNP density in intergenic regions is similar to that in introns and is higher than that in exons, results that are consistent with previous studies. We also calculate the distribution of block break points in intergenic regions, genes, exons, and coding regions and do not find any significant differences. 相似文献
5.
In a two stage genome-wide association study (2S-GWAS), a sample of cases and controls is allocated into two groups, and genetic markers are analyzed sequentially with respect to these groups. For such studies, experimental design considerations have primarily focused on minimizing study cost as a function of the allocation of cases and controls to stages, subject to a constraint on the power to detect an associated marker. However, most treatments of this problem implicitly restrict the set of feasible designs to only those that allocate the same proportions of cases and controls to each stage. In this paper, we demonstrate that removing this restriction can improve the cost advantages demonstrated by previous 2S-GWAS designs by up to 40%. Additionally, we consider designs that maximize study power with respect to a cost constraint, and show that recalculated power maximizing designs can recover a substantial amount of the planned study power that might otherwise be lost if study funding is reduced. We provide open source software for calculating cost minimizing or power maximizing 2S-GWAS designs. 相似文献
6.
Two-stage designs for gene-disease association studies 总被引:2,自引:0,他引:2
The goal of this article is to describe a two-stage design that maximizes the power to detect gene-disease associations when the principal design constraint is the total cost, represented by the total number of gene evaluations rather than the total number of individuals. In the first stage, all genes of interest are evaluated on a subset of individuals. The most promising genes are then evaluated on additional subjects in the second stage. This will eliminate wastage of resources on genes unlikely to be associated with disease based on the results of the first stage. We consider the case where the genes are correlated and the case where the genes are independent. Using simulation results, it is shown that, as a general guideline when the genes are independent or when the correlation is small, utilizing 75% of the resources in stage 1 to screen all the markers and evaluating the most promising 10% of the markers with the remaining resources provides near-optimal power for a broad range of parametric configurations. This translates to screening all the markers on approximately one quarter of the required sample size in stage 1. 相似文献
7.
Association mapping has successfully identified common SNPs associated with many diseases. However, the inability of this class of variation to account for most of the supposed heritability has led to a renewed interest in methods - primarily linkage analysis - to detect rare variants. Family designs allow for control of population stratification, investigations of questions such as parent-of-origin effects and other applications that are imperfectly or not readily addressed in case-control association studies. This article guides readers through the interface between linkage and association analysis, reviews the new methodologies and provides useful guidelines for applications. Just as effective SNP-genotyping tools helped to realize the potential of association studies, next-generation sequencing tools will benefit genetic studies by improving the power of family-based approaches. 相似文献
8.
We consider the effect of informative missingness on association tests that use parental genotypes as controls and that allow for missing parental data. Parental data can be informatively missing when the probability of a parent being available for study is related to that parent's genotype; when this occurs, the distribution of genotypes among observed parents is not representative of the distribution of genotypes among the missing parents. Many previously proposed procedures that allow for missing parental data assume that these distributions are the same. We propose association tests that behave well when parental data are informatively missing, under the assumption that, for a given trio of paternal, maternal, and affected offspring genotypes, the genotypes of the parents and the sex of the missing parents, but not the genotype of the affected offspring, can affect parental missingness. (This same assumption is required for validity of an analysis that ignores incomplete parent-offspring trios.) We use simulations to compare our approach with previously proposed procedures, and we show that if even small amounts of informative missingness are not taken into account, they can have large, deleterious effects on the performance of tests. 相似文献
9.
10.
随着测序技术的发展和全基因组序列的不断积累,全基因组关联研究(genome-wide association study, GWAS)在人类复杂疾病研究中取得了丰硕成果,10余年间发现了数以万计的疾病风险因子。同样,GWAS也为探索细菌表型的遗传机制提供了新的工具。自2013年第一项细菌GWAS(bacterial GWAS, BGWAS)工作发表以来,目前已有10多项相关研究报道,分别揭示了细菌宿主适应性、耐药性及毒力等表型的遗传机制,极大加深了人们对细菌遗传、进化及传播等方面的认识。本文对目前BGWAS的研究方法、应用成果及存在的问题进行了总结,并对BGWAS的研究前景进行了展望,旨在为微生物学领域开展BGWAS研究提供参考。 相似文献
11.
In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683-691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209-213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification. 相似文献
12.
One way to perform linkage-disequilibrium (LD) mapping of genetic traits is to use single markers. Since dense marker maps-such as single-nucleotide polymorphism and high-resolution microsatellite maps-are available, it is natural and practical to generalize single-marker LD mapping to high-resolution haplotype or multiple-marker LD mapping. This article investigates high-resolution LD-mapping methods, for complex diseases, based on haplotype maps or microsatellite marker maps. The objective is to explore test statistics that combine information from haplotype blocks or multiple markers. Based on two coding methods, genotype coding and haplotype coding, Hotelling's T2 statistics TG and TH are proposed to test the association between a disease locus and two haplotype blocks or two markers. The validity of the two T2 statistics is proved by theoretical calculations. A statistic TC, an extension of the traditional chi2 method of comparing haplotype frequencies, is introduced by simply adding the chi2 test statistics of the two haplotype blocks together. The merit of the three methods is explored by calculation and comparison of power and of type I errors. In the presence of LD between the two blocks, the type I error of TC is higher than that of TH and TG, since TC ignores the correlation between the two blocks. For each of the three statistics, the power of using two haplotype blocks is higher than that of using only one haplotype block. By power comparison, we notice that TC has higher power than that of TH, and TH has higher power than that of TG. In the absence of LD between the two blocks, the power of TC is similar to that of TH and higher than that of TG. Hence, we advocate use of TH in the data analysis. In the presence of LD between the two blocks, TH takes into account the correlation between the two haplotype blocks and has a lower type I error and higher power than TG. Besides, the feasibility of the methods is shown by sample-size calculation. 相似文献
13.
Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages. 相似文献
14.
Longmate JA 《American journal of human genetics》2001,68(5):1229-1237
A general method is described for estimation of the power and sample size of studies relating a dichotomous phenotype to multiple interacting loci and environmental covariates. Either a simple case-control design or more complex stratified sampling may be used. The method can be used to design individual studies, to evaluate the power of alternative test statistics for complex traits, and to examine general questions of study design through explicit scenarios. The method is used here to study how the power of association tests is affected by problems of allelic heterogeneity and to investigate the potential role for collective testing of sets of related candidate genes in the presence of locus heterogeneity. The results indicate that allele-discovery efforts are crucial and that omnibus tests or collective testing of alleles can be substantially more powerful than separate testing of individual allelic variants. Joint testing of multiple candidate loci can also dramatically improve power, despite model misspecification and inclusion of irrelevant loci, but requires an a priori hypothesis defining the set of loci to investigate. 相似文献
15.
The ability of genomewide association studies to decipher genetic traits is driven in part by how well the measured single-nucleotide polymorphisms "cover" the unmeasured causal variants. Estimates of coverage based on standard linkage-disequilibrium measures, such as the average maximum squared correlation coefficient (r2), can lead to inaccurate and inflated estimates of the power of genomewide association studies. In contrast, use of the "cumulative r2 adjusted power" measure presented here gives more-accurate estimates of power for genomewide association studies. 相似文献
16.
Cupples LA 《Current opinion in lipidology》2008,19(2):144-150
PURPOSE OF REVIEW: The past year has seen the publication of many genome-wide association studies, most of which are case-control studies. These publications are at the forefront of current research into the examination of genetic effects for numerous diseases, including diabetes, heart disease and cancer. Over the past 25 years the tour de force of genetics research has been in family studies, using segregation, linkage and association analyses. Are these approaches now passé? Here we discuss the role of family studies in modern genetics research, using results from the Framingham Heart Study as examples. RECENT FINDINGS: Family studies permit both linkage and association analyses. Importantly, family-based association tests that consider transmission of genetic variants within a family provide important information on the genetic etiology of disease traits and avoid the potential of false-positive findings due to population substructure. SUMMARY: Family-based study designs continue to contribute much to the modern era of genome-wide association studies. 相似文献
17.
Gu CC Chang YP Hunt SC Schwander K Arnett D Djousse L Heiss G Oberman A Lalouel JM Province M Chakravarti A Rao DC 《Human heredity》2005,60(3):164-176
OBJECTIVE: Function of the renin-angiotensin system is important to human hypertension, but its genetic etiology remains elusive. We set out to examine a hypothesis that multiple genetic variants in the system act together in blood pressure regulation, via intermediate phenotypes such as blood pressure reactivity. METHODS: A sample of 531 hypertensive cases and 417 controls was selected from the HyperGEN study. Hypertension-related traits including blood pressure responses to challenges to math test, handgrip and postural change (mathBP, gripBP, and postBP), and body mass index (BMI) were analyzed for association with 10 single nucleotide polymorphisms (SNPs) in the angiotensinogen (AGT) gene. Single-marker and haplotype analyses were performed to examine the effects of both individual and multiple variants. Multiple-trait profiling was used to assess interaction of latent intermediate factors with susceptible haplotypes. RESULTS: In Blacks, two SNPs in exon 5 and 3'UTR showed significant association with gripBP, and two promoter SNPs were strongly associated with postBP. In Whites, only borderline association was found for 2 promoter SNPs with mathBP. Haplotype analyses in Blacks confirmed association with gripBP, and detected significant association of a haplotype to BMI (p=0.029). With the interactions modeled, haplotype associations found in Blacks remain significant, while significant associations to BMI (p=0.009) and gripSBP emerged in Whites. CONCLUSION: Genetic variants in regulatory regions of AGT showed strong association with blood pressure reactivity. Interaction of promoter and genic SNPs in AGT revealed collective action of multiple variants on blood pressure reactivity and BMI both in Blacks and in Whites, possibly following different pathways. 相似文献
18.
Background
Genomewide association studies have resulted in a great many genomic regions that are likely to harbor disease genes. Thorough interrogation of these specific regions is the logical next step, including regional haplotype studies to identify risk haplotypes upon which the underlying critical variants lie. Pedigrees ascertained for disease can be powerful for genetic analysis due to the cases being enriched for genetic disease. Here we present a Monte Carlo based method to perform haplotype association analysis. Our method, hapMC, allows for the analysis of full-length and sub-haplotypes, including imputation of missing data, in resources of nuclear families, general pedigrees, case-control data or mixtures thereof. Both traditional association statistics and transmission/disequilibrium statistics can be performed. The method includes a phasing algorithm that can be used in large pedigrees and optional use of pseudocontrols. 相似文献19.
Jing Li 《Journal of computational biology》2008,15(3):241-257
Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale whole genome association studies. A multi-stage (or two-stage) design has been a promising alternative: in the first stage, only a fraction of samples are genotyped and tested using a dense set of SNPs, and only a small subset of markers that show moderate associations with the disease will be genotyped in later stages. Multi-stage designs have also been used in candidate gene association studies, usually in regions that have shown strong signals by linkage studies. To decide which set of SNPs to be genotyped in the next stage, a common practice is to utilize a simple test (such as a chi2 test for case-control data) and a liberal significance level without corrections for multiple testing, to ensure that no true signals will be filtered out. In this paper, I have developed a novel SNP selection procedure within the framework of multi-stage designs. Based on data from stage 1, the method explicitly explores correlations (linkage disequilibrium) among SNPs and their possible interactions in determining the disease phenotype. Comparing with a regular multi-stage design, the approach can select a much reduced set of SNPs with high discriminative power for later stages. Therefore, not only does it reduce the genotyping cost in later stages, it also increases the statistical power by reducing the number of tests. Combined analysis is proposed to further improve power, and the theoretical significance level of the combined statistic is derived. Extensive simulations have been performed, and results have shown that the procedure can reduce the number of SNPs required in later stages, with improved power to detect associations. The procedure has also been applied to a real data set from a genome-wide association study of the sporadic amyotrophic lateral sclerosis (ALS) disease, and an interesting set of candidate SNPs has been identified. 相似文献
20.
In the rapidly growing field of association mapping in plants, the use of (marker) haplotypes rather than single markers can be an effective way of improving detection power. Here, we highlight the information that can be obtained from deducing the historical relationships between haplotypes. The ordering of haplotype classes according to deduced historical relationships should further enhance association detection power, but can also be used to predict the genotypic and phenotypic values of unobserved germplasm. 相似文献