首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
OBJECTIVES: Discrete blocks of low haplotype diversity exist within the human genome. The non-redundant subset of 'haplotype tagging' single nucleotide polymorphisms (htSNPs) in such blocks can distinguish a majority of the haplotypes. Several approaches have been proposed to determine htSNPs, ranging from visual inspection to formal analytic procedures. Optimal htSNPs can be estimated using a small subgroup of an association study population that have been genotyped for a dense SNP map, and it is just these htSNPs that are genotyped in the remainder of the samples. We investigated by simulation how the size of the subsample affects the power of association studies, and what type of subjects it should include. METHODS: We used the program tagSNPs [Stram et al., Hum Hered 2003;55:27-36], which selects htSNPs to minimize the uncertainty in predicting common haplotypes for individuals with unphased genotype data. RESULTS: On average, 27% of the SNPs were designated as htSNPs. Genotyping as few as 25 unphased individuals to select the htSNPs did not appear to reduce the power of an association study, as compared with using all SNPs. For the disease models considered, selecting htSNPs based on cases, controls, or a mixture of both gave similar results. CONCLUSIONS: These results suggest that the genotyping effort in an association study can be substantially reduced with little loss of power by identifying htSNPs in a small subsample of individuals.  相似文献   

2.
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.  相似文献   

3.
To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03.  相似文献   

4.

Background  

There is recently great interest in haplotype block structure and haplotype tagging SNPs (htSNPs) in the human genome for its implication on htSNPs-based association mapping strategy for complex disease. Different definitions have been used to characterize the haplotype block structure in the human genome, and several different performance criteria and algorithms have been suggested on htSNPs selection.  相似文献   

5.
6.
Many investigators are now using haplotype-tagging single-nucleotide polymorphism (htSNPs) as a way of screening regions of the genome for association with disease. A common approach is to genotype htSNPs in a study population and to use this information to draw inferences about each individual's haplotypic makeup, including SNPs that were not directly genotyped. To test the validity of this approach, we simulated the exercise of typing htSNPs in a large sample of individuals and compared the true and inferred haplotypes. The accuracy of haplotype inference varied, depending on the method of selecting htSNPs, the linkage-disequilibrium structure of the region, and the amount of missing data. At the stage of selection of htSNPs, haplotype-block-based methods required a larger number of htSNPs than did unstructured methods but gave lower levels of error in haplotype inference, particularly when there was a significant amount of missing data. We present a Web-based utility that allows investigators to compare the likely error rates of different sets of htSNPs and to arrive at an economical set of htSNPs that provides acceptable levels of accuracy in haplotype inference.  相似文献   

7.

Background

The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map.

Results

Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent.

Conclusion

For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results.  相似文献   

8.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

9.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed with regions of low LD. Such LD patterns make it possible to select a set of single nucleotide polymorphism (SNPs; tag SNPs) for genome-wide association studies. We have developed a suite of computer programs to analyze the block-like LD patterns and to select the corresponding tag SNPs. Compared to other programs for haplotype block partitioning and tag SNP selection, our program has several notable features. First, the dynamic programming algorithms implemented are guaranteed to find the block partition with minimum number of tag SNPs for the given criteria of blocks and tag SNPs. Second, both haplotype data and genotype data from unrelated individuals and/or from general pedigrees can be analyzed. Third, several existing measures/criteria for haplotype block partitioning and tag SNP selection have been implemented in the program. Finally, the programs provide flexibility to include specific SNPs (e.g. non-synonymous SNPs) as tag SNPs. AVAILABILITY: The HapBlock program and its supplemental documents can be downloaded from the website http://www.cmb.usc.edu/~msms/HapBlock.  相似文献   

10.
In the 'indirect' method of detecting genetic associations between a trait and a DNA variant, we type several markers in a gene or chromosome region of linkage disequilibrium. If there is association between markers and the trait, we presume the existence of one or more causal polymorphisms in the region. In order to obtain a sufficiently dense set of markers it will almost always be necessary to use single nucleotide polymorphisms (SNPs). Although there is an emerging literature on methods for choosing an optimal set of 'haplotype tag SNPs' (htSNPs) to detect association between a genetic region and a trait, less attention has been given to the problem of how such studies should be analysed when completed, and how the initial data which was used to select the htSNPs should be incorporated into the analysis. This paper discusses this problem for both population- and family-based association studies. The role of the R2 measure of association between a causal locus and various methods of scoring of marker haplotypes is highlighted. In most cases, the simplest method of scoring (locus coding), which does not require phase resolution, is shown generally to be more powerful than scoring methods that include haplotype information. A new 'multi-locus TDT' is also proposed.  相似文献   

11.
Evaluating the patterns of linkage disequilibrium (LD) is important for association mapping study as well as for studying the genomic architecture of human genome (e.g., haplotype block structures). Commonly used bi-allelic pairwise measures for assessing LD between two loci, such as r 2 and D′, may not make full and efficient use of modern multilocus data. Though extended to multilocus scenarios, their performance is still questionable. Meanwhile, most existing measures for an entire multilocus region, such as normalized entropy difference, do not consider existence of LD heterogeneity across the region under investigation. Additionally, these existing multilocus measures cannot handle distant regions where long-range LD patterns may exist. In this study, we proposed a novel multilocus LD measure developed based on mutual information theory. Our proposed measure described LD pattern between two chromosome regions each of which may consist of multiple loci (including multi-allele loci). As such, the proposed measure can better characterize LD patterns between two arbitrary regions. As potential applications, we developed algorithms on the proposed measure for partitioning haplotype blocks and for selecting haplotype tagging SNPs (htSNPs), which were helpful for follow-up association tests. The results on both simulated and empirical data showed that our LD measure had distinct advantages over pairwise and other multilocus measures. First, our measure was more robust, and can capture comprehensively the LD information between neighboring as well as disjointed regions. Second, haplotype blocks were better described via our proposed measure. Furthermore, association tests with htSNPs from the proposed algorithm had improved power over tests on single markers and on haplotypes.  相似文献   

12.

Background

Somatic alterations of cyclin-dependent kinase 2 (CDK2)-cyclin E complex have been shown to contribute to breast cancer (BC) development and progression. This study aimed to explore the effects of single nucleotide polymorphisms (SNPs) in CDK2 and CCNE1 (a gene encoding G1/S specific cyclin E1 protein, formerly called cyclin E) on BC risk, progression and survival in a Chinese Han population.

Methodology/Principal Findings

We herein genotyped 6 haplotype-tagging SNPs (htSNPs) of CCNE1 and 2 htSNPs of CDK2 in 1207 BC cases and 1207 age-matched controls among Chinese Han women, and then reconstructed haplotype blocks according to our genotyping data and linkage disequilibrium status of these htSNPs. For CCNE1, the minor allele homozygotes of three htSNPs were associated with BC risk (rs3218035: adjusted odds ratio [aOR] = 3.35, 95% confidence interval [CI] = 1.69–6.67; rs3218038: aOR = 1.81, 95% CI = 1.22–2.70; rs3218042: aOR = 2.64, 95% CI = 1.31–5.34), and these three loci showed a dose-dependent manner in increasing BC risk (P trend = 0.0001). Moreover, the 5-SNP haplotype CCGTC, which carried none of minor alleles of the 3 at-risk SNPs, was associated with a favorable event-free survival (hazard ratio [HR] = 0.53, 95% CI = 0.32–0.90). Stratified analysis suggested that the minor-allele homozygote carriers of rs3218038 had a worse event-free survival among patients with aggressive tumours (in tumour size>2 cm group: HR = 2.06, 95% CI = 1.06–3.99; in positive lymph node metastasis group: HR = 2.41, 95% CI = 1.15–5.03; in stage II–IV group: HR = 2.03, 95% CI = 1.09–3.79). For CDK2, no significant association was found.

Conclusions/Significance

This study indicates that genetic variants in CCNE1 may contribute to BC risk and survival in Chinese Han population. They may become molecular markers for individual evaluation of BC susceptibility and prognosis. Nevertheless, further validation studies are needed.  相似文献   

13.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome‐wide SNP discovery, many genome‐wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome‐wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM‐based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010  相似文献   

14.
β2-微球蛋白(β2-microglobulin, β2m)作为MHCⅠ类分子的亚基, 在鱼类的免疫系统中发挥重要作用。实验采用直接测序法从P0代尼罗罗非鱼(Oreochromis niloticus)的β2m基因组序列中筛选到30个SNPs, 其中1个SNP位于5?UTR, 16个SNPs位于外显子区域(15个非同义突变位点, 1个同义突变位点), 9个SNPs位于内含子区域, 4个SNPs位于3?UTR。利用snapshot分型法对F1代的102尾易感群体和102尾抗病群体进行基因分型, 并通过Popgen32和PIC-CALC软件统计分析尼罗罗非鱼β2m基因序列的SNPs的He、Ho、Ne和PIC等遗传参数, 表明易感群体中7个SNPs属于中度多态水平(0.251代2个群体中的基因型频率和等位基因频率, 分析其与链球菌抗性或易感性状之间的相关性。结果表明: 24个SNPs的基因型和等位基因频率与无乳链球菌(Streptococcus agalactiae)抗性/易感性状显著相关(P<0.05)。通过连锁不平衡分析发现30个SNPs构成4个单倍块和14种单倍型。其中, 4个单倍型与无乳链球菌抗性性状显著相关(P<0.05), 4个单倍型与易感性状显著相关(P<0.05)。标签SNP分析发现, 单倍块2中的4个SNPs和单倍块3中的13个SNPs彼此之间高度连锁(r2>0.9), 这意味着我们在β2m基因中发现2个htSNPs。研究筛选到的与链球菌抗性/易感性状相关的SNP位点及单倍型具有辅助尼罗罗非鱼抗链球菌病品种选育的潜力。  相似文献   

15.
We have created a program that searches densely genotyped regions for associated non-contiguous haplotypes using a standard family based haplotype association test. This program was designed to expand upon the 'sliding window' methodologies commonly used for haplotype construction by allowing the association of subsets of single nucleotide polymorphisms (SNPs) to drive the construction of the haplotype. This strategy permits HaploBuild to construct more biologically relevant haplotypes that are not constrained by arbitrary length and contiguous orientation. Availability: http://snp.bumc.bu.edu.  相似文献   

16.
Single nucleotide polymorphisms (SNPs) are plentiful in most genomes and amenable to high throughput genotyping, but they are not yet popular for parentage or paternity analysis. The markers are bi-allelic, so individually they contain little information about parentage, and in nonmodel organisms the process of identifying large numbers of unlinked SNPs can be daunting. We explore the possibility of using blocks of between three and 26 linked SNPs as highly polymorphic molecular markers for reconstructing male genotypes in polyandrous organisms with moderate (five offspring) to large (25 offspring) clutches of offspring. Haplotypes are inferred for each block of linked SNPs using the programs Haplore and Phase 2.1. Each multi-SNP haplotype is then treated as a separate allele, producing a highly polymorphic, 'microsatellite-like' marker. A simulation study is performed using haplotype frequencies derived from empirical data sets from Drosophila melanogaster and Mus musculus populations. We find that the markers produced are competitive with microsatellite loci in terms of single parent exclusion probabilities, particularly when using six or more linked SNPs to form a haplotype. These markers contain only modest rates of missing data and genotyping or phasing errors and thus should be seriously considered as molecular markers for parentage analysis, particularly when the study is interested in the functional significance of polymorphisms across the genome.  相似文献   

17.
MOTIVATION: With the recent availability of large-scale data sets profiling single nucleotide polymorphisms (SNPs) and quantitative traits data across different human subpopulations, there has been much attention directed towards discovering patterns of genetic variation and their connection to gene regulation and the onset/progression of disease. While previous work has focused primarily on correlating individual SNP markers with gene expression and disease, it has been suggested that using haplotype blocks instead of individual markers can significantly increase statistical power. RESULTS: We present BlockMapper, a probabilistic generative model for genotype data and quantitative traits data, such as gene expression or phenotype measurements. BlockMapper discovers the block structure of genotype data and associates these inferred blocks to patterns of variation in quantitative traits data, whilst accounting for non-genetic factors. Our model achieves high accuracy for predicting Crohn's disease phenotype in Chromosome 5q31 and reveals novel cis-associations between two haplotype blocks in the ENm006 genomic region and GDI1, a gene implicated in X-linked mental retardation. Our results underscore the importance of accounting for the influence of large sets of SNPs on patterns of regulatory/phenotypic variation and represent a step towards an understanding of human genetic variation.  相似文献   

18.
Single-nucleotide polymorphisms (SNPs) are rapidly replacing microsatellites as the markers of choice for genetic linkage studies and many other studies of human pedigrees. Here, we describe an efficient approach for modeling linkage disequilibrium (LD) between markers during multipoint analysis of human pedigrees. Using a gene-counting algorithm suitable for pedigree data, our approach enables rapid estimation of allele and haplotype frequencies within clusters of tightly linked markers. In addition, with the use of a hidden Markov model, our approach allows for multipoint pedigree analysis with large numbers of SNP markers organized into clusters of markers in LD. Simulation results show that our approach resolves previously described biases in multipoint linkage analysis with SNPs that are in LD. An updated version of the freely available Merlin software package uses the approach described here to perform many common pedigree analyses, including haplotyping and haplotype frequency estimation, parametric and nonparametric multipoint linkage analysis of discrete traits, variance-components and regression-based analysis of quantitative traits, calculation of identity-by-descent or kinship coefficients, and case selection for follow-up association studies. To illustrate the possibilities, we examine a data set that provides evidence of linkage of psoriasis to chromosome 17.  相似文献   

19.
Single nucleotide polymorphisms (SNPs) are widely used when investigators try to map complex disease genes. Although biallelic SNP markers are less informative than microsatellite markers, one can increase their information content by using haplotypes. However, assigning haplotypes (i.e., assigning phase) correctly can be problematic in the presence of SNP heterozygosity. For example, a doubly heterozygous individual, with genotype 12, 12, could have haplotypes 1-1/2-2 or 1-2/2-1 with equal probability; in the absence of additional information, there is no way to determine which haplotype is correct. Thus an algorithm that assigns haplotypes to such an individual will assign the wrong one 50% of the time. We have studied the frequency of haplotype misassignments, i.e., haplotypes that are misassigned solely because of inherent marker ambiguity (not because of errors in genotyping or calculation). We examined both SNPs and microsatellite markers. We used the computer programs GENEHUNTER and SIMWALK to assign the haplotypes. We simulated (a) families with 1-5 children, (b) haplotypes involving different numbers of marker loci (3, 5, 7 and 10 loci, all in linkage equilibrium), and (c) different allele frequencies. Misassignment rates are highest (a) in small families, (b) with many SNP loci, and (c) for loci with the greatest heterozygosity (i.e., where both alleles have frequency 0.5). For example, for triads (i.e., one-child families with both parents genotyped), misassignment rates for SNPs can reach almost 50%. Family sizes of 4-5 children are required in order to ensure a misassignment frequency of < or = 5% for ten-SNP haplotypes with allele frequencies of 0.25-0.5. For microsatellites, a family size of at least 2-3 children is necessary to keep haplotyping misassignments < or = 5%. Finally, we point out that it is misleading for a computer program to yield haplotype assignments without indicating that they may have been misassigned, and we discuss the implications of these misassignments for association and linkage analysis.  相似文献   

20.
Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号