首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Family-based association tests for genomewide association scans   总被引:7,自引:1,他引:6       下载免费PDF全文
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ~860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ~6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free.  相似文献   

2.
Yang K  Moon JK  Jeong N  Back K  Kim HM  Jeong SC 《Genomics》2008,92(1):52-59
A complete genetic linkage map of the soybean, in which sequence-based (SB) genetic markers are evenly distributed genomewide, was constructed from an F(12) population composed of 113 recombinant inbred lines derived from an interspecific cross involving Korean genotypes Hwangkeum and IT182932. Several approaches were employed for the development of 112 novel SB markers targeting both the gaps and the ends of the linkage groups (LGs). The resultant map harbored 20 well-resolved LGs presumed to correspond to the 20 pairs of soybean chromosomes. The map allowed us to identify the important chromosomal structures that were not observed in the integrated genetic maps, to identify the new potentially gene-rich regions, to detect segregation distortion regions within the whole genome, and to extend the ends of the LGs. The results will facilitate the further discovery of agronomically relevant genetic loci in the heretofore neglected chromosomal regions and should also provide some important links between the soybean genetic, physical, and genome sequence maps in the regions.  相似文献   

3.
Because current molecular haplotyping methods are expensive and not amenable to automation, many researchers rely on statistical methods to infer haplotype pairs from multilocus genotypes, and subsequently treat these inferred haplotype pairs as observations. These procedures are prone to haplotype misclassification. We examine the effect of these misclassification errors on the false-positive rate and power for two association tests. These tests include the standard likelihood ratio test (LRTstd) and a likelihood ratio test that employs a double-sampling approach to allow for the misclassification inherent in the haplotype inference procedure (LRTae). We aim to determine the cost-benefit relationship of increasing the proportion of individuals with molecular haplotype measurements in addition to genotypes to raise the power gain of the LRTae over the LRTstd. This analysis should provide a guideline for determining the minimum number of molecular haplotypes required for desired power. Our simulations under the null hypothesis of equal haplotype frequencies in cases and controls indicate that (1) for each statistic, permutation methods maintain the correct type I error; (2) specific multilocus genotypes that are misclassified as the incorrect haplotype pair are consistently misclassified throughout each entire dataset; and (3) our simulations under the alternative hypothesis showed a significant power gain for the LRTae over the LRTstd for a subset of the parameter settings. Permutation methods should be used exclusively to determine significance for each statistic. For fixed cost, the power gain of the LRTae over the LRTstd varied depending on the relative costs of genotyping, molecular haplotyping, and phenotyping. The LRTae showed the greatest benefit over the LRTstd when the cost of phenotyping was very high relative to the cost of genotyping. This situation is likely to occur in a replication study as opposed to a whole-genome association study.  相似文献   

4.
Cui Y  Kang G  Sun K  Qian M  Romero R  Fu W 《Genetics》2008,179(1):637-650
Genes are the functional units in most organisms. Compared to genetic variants located outside genes, genic variants are more likely to affect disease risk. The development of the human HapMap project provides an unprecedented opportunity for genetic association studies at the genomewide level for elucidating disease etiology. Currently, most association studies at the single-nucleotide polymorphism (SNP) or the haplotype level rely on the linkage information between SNP markers and disease variants, with which association findings are difficult to replicate. Moreover, variants in genes might not be sufficiently covered by currently available methods. In this article, we present a gene-centric approach via entropy statistics for a genomewide association study to identify disease genes. The new entropy-based approach considers genic variants within one gene simultaneously and is developed on the basis of a joint genotype distribution among genetic variants for an association test. A grouping algorithm based on a penalized entropy measure is proposed to reduce the dimension of the test statistic. Type I error rates and power of the entropy test are evaluated through extensive simulation studies. The results indicate that the entropy test has stable power under different disease models with a reasonable sample size. Compared to single SNP-based analysis, the gene-centric approach has greater power, especially when there is more than one disease variant in a gene. As the genomewide genic SNPs become available, our entropy-based gene-centric approach would provide a robust and computationally efficient way for gene-based genomewide association study.  相似文献   

5.
Ambitious programs have recently been advocated or launched to create genomewide databases for meta-analysis of association between DNA markers and phenotypes of medical and/or social concern. A necessary but not sufficient condition for success in association mapping is that the data give accurate estimates of both genomic location and its standard error, which are provided for multifactorial phenotypes by composite likelihood. That class includes the Malecot model, which we here apply with an illustrative example. This preliminary analysis leads to five inferences: permutation of cases and controls provides a test of association free of autocorrelation; two hypotheses give similar estimates, but one is consistently more accurate; estimation of the false-discovery rate is extended to causal genes in a small proportion of regions; the minimal data for successful meta-analysis are inferred; and power is robust for all genomic factors except minor-allele frequency. An extension to meta-analysis is proposed. Other approaches to genome scanning and meta-analysis should, if possible, be similarly extended so that their operating characteristics can be compared.  相似文献   

6.
Genomewide association studies have been advocated as a promising alternative to genomewide linkage scans for detection of small-effect genes in complex diseases. Comparisons of power and sample size between the two strategies have shown considerable advantages for the association studies. These comparisons assume that the set of markers includes the exact disease-related polymorphism. A concern, however, is that the power of an association study decreases when this is not the case, because of discrepant allele frequencies and less-than-maximum disequilibrium between the disease-related polymorphism and its nearest marker. Here, we quantify this concern by comparing the sample sizes needed by the two strategies when the markers exclude the disease-related polymorphism. For affected sib pairs and their parents, we found that incomplete disequilibrium and differing allele frequencies can have substantial negative impact on the power of association studies, resulting, in some circumstances, in little gain and even in loss of power, compared with linkage analysis. We provide some guidelines for choosing between strategies, for the detection of genes for complex diseases.  相似文献   

7.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

8.
Misspecified relationships can have serious consequences for linkage studies, resulting in either reduced power or false-positive evidence for linkage. If some individuals in the pedigree are untyped, then Mendelian errors may not be observed. Previous approaches to detection of misspecified relationships by use of genotype data were developed for sib and half-sib pairs. We extend the likelihood calculations of G?ring and Ott and Boehnke and Cox to more-general relative pairs, for which identity-by-descent (IBD) status is no longer a Markov chain, and we propose a likelihood-ratio test. We also extend the identity-by-state (IBS)-based test of Ehm and Wagner to nonsib relative pairs. The likelihood-ratio test has high power, but its drawbacks include the need to construct and apply a separate Markov chain for each possible alternative relationship and the need for simulation to assess significance. The IBS-based test is simpler but has lower power. We propose two new test statistics-conditional expected IBD (EIBD) and adjusted IBS (AIBS)-designed to retain the simplicity of IBS while increasing power by taking into account chance sharing. In simulations, the power of EIBD is generally close to that of the likelihood-ratio test. The power of AIBS is higher than that of IBS, in all cases considered. We suggest a strategy of initial screening by use of EIBD and AIBS, followed by application of the likelihood-ratio test to only a subset of relative pairs, identified by use of EIBD and AIBS. We apply the methods to a Genetic Analysis Workshop 11 data set from the Collaborative Study on the Genetics of Alcoholism.  相似文献   

9.
Statistical methods to map quantitative trait loci (QTL) in outbred populations are reviewed, extensions and applications to human and plant genetic data are indicated, and areas for further research are identified. Simple and computationally inexpensive methods include (multiple) linear regression of phenotype on marker genotypes and regression of squared phenotypic differences among relative pairs on estimated proportions of identity-by-descent at a locus. These methods are less suited for genetic parameter estimation in outbred populations but allow the determination of test statistic distributions via simulation or data permutation; however, further inferences including confidence intervals of QTL location require the use of Monte Carlo or bootstrap sampling techniques. A method which is intermediate in computational requirements is residual maximum likelihood (REML) with a covariance matrix of random QTL effects conditional on information from multiple linked markers. Testing for the number of QTLs on a chromosome is difficult in a classical framework. The computationally most demanding methods are maximum likelihood and Bayesian analysis, which take account of the distribution of multilocus marker-QTL genotypes on a pedigree and permit investigators to fit different models of variation at the QTL. The Bayesian analysis includes the number of QTLs on a chromosome as an unknown.  相似文献   

10.
Application of association mapping to plant breeding populations has the potential to revolutionize plant genetics. The main objectives of this study were to (i) investigate the extent and genomic distribution of linkage disequilibrium (LD) between pairs of amplified fragment length polymorphism (AFLP) markers, (ii) compare these results with those obtained with simple sequence repeat (SSR) markers, and (iii) compare the usefulness of AFLP and SSR markers for genomewide association mapping in plant breeding populations. We examined LD in a cross-section of 72 European elite inbred lines genotyped with 452 AFLP and 93 SSR markers. LD was significant (p < 0.05) for about 15% of the AFLP marker pairs and for about 49% of the SSR marker pairs in each of the two germplasm groups, flint and dent. In both germplasm groups the ratio of linked to unlinked loci pairs in LD was higher for AFLPs than for SSRs. The observation of LD due to linkage for both marker types suggested that genome-wide association mapping should be possible using either AFLPs or SSRs. The results of our study indicated that SSRs should be favored over AFLPs but the opposite applies to populations with a long history of recombination.  相似文献   

11.
Haplotypes provide a more informative format of polymorphisms for genetic association analysis than do individual single-nucleotide polymorphisms. However, the practical efficacy of haplotype-based association analysis is challenged by a trade-off between the benefits of modeling abundant variation and the cost of the extra degrees of freedom. To reduce the degrees of freedom, several strategies have been considered in the literature. They include (1) clustering evolutionarily close haplotypes, (2) modeling the level of haplotype sharing, and (3) smoothing haplotype effects by introducing a correlation structure for haplotype effects and studying the variance components (VC) for association. Although the first two strategies enjoy a fair extent of power gain, empirical evidence showed that VC methods may exhibit only similar or less power than the standard haplotype regression method, even in cases of many haplotypes. In this study, we report possible reasons that cause the underpowered phenomenon and show how the power of the VC strategy can be improved. We construct a score test based on the restricted maximum likelihood or the marginal likelihood function of the VC and identify its nontypical limiting distribution. Through simulation, we demonstrate the validity of the test and investigate the power performance of the VC approach and that of the standard haplotype regression approach. With suitable choices for the correlation structure, the proposed method can be directly applied to unphased genotypic data. Our method is applicable to a wide-ranging class of models and is computationally efficient and easy to implement. The broad coverage and the fast and easy implementation of this method make the VC strategy an effective tool for haplotype analysis, even in modern genomewide association studies.  相似文献   

12.
RFLP and RAPD markers were evaluated and compared for their ability to determine genetic relationships in a set of three B. napus breeding lines. Using a total of 50 RFLP and 92 RAPD markers, the relatedness between the lines was determined. In total, the RFLP and the RAPD analysis revealed more than 500 and 400 bands, respectively. The relative frequencies of loci with allele differences were estimated from the band data. The RFLP and RAPD marker sets detected very similar relationships among the three lines, consistent with known pedigree data. Bootstrap analyses showed that the use of approximately 30 probes or primers would have been sufficient to achieve these relationships. This indicates that RAPD markers have the same resolving power as RFLP markers when used on exactly the same set of B. napus genotypes. Since RAPD markers are easier and quicker to use, these markers may be preferred in applications where the relationships between closely-related breeding lines are of interest. The use of RAPD markers in fingerprinting applications may, however, not be warranted, and this is discussed in relation to the reliability of RAPD markers.  相似文献   

13.
The heritability of quantitative traits, or the proportion of phenotypic variation due to additive genetic or heritable effects, plays an important role in determining the evolutionary response to natural selection. Most quantitative genetic studies are performed in the laboratory, due to difficulty in obtaining genealogical data in natural populations. Genealogies are known, however, from a unique 20-year study of toque macaques (Macaca sinica) at Polonnaruwa, Sri Lanka. Heritability in this natural population was, therefore, estimated. Twenty-seven body measurements representing the lengths and widths of the head, trunk, extremities, and tail were collected from 270 individuals. The sample included 172 offspring-mother pairs from 39 different matrilineal families. Heritabilities were estimated using traditional mother-offspring regression and maximum likelihood methods which utilize all genealogical relationships in the sample. On the common assumption that environmental (including social) factors affecting morphology were randomly distributed across families, all but two of the traits (25 of 27) were significantly heritable, with an average heritability of 0.51 for the mother-offspring analysis and 0.56 for the maximum likelihood analysis. Heritability estimates obtained from the two analyses were very similar. We conclude that the Polonnaruwa macaques exhibit a comparatively moderate to high level of heritability for body form. © 1992 Wiley-Liss, Inc.  相似文献   

14.
The angiotensin I-converting enzyme (ACE) gene (17q23) is a candidate gene for essential hypertension and related diseases, but investigation of its role in human pathology is hampered by a lack of identified polymorphisms. Currently, a 287-bp insertion/deletion (I/D) RFLP in intron 16 represents the only one known. Additional polymorphisms for the ACE gene would make most families informative for linkage studies and would allow haplotypes to be assigned in association studies. To increase the information provided by the ACE gene, we used a sensitive screening technique, denaturing gradient gel electrophoresis (DGGE) blots, to identify polymorphisms and combined this with gene counting to identify haplotypes. Five independent polymorphisms, restriction fragment melting polymorphisms (RFMPs), were identified by four probes (encompassing half of the ACE cDNA) in digests produced by three restriction enzymes (DdeI, RsaI, and AluI). One RFMP has three alleles while the others have two alleles. In a sample of 67 unrelated control subjects, minor allele frequencies ranged from 0.12 to 0.49. A significant level of linkage disequilibrium was found for all pairs of markers. The four most informative RFMPs, taken in combination, define 24 potential haplotypes. Based on gene counting, 11 of the 24 are rare or nonexistent in this population, and the estimated heterozygosity of the remaining 13 haplotypes approaches 80%. Under these conditions for the ACE locus, phase-unknown genotypes could be assigned to haplotype pairs in unrelated subjects with reasonable certainty. Thus, using DGGE blot technique for identifying numerous DNA polymorphisms in a candidate locus, in combination with gene counting, one can often identify DNA haplotypes for both related and unrelated study subjects at a candidate locus. These markers in the ACE gene should be useful for clinical and epidemiologic studies of the role of ACE in human disease.  相似文献   

15.
Association mapping enables the detection of marker-trait associations in unstructured populations by taking advantage of historical linkage disequilibrium (LD) that exists between a marker and the true causative polymorphism of the trait phenotype. Our first objective was to understand the pattern of LD decay in the diploid alfalfa genome. We used 89 highly polymorphic SSR loci in 374 unimproved diploid alfalfa (Medicago sativa L.) genotypes from 120 accessions to infer chromosome-wide patterns of LD. We also sequenced four lignin biosynthesis candidate genes (caffeoyl-CoA 3-O-methyltransferase (CCoAoMT), ferulate-5-hydroxylase (F5H), caffeic acid-O-methyltransferase (COMT), and phenylalanine amonialyase (PAL 1)) to identify single nucleotide polymorphisms (SNPs) and infer within gene estimates of LD. As the second objective of this study, we conducted association mapping for cell wall components and agronomic traits using the SSR markers and SNPs from the four candidate genes. We found very little LD among SSR markers implying limited value for genomewide association studies. In contrast, within gene LD decayed within 300 bp below an r (2) of 0.2 in three of four candidate genes. We identified one SSR and two highly significant SNPs associated with biomass yield. Based on our results, focusing association mapping on candidate gene sequences will be necessary until a dense set of genome-wide markers is available for alfalfa.  相似文献   

16.
For genomewide association (GWA) studies in family-based designs, we propose a novel two-stage strategy that weighs the association P values with the use of independently estimated weights. The association information contained in the family sample is partitioned into two orthogonal components--namely, the between-family information and the within-family information. The between-family component is used in the first (i.e., screening) stage to obtain a relative ranking of all the markers. The within-family component is used in the second (i.e., testing) stage in the framework of the standard family-based association test, and the resulting P values are weighted using the estimated marker ranking from the screening step. The approach is appealing, in that it ensures that all the markers are tested in the testing step and, at the same time, also uses information from the screening step. Through simulation studies, we show that testing all the markers is more powerful than testing only the most promising ones from the screening step, which was the method suggested by Van Steen et al. A comparison with a population-based approach shows that the approach achieves comparable power. In the presence of a reasonable level of population stratification, our approach is only slightly affected in terms of power and, since it is a family-based method, is completely robust to spurious effects. An application to a 100K scan in the Framingham Heart Study illustrates the practical advantages of our approach. The proposed method is of general applicability; it extends to any setting in which prior, independent ranking of hypotheses is available.  相似文献   

17.
Family-based association methods have been developed primarily for autosomal markers. The X-linked sibling transmission/disequilibrium test (XS-TDT) and the reconstruction-combined TDT for X-chromosome markers (XRC-TDT) are the first association-based methods for testing markers on the X chromosome in family data sets. These are valid tests of association in family triads or discordant sib pairs but are not theoretically valid in multiplex families when linkage is present. Recently, XPDT and XMCPDT, modified versions of the pedigree disequilibrium test (PDT), were proposed. Like the PDT, XPDT compares genotype transmissions from parents to affected offspring or genotypes of discordant siblings; however, the XPDT can have low power if there are many missing parental genotypes. XMCPDT uses a Monte Carlo sampling approach to infer missing parental genotypes on the basis of true or estimated population allele frequencies. Although the XMCPDT was shown to be more powerful than the XPDT, variability in the statistic due to the use of an estimate of allele frequency is not properly accounted for. Here, we present a novel family-based test of association, X-APL, a modification of the test for association in the presence of linkage (APL) test. Like the APL, X-APL can use singleton or multiplex families and properly infers missing parental genotypes in linkage regions by considering identity-by-descent parameters for affected siblings. Sampling variability of parameter estimates is accounted for through a bootstrap procedure. X-APL can test individual marker loci or X-chromosome haplotypes. To allow for different penetrances in males and females, separate sex-specific tests are provided. Using simulated data, we demonstrated validity and showed that the X-APL is more powerful than alternative tests. To show its utility and to discuss interpretation in real-data analysis, we also applied the X-APL to candidate-gene data in a sample of families with Parkinson disease.  相似文献   

18.
In the present study, we tested rice genotypes that included un(der)exploited landraces of Tamil Nadu along with indica and japonica test cultivars to ascertain their genetic diversity structure. Highly polymorphic microsatellite markers were used for generating marker segregation data. A novel measure, allele discrimination index, was used to determine subpopulation differentiation power of each marker. Phenotypic data were collected for yield and component traits. Pattern of molecular differentiation separated indica and japonica genotypes; indica genotypes had two subpopulations within. Landraces were found to have indica genome, but formed a separate subgroup with low linkage disequilibrium. The landraces further separated into distinct group in both hierarchical clustering analysis using neighbour-joining method as well as in the model based population structure analysis. Japonica and the remaining indica cultivars formed two other distinct groups. Linkage disequilibrium observed in the whole population was considerably reduced in subpopulations. Low linkage disequilibrium of landforms suggests their narrow adaptation in local geographical niche. Many population specific alleles could be identified particularly for japonica cultivars and landraces. Association analysis revealed nine marker-trait associations with three agronomic traits, of which 67% were previously reported. Although the testing landraces together with known cultivars had permitted genomewide association mapping, the experiment offers scope to study more landraces collected from the entire geographical region for drawing more reliable information.  相似文献   

19.
Detecting the association between genetic markers and complex diseases can be a critical first step toward identification of the genetic basis of disease. Misleading associations can be avoided by choosing as controls the parents of diseased cases, but the availability of parents often limits this design to early-onset disease. Alternatively, sib controls offer a valid design. A general multivariate score statistic is presented, to detect the association between a multiallelic genetic marker locus and affection status; this general approach is applicable to designs that use parents as controls, sibs as controls, or even unrelated controls whose genotypes do not fit Hardy-Weinberg proportions or that pool any combination of these different designs. The benefit of this multivariate score statistic is that it will tend to be the most powerful method when multiple marker alleles are associated with affection status. To plan these types of studies, we present methods to compute sample size and power, allowing for varying sibship sizes, ascertainment criteria, and genetic models of risk. The results indicate that sib controls have less power than parental controls and that the power of sib controls can be increased by increasing either the number of affected sibs per sibship or the number of unaffected control sibs. The sample-size results indicate that the use of sib controls to test for associations, by use of either a single-marker locus or a genomewide screen, will be feasible for markers that have a dominant effect and for common alleles having a recessive effect. The results presented will be useful for investigators planning studies using sibs as controls.  相似文献   

20.
Association studies are widely seen as the most promising approach for finding polymorphisms that influence genetically complex traits, such as common diseases and responses to their treatment. Considerable interest has therefore recently focused on the development of methods that efficiently screen genomic regions or whole genomes for gene variants associated with complex phenotypes. One key element in this search is the use of linkage disequilibrium to gain maximal information from typing a selected subset of highly informative single-nucleotide polymorphism (SNP) markers, now often called "tagging SNPs" (tSNPs). Probably the most common approach to linkage-disequilibrium gene mapping involves a three-step program: (1) characterization of the haplotype structure in candidate genes or genomic regions of interest, (2) identification of tSNPs sufficient to represent the most common haplotypes, and (3) typing of tSNPs in clinical material. Early definitions of tSNPs focused on the amount of haplotype diversity that they explained. To select tSNPs that would have maximal power in a genetic association study, however, we have developed optimization criteria based on the r2 measure of association and have compared these with other criteria based on the haplotype diversity. To evaluate the full program and to assess how well the selected tags are likely to perform, we have determined the haplotype structure and have assessed tSNPs in the SCN1A gene, an important candidate gene for sporadic epilepsy. We find that as few as four tSNPs are predicted to maintain a consistently high r2 value with all other common SNPs in the gene, indicating that the tags could be used in an association study with only a modest reduction in power relative to direct assays of all common SNPs. This implies that very large case-control studies can be screened for variation in hundreds of candidate genes with manageable experimental effort, once tSNPs are identified. However, our results also show that tSNPs identified in one population may not necessarily perform well in another, indicating that the preliminary study to identify tSNPs and the later case-control study should be performed in the same population. Our results also indicate that tSNPs will not easily identify discrepant SNPs, which lie on importantly discriminating but apparently short genealogical branches. This could significantly complicate tagging approaches for phenotypes influenced by variants that have experienced positive selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号