首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Stella A  Boettcher PJ 《Genetics》2004,166(1):341-350
Simulation was used to evaluate the performance of different selective genotyping strategies when using linkage disequilibrium across large half-sib families to position a QTL within a previously defined genomic region. Strategies examined included standard selective genotyping and different approaches of discordant and concordant sib selection applied to arbitrary or selected families. Strategies were compared as a function of effect and frequency of QTL alleles, heritability, and phenotypic expression of the trait. Large half-sib families were simulated for 100 generations and 2% of the population was genotyped in the final generation. Simple ANOVA was applied and the marker with the greatest F-value was considered the most likely QTL position. For traits with continuous phenotypes, genotyping the most divergent pairs of half-sibs from all families was the best strategy in general, but standard selective genotyping was somewhat more precise when heritability was low. When the phenotype was distributed in ordered categories, discordant sib selection was the optimal approach for positioning QTL for traits with high heritability and concordant sib selection was the best approach when genetic effects were small. Genotyping of a few selected sibs from many families was generally more efficient than genotyping many individuals from a few highly selected sires.  相似文献   

2.
Marginal tests based on individual SNPs are routinely used in genetic association studies. Studies have shown that haplotype‐based methods may provide more power in disease mapping than methods based on single markers when, for example, multiple disease‐susceptibility variants occur within the same gene. A limitation of haplotype‐based methods is that the number of parameters increases exponentially with the number of SNPs, inducing a commensurate increase in the degrees of freedom and weakening the power to detect associations. To address this limitation, we introduce a hierarchical linkage disequilibrium model for disease mapping, based on a reparametrization of the multinomial haplotype distribution, where every parameter corresponds to the cumulant of each possible subset of a set of loci. This hierarchy present in the parameters enables us to employ flexible testing strategies over a range of parameter sets: from standard single SNP analyses through the full haplotype distribution tests, reducing degrees of freedom and increasing the power to detect associations. We show via extensive simulations that our approach maintains the type I error at nominal level and has increased power under many realistic scenarios, as compared to single SNP and standard haplotype‐based studies. To evaluate the performance of our proposed methodology in real data, we analyze genome‐wide data from the Wellcome Trust Case‐Control Consortium.  相似文献   

3.
The current development of densely spaced collections of single nucleotide polymorphisms (SNPs) will lead to genomewide association studies for a wide range of diseases in many different populations. Determinations of the appropriate number of SNPs to genotype involve a balancing of power and cost. Several variables are important in these determinations. We show that there are different combinations of sample size and marker density that can be expected to achieve the same power. Within certain bounds, investigators can choose between designs with more subjects and fewer markers or those with more markers and fewer subjects. Which designs are more cost-effective depends on the cost of phenotyping versus the cost of genotyping. We show that, under the assumption of a set cost for genotyping, one can calculate a "threshold cost" for phenotyping; when phenotyping costs per subject are less than this threshold, designs with more subjects will be more cost-effective than designs with more markers. This framework for determining a cost-effective study will aid in the planning of studies, especially if there are choices to be made with respect to phenotyping methods or study populations.  相似文献   

4.
5.
Several previous studies concluded that linkage disequilibrium (LD) in livestock populations from developed countries originated from the impact of strong selection. Here, we assessed the extent of LD in a cattle population from western Africa that was bred in an extensive farming system. The analyses were performed on 363 individuals in a Bos indicus x Bos taurus population using 42 microsatellite markers on BTA04, BTA07 and BTA13. A high level of expected heterozygosity (0.71), a high mean number of alleles per locus (9.7) and a mild shift in Hardy-Weinberg equilibrium were found. Linkage disequilibrium extended over shorter distances than what has been observed in cattle from developed countries. Effective population size was assessed using two methods; both methods produced large values: 1388 when considering heterozygosity (assuming a mutation rate of 10(-3)) and 2344 when considering LD on whole linkage groups (assuming a constant population size over generations). However, analysing the decay of LD as a function of marker spacing indicated a decreasing trend in effective population size over generations. This decrease could be explained by increasing selective pressure and/or by an admixture process. Finally, LD extended over small distances, which suggested that whole-genome scans will require a large number of markers. However, association studies using such populations will be effective.  相似文献   

6.
Population subdivision due to habitat loss and modification, exploitation of wild populations and altered spatial population dynamics is of increasing concern in nature. Detecting population fragmentation is therefore crucial for conservation management. Using computer simulations, we show that a single sample estimator of N e based on linkage disequilibrium is a highly sensitive and promising indicator of recent population fragmentation and bottlenecks, even with some continued gene flow. For example, fragmentation of a panmictic population of N e = 1,000 into demes of N e = 100 can be detected with high probability after a single generation when estimates from this method are compared to prefragmentation estimates, given data for ~20 microsatellite loci in samples of 50 individuals. We consider a range of loci (10–40) and individuals (25–100) typical of current studies of natural populations and show that increasing the number of loci gives nearly the same increase in precision as increasing the number of individuals sampled. We also evaluated effects of incomplete fragmentation and found this N e-reduction signal is still apparent in the presence of considerable migration (m ~ 0.10–0.25). Single-sample genetic estimates of N e thus show considerable promise for early detection of population fragmentation and decline.  相似文献   

7.
8.
Although Phalaenopsis orchids are among the most economically important potted plants, little is known about either the genetic diversity among varieties or the genetic complexity of key ornamental traits. Therefore, we analysed the genetic diversity of a broad collection of Phalaenopsis varieties and selected wild species by means of molecular markers. The marker data were used to obtain genetic distances, estimates of the degree of linkage disequilibrium and population structure for the genotypes under study. With a total of 492 markers, the genotypes clustered according to their horticultural classification (for example, old hybrids vs. more recent hybrids) but not according to their origin, indicating extensive exchange of germplasm among breeders. Linkage disequilibrium was found to decrease relatively slowly, most likely due to the small number of generations that have occurred since the first hybrids were generated. Based on the most likely estimates for the population structure (ranging from 10 to 12 subpopulations), associations between ornamental traits like flower size, flower colour, flower type, flower texture, stem length and leaf shape were calculated. These results can now serve as starting points for detailed analyses of the genetic architecture of these traits.  相似文献   

9.
Ball RD 《Genetics》2011,189(4):1497-1514
In genome-wide association studies hundreds of thousands of loci are scanned in thousands of cases and controls, with the goal of identifying genomic loci underpinning disease. This is a challenging statistical problem requiring strong evidence. Only a small proportion of the heritability of common diseases has so far been explained. This "dark matter of the genome" is a subject of much discussion. It is critical to have experimental design criteria that ensure that associations between genomic loci and phenotypes are robustly detected. To ensure associations are robustly detected we require good power (e.g., 0.8) and sufficiently strong evidence [i.e., a high Bayes factor (e.g., 10(6), meaning the data are 1 million times more likely if the association is real than if there is no association)] to overcome the low prior odds for any given marker in a genome scan to be associated with a causal locus. Power calculations are given for determining the sample sizes necessary to detect effects with the required power and Bayes factor for biallelic markers in linkage disequilibrium with causal loci in additive, dominant, and recessive genetic models. Significantly stronger evidence and larger sample sizes are required than indicated by traditional hypothesis tests and power calculations. Many reported putative effects are not robustly detected and many effects including some large moderately low-frequency effects may remain undetected. These results may explain the dark matter in the genome. The power calculations have been implemented in R and will be available in the R package ldDesign.  相似文献   

10.
In a simulation study, different designs were compared for efficiency of fine-mapping of QTL. The variance component method for fine-mapping of QTL was used to estimate QTL position and variance components. The design of many families with small size gave a higher mapping resolution than a design with few families of large size. However, the difference is small in half sib designs. The proportion of replicates with the QTL positioned within 3 cM of the true position is 0.71 in the best design, and 0.68 in the worst design applied to 128 animals with a phenotypic record and a QTL explaining 25% of the phenotypic variance. The design of two half sib families each of size 64 was further investigated for a hypothetical population with effective size of 1000 simulated for 6000 generations with a marker density of 0.25 cM and with marker mutation rate 4 × 10-4 per generation. In mapping using bi-allelic markers, 42~55% of replicated simulations could position QTL within 0.75 cM of the true position whereas this was higher for multi allelic markers (48~76%). The accuracy was lowest (48%) when mutation age was 100 generations and increased to 68% and 76% for mutation ages of 200 and 500 generations, respectively, after which it was about 70% for mutation ages of 1000 generations and older. When effective size was linearly decreasing in the last 50 generations, the accuracy was decreased (56 to 70%). We show that half sib designs that have often been used for linkage mapping can have sufficient information for fine-mapping of QTL. It is suggested that the same design with the same animals for linkage mapping should be used for fine-mapping so gene mapping can be cost effective in livestock populations.  相似文献   

11.

Background  

Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.  相似文献   

12.

Background

Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.

Results

A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.

Conclusion

A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.  相似文献   

13.
Population-based mapping approaches are attractive for tracing the genetic background to phenotypic traits in wild species, given that it is often difficult to gather extensive and well-defined pedigrees needed for quantitative trait locus analysis. However, the feasibility of association or hitch-hiking mapping is dependent on the degree of linkage disequilibrium (LD) in the population, on which there is yet limited information for wild species. Here we use single nucleotide polymorphism (SNP) markers from 23 genes in a recently established linkage map of the Z chromosome of the collared flycatcher, to study the extent of LD in a natural bird population. In most but not all cases we find SNPs within the same intron (less than 500 bp) to be in perfect LD. However, LD then decays to background level at a distance 1cM or 400-500 kb. Although LD seems more extensive than in other species, if the observed pattern is representative for other regions of the genome and turns out to be a general feature of natural bird populations, dense marker maps might be needed for genome scans aimed at identifying association between marker and trait loci.  相似文献   

14.
Association mapping has successfully identified common SNPs associated with many diseases. However, the inability of this class of variation to account for most of the supposed heritability has led to a renewed interest in methods - primarily linkage analysis - to detect rare variants. Family designs allow for control of population stratification, investigations of questions such as parent-of-origin effects and other applications that are imperfectly or not readily addressed in case-control association studies. This article guides readers through the interface between linkage and association analysis, reviews the new methodologies and provides useful guidelines for applications. Just as effective SNP-genotyping tools helped to realize the potential of association studies, next-generation sequencing tools will benefit genetic studies by improving the power of family-based approaches.  相似文献   

15.
Two-stage designs for gene-disease association studies   总被引:2,自引:0,他引:2  
The goal of this article is to describe a two-stage design that maximizes the power to detect gene-disease associations when the principal design constraint is the total cost, represented by the total number of gene evaluations rather than the total number of individuals. In the first stage, all genes of interest are evaluated on a subset of individuals. The most promising genes are then evaluated on additional subjects in the second stage. This will eliminate wastage of resources on genes unlikely to be associated with disease based on the results of the first stage. We consider the case where the genes are correlated and the case where the genes are independent. Using simulation results, it is shown that, as a general guideline when the genes are independent or when the correlation is small, utilizing 75% of the resources in stage 1 to screen all the markers and evaluating the most promising 10% of the markers with the remaining resources provides near-optimal power for a broad range of parametric configurations. This translates to screening all the markers on approximately one quarter of the required sample size in stage 1.  相似文献   

16.
It has been demonstrated in the literature that the transmission/disequilibrium test (TDT) has higher power than the affected-sib-pair (ASP) mean test when linkage disequilibrium (LD) is strong but that the mean test has higher power when LD is weak. Thus, for ASP data, it seems clear that the TDT should be used when LD is strong but that the mean test or other linkage tests should be used when LD is weak or absent. However, in practice, it may be difficult to follow such a guideline, because the extent of LD is often unknown. Even with a highly dense genetic-marker map, in which some markers should be located near the disease-predisposing mutation, strong LD is not inevitable. Besides the genetic distance, LD is also affected by many factors, such as the allelic heterogeneity at the disease locus, the initial LD, the allelic frequencies at both disease locus and marker locus, and the age of the mutation. Therefore, it is of interest to develop methods that are adaptive to the extent of LD. In this report, we propose a disequilibrium maximum-binomial-likelihood (DMLB) test that incorporates LD in the maximum-binomial-likelihood (MLB) test. Examination of the corresponding score statistics shows that this method adaptively combines two sources of information: (a) the identity-by-descent (IBD) sharing score, which is informative for linkage regardless of the existence of LD, and (b) the contrast between allele-specific IBD sharing score, which is informative for linkage only in the presence of LD. For ASP data, the proposed test has higher power than either the TDT or the mean test when the extent of LD ranges from moderate to strong. Only when LD is very weak or absent is the DMLB slightly less powerful than the mean test; in such cases, the TDT has essentially no power to detect linkage. Therefore, the DMLB test is an interesting approach to linkage detection when the extent of LD is unknown.  相似文献   

17.

Background  

Discovering the genetic basis of common genetic diseases in the human genome represents a public health issue. However, the dimensionality of the genetic data (up to 1 million genetic markers) and its complexity make the statistical analysis a challenging task.  相似文献   

18.
曹宗富  马传香  王雷  蔡斌 《遗传》2010,32(9):921-928
在复杂疾病的全基因组关联研究中,人群分层现象会增加结果的假阳性率,因此考虑人群遗传结构、控制人群分层是很有必要的。而在人群分层研究中,使用随机选择的SNP的效果还有待进一步探讨。文章利用HapMap Phase2人群中无关个体的Affymetrix SNP 6.0芯片分型数据,在全基因组上随机均匀选择不同数量的SNP,同时利用f值和Fisher精确检验方法筛选祖先信息标记(Ancestry Informative Markers,AIMs)。然后利用HapMap Phase3中的无关个体的数据,以F-statistics和STRUCTURE分析两种方法评估所选出的不同SNP组合对人群的区分效果。研究发现,随机均匀分布于全基因组的SNP可用于识别人群内部存在的遗传结构。文章进一步提示,在全基因组关联研究中,当没有针对特定人群的AIMs时,可在全基因组上随机选择3000以上均匀分布的SNP来控制人群分层。  相似文献   

19.
Linkage analysis is commonly used to find marker-trait associations within the full-sib families of forest tree and other species. Study of marker-trait associations at the population level is termed linkage-disequilibrium (LD) mapping. A female-tester design comprising 200 full-sib families generated by crossing 40 pollen parents with five female parents was used to assess the relationship between the marker-allele frequency classes obtained from parental genotypes at SSR marker loci and the full-sib family performance (average predicted breeding value of two parents) in radiata pine (Pinus radiata D. Don). For alleles (at a marker locus) that showed significant association, the copy number of that allele in the parents was significantly correlated, either positively or negatively, with the full-sib family performance for various economic traits. Regression of parental breeding value on its genotype at marker loci revealed that most of the markers that showed significant association with full-sib family performance were not significantly associated with the parental breeding values. This suggests that over-representation of the female parents in our sample of 200 full-sib families could have biased the process of detecting marker-trait associations. The evidence for the existence of marker-trait LD in the population studied is rather weak and would require further testing. The exact test for genotypic disequilibrium between pairs of linked or unlinked marker loci revealed non-significant LD. Observed genotypic frequencies at several marker loci were significantly different from the expected Hardy-Weinberg equilibrium. The possibilities of utilising marker-trait associations for early selection, among-family selection and selecting parents for the next generation of breeding are also discussed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号