首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A pressing problem in studying the evolution of microbial pathogens is to determine the extent to which these genomes recombine. This information is essential for locating pathogenicity loci by using association studies or population genetic approaches. Recombination also complicates the use of phylogenetic approaches to estimate evolutionary parameters such as selection pressures. Reliable methods that detect and estimate the rate of recombination are, therefore, vital. This article reviews the approaches that are available for detecting and estimating recombination in microbial pathogens and how they can be used to understand pathogen evolution and to identify medically relevant loci.  相似文献   

2.
The study of the association of polymorphic genetic markers with common diseases is one of the most powerful tools in modern genetics. Interest in single nucleotide polymorphisms (SNPs) has steadily grown over the last decade. SNPs are currently the most developed markers in the human genome because they have a number of advantages over other marker types. One of the critical problems responsible for 'spurious' association findings in case-control studies is population stratification. There are many statistical approaches developed for detecting population heterogeneity. However the power to detect population structure by known methods is highly dependent on the number of loci utilised. We performed an analysis of SNPs data available in the public domain from The Single Nucleotide Consortia Ltd. (TSCL). Three populations, Afro-American, Asian and Caucasian, were compared. Estimation of the minimum number of SNPs loci necessary for detection of the population structure was performed. Two clustering approaches, distance-based and model-based, were compared. The model-based approach was superior when compared with the distance-based method. We found more than 65 random SNPs loci are required for identifying distinct geographically separated populations. Increasing the number of markers to over 100 raises the probability of correct assignment of a particular individual to an origin group to over 90%, even with conventional clustering methods.  相似文献   

3.
Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next‐generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced‐representation sequencing, candidate‐gene approaches, linearity of allele–environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies.  相似文献   

4.
一种有效的复杂疾病基因定位的检测法   总被引:1,自引:0,他引:1  
连锁不平衡(LD)应用于某些复杂疾病基因的定位,近年来发展了许多LD定位方法,除TDT外,大多数LD定位方法须先假定无人群混和,人群混合可增大在疾病基因定位时犯Ⅰ类错误的机率,产生无效结果。此方法利用LD来检测标记位点和疾病敏感位点(DSL)的连锁(有连锁不平衡)相关(有连锁)。分析时采用不相关样本,已知其父母基因型和至少父母之一为杂合子,再将随机样本依基因型不同分类,然后对来自不同类的数据应用有力的统计方法进行单独和联合分析。此LD定位法不仅适用于患病和正常个体,而且有效消除据父母基因分类的样本定位时人群混合的影响,分析结果和模拟结果也表明此方法解决了在检测标记位点和疾病敏感位点之间的连锁和相关时人群混和的问题,但与TDT比,此法在检测的位点为DSL时丙能有效和充分地利用矫正数据,检测位点不是DSL时,此法和TDT法可相互补充更有效地检测连锁的DSL。  相似文献   

5.
The detection of adaptive loci in the genome is essential as it gives the possibility of understanding what proportion of a genome or which genes are being shaped by natural selection. Several statistical methods have been developed which make use of molecular data to reveal genomic regions under selection. In this paper, we propose an approach to address this issue from the environmental angle, in order to complement results obtained by population genetics. We introduce a new method to detect signatures of natural selection based on the application of spatial analysis, with the contribution of geographical information systems (GIS), environmental variables and molecular data. Multiple univariate logistic regressions were carried out to test for association between allelic frequencies at marker loci and environmental variables. This spatial analysis method (SAM) is similar to current population genomics approaches since it is designed to scan hundreds of markers to assess a putative association with hundreds of environmental variables. Here, by application to studies of pine weevils and breeds of sheep we demonstrate a strong correspondence between SAM results and those obtained using population genetics approaches. Statistical signals were found that associate loci with environmental parameters, and these loci behave atypically in comparison with the theoretical distribution for neutral loci. The contribution of this new tool is not only to permit the identification of loci under selection but also to establish hypotheses about ecological factors that could exert the selection pressure responsible. In the future, such an approach may accelerate the process of hunting for functional genes at the population level.  相似文献   

6.
Because of the need for fine mapping of disease loci and the availability of dense single-nucleotide-polymorphism markers, many forms of association tests have been developed. Most of them are applicable only to triads, whereas some are amenable to nuclear families (sibships). Although there are a number of methods that can deal with extended families (e.g., the pedigree disequilibrium test [PDT]), most of them cannot accommodate incomplete data. Furthermore, despite a large body of literature on association mapping, only a very limited number of publications are applicable to X-chromosomal markers. In this report, we first extend the PDT to markers on the X chromosome for testing linkage disequilibrium in the presence of linkage. This method is applicable to any pedigree structure and is termed "X-chromosomal pedigree disequilibrium test" (XPDT). We then further extend the XPDT to accommodate pedigrees with missing genotypes in some of the individuals, especially founders. Monte Carlo (MC) samples of the missing genotypes are generated and used to calculate the XMCPDT (X-chromosomal MC PDT) statistic, which is defined as the conditional expectation of the XPDT statistic given the incomplete (observed) data. This MC version of the XPDT remains a valid test for association under linkage with the assumption that the pedigrees and their associated affection patterns are drawn randomly from a population of pedigrees with at least one affected offspring. This set of methods was compared with existing approaches through simulation, and substantial power gains were observed in all settings considered, with type I error rates closely tracking their nominal values.  相似文献   

7.
Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We use a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model, we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.  相似文献   

8.
Meirmans PG 《Molecular ecology》2012,21(12):2839-2846
The genetic population structure of many species is characterised by a pattern of isolation by distance (IBD): due to limited dispersal, individuals that are geographically close tend to be genetically more similar than individuals that are far apart. Despite the ubiquity of IBD in nature, many commonly used statistical tests are based on a null model that is completely non-spatial, the Island model. Here, I argue that patterns of spatial autocorrelation deriving from IBD present a problem for such tests as it can severely bias their outcome. I use simulated data to illustrate this problem for two widely used types of tests: tests of hierarchical population structure and the detection of loci under selection. My results show that for both types of tests the presence of IBD can indeed lead to a large number of false positives. I therefore argue that all analyses in a study should take the spatial dependence in the data into account, unless it can be shown that there is no spatial autocorrelation in the allele frequency distribution that is under investigation. Thus, it is urgent to develop additional statistical approaches that are based on a spatially explicit null model instead of the non-spatial Island model.  相似文献   

9.
To control for hidden population stratification in genetic-association studies, statistical methods that use marker genotype data to infer population structure have been proposed as a possible alternative to family-based designs. In principle, it is possible to infer population structure from associations between marker loci and from associations of markers with the trait, even when no information about the demographic background of the population is available. In a model in which the total population is formed by admixture between two or more subpopulations, confounding can be estimated and controlled. Current implementations of this approach have limitations, the most serious of which is that they do not allow for uncertainty in estimations of individual admixture proportions or for lack of identifiability of subpopulations in the model. We describe methods that overcome these limitations by a combination of Bayesian and classical approaches, and we demonstrate the methods by using data from three admixed populations--African American, African Caribbean, and Hispanic American--in which there is extreme confounding of trait-genotype associations because the trait under study (skin pigmentation) varies with admixture proportions. In these data sets, as many as one-third of marker loci show crude associations with the trait. Control for confounding by population stratification eliminates these associations, except at loci that are linked to candidate genes for the trait. With only 32 markers informative for ancestry, the efficiency of the analysis is 70%. These methods can deal with both confounding and selection bias in genetic-association studies, making family-based designs unnecessary.  相似文献   

10.
Individual genetic admixture estimates, determined both across the genome and at specific genomic regions, have been proposed for use in identifying specific genomic regions harboring loci influencing phenotypes in regional admixture mapping (RAM). Estimates of individual ancestry can be used in structured association tests (SAT) to reduce confounding induced by various forms of population substructure. Although presented as two distinct approaches, we provide a conceptual framework in which both RAM and SAT are special cases of a more general linear model. We clarify which variables are sufficient to condition upon in order to prevent spurious associations and also provide a simple closed form “semiparametric” method of evaluating the reliability of individual admixture estimates. An estimate of the reliability of individual admixture estimates is required to make an inherent errors-in-variables problem tractable. Casting RAM and SAT methods as a general linear model offers enormous flexibility enabling application to a rich set of phenotypes, populations, covariates, and situations, including interaction terms and multilocus models. This approach should allow far wider use of RAM and SAT, often using standard software, in addressing admixture as either a confounder of association studies or a tool for finding loci influencing complex phenotypes in species as diverse as plants, humans, and nonhuman animals.  相似文献   

11.
Large-scale, multilocus genetic association studies require powerful and appropriate statistical-analysis tools that are designed to relate genotype and haplotype information to phenotypes of interest. Many analysis approaches consider relating allelic, haplotypic, or genotypic information to a trait through use of extensions of traditional analysis techniques, such as contingency-table analysis, regression methods, and analysis-of-variance techniques. In this work, we consider a complementary approach that involves the characterization and measurement of the similarity and dissimilarity of the allelic composition of a set of individuals' diploid genomes at multiple loci in the regions of interest. We describe a regression method that can be used to relate variation in the measure of genomic dissimilarity (or "distance") among a set of individuals to variation in their trait values. Weighting factors associated with functional or evolutionary conservation information of the loci can be used in the assessment of similarity. The proposed method is very flexible and is easily extended to complex multilocus-analysis settings involving covariates. In addition, the proposed method actually encompasses both single-locus and haplotype-phylogeny analysis methods, which are two of the most widely used approaches in genetic association analysis. We showcase the method with data described in the literature. Ultimately, our method is appropriate for high-dimensional genomic data and anticipates an era when cost-effective exhaustive DNA sequence data can be obtained for a large number of individuals, over and above genotype information focused on a few well-chosen loci.  相似文献   

12.
Melon has tremendous fruit diversity, the product of complex interactions of consumer preferences in different countries and a wide range of agro-climatic zones. Understanding footprints of divergence underlying formation of various morphotypes is important for developing sustainable and high-quality melons. Basic understanding of population structure and linkage disequilibrium (LD) is limited in melon and has lagged behind other crops. Characterization of population structure and LD are essential for carrying out association mapping of quantitative trait loci (QTL) underlying various complex traits. Mapped single-locus microsatellite markers are known to be very valuable for resolving the population structure and 268 such markers were used in the current study to resolve population structure and LD pattern using 87 accessions of melons belonging to Eastern European, Euro-North American and Asian types. A mixed linear model was implemented to detect QTL for various fruit traits. Various levels of QTL with high to moderate stringency were detected for fruit shape, fruit weight, soluble solids, and rind pressure and a majority of them was found to be in agreement with the previously published data, indicating that association mapping can be very useful for melon molecular breeding. Minor discrepancies in the position, strength and the variation explained by the QTL present between the methods of association and recombinant mapping approaches can be bridged if more melon groups and larger sets of accessions are involved in future studies, combined with high-throughput marker panels.  相似文献   

13.
Multilocus coalescent methods for inferring species trees or historical demographic parameters typically require the assumption that gene trees for sampled SNPs or DNA sequence loci are conditionally independent given their species tree. In practice, researchers have used different criteria to delimit “independent loci.” One criterion identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment (IA criterion). O'Neill et al. (2013, Molecular Ecology, 22, 111–129) used this approach in their phylogeographic study of North American tiger salamander species complex. In two other studies, researchers developed a pair of related methods that employ an independent genealogies criterion (IG criterion), which considers the effects of population‐level recombination on correlations between the gene trees of intrachromosomal loci. Here, I explain these three methods, illustrate their use with example data, and evaluate their efficacies. I show that the IA approach is more conservative, is simpler to use and requires fewer assumptions than the IG approaches. However, IG approaches can identify much larger numbers of independent loci than the IA method, which, in turn, allows researchers to obtain more precise and accurate estimates of species trees and historical demographic parameters. A disadvantage of the IG methods is that they require an estimate of the population recombination rate. Despite their drawbacks, IA and IG approaches provide molecular ecologists with promising a priori methods for selecting SNPs or DNA sequence loci that likely meet the independence assumption in coalescent‐based phylogenomic studies.  相似文献   

14.
The availability of a large number of dense SNPs, high-throughput genotyping and computation methods promotes the application of family-based association tests. While most of the current family-based analyses focus only on individual traits, joint analyses of correlated traits can extract more information and potentially improve the statistical power. However, current TDT-based methods are low-powered. Here, we develop a method for tests of association for bivariate quantitative traits in families. In particular, we correct for population stratification by the use of an integration of principal component analysis and TDT. A score test statistic in the variance-components model is proposed. Extensive simulation studies indicate that the proposed method not only outperforms approaches limited to individual traits when pleiotropic effect is present, but also surpasses the power of two popular bivariate association tests termed FBAT-GEE and FBAT-PC, respectively, while correcting for population stratification. When applied to the GAW16 datasets, the proposed method successfully identifies at the genome-wide level the two SNPs that present pleiotropic effects to HDL and TG traits.  相似文献   

15.
Many popular methods for exploring gene-gene interactions, including the case-only approach, rely on the key assumption that physically distant loci are in linkage equilibrium in the underlying population. These methods utilize the presence of correlation between unlinked loci in a disease-enriched sample as evidence of interactions among the loci in the etiology of the disease. We use data from the CGEMS case-control genome-wide association study of breast cancer to demonstrate empirically that the case-only and related methods have the potential to create large-scale false positives because of the presence of population stratification (PS) that creates long-range linkage disequilibrium in the genome. We show that the bias can be removed by considering parametric and nonparametric methods that assume gene-gene independence between unlinked loci, not in the entire population, but only conditional on population substructure that can be uncovered based on the principal components of a suitably large panel of PS markers. Applications in the CGEMS study as well as simulated data show that the proposed methods are robust to the presence of population stratification and are yet much more powerful, relative to standard logistic regression methods that are also commonly used as robust alternatives to the case-only type methods.  相似文献   

16.
Many methods exist for genotyping—revealing which alleles an individual carries at different genetic loci. A harder problem is haplotyping—determining which alleles lie on each of the two homologous chromosomes in a diploid individual. Conventional approaches to haplotyping require the use of several generations to reconstruct haplotypes within a pedigree, or use statistical methods to estimate the prevalence of different haplotypes in a population. Several molecular haplotyping methods have been proposed, but have been limited to small numbers of loci, usually over short distances. Here we demonstrate a method which allows rapid molecular haplotyping of many loci over long distances. The method requires no more genotypings than pedigree methods, but requires no family material. It relies on a procedure to identify and genotype single DNA molecules, and reconstruction of long haplotypes by a ‘tiling’ approach. We demonstrate this by resolving haplotypes in two regions of the human genome, harbouring 20 and 105 single-nucleotide polymorphisms, respectively. The method can be extended to reconstruct haplotypes of arbitrary complexity and length, and can make use of a variety of genotyping platforms. We also argue that this method is applicable in situations which are intractable to conventional approaches.  相似文献   

17.
An emergent problem in the study of pathogen evolution is our ability to determine the extent to which their rapidly evolving genomes recombine. Such information is necessary and essential for locating pathogenicity loci using association studies, and it also directs future screening, therapeutic and vaccination strategies. Recombination also complicates the use of phylogenetic approaches to infer evolutionary parameters including selection pressures. Reliable methods that identify the presence of regions of recombination are therefore vital. We illustrate the use of an integrated model-based approach to inferring recombination structure using all available sequences of the highly variable, transforming Kaposis sarcoma-associated herpesviral gene, ORF-K1. This technique learns the parameters of a statistical model that takes recombination hotspots, population genetic effects, and variable rates of mutation into account. As there are no known mechanisms to explain the high mutation rate in this DNA viral gene, recombination may account for some of the variability observed. We infer recombination hotspots in conserved sites such as the tyrosine kinase signaling motif, referred to here as recombination drift, as well as in nonconserved sites, a process described as recombination shift.This article contains online supplementary material.  相似文献   

18.
Experimental evolution studies can be used to explore genomic response to artificial and natural selection. In such studies, loci that display larger allele frequency change than expected by genetic drift alone are assumed to be directly or indirectly associated with traits under selection. However, such studies report surprisingly many loci under selection, suggesting that current tests for allele frequency change may be subject to P‐value inflation and hence be anticonservative. One factor known from genomewide association (GWA) studies to cause P‐value inflation is population stratification, such as relatedness among individuals. Here, we suggest that by treating presence of an individual in a population after selection as a binary response variable, existing GWA methods can be used to account for relatedness when estimating allele frequency change. We show that accounting for relatedness like this effectively reduces false‐positives in tests for allele frequency change in simulated data with varying levels of population structure. However, once relatedness has been accounted for, the power to detect causal loci under selection is low. Finally, we demonstrate the presence of P‐value inflation in allele frequency change in empirical data spanning multiple generations from an artificial selection experiment on tarsus length in two free‐living populations of house sparrow and correct for this using genomic control. Our results indicate that since allele frequencies in large parts of the genome may change when selection acts on a heritable trait, such selection is likely to have considerable and immediate consequences for the eco‐evolutionary dynamics of the affected populations.  相似文献   

19.
We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use.  相似文献   

20.
Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号