期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reducing cryptic relatedness in genomic data sets via a central node exclusion algorithm

Pablo A. S. Fonseca Thiago P. Leal Fernanda C. Santos Mateus H. Gouveia Samir Id‐Lahoucine Izinara C. Rosse Ricardo V. Ventura Frank A. T. Bruneli Marco A. Machado Maria Gabriela C. D. Peixoto Eduardo Tarazona‐Santos Maria Raquel S. Carvalho 《Molecular ecology resources》2018,18(3):435-447

Cryptic relatedness is a confounding factor in genetic diversity and genetic association studies. Development of strategies to reduce cryptic relatedness in a sample is a crucial step for downstream genetic analyses. This study uses a node selection algorithm, based on network degrees of centrality, to evaluate its applicability and impact on evaluation of genetic diversity and population stratification. 1,036 Guzerá (Bos indicus) females were genotyped using Illumina Bovine SNP50 v2 BeadChip. Four strategies were compared. The first and second strategies consist on a iterative exclusion of most related individuals based on PLINK kinship coefficient (φij) and VanRaden's φij, respectively. The third and fourth strategies were based on a node selection algorithm. The fourth strategy, Network G matrix, preserved the larger number of individuals with a better diversity and representation from the initial sample. Determining the most probable number of populations was directly affected by the kinship metric. Network G matrix was the better strategy for reducing relatedness due to producing a larger sample, with more distant individuals, a more similar distribution when compared with the full data set in the MDS plots and keeping a better representation of the population structure. Resampling strategies using VanRaden's φij as a relationship metric was better to infer the relationships among individuals. Moreover, the resampling strategies directly impact the genomic inflation values in genomewide association studies. The use of the node selection algorithm also implies better selection of the most central individuals to be removed, providing a more representative sample. 相似文献

2.

Robust genomic control and robust delta centralization tests for case-control association studies

Zang Y Zhang H Yang Y Zheng G 《Human heredity》2007,63(3-4):187-195

The population-based case-control design is a powerful approach for detecting susceptibility markers of a complex disease. However, this approach may lead to spurious association when there is population substructure: population stratification (PS) or cryptic relatedness (CR). Two simple approaches to correct for the population substructure are genomic control (GC) and delta centralization (DC). GC uses the variance inflation factor to correct for the variance distortion of a test statistic, and the DC centralizes the non-central chi-square distribution of the test statistic. Both GC and DC have been studied for case-control association studies mainly under a specific genetic model (e.g. recessive, additive or dominant), under which an optimal trend test is available. The genetic model is usually unknown for many complex diseases. In this situation, we study the performance of three robust tests based on the GC and DC corrections in the presence of the population substructure. Our results show that, when the genetic model is unknown, the DC- (or GC-) corrected maximum and Pearson's association test are robust and have good control of Type I error and high power relative to the optimal trend tests in the presence of PS (or CR). 相似文献

3.

Estimating kinship in admixed populations

Thornton T Tang H Hoffmann TJ Ochs-Balcom HM Caan BJ Risch N 《American journal of human genetics》2012,91(1):122-138

Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified. 相似文献

4.

Robust genomic control for association studies

下载免费PDF全文

Zheng G Freidlin B Gastwirth JL 《American journal of human genetics》2006,78(2):350-356

Population-based case-control studies are a useful method to test for a genetic association between a trait and a marker. However, the analysis of the resulting data can be affected by population stratification or cryptic relatedness, which may inflate the variance of the usual statistics, resulting in a higher-than-nominal rate of false-positive results. One approach to preserving the nominal type I error is to apply genomic control, which adjusts the variance of the Cochran-Armitage trend test by calculating the statistic on data from null loci. This enables one to estimate any additional variance in the null distribution of statistics. When the underlying genetic model (e.g., recessive, additive, or dominant) is known, genomic control can be applied to the corresponding optimal trend tests. In practice, however, the mode of inheritance is unknown. The genotype-based chi (2) test for a general association between the trait and the marker does not depend on the underlying genetic model. Since this general association test has 2 degrees of freedom (df), the existing formulas for estimating the variance factor by use of genomic control are not directly applicable. By expressing the general association test in terms of two Cochran-Armitage trend tests, one can apply genomic control to each of the two trend tests separately, thereby adjusting the chi (2) statistic. The properties of this robust genomic control test with 2 df are examined by simulation. This genomic control-adjusted 2-df test has control of type I error and achieves reasonable power, relative to the optimal tests for each model. 相似文献

5.

Alternative parameterizations of relatedness in whole genome association analysis of pre-weaning traits of Nelore-Angus calves

David G. Riley Clare A. Gill Andy D. Herring Penny K. Riggs Jason E. Sawyer James O. Sanders 《Genetics and molecular biology》2014,37(3):518-525

Gestation length, birth weight, and weaning weight of F₂ Nelore-Angus calves (n = 737) with designed extensive full-sibling and half-sibling relatedness were evaluated for association with 34,957 SNP markers. In analyses of birth weight, random relatedness was modeled three ways: 1) none, 2) random animal, pedigree-based relationship matrix, or 3) random animal, genomic relationship matrix. Detected birth weight-SNP associations were 1,200, 735, and 31 for those parameterizations respectively; each additional model refinement removed associations that apparently were a result of the built-in stratification by relatedness. Subsequent analyses of gestation length and weaning weight modeled genomic relatedness; there were 40 and 26 trait-marker associations detected for those traits, respectively. Birth weight associations were on BTA14 except for a single marker on BTA5. Gestation length associations included 37 SNP on BTA21, 2 on BTA27 and one on BTA3. Weaning weight associations were on BTA14 except for a single marker on BTA10. Twenty-one SNP markers on BTA14 were detected in both birth and weaning weight analyses. 相似文献

6.

Confounding from cryptic relatedness in case-control association studies

Voight BF Pritchard JK 《PLoS genetics》2005,1(3):e32

Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population. 相似文献

7.

Rapid estimation of genetic relatedness among heterogeneous populations of alfalfa by random amplification of bulked genomic DNA samples 总被引：9，自引：1，他引：9

Kangfu Yu K. P. Pauls 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1993,86(6):788-794

A procedure which involves the use of RAPD markers, obtained from bulked genomic DNA samples, to estimate genetic relatedness among heterogeneous populations is demonstrated in this study. Bulked samples of genomic DNA from several alfalfa plants per population were used as templates in polymerase chain reactions with different random primers to produce RAPD patterns. The results show that the RAPD patterns can be used to determine genetic distances among heterogeneous populations and cultivars which correspond to their known relatedness. The results also indicate that, by using ten primers with bulked DNA samples from ten individuals, 18–72 populations or cultivars can be distinguished from each other on the basis of at least one unique RAPD marker. We anticipate that DNA bulking and methods for comparing RAPD patterns will be very useful for identifying cultivars, for studying phylogenetic relationships among heterogeneous populations and for selecting parents to maximize heterosis in crosses. 相似文献

8.

William David Weber Nicola M. Anthony Simon P. Lailvaux 《Ecology and evolution》2021,11(6):2886

The way that individuals are spatially organized in their environment is a fundamental population characteristic affecting social structure, mating system, and reproductive ecology. However, for many small or cryptic species, the factors driving the spatial distribution of individuals within a population are poorly understood and difficult to quantify. We combined microsatellite data, remote sensing, and mark–recapture techniques to test the relative importance of body size and relatedness in determining the spatial distribution of male Anolis carolinensis individuals within a focal population over a five‐year period. We found that males maintain smaller home ranges than females. We found no relationship between male body size and home range size, nor any substantial impact of relatedness on the geographic proximity. Instead, the main driver of male spatial distribution in this population was differences in body size. We also found no evidence for offspring inheritance of their parent''s territories. Males were never sampled within their father''s territory providing strong support for male‐biased dispersal. This study introduces a novel approach by combining standard mark release capture data with measures of pairwise relatedness, body size, and GPS locations to better understand the factors that drive the spatial distribution of individuals within a population. 相似文献

9.

Controlling false discoveries in genome scans for selection

下载免费PDF全文

Olivier François Helena Martins Kevin Caye Sean D. Schoville 《Molecular ecology》2016,25(2):454-469

Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets. 相似文献

10.

Too much of a good thing? Finding the most informative genetic data set to answer conservation questions

Elspeth A. McLennan Belinda R. Wright Katherine Belov Carolyn J. Hogg Catherine E. Grueber 《Molecular ecology resources》2019,19(3):659-671

Molecular markers are a useful tool allowing conservation and population managers to shed light on genetic processes affecting threatened populations. However, as technological advancements in molecular techniques continue to evolve, conservationists are frequently faced with new genetic markers, each with nuanced variation in their characteristics as well as advantages and disadvantages for informing various questions. We used a well‐studied population of Tasmanian devils (Sarcophilus harrisii) from Maria Island, Tasmania, to illustrate the issues associated with combining multiple genetic data sets and to help answer a question posed by many population managers: which data set will provide the most precise and accurate estimates of the population processes we are trying to measure? We analysed individual heterozygosity (as internal relatedness, IR) of 96 individuals, calculated using four genetic marker types (putatively neutral microsatellites, major histocompatibility complex‐linked microsatellites, reduced representation sequencing, and candidate region resequencing). We found no correlation in IR values across marker types, suggesting that various genetic markers reflect different aspects of genomic diversity. In addition, some marker types were more informative than others for conservation decision‐making. Reduced representation sequencing provided the highest precision (lowest error) for estimating population‐level genetic diversity, and most closely reflected genome‐wide heterozygosity both theoretically and empirically. Within the conservation context, our results highlight important considerations when choosing a molecular technique for wildlife genetics. 相似文献

11.

Comparison of population-based association study methods correcting for population stratification

Zhang F Wang Y Deng HW 《PloS one》2008,3(10):e3392

Population stratification can cause spurious associations in population-based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population-based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population-based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies. 相似文献

12.

Genomic and genealogical investigation of the French Canadian founder population structure

Roy-Gagnon MH Moreau C Bherer C St-Onge P Sinnett D Laprise C Vézina H Labuda D 《Human genetics》2011,129(5):521-531

Characterizing the genetic structure of worldwide populations is important for understanding human history and is essential to the design and analysis of genetic epidemiological studies. In this study, we examined genetic structure and distant relatedness and their effect on the extent of linkage disequilibrium (LD) and homozygosity in the founder population of Quebec (Canada). In the French Canadian founder population, such analysis can be performed using both genomic and genealogical data. We investigated genetic differences, extent of LD, and homozygosity in 140 individuals from seven sub-populations of Quebec characterized by different demographic histories reflecting complex founder events. Genetic findings from genome-wide single nucleotide polymorphism data were correlated with genealogical information on each of these sub-populations. Our genomic data showed significant population structure and relatedness present in the contemporary Quebec population, also reflected in LD and homozygosity levels. Our extended genealogical data corroborated these findings and indicated that this structure is consistent with the settlement patterns involving several founder events. This provides an independent and complementary validation of genomic-based studies of population structure. Combined genomic and genealogical data in the Quebec founder population provide insights into the effects of the interplay of two important sources of bias in genetic epidemiological studies, unrecognized genetic structure and cryptic relatedness. 相似文献

13.

Confounding from cryptic relatedness in haplotype-based association studies

Feng Zhang Hong-Wen Deng 《Genetica》2010,138(9-10):945-950

Cryptic relatedness was suggested to be an important source of confounding in population-based association studies (PBAS). The impact of cryptic relatedness on the performance of haplotype phase inference and haplotype-based association tests is not clear. In this study, we used the Hapmap genetic data to simulate a set of related samples. We evaluated the accuracy of haplotype phase inferred by PHASE 2.1 and calculated the power, type I error rates, accuracy and positive prediction value (PPV) of haplotype frequency-based association tests (HFAT) and haplotype similarity-based association tests (HSAT) under various scenarios, considering relatedness levels, disease models and sample sizes. Cryptic relatedness appeared to slightly increase the accuracy of haplotype phase inference. We observed significant negative effect of cryptic relatedness on the performance of HFAT and HSAT. Ignoring cryptic relatedness may increase spurious association results in haplotype-based PBAS. 相似文献

14.

A semiparametric test to detect associations between quantitative traits and candidate genes in structured populations

Li M Reilly C Hanson T 《Bioinformatics (Oxford, England)》2008,24(20):2356-2362

MOTIVATION: Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets. RESULTS: In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets. 相似文献

15.

Ten microsatellite loci from Northern Bobwhite (Colinus virginianus)

Brant C. Faircloth Theron M. Terhune Nancy A. Schable Travis C. Glenn William E. Palmer John P. Carroll 《Conservation Genetics》2009,10(3):535-538

相似文献

16.

Investigating Population Genetic Structure in a Highly Mobile Marine Organism: The Minke Whale Balaenoptera acutorostrata acutorostrata in the North East Atlantic

María Quintela Hans J. Skaug Nils ?ien Tore Haug Bj?rghild B. Seliussen Hiroko K. Solvang Christophe Pampoulie Naohisa Kanda Luis A. Pastene Kevin A. Glover 《PloS one》2014,9(9)

Inferring the number of genetically distinct populations and their levels of connectivity is of key importance for the sustainable management and conservation of wildlife. This represents an extra challenge in the marine environment where there are few physical barriers to gene-flow, and populations may overlap in time and space. Several studies have investigated the population genetic structure within the North Atlantic minke whale with contrasting results. In order to address this issue, we analyzed ten microsatellite loci and 331 bp of the mitochondrial D-loop on 2990 whales sampled in the North East Atlantic in the period 2004 and 2007–2011. The primary findings were: (1) No spatial or temporal genetic differentiations were observed for either class of genetic marker. (2) mtDNA identified three distinct mitochondrial lineages without any underlying geographical pattern. (3) Nuclear markers showed evidence of a single panmictic population in the NE Atlantic according STRUCTURE''s highest average likelihood found at K = 1. (4) When K = 2 was accepted, based on the Evanno''s test, whales were divided into two more or less equally sized groups that showed significant genetic differentiation between them but without any sign of underlying geographic pattern. However, mtDNA for these individuals did not corroborate the differentiation. (5) In order to further evaluate the potential for cryptic structuring, a set of 100 in silico generated panmictic populations was examined using the same procedures as above showing genetic differentiation between two artificially divided groups, similar to the aforementioned observations. This demonstrates that clustering methods may spuriously reveal cryptic genetic structure. Based upon these data, we find no evidence to support the existence of spatial or cryptic population genetic structure of minke whales within the NE Atlantic. However, in order to conclusively evaluate population structure within this highly mobile species, more markers will be required. 相似文献

17.

下载免费PDF全文

Zhang S Zhao H 《American journal of human genetics》2001,69(3):601-614

Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification, such as family-based association designs, may be less powerful. Furthermore, it is often more feasible and less expensive to collect unrelated individuals. Recently, several statistical methods have been proposed for case-control association tests in a structured population; these methods may be robust to population stratification. In the present study, we propose a quantitative similarity-based association test (QSAT) to identify association between a candidate marker and a quantitative trait of interest, through use of unrelated individuals. For the QSAT, we first determine whether two individuals are from the same subpopulation or from different subpopulations, using genotype data at a set of independent markers. We then perform an association test between the candidate marker and the quantitative trait, through incorporation of such information. Simulation results based on either coalescent models or empirical population genetics data show that the QSAT has a correct type I error rate in the presence of population stratification and that the power of the QSAT is higher than that of family-based association designs. 相似文献

18.

Use of unlinked genetic markers to detect population stratification in association studies. 总被引：38，自引：0，他引：38

下载免费PDF全文

J K Pritchard N A Rosenberg 《American journal of human genetics》1999,65(1):220-228

We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use. 相似文献

19.

Epigenetic variation predicts regional and local intraspecific functional diversity in a perennial herb

Pilar Bazaga 《Molecular ecology》2014,23(20):4926-4938

The ecological significance of epigenetic variation has been generally inferred from studies on model plants under artificial conditions, but the importance of epigenetic differences between individuals as a source of intraspecific diversity in natural plant populations remains essentially unknown. This study investigates the relationship between epigenetic variation and functional plant diversity by conducting epigenetic (methylation‐sensitive amplified fragment length polymorphisms, MSAP) and genetic (amplified fragment length polymorphisms, AFLP) marker–trait association analyses for 20 whole‐plant, leaf and regenerative functional traits in a large sample of wild‐growing plants of the perennial herb Helleborus foetidus from ten sampling sites in south‐eastern Spain. Plants differed widely in functional characteristics, and exhibited greater epigenetic than genetic diversity, as shown by per cent polymorphism of MSAP fragments (92%) or markers (69%) greatly exceeding that for AFLP ones (41%). After controlling for genetic structuring and possible cryptic relatedness, every functional trait considered exhibited a significant association with at least one AFLP or MSAP marker. A total of 27 MSAP (13.0% of total) and 12 AFLP (4.4%) markers were involved in significant associations, which explained on average 8.2% and 8.0% of trait variance, respectively. Individual MSAP markers were more likely to be associated with functional traits than AFLP markers. Between‐site differences in multivariate functional diversity were directly related to variation in multilocus epigenetic diversity after multilocus genetic diversity was statistically accounted for. Results suggest that epigenetic variation can be an important source of intraspecific functional diversity in H. foetidus, possibly endowing this species with the capacity to exploit a broad range of ecological conditions despite its modest genetic diversity. 相似文献

20.

M Gowda Y Zhao T Würschum C FH Longin T Miedaner E Ebmeyer R Schachschneider E Kazman J Schacht J-P Martinant M F Mette J C Reif 《Heredity》2014,112(5):552-561

The accuracy of genomic selection depends on the relatedness between the members of the set in which marker effects are estimated based on evaluation data and the types for which performance is predicted. Here, we investigate the impact of relatedness on the performance of marker-assisted selection for fungal disease resistance in hybrid wheat. A large and diverse mapping population of 1739 elite European winter wheat inbred lines and hybrids was evaluated for powdery mildew, leaf rust and stripe rust resistance in multi-location field trials and fingerprinted with 9 k and 90 k SNP arrays. Comparison of the accuracies of prediction achieved with data sets from the two marker arrays revealed a crucial role for a sufficiently high marker density in genome-wide association mapping. Cross-validation studies using test sets with varying degrees of relationship to the corresponding estimation sets revealed that close relatedness leads to a substantial increase in the proportion of total genotypic variance explained by the identified QTL and consequently to an overoptimistic judgment of the precision of marker-assisted selection. 相似文献