首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Yuan X  Zhang J  Wang Y 《Biochemical genetics》2011,49(5-6):395-409
Existing simulation methods usually simulate linkage disequilibrium (LD) structures starting with an initial population that is randomly generated according to specified allele frequencies. These at random based methods might be unstable because the LD level of the initial population is generally extremely low. This study presents a new algorithm, SIMLD, to simulate genome populations with real LD structures. SIMLD begins from an initial population with possibly the highest LD level, and then the LD decays to fit the desired level through processes of mating and recombination over generations. SIMLD can produce case-control samples according to various disease models. Using empirical SNP marker information from three populations of HapMap data, we implement the proposed algorithm and demonstrate a set of experimental results.  相似文献   

2.
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference.  相似文献   

3.
Linkage disequilibrium (LD) is a major concern in many genetic studies because of the markedly increased density of SNP (Single Nucleotide Polymorphism) genotype markers. This dramatic increase in the number of SNPs may cause problems in statistical analyses, such as by introducing multiple comparisons in hypothesis testing and colinearity in logistic regression models, because of the presence of complex LD structures. Inferences must be made about the underlying genetic variation through the LD structure before applying statistical models to the data. Therefore, we introduced the textile plot to provide a visualization of LD to improve the analysis of the genetic variation present in multiple-SNP genotype data. The plot can accentuate LD by displaying specific geometrical shapes, and allowing for the underlying haplotype structure to be inferred without any haplotype-phasing algorithms. Application of this technique to simulated and real data sets illustrated the potential usefulness of the textile plot as an aid to the interpretation of LD in multiple-SNP genotype data. The initial results of LD mapping and haplotype analyses of disease genes are encouraging, indicating that the textile plot may be useful in disease association studies.  相似文献   

4.
We propose a simple model of evolution at a pair of SNP loci, under mutation, genetic drift and recombination. The developed model allows to consider evolution of SNPs under different demographic scenarios. We applied it to SNP data containing polymorphisms spanning 19 gene regions. We initially matched the linkage disequilibrium (LD) data only, and then we reconciled both LD and heterozygosity data. The imbalance between LD and heterozygosity data, observed for some of the analyzed genomic regions, may be a signature of selection acting in these regions. However, assuming neutrality, we obtain estimates of the age of population expansion of modern humans, which are consistent with the consensus estimates. In addition, we are able to estimate the ages of the polymorphisms observed in different genomic regions and we find that they vary widely with respect to their age. Polymorphisms at loci implicated in human disease, seem to be younger than average. Our results supplement the conclusions originally obtained by Reich and co-workers for the same set of data.  相似文献   

5.
Linkage disequilibrium (LD) content was calculated for the Genetic Analysis Workshop 14 Affymetrix and Illumina single-nucleotide polymorphism (SNP) genome scans of the Collaborative Study on the Genetics of Alcoholism samples. Pair-wise LD was measured as both D' and r2 on 505 pedigree founder individuals. The r2 estimates were then used to correct the multipoint identity by descent matrix (MIBD) calculation to account for LD and LOD scores on chromosomes 3 and 18 were calculated for COGA's ttdt3 electrophysiological trait using those MIBDs. Extensive LD was observed throughout both marker sets, and it was higher in Affymetrix's more dense SNP map. However, SNP density did not solely account for Affymetrix's higher LD. MIBD estimation procedures assume linkage equilibrium to construct genotypes of non-genotyped pedigree founder individuals, and dense SNP genotyping maps are likely to contain moderate to high LD between markers. LOD score plots calculated after correction for LD followed the same general pattern as uncorrected ones. Since in our study almost half of the pedigree founders were genotyped, it is possible that LD had a minor impact on the LOD scores. Caution should probably be taken when using high density SNP maps when many non-genotyped founders are present in the study pedigrees.  相似文献   

6.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.  相似文献   

7.
Leblois R  Slatkin M 《Molecular ecology》2007,16(11):2237-2245
We consider an isolated population founded by a small number of individuals randomly chosen from a source population of known genetic composition at a known time in the past. We develop a Monte-Carlo maximum-likelihood method for estimating the number of founding individuals from the haplotype frequencies at several SNP (single nucleotide polymorphism) loci in a sample. We assume the isolated population was founded recently enough that that mutation can be ignored and that haplotype frequencies in the source population have not changed. We apply the method to simulated data and show that it is unbiased. With a reasonable number of individuals sampled, it is possible to estimate the number of founders within a factor of 2. We show that the performance of the method is not degraded substantially if the frequencies of the rare haplotypes in the source are not known precisely and if there is some recombination. We illustrate the use of our method by applying it to a previously published data set from a recently founded population of wolves (Canis lupus) in Scandinavia.  相似文献   

8.
Slate J  Phua SH 《Molecular ecology》2003,12(3):597-608
Mitochondrial DNA (mtDNA) is a widely employed molecular tool in phylogeography, in the inference of human evolutionary history, in dating the domestication of livestock and in forensic science. In humans and other vertebrates the popularity of mtDNA can be partially attributed to an assumption of strict maternal inheritance, such that there is no recombination between mitochondrial lineages. The recent demonstration that linkage disequilibrium (LD) declines as a function of distance between polymorphic sites in hominid mitochondrial genomes has been interpreted as evidence of recombination between mtDNA haplotypes, and hence nonclonal inheritance. However, critics of mtDNA recombination have suggested that this association is an artefact of an inappropriate measure of LD or of sequencing error, and subsequent studies of other populations have failed to replicate the initial finding. Here we report the analysis of 16 ruminant populations and present evidence that LD significantly declines with distance in five of them. A meta-analysis of the data indicates a nonsignificant trend of LD declining with distance. Most of the earlier criticisms of patterns between LD and distance in hominid mtDNA are not applicable to this data set. Our results suggest that either ruminant mtDNA is not strictly clonal or that compensatory selection has influenced patterns of variation at closely linked sites within the mitochondrial control region. The potential impact of these processes should be considered when using mtDNA as a tool in vertebrate population genetic, phylogenetic and forensic studies.  相似文献   

9.
Jiang R  Marjoram P  Borevitz JO  Tavaré S 《Genetics》2006,173(4):2257-2267
This article is concerned with a statistical modeling procedure to call single-feature polymorphisms from microarray experiments. We use this new type of polymorphism data to estimate the mutation and recombination parameters in a population. The mutation parameter can be estimated via the number of single-feature polymorphisms called in the sample. For the recombination parameter, a two-feature sampling distribution is derived in a way analogous to that for the two-locus sampling distribution with SNP data. The approximate-likelihood approach using the two-feature sampling distribution is examined and found to work well. A coalescent simulation study is used to investigate the accuracy and robustness of our method. Our approach allows the utilization of single-feature polymorphism data for inference in population genetics.  相似文献   

10.
The relationship between linkage disequilibrium (LD) and recombination fraction can be used to infer the pattern of genetic variation and evolutionary process in humans and other systems. We described a computational framework to construct a linkage–LD map from commonly used biallelic, single-nucleotide polymorphism (SNP) markers for outcrossing plants by which the decline of LD is visualized with genetic distance. The framework was derived from an open-pollinated (OP) design composed of plants randomly sampled from a natural population and seeds from each sampled plant, enabling simultaneous estimation of the LD in the natural population and recombination fraction due to allelic co-segregation during meiosis. We modified the framework to infer evolutionary pasts of natural populations using those marker types that are segregating in a dominant manner, given their role in creating and maintaining population genetic diversity. A sophisticated two-level EM algorithm was implemented to estimate and retrieve the missing information of segregation characterized by dominant-segregating markers such as single methylation polymorphisms. The model was applied to study the relationship between linkage and LD for a non-model outcrossing species, a gymnosperm species, Torreya grandis, naturally distributed in mountains of the southeastern China. The linkage–LD map constructed from various types of molecular markers opens a powerful gateway for studying the history of plant evolution.  相似文献   

11.
A tutorial on statistical methods for population association studies   总被引:14,自引:0,他引:14  
Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy-Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.  相似文献   

12.
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.  相似文献   

13.
Genetic variation in the human population may lead to functional variants of genes that contribute to risk for common chronic diseases such as cancer. In an effort to detect such possible predisposing variants, we constructed haplotypes for a candidate gene and tested their efficacy in association studies. We developed haplotypes consisting of 14 biallelic neutral-sequence variants that span 142 kb of the ATM locus. ATM is the gene responsible for the autosomal recessive disease ataxia-telangiectasia (AT). These ATM noncoding single-nucleotide polymorphisms (SNPs) were genotyped in nine CEPH families (89 individuals) and in 260 DNA samples from four different ethnic origins. Analysis of these data with an expectation-maximization algorithm revealed 22 haplotypes at this locus, with three major haplotypes having frequencies > or = .10. Tests for recombination and linkage disequilibrium (LD) show reduced recombination and extensive LD at the ATM locus, in all four ethnic groups studied. The most striking example was found in the study population of European ancestry, in which no evidence for recombination could be discerned. The potential of ATM haplotypes for detection of genetic variants through association studies was tested by analysis of 84 individuals carrying one of three ATM coding SNPs. Each coding SNP was detected by association with an ATM haplotype. We demonstrate that association studies with haplotypes for candidate genes have significant potential for the detection of genetic backgrounds that contribute to disease.  相似文献   

14.
There is considerable interest in identifying and characterizing block-like patterns of linkage disequilibrium (LD; haplotype blocks) in the human genome as these may facilitate the identification of complex disease genes via genome-wide association studies. Although recombination hot-spots have been suggested as the primary mechanism to explain the block-like pattern of LD, other forces, such as genetic drift, may also be important. To this end, we have studied the effect of various recombination models on patterns of LD by using extensive simulations. As expected, haplotype blocks were observed under a model allowing recombination hot-spots. However, we also observed similar block-like patterns in the models where recombination crossovers are randomly and uniformly distributed, and we demonstrate that these blocks are generated by genetic drift. We caution that genetic drift may be an alternative mechanism (in addition to recombination hot-spots) that can lead to block-like patterns of LD. Our findings highlight the necessity of characterizing haplotype blocks in world-wide populations.  相似文献   

15.
Enterocytozoon bieneusi is a widespread parasite with high genetic diversity among hosts. Its natural reservoir remains elusive and data on population structure are available only in isolates from primates. Here we describe a population genetic study of 101 E. bieneusi isolates from pigs using sequence analysis of the ribosomal internal transcribed spacer (ITS) and four mini- and microsatellite markers. The presence of strong linkage disequilibrium (LD) and limited genetic recombination indicated a clonal structure for the population. Bayesian inference of phylogeny, structural analysis, and principal coordinates analysis separated the overall population into three subpopulations (SP3 to SP5) with genetic segregation of the isolates at some geographic level. Comparative analysis showed the differentiation of SP3 to SP5 from the two known E. bieneusi subpopulations (SP1 and SP2) from primates. The placement of a human E. bieneusi isolate in pig subpopulation SP4 supported the zoonotic potential of some E. bieneusi isolates. Network analysis showed directed evolution of SP5 to SP3/SP4 and SP1 to SP2. The high LD and low number of inferred recombination events are consistent with the possibility of host adaptation in SP2, SP3, and SP4. In contrast, the reduced LD and high genetic diversity in SP1 and SP5 might be results of broad host range and adaptation to new host environment. The data provide evidence of the potential occurrence of host adaptation in some of E. bieneusi isolates that belong to the zoonotic ITS Group 1.  相似文献   

16.
The variation of the recombination rate along chromosomal DNA is one of the important determinants of the patterns of linkage disequilibrium. A number of inferential methods have been developed which estimate the recombination rate and its variation from population genetic data. The majority of these methods are based on modelling the genealogical process underlying a sample of DNA sequences and thus explicitly include a model of the demographic process. Here we propose a different inferential procedure based on a previously introduced framework where recombination is modelled as a point process along a DNA sequence. The approach infers regions containing putative hotspots based on the inferred minimum number of recombination events; it thus depends only indirectly on the underlying population demography. A Poisson point process model with local rates is then used to infer patterns of recombination rate estimation in a fully Bayesian framework. We illustrate this new approach by applying it to several population genetic datasets, including a region with an experimentally confirmed recombination hotspot.  相似文献   

17.
snp.plotter is a newly developed R package which produces high-quality plots of results from genetic association studies. The main features of the package include options to display a linkage disequilibrium (LD) plot below the P-value plot using either the r2 or D' LD metric, to set the X-axis to equal spacing or to use the physical map of markers, and to specify plot labels, colors, symbols and LD heatmap color scheme. snp.plotter can plot single SNP and/or haplotype data and simultaneously plot multiple sets of results. R is a free software environment for statistical computing and graphics available for most platforms. The proposed package provides a simple way to convey both association and LD information in a single appealing graphic for genetic association studies. AVAILABILITY: Downloadable R package and example datasets are available at http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html and http://www.r-project.org.  相似文献   

18.
Recent advances in sequencing allow population‐genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction‐site‐associated DNA sequence (RAD‐seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well‐characterized single nucleotide polymorphism (SNP) data set from 21 three‐spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single‐outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population‐genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population‐demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population‐genomic data set, making it especially valuable for nonmodel species.  相似文献   

19.
As large-scale sequencing efforts turn from single genome sequencing to polymorphism discovery, single nucleotide polymorphisms (SNPs) are becoming an increasingly important class of population genetic data. But because of the ascertainment biases introduced by many methods of SNP discovery, most SNP data cannot be analyzed using classical population genetic methods. Statistical methods must instead be developed that can explicitly take into account each method of SNP discovery. Here we review some of the current methods for analyzing SNPs and derive sampling distributions for single SNPs and pairs of SNPs for some common SNP discovery schemes. We also show that the ascertainment scheme has a large effect on the estimation of linkage disequilibrium and recombination, and describe some methods of correcting for ascertainment biases when estimating recombination rates from SNP data.  相似文献   

20.
Jiang N  Wang M  Jia T  Wang L  Leach L  Hackett C  Marshall D  Luo Z 《PloS one》2011,6(8):e23192

Background

It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.

Methodology

We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.

Results/Conclusions

The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号