首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is considerable ethno-linguistic and genetic variation among human populations in Asia, although tracing the origins of this diversity is complicated by migration events. Thailand is at the center of Mainland Southeast Asia (MSEA), a region within Asia that has not been extensively studied. Genetic substructure may exist in the Thai population, since waves of migration from southern China throughout its recent history may have contributed to substantial gene flow. Autosomal SNP data were collated for 438,503 markers from 992 Thai individuals. Using the available self-reported regional origin, four Thai subpopulations genetically distinct from each other and from other Asian populations were resolved by Neighbor-Joining analysis using a 41,569 marker subset. Using an independent Principal Components-based unsupervised clustering approach, four major MSEA subpopulations were resolved in which regional bias was apparent. A major ancestry component was common to these MSEA subpopulations and distinguishes them from other Asian subpopulations. On the other hand, these MSEA subpopulations were admixed with other ancestries, in particular one shared with Chinese. Subpopulation clustering using only Thai individuals and the complete marker set resolved four subpopulations, which are distributed differently across Thailand. A Sino-Thai subpopulation was concentrated in the Central region of Thailand, although this constituted a minority in an otherwise diverse region. Among the most highly differentiated markers which distinguish the Thai subpopulations, several map to regions known to affect phenotypic traits such as skin pigmentation and susceptibility to common diseases. The subpopulation patterns elucidated have important implications for evolutionary and medical genetics. The subpopulation structure within Thailand may reflect the contributions of different migrants throughout the history of MSEA. The information will also be important for genetic association studies to account for population-structure confounding effects.  相似文献   

2.
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs.We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families.  相似文献   

3.
4.
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.  相似文献   

5.
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.  相似文献   

6.
Browning SR  Thompson EA 《Genetics》2012,190(4):1521-1531
Identity-by-descent (IBD) mapping tests whether cases share more segments of IBD around a putative causal variant than do controls. These segments of IBD can be accurately detected from genome-wide SNP data. We investigate the power of IBD mapping relative to that of SNP association testing for genome-wide case-control SNP data. Our focus is particularly on rare variants, as these tend to be more recent and hence more likely to have recent shared ancestry. We simulate data from both large and small populations and find that the relative performance of IBD mapping and SNP association testing depends on population demographic history and the strength of selection against causal variants. We also present an IBD mapping analysis of a type 1 diabetes data set. In those data we find that we can detect association only with the HLA region using IBD mapping. Overall, our results suggest that IBD mapping may have higher power than association analysis of SNP data when multiple rare causal variants are clustered within a gene. However, for outbred populations, very large sample sizes may be required for genome-wide significance unless the causal variants have strong effects.  相似文献   

7.
Isolation by distance (IBD) has been a common measure of genetic structure among populations and is based on Euclidean distances among populations. Whereas IBD does not incorporate geographic complexity (e.g. dispersal barriers, corridors) that may better predict genetic structure, a new approach (landscape genetics) joins landscape ecology with population genetics to better model genetic structure. Should IBD be set aside or should it persist as the most simple model in landscape genetics? We evaluated the status of IBD by collecting and analyzing results of 240 IBD data sets among diverse taxa and study systems. IBD typically represented a low proportion of variance in genetic structure (mean r2=0.22) in part because many studies included relatively few populations (mean=11). The number of populations studied (N) was asymptotically related to IBD significance; a study with 9 populations has only 50% probability of significance, while one with >23 populations will have 90% probability of significance. Surprisingly, ectothermic animals were significantly (p=0.0018) more likely to have significant IBD than endotherms, which suggests a metabolic basis underlying gene flow rates. We also observed marginally significant effects on IBD significance for a) taxa in general and b) dispersal modes within actively‐dispersing endotherms. Other factors analyzed (genetic markers, genetic distances, habitats, active or passive dispersal, plant growth form) did not significantly affect IBD, likely related to typical N. For multiple reasons we conclude that IBD should continue as the simplest reference standard against which all other, more complex models should be compared in landscape genetics research.  相似文献   

8.
One of the most pressing issues in spatial genetics concerns sampling. Traditionally, substructure and gene flow are estimated for individuals sampled within discrete populations. Because many species may be continuously distributed across a landscape without discrete boundaries, understanding sampling issues becomes paramount. Given large-scale, geographically broad conservation efforts, researchers are looking for guidance as to the trade-offs between sampling more individuals within a population versus few individuals scattered across more populations. Here, we conducted simulations that address these issues. We first established two archetypical patterns of dispersion: (1) individuals within discrete populations, and (2) continuously distributed individuals with limited dispersal. We used genotypes generated from a spatially-explicit, individual-based program and simulated genetic structure in individuals from nine different population sizes across a landscape that either had barriers to movement (defining discrete populations) or isolation-by-distance patterns (defining continuously distributed individuals). Then, given each pattern of dispersion, we allocated samples across four different sampling strategies for each of the nine population sizes in various configurations for sampling more individuals within a population versus fewer individuals scattered across more populations. We assessed the population genetic substructure with both the population-based metric, F ST, and an individual-based metric, D PS regardless of the true pattern of dispersion to allow us to better understand the effect of incorrectly matching the metric and the distribution (e.g., F ST with continuously distributed individuals, and vice versa). We show that sampling many subpopulations (or sampling areas), thus sampling fewer individuals per subpopulation, overestimates measures of population subdivision with the population-based metric for both patterns of dispersion. In contrast, using the individual-based metric gives the opposite results: sampling too few subpopulations, and many individuals per subpopulation, produces an underestimate of the strength of isolation-by-distance. By comparing all results, we were able to suggest a strong predictive model of a chosen genetic structure metric for elucidating the sampling design trade-offs given each pattern of dispersion and configuration on the landscape.  相似文献   

9.
Population geneticists and community ecologists have long recognized the importance of sampling design for uncovering patterns of diversity within and among populations and in communities. Invasion ecologists increasingly have utilized phylogeographical patterns of mitochondrial or chloroplast DNA sequence variation to link introduced populations with putative source populations. However, many studies have ignored lessons from population genetics and community ecology and are vulnerable to sampling errors owing to insufficient field collections. A review of published invasion studies that utilized mitochondrial or chloroplast DNA markers reveals that insufficient sampling could strongly influence results and interpretations. Sixty per cent of studies sampled an average of less than six individuals per source population, vs. only 45% for introduced populations. Typically, far fewer introduced than source populations were surveyed, although they were sampled more intensively. Simulations based on published data forming a comprehensive mtDNA haplotype data set highlight and quantify the impact of the number of individuals surveyed per source population and number of putative source populations surveyed for accurate assignment of introduced individuals. Errors associated with sampling a low number of individuals are most acute when rare source haplotypes are dominant or fixed in the introduced population. Accuracy of assignment of introduced individuals is also directly related to the number of source populations surveyed and to the degree of genetic differentiation among them ( F ST). Incorrect interpretations resulting from sampling errors can be avoided if sampling design is considered before field collections are made.  相似文献   

10.
The human dopaminergic system is a significant focal point of study in the fields of neuropsychiatry and pharmacology, plus it is also a promising nuclear DNA marker in studies of human genome diversity. In this study, we assayed six polymorphic markers in the dopamine D2 receptor gene (DRD2) in 482 unrelated individuals from nine ethnic populations of India. Our results demonstrate that the six markers are highly polymorphic in all populations and the constructed haplotypes show a high level of heterozygosity. Out of the eight possible three-site haplotypes, all populations commonly shared only three haplotypes. The haplotypes exhibited fairly high frequencies across multiple populations; Kurumba population showed all eight three-site haplotypes. The ancestral haplotype (B2-D2-Al) was observed at high frequency only in the Siddi population. Haplotypes based on all six markers revealed 16 haplotypes, out of which only 6 are most common with a frequency of greater than 5% in at least one of the nine populations. But only three haplotypes were shared by all nine populations with the cumulative frequency ranging from 80.8% (Kurumba) to 96.6% (Onge). Great variation in levels of linkage disequilibrium (LD) was detected, ranging from complete LD in the Badaga to virtually no LD in the Siddi. This range of LD likely reflects different population histories, such as African ancestry in the Siddi and recent founding events in the population isolates, Badaga and Kota.  相似文献   

11.
Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.  相似文献   

12.
Research over the past 20 years has shown, with the help of molecular markers, that the population genetics and distribution patterns of freshwater invertebrates in North America are often more complex than was previously believed. Here we extend this research to an, as yet, unstudied but widespread and common group, the freshwater bryozoans. Colonies of the bryozoan Cristatella mucedo were collected from a number of lakes across central North America, and were characterized genetically by analysis of microsatellite loci and mitochondrial DNA (mtDNA) cytochrome b sequences. The microsatellites illustrate a pattern of generally diverse and highly differentiated populations that contain little evidence of recent gene flow. The mtDNA sequences yielded highly variable levels of divergence, ranging from 0.0 to 8.8% within populations, and 0.0 to 9.8% among populations. The multiple divergent mtDNA lineages within populations provide evidence for repeated colonization events. The lack of clustering of haplotypes by site suggests that there has been widespread dispersal of multiple genetic lineages since the last ice age. While some of the haplotype lineages may have evolved in disjunct glacial refugia, the maximum levels of divergence predate the time since the last glacial-interglacial cycles. It is likely that multiple factors including vicariance events, patterns of dispersal, localized extinction, and an unusual life history, explain the unique phylogeographic patterns evident today in populations of C. mucedo.  相似文献   

13.
Introductions of biological control agents may cause bottlenecks in population size despite efforts to avoid them. We examined the population genetics of Aphidius ervi (Hymenoptera: Braconidae), a parasitoid that was introduced to North America from Western Europe in 1959 to control pea aphids. To explore the phylogeographical relationships of A. ervi we sequenced 1249 bp of mitochondrial DNA (mtDNA) from 27 individuals from the native range and 51 individuals from the introduced range. Most individuals from Western Europe, the Middle East and North America shared one of two common haplotypes, consistent with the known history of the introduction. However, some A. ervi from the Pacific Northwest have a haplotype that is most similar to haplotypes found in Japan, raising the possibility of a second accidental introduction. To examine population structure and assess whether a bottleneck occurred upon introduction to North America, we assayed variation at 5 microsatellite loci in 62 individuals from 2 native populations and 230 individuals from 6 introduced populations. Introduced samples had fewer rare alleles than native samples (F1,34 = 13.5, P = 0.0008), but heterozygosity did not differ significantly. These results suggest that a mild bottleneck occurred in spite of the introduction of over 1000 individuals. Using a hierarchical Bayesian approach, the founding population size was estimated to be 245 individuals. amova showed significant genetic differentiation between the European and North American samples, and a Bayesian assignment approach clustered individuals into four groups, with most European individuals in one group and most North American individuals in the other three. These results highlight that genetic changes are associated with founder events in rapidly growing natural populations, even when the founding population size is relatively large.  相似文献   

14.
The majority of complete hydatidiform moles (CHMs) harbor duplicated haploid genomes that originate from sperm. This makes CHMs more advantageous than conventional diploid cells for determining haplotypes of SNPs and copy-number variations (CNVs), because all of the genetic variants in a CHM genome are homozygous. Here we report SNP and CNV haplotype structures determined by analysis of 100 CHMs from Japanese subjects via high-density DNA arrays. The obtained haplotype map should be useful as a reference for the haplotype structure of Asian populations. We resolved common CNV regions (merged CNV segments across the examined samples) into CNV events (clusters of CNV segments) on the basis of mutual overlap and found that the haplotype backgrounds of different CNV events within the same CNV region were predominantly similar, perhaps because of inherent structural instability.  相似文献   

15.
Retracing the trajectories of past genetic events is crucial to understand the structure of the genome, both in individuals and across populations. A haplotype describes a string of polymorphic sites along a DNA segment. Haplotype diversity is due to mutations creating new variants, and to recombinations and gene conversions that mix and redistribute these variants among individual chromosomes in populations. A number of studies have revealed a relatively simple pattern of haplotype diversity in the human genome, dominated by a few common haplotypes representing founder ancestral ones. New haplotypes are usually rare and have a limited geographic distribution. We propose a method to derive a new haplotype from a set of putative ancestral haplotypes, once mutations in place, through minimal recombination and gene conversion pathways. We describe classes of pathways that represent the whole set of minimal pathways leading to a new haplotype. We show that obtaining this set of pathways can be represented as a problem of finding "secondary structures" of minimum energy. We present a polynomial algorithm solving this folding problem.  相似文献   

16.
Segments of indentity-by-descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping, and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement) evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent superexponential population growth that is designed to match United Kingdom data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the United Kingdom and from Northern Finland and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, and easy to use and is implemented in Beagle version 4.  相似文献   

17.
We have analyzed human genetic diversity in 33 Old World populations including 23 populations obtained through Genographic Project studies. A set of 1,536 SNPs in five X chromosome regions were genotyped in 1,288 individuals (mostly males). We use a novel analysis employing subARG network construction with recombining chromosomal segments. Here, a subARG is constructed independently for each of five gene-free regions across the X chromosome, and the results are aggregated across them. For PCA, MDS and ancestry inference with STRUCTURE, the subARG is processed to obtain feature vectors of samples and pairwise distances between samples. The observed population structure, estimated from the five short X chromosomal segments, supports genome-wide frequency-based analyses: African populations show higher genetic diversity, and the general trend of shared variation is seen across the globe from Africa through Middle East, Europe, Central Asia, Southeast Asia, and East Asia in broad patterns. The recombinational analysis was also compared with established methods based on SNPs and haplotypes. For haplotypes, we also employed a fixed-length approach based on information-content optimization. Our recombinational analysis suggested a southern migration route out of Africa, and it also supports a single, rapid human expansion from Africa to East Asia through South Asia.  相似文献   

18.
We carried out a phylogeographic study using mtDNA (COII) for the endemic springtail Desoria klovstadi (formerly Isotoma klovstadi ) from northern Victoria Land, Antarctica. Low levels of sequence divergence (≤ 1.6%) across 26 unique haplotypes (from 69 individuals) were distributed according to geographic location. Cape Hallett and Daniell Peninsula contained the highest nucleotide (both > 0.004) and haplotype (both > 0.9) diversity with 10 (of 16) and 8 (of 12) unique haplotypes, respectively. All other populations (Football Saddle, Crater Cirque, Cape Jones) had lower diversity with 2–4 unique haplotypes. Across the 69 individuals from five populations there was only a single haplotype shared between two populations (Daniell Peninsula and Football Saddle). Furthermore, nested clade analyses revealed that some of the Daniell Peninsula haplotypes were more closely related to Football Saddle haplotypes than to any other population. Such discrete haplotype groupings suggest historical (rare) dispersal across the Pleistocene (1.8 mya−11 kya) and Holocene (11 kya–present), coupled with repeated extinction, range contraction and expansion events, and/or incomplete sampling across the species range. The nested clade analyses reveal that a common pattern of climatic and geological history over long-term glacial habitat fragmentation has determined the geographic and haplotype distributions found for D. klovstadi .  相似文献   

19.
20.
1. The genetic variation of the endangered freshwater fish Ladigesocypris ghigii, endemic to the island of Rhodes (Greece), was investigated for nine populations, originating from seven different stream systems and a reservoir, both at the mtDNA and nuclear level, in order to suggest conservation actions. 2. Both restriction fragment length polymorphism analysis of five segments of mitochondrial DNA (ND‐5/6, COI and 12S‐16S rRNA) amplified by polymerase chain reaction, and random amplified polymorphic DNA analysis, revealed extremely low levels of intra‐population polymorphism. It is highly likely that the low intra‐population variability is the result of successive bottleneck events evident in shrinkage and expansion of the populations year after year, which may have led to a complete loss of several genotypes and haplotypes, and an increased degree of inbreeding. 3. Inter‐population genetic structuring was high, with fixation of haplotypes within six of the nine populations and fixation of alleles within populations originating from different waterbodies. It is probable that all haplotypes and/or alleles found were initially represented in all populations. However, because of the long time of isolation coupled with successive bottleneck and subsequent genetic drift, common mtDNA haplotypes and alleles among the populations may have become rare or extinct through stochastic lineage loss. 4. Although nucleotide divergence among haplotypes was very shallow, half of the haplotypes recorded (three of six), resulted from nucleotide changes on the 12S–16S rRNA segments, which are the most conserved part of the mitochondrial genome. This fact may indicate that the observed genetic variation did not necessarily result only from the retention of ancestral polymorphism, but may have arisen through mutation and complete lineage sorting over a relatively small number of generations, once the populations had become isolated from one another. 5. Our data suggest that two of the L. ghigii populations may be on independent evolutionary trajectories. Considering that each population appears so far well adapted within each site, all populations should be managed and conserved separately.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号