首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The distribution pattern of mtDNA haplotypes in distinct populations of the glacial relict crustacean Saduria entomon was examined to assess phylogeographic relationships among them. Populations from the Baltic, the White Sea and the Barents Sea were screened for mtDNA variation using PCR‐based RFLP analysis of a 1150 bp fragment containing part of the CO I and CO II genes. Five mtDNA haplotypes were recorded. An analysis of geographical heterogeneity in haplotype frequency distributions revealed significant differences among populations. The isolated populations of S. entomon have diverged since the retreat of the last glaciation. The geographical pattern of variation is most likely the result of stochastic (founder effect, genetic drift) mechanisms and suggests that the haplotype differentiation observed is probably older than the isolation of the Baltic and Arctic seas.  相似文献   

2.
Smelt Osmerus eperlanus has two different life history strategies in the Netherlands. The migrating population inhabits the Wadden Sea and spawns in freshwater areas. After the closure of the Afsluitdijk in 1932, part of the smelt population became landlocked. The fresh water smelt population has been in severe decline since 1990, and has strongly negatively impacted the numbers of piscivorous water birds relying on smelt as their main prey. The lakes that were formed after the dike closure, IJsselmeer and Markermeer have been assigned as Natura 2000 sites, based on their importance for (among others) piscivorous water birds. Because of the declining fresh water smelt population, the question arose whether this population is still supported by the diadromous population. Opportunities for exchange between fresh water and the sea are however limited to discharge sluices. The relationship between the diadromous and landlocked smelt population was analysed by means of otolith microchemistry. Our interpretation of otolith strontium (88Sr) patterns from smelt specimens collected in the fresh water area of Lake IJsselmeer and Markermeer, compared to those collected in the nearby marine environment, is that there is currently no evidence for a substantial contribution from the diadromous population to the spawning stock of the landlocked population.  相似文献   

3.
We propose a novel approximate-likelihood method to fit demographic models to human genomewide single-nucleotide polymorphism (SNP) data. We divide the genome into windows of constant genetic map width and then tabulate the number of distinct haplotypes and the frequency of the most common haplotype for each window. We summarize the data by the genomewide joint distribution of these two statistics—termed the HCN statistic. Coalescent simulations are used to generate the expected HCN statistic for different demographic parameters. The HCN statistic provides additional information for disentangling complex demography beyond statistics based on single-SNP frequencies. Application of our method to simulated data shows it can reliably infer parameters from growth and bottleneck models, even in the presence of recombination hotspots when properly modeled. We also examined how practical problems with genomewide data sets, such as errors in the genetic map, haplotype phase uncertainty, and SNP ascertainment bias, affect our method. Several modifications of our method served to make it robust to these problems. We have applied our method to data collected by Perlegen Sciences and find evidence for a severe population size reduction in northwestern Europe starting 32,500–47,500 years ago.  相似文献   

4.
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors.  相似文献   

5.
Haplotypes have gained increasing attention in the mapping of complex-disease genes, because of the abundance of single-nucleotide polymorphisms (SNPs) and the limited power of conventional single-locus analyses. It has been shown that haplotype-inference methods such as Clark's algorithm, the expectation-maximization algorithm, and a coalescence-based iterative-sampling algorithm are fairly effective and economical alternatives to molecular-haplotyping methods. To contend with some weaknesses of the existing algorithms, we propose a new Monte Carlo approach. In particular, we first partition the whole haplotype into smaller segments. Then, we use the Gibbs sampler both to construct the partial haplotypes of each segment and to assemble all the segments together. Our algorithm can accurately and rapidly infer haplotypes for a large number of linked SNPs. By using a wide variety of real and simulated data sets, we demonstrate the advantages of our Bayesian algorithm, and we show that it is robust to the violation of Hardy-Weinberg equilibrium, to the presence of missing data, and to occurrences of recombination hotspots.  相似文献   

6.
The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individuals based on population genetics, and are less accurate. Thus, the effects of this inaccuracy on downstream analyses needs to be assessed. We determined true Japanese haplotypes by genotyping 100 complete hydatidiform moles (CHM), each carrying a genome derived from a single sperm, using Affymetrix 500 K Arrays. We then assessed how inferred haplotypes can differ from true haplotypes, by phasing pseudo-individualized true haplotypes using the programs PHASE, fastPHASE, and Beagle. We found that, at various genomic regions, especially the MHC locus, the expansion of extended haplotype homozygosity (EHH), which is a measure of positive selection, is obscured when inferred Asian haplotype data is used to detect the expansion. We then mapped the genome using a new statistic, XDiHH, which directly detects the difference between the true and inferred haplotypes, in the determination of EHH expansion. We also show that the true haplotype data presented here is useful to assess and improve the accuracy of phasing of Asian genotypes.  相似文献   

7.
Copy number variation (CNV) has been reported to be associated with disease and various cancers. Hence, identifying the accurate position and the type of CNV is currently a critical issue. There are many tools targeting on detecting CNV regions, constructing haplotype phases on CNV regions, or estimating the numerical copy numbers. However, none of them can do all of the three tasks at the same time. This paper presents a method based on Hidden Markov Model to detect parent specific copy number change on both chromosomes with signals from SNP arrays. A haplotype tree is constructed with dynamic branch merging to model the transition of the copy number status of the two alleles assessed at each SNP locus. The emission models are constructed for the genotypes formed with the two haplotypes. The proposed method can provide the segmentation points of the CNV regions as well as the haplotype phasing for the allelic status on each chromosome. The estimated copy numbers are provided as fractional numbers, which can accommodate the somatic mutation in cancer specimens that usually consist of heterogeneous cell populations. The algorithm is evaluated on simulated data and the previously published regions of CNV of the 270 HapMap individuals. The results were compared with five popular methods: PennCNV, genoCN, COKGEN, QuantiSNP and cnvHap. The application on oral cancer samples demonstrates how the proposed method can facilitate clinical association studies. The proposed algorithm exhibits comparable sensitivity of the CNV regions to the best algorithm in our genome-wide study and demonstrates the highest detection rate in SNP dense regions. In addition, we provide better haplotype phasing accuracy than similar approaches. The clinical association carried out with our fractional estimate of copy numbers in the cancer samples provides better detection power than that with integer copy number states.  相似文献   

8.
Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.  相似文献   

9.
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.  相似文献   

10.
11.
The accurate identification of the route of transmission taken by an infectious agent through a host population is critical to understanding its epidemiology and informing measures for its control. However, reconstruction of transmission routes during an epidemic is often an underdetermined problem: data about the location and timings of infections can be incomplete, inaccurate, and compatible with a large number of different transmission scenarios. For fast-evolving pathogens like RNA viruses, inference can be strengthened by using genetic data, nowadays easily and affordably generated. However, significant statistical challenges remain to be overcome in the full integration of these different data types if transmission trees are to be reliably estimated. We present here a framework leading to a bayesian inference scheme that combines genetic and epidemiological data, able to reconstruct most likely transmission patterns and infection dates. After testing our approach with simulated data, we apply the method to two UK epidemics of Foot-and-Mouth Disease Virus (FMDV): the 2007 outbreak, and a subset of the large 2001 epidemic. In the first case, we are able to confirm the role of a specific premise as the link between the two phases of the epidemics, while transmissions more densely clustered in space and time remain harder to resolve. When we consider data collected from the 2001 epidemic during a time of national emergency, our inference scheme robustly infers transmission chains, and uncovers the presence of undetected premises, thus providing a useful tool for epidemiological studies in real time. The generation of genetic data is becoming routine in epidemiological investigations, but the development of analytical tools maximizing the value of these data remains a priority. Our method, while applied here in the context of FMDV, is general and with slight modification can be used in any situation where both spatiotemporal and genetic data are available.  相似文献   

12.
13.
We analyzed mutations and defined the chromosomal haplotype in 127 patients of Mediterranean descent who were affected by Wilson disease (WD), 39 Sardinians, 49 Italians, 33 Turks, and 6 Albanians. Haplotypes were derived by use of the microsatellite markers D13S301, D13S296, D13S297, and D13S298, which are linked to the WD locus. There were five common haplotypes in Sardinians, three in Italians, and two in Turks, which accounted for 85%, 32%, and 30% of the WD chromosomes, respectively. We identified 16 novel mutations: 8 frameshifts, 7 missense mutations, and 1 splicing defect. In addition, we detected the previously described mutations: 2302insC, 3404delC, Argl320ter, Gly944-Ser, and Hisl070Gin. Of the new mutations detected, two, the 1515insT on haplotype I and 2464delC on haplotype XVI, accounted for 6% and 13%, respectively, of the mutations in WD chromosomes in the Sardinian population. Mutations H1070Q, 2302insC, and 2533delA represented 13%, 8%, and 8%, respectively, of the mutations in WD chromosomes in other Mediterranean populations. The remaining mutations were rare and limited to one or two patients from different populations. Thus, WD results from some frequent mutations and many rare defects.  相似文献   

14.
Deep sequencing of viral populations using next-generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intrahost viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here, we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.  相似文献   

15.
A phylogeographic approach was conducted to assess the geographic structure and genetic variation in populations of the boll weevil Anthonomus grandis, which is the most harmful insect pest of cotton in the Americas. COI and COII mitochondrial gene sequences were analyzed to test a former hypothesis on the origin of the boll weevil in Argentina, Brazil and Paraguay, using samples from Mexico and USA as putative source populations. The analysis of variability suggests that populations from South American cotton fields and nearby disturbed areas form a phylogroup with a central haplotype herein called A, which is the most common and widespread in USA and South America. The population from Texas has the A haplotype as the most frequent and gathers in the same group as the South American populations associated with cotton. The sample from Tecomán (México) shows high values of within-nucleotide divergence, shares no haplotype in common with the South American samples, and forms a phylogroup separated by several mutational steps. The sample from Iguazú National Park (Misiones Province, Argentina) has similar characteristics, with highly divergent haplotypes forming a phylogroup closer to the samples from cotton fields, than to the Mexican group. We propose that in South America there are: populations with characteristics of recent invaders, which would be remnants of “bottlenecks” that occurred after single or multiple colonization events, probably from the United States, and ancient populations associated with native forests, partially isolated by events of historical fragmentation.  相似文献   

16.
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).  相似文献   

17.
Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.  相似文献   

18.
Variation in the angiotensinogen gene, AGT, has been associated with variation in plasma angiotensinogen levels. In addition, the T235M polymorphism in the AGT product is associated with an increased risk of essential hypertension in multiple populations, making AGT a good example of a quantitative-trait locus underlying susceptibility to a common disease. To better understand genetic variation in AGT, we sequenced a 14.4-kb genomic region spanning the entire AGT and identified 44 single-nucleotide polymorphisms (SNPs). Forty-two SNPs were observed both in 88 white and in 77 Japanese unselected subjects. Six major haplotypes accounted for most of the variation in this region, indicating less allelic complexity than in many other genomic regions. Although the two populations were found to share all of the major AGT haplotypes, there were substantial differences in haplotype frequencies. Pairwise linkage disequilibrium (LD), measured by the D', r(2), and d(2) statistics, demonstrated a general pattern of decline with increasing distance, but, as expected in a small genomic region, individual LD values were highly variable. LD between T235M and each of the other 39 SNPs was assessed in order to model the usefulness of LD to detect a disease-associated mutation. Among the Japanese subjects, 13 (33%) of the SNPs had r(2) values >0.1, whereas this statistic was substantially higher for the white subjects (occurring in 35/39 [90%]). LD between a hypertension-associated promoter mutation, A-6G, and 39 SNPs was also measured. Similar results were obtained, with 33% of the SNPs showing r(2)>0.1 in the Japanese subjects and 92% of the SNPs showing r(2)>0.1 in the white subjects. This difference, which occurs despite an overall similarity in LD patterns in the two populations, reflects a much higher frequency of the M235-associated haplotype in the white sample. These results have important implications for the usefulness of LD approaches in the mapping of genes underlying susceptibility to complex diseases.  相似文献   

19.
20.
Both present-day and past processes can shape connectivity of populations. Pleistocene vicariant events and dispersal have shaped the present distribution and connectivity patterns of aquatic species in the Indo-Pacific region. In particular, the processes that have shaped distribution of amphidromous goby species still remain unknown. Previous studies show that phylogeographic breaks are observed between populations in the Indian and Pacific Oceans where the shallow Sunda shelf constituted a geographical barrier to dispersal, or that the large spans of open ocean that isolate the Hawaiian or Polynesian Islands are also barriers for amphidromous species even though they have great dispersal capacity. Here we assess past and present genetic structure of populations of two amphidromous fish (gobies of the Sicydiinae) that are widely distributed in the Central West Pacific and which have similar pelagic larval durations. We analysed sections of mitochondrial COI, Cytb and nuclear Rhodospine genes in individuals sampled from different locations across their entire known range. Similar to other Sicydiinae fish, intraspecific mtDNA genetic diversity was high for all species (haplotype diversity between 0.9–0.96). Spatial analyses of genetic variation in Sicyopus zosterophorum demonstrated strong isolation across the Torres Strait, which was a geologically intermittent land barrier linking Australia to Papua New Guinea. There was a clear genetic break between the northwestern and the southwestern clusters in Si. zosterophorumST = 0.67502 for COI) and coalescent analyses revealed that the two populations split at 306 Kyr BP (95% HPD 79–625 Kyr BP), which is consistent with a Pleistocene separation caused by the Torres Strait barrier. However, this geographical barrier did not seem to affect Sm. fehlmanni. Historical and demographic hypotheses are raised to explain the different patterns of population structure and distribution between these species. Strategies aiming to conserve amphidromous fish should consider the presence of cryptic evolutionary lineages to prevent stock depletion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号