首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Yong Wang  Jody Hey 《Genetics》2010,184(2):363-379
Most methods for studying divergence with gene flow rely upon data from many individuals at few loci. Such data can be useful for inferring recent population history but they are unlikely to contain sufficient information about older events. However, the growing availability of genome sequences suggests a different kind of sampling scheme, one that may be more suited to studying relatively ancient divergence. Data sets extracted from whole-genome alignments may represent very few individuals but contain a very large number of loci. To take advantage of such data we developed a new maximum-likelihood method for genomic data under the isolation-with-migration model. Unlike many coalescent-based likelihood methods, our method does not rely on Monte Carlo sampling of genealogies, but rather provides a precise calculation of the likelihood by numerical integration over all genealogies. We demonstrate that the method works well on simulated data sets. We also consider two models for accommodating mutation rate variation among loci and find that the model that treats mutation rates as random variables leads to better estimates. We applied the method to the divergence of Drosophila melanogaster and D. simulans and detected a low, but statistically significant, signal of gene flow from D. simulans to D. melanogaster.IN the study of speciation researchers often inquire of the extent that populations have exchanged genes as they diverged and on the time since populations began to diverge. Answers to questions about historical divergence and gene flow potentially lie in patterns of genetic variation that are found in present day populations. To bridge the gap between population history and current genetic data, population geneticists can make use of a gene genealogy, G, a bifurcating tree that represents the history of ancestry of sampled gene copies. The probability of a particular value of G can be calculated for a particular parameter set using coalescent models. Then given a particular genealogy, genetic variation can be examined using a mutation model that is appropriate for the kind of data being used. Finally by considering multiple values of G, the connection can be made between the population evolution history and the data. A mathematical representation that treats G as a key interstitial variable was given by Felsenstein (1988),(1)where X represents the sequence data, G represents gene genealogy, Ψ represents the set of all possible genealogies, and Θ represents the vector of population parameters included in the model.Unless sample sizes are very small, (1) cannot be solved analytically, and so considerable effort has gone into finding approximate solutions (Kuhner et al. 1995; Griffiths and Marjoram 1996; Wilson and Balding 1998). One general approach is to sample genealogies using a Markov chain Monte Carlo (MCMC) simulation. This is the approach developed by Kuhner and colleagues (Kuhner et al. 1995) and that has since been extended to models with migration (Beerli and Felsenstein 1999, 2001; Nielsen and Wakeley 2001). A general problem for these methods is that they usually require long running times to generate sufficiently large and independent samples, especially when the MCMC simulation is mixing slowly.With fast-improving DNA sequencing techniques, more and more genome sequences are becoming available, and alignments of these whole-genome sequences are a very useful source of information for the study of divergence. However, traditional MCMC methods are likely to be slow on genome-scale data because running times are proportional to the number of loci. To overcome this difficulty Yang developed a likelihood method (Yang 2002) for data sets containing one sample from each of the three populations at every locus. This method uses numerical integration to calculate the likelihood function in Equation 1. By using a very large number of loci, the method can make up for using a very small number of individuals (i.e., genomes).Yang''s method is based on a divergence model that assumes no gene flow between separated populations. However, there are many situations where gene flow may have been occurring and where it is preferable to include it within the divergence model. One model that has been used a lot in this context is the isolation-with-migration (IM) model, which incorporates both population separation and migration (Nielsen and Wakeley 2001). Under an IM model the genealogies include not only some fixed number of coalescent events and speciation events, but also any possible number of migration events. The potential for very large numbers of migration events complicates the sample space of G and makes the numerical integration seemingly impossible. Innan and Watanabe (2006) circumvent this problem by using a recursion method to estimate the coalescent rates on a series of time points. In their recursion, the accuracy in calculating coalescent rate at one time point depends on the accuracy of calculations at previous time points, and this may impair the precision of the overall likelihood calculation. Therefore we developed a method that relies on numerical integration to calculate the likelihood under an IM model. We tested the accuracy of this method on simulated data sets of various sample sizes and applied it to a genome alignment of Drosophila melanogaster and D. simulans (with D. yakuba as an outgroup).  相似文献   

2.
The phylogenetic positions of the families Campynemataceae and Corsiaceae within the order Liliales remains unclear. To date, molecular data from the plastid genome of Corsiaceae has been obtained exclusively from Arachnitis, for which alignment and phylogenetic inference has proved difficult. The extent of gene conservation among mycoheterotrophic species within Corsiaceae remains unknown. To clarify the phylogenetic position of Campynemataceae and Corsiaceae within Liliales, functional plastid-coding genes of species representing both families have been analyzed. Examination of two phylogenetic data sets of plastid genes employing parsimony, maximum-likelihood, and Bayesian inference methods strongly supported both families forming a basal clade to the remaining taxa of Liliales. The first data set consists of five functional plastid-encoded genes (matK, rps7, rps2, rps19, and rpl2) sequenced from Corsia dispar (Corsiaceae). The data set included 31 species representing all families within Liliales, as well as selected orders that are related closely to Liliales (10 outgroup species from Asparagales, Dioscoreales, and Pandanales). The second phylogenetic analysis was based on 75 plastid genes. This data set included 18 species from Liliales, representing major clades within the order, and 10 outgroup species from Asparagales, Dioscoreales, and Pandanales. In this latter data set, Campynemataceae was represented by 60 plastid-encoded genes sequenced from herbarium material of Campynema lineare. A large proportion of the plastid genome of C. dispar was also sequenced and compared to the plastid genomes of photosynthetic plants within Liliales and mycoheterotrophic plants within Asparagales to explore plastid genome reduction. The plastid genome of C. dispar is in the advanced stages of reduction, which signifies its high dependency on mycorrhizal fungi and is suggestive of a loss in photosynthetic ability. Functional plastid genes found in C. dispar may be applicable to other species in Corsiaceae, which will provide a basis for in-depth molecular analyses of interspecies relationships within the family, once molecular data from other members become available.  相似文献   

3.
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.  相似文献   

4.
After migrant chromosomes enter a population, they are progressively sliced into smaller pieces by recombination. Therefore, the length distribution of “migrant tracts” (chromosome segments with recent migrant ancestry) contains information about historical patterns of migration. Here we introduce a theoretical framework describing the migrant tract length distribution and propose a likelihood inference method to test demographic hypotheses and estimate parameters related to a historical change in migration rate. Applying this method to data from the hybridizing subspecies Mus musculus domesticus and M. m. musculus, we find evidence for an increase in the rate of hybridization. Our findings could indicate an evolutionary trajectory toward fusion rather than speciation in these taxa.  相似文献   

5.
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright''s fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.  相似文献   

6.
Hua Chen  Kun Chen 《Genetics》2013,194(3):721-736
The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages nAn(t) follows a Poisson distribution, and as mn, n(n ? 1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.  相似文献   

7.

Background

Little is known about the phylogeography of norovirus (NoV) in China. In norovirus, a clear understanding for the characteristics of tree topology, migration patterns and its demographic dynamics in viral circulation are needed to identify its prevalence trends, which can help us better prepare for its epidemics as well as develop useful control strategies. The aim of this study was to explore the genetic diversity, temporal distribution, demographic dynamics and migration patterns of NoV that circulated in China.

Results

Our analysis showed that two major genogroups, GI and GII, were identified in China, in which GII.3, GII.4 and GII.17 accounted for the majority with a total proportion around 70%. Our demography inference suggested that during the long-term migration process, NoV evolved into multiple lineages and then experienced a selective sweep, which reduced its genetic diversity. The phylogeography results suggested that the norovirus may have originated form the South China (Hong Kong and Guangdong), followed by multicenter direction outbreaks across the country.

Conclusions

From these analyses, we indicate that domestic poultry trade and frequent communications of people from different regions have all contributed to the spread of the NoV in China. Together with recent advances in phylogeographic inference, our researches also provide powerful illustrations of how coalescent-based methods can extract adequate information in molecular epidemiology.
  相似文献   

8.
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.  相似文献   

9.
Ichang papeda (Citrus ichangensis), a wild and endemic perennial plant in Rutaceae, is characterized by the existence of wild and natural populations in southwestern and middle-west China. We analyzed a total of 231 individuals across 16 natural populations using chloroplast SSR markers, nuclear SSR markers, and single-copy nuclear genes. Standard population genetic analyses as well as Bayesian and maximum likelihood models were used to clarify the genetic diversity, population differentiation, barriers to gene flow, bottleneck events, isolation by distance, history migration, demographic history among populations, and phylogeny evolution. The chloroplast and nuclear genome analyses revealed a low level of genetic diversity in C. ichangensis. Clear signals of recent bottlenecks and strong patterns of isolation by distance were detected among different subpopulations, indicating a low extent of historical gene flow for this species and that genetic drift would occur after population differentiation. Bayesian clustering analyses revealed a clear pattern of genetic structure, with one cluster spanning the potential refugia in Wuling Mountains and Ta-pa Mountains, and other two clusters covering a more limited distribution range. The demographic history also supported the scenario that two isolated clusters originated in parallel from the genetic diversity center. Taxonomically, Ichang papeda may be a member of subgenus Citrus. Owing to the complicated topography, the mountainous regions and the Yangtze River have provided long-term stable habitats for C. ichangensis and acted as main barriers for its expansion, which might facilitate the process of speciation. Statistical population models and genetic data indicated strong genetic structure in C. ichangensis, which might result from the restricted gene flow, genetic drift, and population bottlenecks.  相似文献   

10.
11.
Genomic resources are a valuable research tool for understanding and forecasting the response of forest trees to global change and for developing science-based management strategies. Yet, many ecologically relevant tree species still lack such resources. The conifer genus Juniperus contains >?70 species that are widely distributed through the Northern Hemisphere, including several keystone species that form extensive forests in arid landscapes. To date, single-nucleotide polymorphism (SNP) markers have not been described for this ecologically important tree genus and the few described simple sequence repeat (SSR) markers result insufficient for performing reliable population demographic inference. Here, we report on the successful development of 19 new SSR and 147 SNP markers for Phoenician juniper (Juniperus phoenicea ssp. turbinata), a species widely distributed along the coasts of the Mediterranean Basin. We calculate a series of population genetic diversity estimates for each set of markers independently and for both sets combined. Our comparison shows that the higher per-locus information content of SSRs makes them the marker of choice for parentage and assignment studies, whereas SNPs provide more reliable demographic inferences (Ne and detection of a recent bottleneck). We also test and confirm the transferability of the new set of SNP markers to the closely related tetraploid species J. thurifera. Finally, we perform an orthology analysis with two gymnosperm model species to search for SNPs linked with functional genes.  相似文献   

12.
Viruses utilize a diverse array of mechanisms to deliver their genomes into hosts. While great strides have been made in understanding the genome delivery of eukaryotic and prokaryotic viruses, little is known about archaeal virus genome delivery and the associated particle changes. The Sulfolobus turreted icosahedral virus (STIV) is a double-stranded DNA (dsDNA) archaeal virus that contains a host-derived membrane sandwiched between the genome and the proteinaceous capsid shell. Using cryo-electron microscopy (cryo-EM) and different biochemical treatments, we identified three viral morphologies that may correspond to biochemical disassembly states of STIV. One of these morphologies was subtly different from the previously published 27-Å-resolution electron density that was interpreted with the crystal structure of the major capsid protein (MCP). However, these particles could be analyzed at 12.5-Å resolution by cryo-EM. Comparing these two structures, we identified the location of multiple proteins forming the large turret-like appendages at the icosahedral vertices, observed heterogeneous glycosylation of the capsid shell, and identified mobile MCP C-terminal arms responsible for tethering and releasing the underlying viral membrane to and from the capsid shell. Collectively, our studies allow us to propose a fusogenic mechanism of genome delivery by STIV, in which the dismantled capsid shell allows for the fusion of the viral and host membranes and the internalization of the viral genome.Viruses are valuable biological tools for manipulating the cellular processes of their hosts, and they can also serve as model systems for describing macromolecular interactions through the analysis of their architecture. The Sulfolobus turreted icosahedral virus (STIV) is an archaeal virus that infects Sulfolobus solfataricus (phylum Crenarchaeota). STIV is a lytic virus that was isolated from an acidic hot spring (>80°C and pH of <3) in Yellowstone National Park (27). Hence, STIV is an important model for studying the biochemical requirements to sustain life in extreme physicochemical conditions and has the potential to become a tool for the biochemical and genetic manipulation of its host—much like bacteriophages lambda, P22, and phi29 have done for their respective hosts.Prior structural studies of STIV using cryo-electron microscopy (cryo-EM), X-ray crystallography, and proteomics have described large pentameric turret-like structures, with petal-like protrusions emanating from their central shafts (27). The T=31d capsid shell is composed of trimeric capsomers exhibiting pseudo-hexagonal symmetry, in which each of the three capsomer subunits donates two viral jelly rolls with its β-sheets normal to the capsid surface (15, 27). Capsomers surrounding the icosahedral 3-fold axes, and their neighboring subunits, make direct contact with the viral membrane via a highly basic C-terminal helix of each subunit (15, 23). Surrounding the base of the turrets are proteins that make contact with the capsid shell and a host-derived viral membrane (15). The viral membrane and the enclosed viral genome are referred to as the lipid core.The capsid architecture of STIV and the crystal structure of its major capsid protein (MCP) are strikingly similar to those of the bacteriophages PRD1, Bam35, and PM2, the alga virus PBCV-1, and the mammalian adenovirus. This similarity suggests that these viruses share an ancestral virus (2, 4, 7, 15, 25). Given the evolutionary relationship shared between STIV and PRD1, we postulated that the large turret-like vertices of STIV were used to inject the viral genome into the Sulfolobus host—a genome delivery mechanism employed by PRD1 (27).A recent report by Brumfield et al. (5) describes gross cellular ultrastructural changes induced in the Sulfolobus host during STIV infection and release. The authors identified distinct particles that appear to be assembly intermediates of STIV en route to maturation. From these intermediates the authors proposed a general mechanism of capsid assembly, in which MCP subunits and minor capsid proteins (mCPs) coassemble with the lipid membrane to form a lipid-enclosed protein vesicle. These vesicles are spherical and lack the double-stranded DNA (dsDNA) genome and turret-like appendages at the vertices.While these studies confirm an empty procapsid intermediate, the corresponding molecular mechanism associated with assembly and disassembly remains to be understood. Moreover, little is known about STIV or other archaeal virus genome delivery into the host. To obtain a better understanding of the molecular mechanism of STIV architecture and its role in genome delivery, we characterized three distinct morphologies of STIV particles using cryo-EM. An image reconstruction of one of these revealed the absence of a number of constituents decorating the STIV capsid. Hence, for simplicity, we refer to the previously reported image reconstruction (27) as “decorated” and the new image reconstruction reported here as “undecorated.” Reference-free two-dimensional (2D) class averages of the second identified morphology reveal a partially decorated STIV lipid core. The third identified morphology corresponds to the isolated STIV lipid core. Taken together, our analyses indicate that these morphologies correspond to different disassembly intermediates of STIV that can be isolated in vitro and help provide a picture of the STIV capsid architecture. Additionally, these morphologies allow us to propose an alternative possible mechanism of genome delivery.  相似文献   

13.
The central questions asked in whole-genome association studies are how to locate associated regions in the genome and how to estimate the significance of these findings. Researchers usually do this by testing each SNP separately for association and then applying a suitable correction for multiple-hypothesis testing. However, SNPs are correlated by the unobserved genealogy of the population, and a more powerful statistical methodology would attempt to take this genealogy into account. Leveraging the genealogy in association studies is challenging, however, because the inference of the genealogy from the genotypes is a computationally intensive task, in particular when recombination is modeled, as in ancestral recombination graphs. Furthermore, if large numbers of genealogies are imputed from the genotypes, the power of the study might decrease if these imputed genealogies create an additional multiple-hypothesis testing burden. Indeed, we show in this paper that several existing methods that aim to address this problem suffer either from low power or from a very high false-positive rate; their performance is generally not better than the standard approach of separate testing of SNPs. We suggest a new genealogy-based approach, CAMP (coalescent-based association mapping), that takes into account the trade-off between the complexity of the genealogy and the power lost due to the additional multiple hypotheses. Our experiments show that CAMP yields a significant increase in power relative to that of previous methods and that it can more accurately locate the associated region.  相似文献   

14.
15.

Background

High-yielding cultivars of rice (Oryza sativa L.) have been developed in Japan from crosses between overseas indica and domestic japonica cultivars. Recently, next-generation sequencing technology and high-throughput genotyping systems have shown many single-nucleotide polymorphisms (SNPs) that are proving useful for detailed analysis of genome composition. These SNPs can be used in genome-wide association studies to detect candidate genome regions associated with economically important traits. In this study, we used a custom SNP set to identify introgressed chromosomal regions in a set of high-yielding Japanese rice cultivars, and we performed an association study to identify genome regions associated with yield.

Results

An informative set of 1152 SNPs was established by screening 14 high-yielding or primary ancestral cultivars for 5760 validated SNPs. Analysis of the population structure of high-yielding cultivars showed three genome types: japonica-type, indica-type and a mixture of the two. SNP allele frequencies showed several regions derived predominantly from one of the two parental genome types. Distinct regions skewed for the presence of parental alleles were observed on chromosomes 1, 2, 7, 8, 11 and 12 (indica) and on chromosomes 1, 2 and 6 (japonica). A possible relationship between these introgressed regions and six yield traits (blast susceptibility, heading date, length of unhusked seeds, number of panicles, surface area of unhusked seeds and 1000-grain weight) was detected in eight genome regions dominated by alleles of one parental origin. Two of these regions were near Ghd7, a heading date locus, and Pi-ta, a blast resistance locus. The allele types (i.e., japonica or indica) of significant SNPs coincided with those previously reported for candidate genes Ghd7 and Pi-ta.

Conclusions

Introgression breeding is an established strategy for the accumulation of QTLs and genes controlling high yield. Our custom SNP set is an effective tool for the identification of introgressed genome regions from a particular genetic background. This study demonstrates that changes in genome structure occurred during artificial selection for high yield, and provides information on several genomic regions associated with yield performance.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-346) contains supplementary material, which is available to authorized users.  相似文献   

16.
The purpose of this work was to evaluate the evolutionary history of Campylobacter coli isolates derived from multiple host sources and to use microarray comparative genomic hybridization to assess whether there are particular genes comprising the dispensable portion of the genome that are more commonly associated with certain host species. Genotyping and ClonalFrame analyses of an expanded 16-gene multilocus sequence typing (MLST) data set involving 85 isolates from 4 different hosts species tentatively supported the development of C. coli host-preferred groups and suggested that recombination has played various roles in their diversification; however, geography could not be excluded as a contributing factor underlying the history of some of the groups. Population genetic analyses of the C. coli pubMLST database by use of STRUCTURE suggested that isolates from swine form a relatively homogeneous genetic group, that chicken and human isolates show considerable genetic overlap, that isolates from ducks and wild birds have similarity with environmental water samples and that turkey isolates have a connection with human infection similar to that observed for chickens. Analysis of molecular variance (AMOVA) was performed on these same data and suggested that host species was a significant factor in explaining genetic variation and that macrogeography (North America, Europe, and the United Kingdom) was not. The microarray comparative genomic hybridization data suggested that there were combinations of genes more commonly associated with isolates derived from particular hosts and, combined with the results on evolutionary history, suggest that this is due to a combination of common ancestry in some cases and lateral gene transfer in others.Campylobacter species are a leading bacterial cause of gastroenteritis within the United States and throughout much of the rest of the developed world. According to the CDC, there are an estimated 2 million to 4 million cases of Campylobacter illness each year in the United States (37). Campylobacter jejuni is generally recognized as the predominant cause of campylobacteriosis, responsible for approximately 90% of reported cases, while the majority of the remainder are caused by the closely related sister species Campylobacter coli (27). Not surprisingly, therefore, the majority of research on Campylobacter has centered on C. jejuni, and C. coli is a less studied organism.A multilocus sequence typing (MLST) scheme of C. jejuni was first developed by Dingle et al. (13) on the basis of the genome sequence of C. jejuni NCTC 11168. There have also been a number of studies using the genome sequence data to develop microarrays for gene presence/absence determination across strains of C. jejuni and to identify the core genome components for the species (6, 15, 32, 33, 42, 43, 53, 57). Although C. coli is responsible for fewer food-borne illnesses than C. jejuni, the impact of C. coli is still substantial, and there is also evidence that C. coli may carry higher levels of resistance to some antibiotics (1). C. coli and C. jejuni also tend to differ in their relative prevalences in animal host species and various environmental sources (4, 48, 58), and there is some evidence that both taxa may include groups of host-specific putative ecotype strains (7, 36, 38, 39, 52, 56). At present, there is only a single draft genome sequence available for C. coli, and there are no microarray comparative genomic hybridization data for C. coli strains. Thus, there is no information on intraspecies variability in gene presence/absence in C. coli and how such variability might correlate with host species.The purpose of this work was to develop and apply an expanded 16-locus MLST genotyping scheme to evaluate the evolutionary history of Campylobacter coli isolates derived from multiple host sources and to use microarray comparative genomic hybridization to assess whether there are particular genes comprising the dispensable portion of the genome that are more commonly associated with isolates derived from different host species.  相似文献   

17.
Natural hybridization may influence population fitness and responsiveness to natural selection, in particular in oceanic island systems. In previous studies, interspecific hybridization was detected between the Galápagos iguana species Amblyrhynchus cristatus and Conolophus subcristatus. Further, possible hybridization was also suggested to occur between C. subcristatus and C. marthae at Wolf Volcano on Isabela Island. In this work, we investigated the level of hybridization between C. subcristatus and C. marthae using a large set of microsatellite markers. Results indicated strong differentiation between species and, while we cannot rule out hybridization in the past, there is no evidence of ongoing hybridization between C. marthae and C. subcristatus. These findings have great importance for the design of management actions and conservation plans, in particular for the purposes of a head start program. However, because potential for hybridization may change under different environmental and demographic conditions, genetic characterization of newly marked individuals of C. marthae and C. subcristatus in Wolf Volcano should not be interrupted.  相似文献   

18.
The complete mitochondrial genome sequence of the holoparasitic isopod Gyge ovalis (Shiino, 1939) has been determined. The mitogenome is 14,268 bp in length and contains 34 genes: 13 protein-coding genes, two ribosomal RNA, 19 tRNA and a control region. Three tRNA genes (trnE, trnI and trnS1) are missing. Most of the tRNA genes show secondary structures which derive from the usual cloverleaf pattern except for trnC which is characterised by the loss of the DHU-arm. Compared to the isopod ground pattern and Eurydice pulchra Leach, 1815 (suborder Cymothoida Wägele, 1989), the genome of G. ovalis shows few differences, with changes only around the control region. However, the genome of G. ovalis is very different from that of non-cymothoidan isopods and reveals that the gene order evolution in isopods is less conservative compared to other crustaceans. Phylogenic trees were constructed using maxiumum likelihood and Bayesian inference analyses based on 13 protein-coding genes. The results do not support the placement of G. ovalis with E. pulchra and Bathynomus sp. in the same suborder; rather, G. ovalis appears to have a closer relationship to Ligia oceanica (Linnaeus, 1767), but this result suggests a need for more data and further analysis. Nevertheless, these results cast doubt that Epicaridea Latreille, 1825 can be placed as an infraorder within the suborder Cymothoida, and Epicaridea appears to also deserve subordinal rank. Further development of robust phylogenetic relationships across Isopoda Latreille, 1817 will require more genetic data from a greater diversity of taxa belonging to all isopod suborders.  相似文献   

19.
A major goal of population genomics is to reconstruct the history of natural populations and to infer the neutral and selective scenarios that can explain the present-day polymorphism patterns. However, the separation between neutral and selective hypotheses has proven hard, mainly because both may predict similar patterns in the genome. This study focuses on the development of methods that can be used to distinguish neutral from selective hypotheses in equilibrium and nonequilibrium populations. These methods utilize a combination of statistics on the basis of the site frequency spectrum (SFS) and linkage disequilibrium (LD). We investigate the patterns of genetic variation along recombining chromosomes using a multitude of comparisons between neutral and selective hypotheses, such as selection or neutrality in equilibrium and nonequilibrium populations and recurrent selection models. We perform hypothesis testing using the classical P-value approach, but we also introduce methods from the machine-learning field. We demonstrate that the combination of SFS- and LD-based statistics increases the power to detect recent positive selection in populations that have experienced past demographic changes.GENOMES contain information related to the history of natural populations. Past neutral and selective processes may have left footprints in the genome. Recent advances in population genetics aim to understand the patterns of genetic diversity and identify events that have led to genetic adaptations. Among them, positive selection has been a focus of many recent studies (Harr et al. 2002; Kim and Stephan 2002; Glinka et al. 2003; Akey et al. 2004; Orengo and Aguadé 2004). Their goal is to (i) provide evidence of positive selection, (ii) estimate the strength and the rate of selection, and (iii) localize the targets of selection. These objectives form the basis of a long-term pursuit, which is the understanding of the molecular basis of adaptation of populations in a changing environment.Positive selection can cause genetic hitchhiking when a beneficial mutation spreads in the population (Maynard Smith and Haigh 1974). When a strongly beneficial mutation occurs and spreads in a population, linked neutral or slightly deleterious variants hitchhike with it, and their frequency increases. According to Maynard Smith and Haigh''s model, three patterns are generated locally around the position of the beneficial mutation. First, the level of variability will be reduced since standing variation of the population that is not linked to the beneficial allele vanishes, and tightly linked polymorphisms may fix (Kaplan et al. 1989; Stephan et al. 1992). Second, the site frequency spectrum (SFS), which describes the frequency of allelic variants, shifts from its neutral expectation toward rare and high-frequency derived variants (Braverman et al. 1995; Fay and Wu 2000). The third signature describes the emergence of specific linkage disequilibrium (LD) patterns around the target of positive selection, such as an elevated level of LD in the early phase of the fixation process of the beneficial mutation and a decay of LD across the selected site at the end of the selective phase (Kim and Nielsen 2004; Stephan et al. 2006).The availability of genome-wide SNP data has made possible the scanning of genomes and the identification of loci that may have been targets of recent selective events. Several approaches have been developed within the last years that can detect the molecular signatures of positive selection (Kim and Stephan 2002; Jensen et al. 2005; Nielsen et al. 2005). While the methods of Kim and Stephan (2002) and Jensen et al. (2005) are designed to analyze subgenomic SNP data, the approach of Nielsen et al. (2005) can be applied to both subgenomic and whole-genome data (reviewed in Pavlidis et al. 2008). For this reason we concentrate here on the latter procedure. This method, called SweepFinder, calculates the probability P(x) that a polymorphism of multiplicity x is linked to a beneficial mutation using a simple selective model and the SFS prior to the selective event. Then, for each location in the genome it compares a selective with a neutral model assuming independence between the SNPs, therefore calculating the composite likelihood ratio Λ. Thus, it identifies regions where the likelihood of the selective sweep is greater than that of the neutral model using the maximum value ΛMAX of Λ.The ω-statistic, developed by Kim and Nielsen (2004), detects specific LD patterns caused by genetic hitchhiking (described above). In the study by Kim and Nielsen (2004) the maximum value of the ω-statistic was used to identify the targets of selective sweeps. Later, Jensen et al. (2007) studied its performance in separating demographic from selective scenarios. An important result by Jensen et al. (2007) is the demonstration that for demographic parameters relevant to nonequilibrium populations (such as the cosmopolitan populations of Drosophila melanogaster) the ω-statistic can distinguish between neutral and selective scenarios. This article further develops SweepFinder and the ω-statistic such that they can eventually be applied to whole-genome SNP data sets that have been collected from nonequilibrium populations. In particular, populations undergoing population-size bottlenecks are of interest as these size changes may confound the patterns of selective sweeps (Barton 1998). For this reason we use the following approach: first, we theoretically analyze the genealogies of bottlenecked populations under neutrality and show to what extent they resemble the genealogies of single hitchhiking (SHH) events. We also point out the importance of high-frequency-derived variants in the identification of selective sweeps. Second, we study the statistical properties of SweepFinder and the ω-statistic separately and in combination. As the main result, we demonstrate that the combination of these two methods (that include both SFS and LD information) increases the power for detecting recent SHH events in nonequilibrium populations, in particular when machine-learning techniques are employed. Third we analyze the performance of SweepFinder and the ω-statistic in the detection of recurrent hitchhiking (RHH) events.  相似文献   

20.

Background

Turkey is a crossroads of major population movements throughout history and has been a hotspot of cultural interactions. Several studies have investigated the complex population history of Turkey through a limited set of genetic markers. However, to date, there have been no studies to assess the genetic variation at the whole genome level using whole genome sequencing. Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32 × -48×).

Results

We show that the genetic variation of the contemporary Turkish population clusters with South European populations, as expected, but also shows signatures of relatively recent contribution from ancestral East Asian populations. In addition, we document a significant enrichment of non-synonymous private alleles, consistent with recent observations in European populations. A number of variants associated with skin color and total cholesterol levels show frequency differentiation between the Turkish populations and European populations. Furthermore, we have analyzed the 17q21.31 inversion polymorphism region (MAPT locus) and found increased allele frequency of 31.25% for H1/H2 inversion polymorphism when compared to European populations that show about 25% of allele frequency.

Conclusion

This study provides the first map of common genetic variation from 16 western Asian individuals and thus helps fill an important geographical gap in analyzing natural human variation and human migration. Our data will help develop population-specific experimental designs for studies investigating disease associations and demographic history in Turkey.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-963) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号