首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 946 毫秒
1.
Bertels F  Rainey PB 《PLoS genetics》2011,7(6):e1002132
Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.  相似文献   

2.
A survey of bacterial insertion sequences using IScan   总被引:4,自引:0,他引:4  
Bacterial insertion sequences (ISs) are the simplest kinds of bacterial mobile DNA. Evolutionary studies need consistent IS annotation across many different genomes. We have developed an open-source software package, IScan, to identify bacterial ISs and their sequence elements—inverted and target direct repeats—in multiple genomes using multiple flexible search parameters. We applied IScan to 438 completely sequenced bacterial genomes and 20 IS families. The resulting data show that ISs within a genome are extremely similar, with a mean synonymous divergence of Ks = 0.033. Our analysis substantially extends previously available information, and suggests that most ISs have entered bacterial genomes recently. By implication, their population persistence may depend on horizontal transfer. We also used IScan's ability to analyze the statistical significance of sequence similarity among many IS inverted repeats. Although the inverted repeats of insertion sequences are evolutionarily highly flexible parts of ISs, we show that this ability can be used to enrich a dataset for ISs that are likely to be functional. Applied to the thousands of genomes that will soon be available, IScan could be used for many purposes, such as mapping the evolutionary history and horizontal transfer patterns of different ISs.  相似文献   

3.
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.  相似文献   

4.
Herpesviridae is a diverse family of large and complex pathogens whose genomes are extremely difficult to sequence. This is particularly true for clinical samples, and if the virus, host, or both genomes are being sequenced for the first time. Although herpesviruses are known to occasionally integrate in host genomes, and can also be inherited in a Mendelian fashion, they are notably absent from the genomic fossil record comprised of endogenous viral elements (EVEs). Here, we combine paleovirological and metagenomic approaches to both explore the constituent viral diversity of mammalian genomes and search for endogenous herpesviruses. We describe the first endogenous herpesvirus from the genome of the Philippine tarsier, belonging to the Roseolovirus genus, and characterize its highly defective genome that is integrated and flanked by unambiguous host DNA. From a draft assembly of the aye-aye genome, we use bioinformatic tools to reveal over 100,000 bp of a novel rhadinovirus that is the first lemur gammaherpesvirus, closely related to Kaposi''s sarcoma-associated virus. We also identify 58 genes of Pan paniscus lymphocryptovirus 1, the bonobo equivalent of human Epstein-Barr virus. For each of the viruses, we postulate gene function via comparative analysis to known viral relatives. Most notably, the evidence from gene content and phylogenetics suggests that the aye-aye sequences represent the most basal known rhadinovirus, and indicates that tumorigenic herpesviruses have been infecting primates since their emergence in the late Cretaceous. Overall, these data show that a genomic fossil record of herpesviruses exists despite their extremely large genomes, and expands the known diversity of Herpesviridae, which will aid the characterization of pathogenesis. Our analytical approach illustrates the benefit of intersecting evolutionary approaches with metagenomics, genetics and paleovirology.  相似文献   

5.
Mutation rates vary both within and between bacterial species, and understanding what drives this variation is essential for understanding the evolutionary dynamics of bacterial populations. In this study, we investigate two factors that are predicted to influence the mutation rate: ecology and genome size. We conducted mutation accumulation experiments on eight strains of the emerging zoonotic pathogen Streptococcus suis. Natural variation within this species allows us to compare tonsil carriage and invasive disease isolates, from both more and less pathogenic populations, with a wide range of genome sizes. We find that invasive disease isolates have repeatedly evolved mutation rates that are higher than those of closely related carriage isolates, regardless of variation in genome size. Independent of this variation in overall rate, we also observe a stronger bias towards G/C to A/T mutations in isolates from more pathogenic populations, whose genomes tend to be smaller and more AT-rich. Our results suggest that ecology is a stronger correlate of mutation rate than genome size over these timescales, and that transitions to invasive disease are consistently accompanied by rapid increases in mutation rate. These results shed light on the impact that ecology can have on the adaptive potential of bacterial pathogens.  相似文献   

6.
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.  相似文献   

7.
Bacterial viruses are widespread and abundant across natural and engineered habitats. They influence ecosystem functioning through interactions with their hosts. Laboratory studies of phage–host pairs have advanced our understanding of phenotypic and genetic diversification in bacteria and phages. However, the dynamics of phage–host interactions have been seldom recorded in complex natural environments. We conducted an observational metagenomic study of the dynamics of interaction between Gordonia and their phages using a three-year data series of samples collected from a full-scale wastewater treatment plant. The aim was to obtain a comprehensive picture of the coevolution dynamics in naturally evolving populations at relatively high time resolution. Coevolution was followed by monitoring changes over time in the CRISPR loci of Gordonia metagenome-assembled genome, and reciprocal changes in the viral genome. Genome-wide analysis indicated low strain variability of Gordonia, and almost clonal conservation of the trailer end of the CRISPR loci. Incorporation of newer spacers gave rise to multiple coexisting bacterial populations. The host population carrying a shorter CRISPR locus that contain only ancestral spacers, which has not acquired newer spacers against the coexisting phages, accounted for more than half of the total host abundance in the majority of samples. Phages genome co-evolved by introducing directional changes, with no preference for mutations within the protospacer and PAM regions. Metagenomic reconstruction of time-resolved variants of host and viral genomes revealed how the complexity at the population level has important consequences for bacteria-phage coexistence.Subject terms: Microbial ecology, Metagenomics, Bacteriophages  相似文献   

8.
How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of >13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence.  相似文献   

9.
Many genomic sequences have been recently published for bacteria that can replicate only within eukaryotic hosts. Comparisons of genomic features with those of closely related bacteria retaining free-living stages indicate that rapid evolutionary change often occurs immediately after host restriction. Typical changes include a large increase in the frequency of mobile elements in the genome, chromosomal rearrangements mediated by recombination among these elements, pseudogene formation, and deletions of varying size. In anciently host-restricted lineages, the frequency of insertion sequence elements decreases as genomes become extremely small and strictly clonal. These changes represent a general syndrome of genome evolution, which is observed repeatedly in host-restricted lineages from numerous phylogenetic groups. Considerable variation also exists, however, in part reflecting unstudied aspects of the population structure and ecology of host-restricted bacterial lineages.  相似文献   

10.
In this work, we isolated and characterized 14 bacteriophages that infect Rhizobium etli. They were obtained from rhizosphere soil of bean plants from agricultural lands in Mexico using an enrichment method. The host range of these phages was narrow but variable within a collection of 48 R. etli strains. We obtained the complete genome sequence of nine phages. Four phages were resistant to several restriction enzymes and in vivo cloning, probably due to nucleotide modifications. The genome size of the sequenced phages varied from 43 kb to 115 kb, with a median size of ∼45 to 50 kb. A large proportion of open reading frames of these phage genomes (65 to 70%) consisted of hypothetical and orphan genes. The remainder encoded proteins needed for phage morphogenesis and DNA synthesis and processing, among other functions, and a minor percentage represented genes of bacterial origin. We classified these phages into four genomic types on the basis of their genomic similarity, gene content, and host range. Since there are no reports of similar sequences, we propose that these bacteriophages correspond to novel species.  相似文献   

11.
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of new genomic data?” Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity—even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.  相似文献   

12.
Pseudomonas syringae is a common foliar bacterium responsible for many important plant diseases. We studied the population structure and dynamics of the core genome of P. syringae via multilocus sequencing typing (MLST) of 60 strains, representing 21 pathovars and 2 nonpathogens, isolated from a variety of plant hosts. Seven housekeeping genes, dispersed around the P. syringae genome, were sequenced to obtain 400 to 500 nucleotides per gene. Forty unique sequence types were identified, with most strains falling into one of four major clades. Phylogenetic and maximum-likelihood analyses revealed a remarkable degree of congruence among the seven genes, indicating a common evolutionary history for the seven loci. MLST and population genetic analyses also found a very low level of recombination. Overall, mutation was found to be approximately four times more likely than recombination to change any single nucleotide. A skyline plot was used to study the demographic history of P. syringae. The species was found to have maintained a constant population size over time. Strains were also found to remain genetically homogeneous over many years, and when isolated from sites as widespread as the United States and Japan. An analysis of molecular variance found that host association explains only a small proportion of the total genetic variation in the sample. These analyses reveal that with respect to the core genome, P. syringae is a highly clonal and stable species that is endemic within plant populations, yet the genetic variation seen in these genes only weakly predicts host association.  相似文献   

13.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

14.
Gut microbial diversity is thought to reflect the co‐evolution of microbes and their hosts as well as current host‐specific attributes such as genetic background and environmental setting. To explore interactions among these parameters, we characterized variation in gut microbiome composition of California voles (Microtus californicus) across a contact zone between two recently diverged lineages of this species. Because this contact zone contains individuals with mismatched mitochondrial‐nuclear genomes (cybrids), it provides an important opportunity to explore how different components of the genotype contribute to gut microbial diversity. Analyses of bacterial 16S rRNA sequences and joint species distribution modelling revealed that host genotypes and genetic differentiation among host populations together explained more than 50% of microbial community variation across our sampling transect. The ranked importance (most to least) of factors contributing to gut microbial diversity in our study populations were: genome‐wide population differentiation, local environmental conditions, and host genotypes. However, differences in microbial communities among vole populations (β‐diversity) did not follow patterns of lineage divergence (i.e., phylosymbiosis). Instead, among‐population variation was best explained by the spatial distribution of hosts, as expected if the environment is a primary source of gut microbial diversity (i.e., dispersal limitation hypothesis). Across the contact zone, several bacterial taxa differed in relative abundance between the two parental lineages as well as among individuals with mismatched mitochondrial and nuclear genomes. Thus, genetic divergence among host lineages and mitonuclear genomic mismatches may also contribute to microbial diversity by altering interactions between host genomes and gut microbiota (i.e., hologenome speciation hypothesis).  相似文献   

15.
There are no doubts that transposable elements (TEs) have greatly influenced genomes evolution. They have, however, evolved in different ways throughout mammals, plants, and invertebrates. In mammals they have been shown to be widely present but with low transposition activity; in plants they are responsible for large increases in genome size. In Drosophila, despite their low amount, transposition seems to be higher. Therefore, to understand how these elements have evolved in different genomes and how host genomes have proposed to go around them, are major questions on genome evolution. We analyzed sequences of the retrotransposable elements 412 in natural populations of the Drosophila simulans and D. melanogaster species that greatly differ in their amount of TEs. We identified new subfamilies of this element that were the result of mutation or insertion-deletion process, but also of interfamily recombinations. These new elements were well conserved in the D. simulans natural populations. The new regulatory regions produced by recombination could give rise to new elements able to overcome host control of transposition and, thus, become potential genome invaders.  相似文献   

16.

Background

Pseudomonas aeruginosa is an important opportunistic pathogen responsible for many infections in hospitalized and immunocompromised patients. Previous reports estimated that approximately 10% of its 6.6 Mbp genome varies from strain to strain and is therefore referred to as “accessory genome”. Elements within the accessory genome of P. aeruginosa have been associated with differences in virulence and antibiotic resistance. As whole genome sequencing of bacterial strains becomes more widespread and cost-effective, methods to quickly and reliably identify accessory genomic elements in newly sequenced P. aeruginosa genomes will be needed.

Results

We developed a bioinformatic method for identifying the accessory genome of P. aeruginosa. First, the core genome was determined based on sequence conserved among the completed genomes of twelve reference strains using Spine, a software program developed for this purpose. The core genome was 5.84 Mbp in size and contained 5,316 coding sequences. We then developed an in silico genome subtraction program named AGEnt to filter out core genomic sequences from P. aeruginosa whole genomes to identify accessory genomic sequences of these reference strains. This analysis determined that the accessory genome of P. aeruginosa ranged from 6.9-18.0% of the total genome, was enriched for genes associated with mobile elements, and was comprised of a majority of genes with unknown or unclear function. Using these genomes, we showed that AGEnt performed well compared to other publically available programs designed to detect accessory genomic elements. We then demonstrated the utility of the AGEnt program by applying it to the draft genomes of two previously unsequenced P. aeruginosa strains, PA99 and PA103.

Conclusions

The P. aeruginosa genome is rich in accessory genetic material. The AGEnt program accurately identified the accessory genomes of newly sequenced P. aeruginosa strains, even when draft genomes were used. As P. aeruginosa genomes become available at an increasingly rapid pace, this program will be useful in cataloging the expanding accessory genome of this bacterium and in discerning correlations between phenotype and accessory genome makeup. The combination of Spine and AGEnt should be useful in defining the accessory genomes of other bacterial species as well.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-737) contains supplementary material, which is available to authorized users.  相似文献   

17.
Wolbachia endosymbionts are widespread in arthropods and are generally considered reproductive parasites, inducing various phenotypes including cytoplasmic incompatibility, parthenogenesis, feminization and male killing, which serve to promote their spread through populations. In contrast, Wolbachia infecting filarial nematodes that cause human diseases, including elephantiasis and river blindness, are obligate mutualists. DNA purification methods for efficient genomic sequencing of these unculturable bacteria have proven difficult using a variety of techniques. To efficiently capture endosymbiont DNA for studies that examine the biology of symbiosis, we devised a parallel strategy to an earlier array-based method by creating a set of SureSelect? (Agilent) 120-mer target enrichment RNA oligonucleotides (“baits”) for solution hybrid selection. These were designed from Wolbachia complete and partial genome sequences in GenBank and were tiled across each genomic sequence with 60 bp overlap. Baits were filtered for homology against host genomes containing Wolbachia using BLAT and sequences with significant host homology were removed from the bait pool. Filarial parasite Brugia malayi DNA was used as a test case, as the complete sequence of both Wolbachia and its host are known. DNA eluted from capture was size selected and sequencing samples were prepared using the NEBNext® Sample Preparation Kit. One-third of a 50 nt paired-end sequencing lane on the HiSeq? 2000 (Illumina) yielded 53 million reads and the entirety of the Wolbachia genome was captured. We then used the baits to isolate more than 97.1 % of the genome of a distantly related Wolbachia strain from the crustacean Armadillidium vulgare, demonstrating that the method can be used to enrich target DNA from unculturable microbes over large evolutionary distances.  相似文献   

18.
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.  相似文献   

19.
Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD) and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs) are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand the effects of inbreeding.  相似文献   

20.
How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, previous studies have used models of transposition–selection equilibrium that assume a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence. By conditioning on the age of an individual TE insertion allele (inferred by the number of unique substitutions that have occurred within the particular TE sequence since insertion), we determine the probability distribution of the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this nonequilibrium neutral model, we are able to explain ∼80% of the variance in TE insertion allele frequencies based on age alone. Controlling for both nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications, or other copy number variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号