首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Short‐read sequencing technologies have in principle made it feasible to draw detailed inferences about the recent history of any organism. In practice, however, this remains challenging due to the difficulty of genome assembly in most organisms and the lack of statistical methods powerful enough to discriminate between recent, nonequilibrium histories. We address both the assembly and inference challenges. We develop a bioinformatic pipeline for generating outgroup‐rooted alignments of orthologous sequence blocks from de novo low‐coverage short‐read data for a small number of genomes, and show how such sequence blocks can be used to fit explicit models of population divergence and admixture in a likelihood framework. To illustrate our approach, we reconstruct the Pleistocene history of an oak‐feeding insect (the oak gallwasp Biorhiza pallida), which, in common with many other taxa, was restricted during Pleistocene ice ages to a longitudinal series of southern refugia spanning the Western Palaearctic. Our analysis of sequence blocks sampled from a single genome from each of three major glacial refugia reveals support for an unexpected history dominated by recent admixture. Despite the fact that 80% of the genome is affected by admixture during the last glacial cycle, we are able to infer the deeper divergence history of these populations. These inferences are robust to variation in block length, mutation model and the sampling location of individual genomes within refugia. This combination of de novo assembly and numerical likelihood calculation provides a powerful framework for estimating recent population history that can be applied to any organism without the need for prior genetic resources.  相似文献   

2.
Whelan S  Goldman N 《Genetics》2004,167(4):2027-2043
Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.  相似文献   

3.
In addition to the nuclear genome, organisms have organelle genomes. Most of the DNA present in eukaryotic organisms is located in the cell nucleus. Chloroplasts have independent genomes which are inherited from the mother. Duplicated genes are common in the genomes of all organisms. It is believed that gene duplication is the most important step for the origin of genetic variation, leading to the creation of new genes and new gene functions. Despite the fact that extensive gene duplications are rare among the chloroplast genome, gene duplication in the chloroplast genome is an essential source of new genetic functions and a mechanism of neo-evolution. The events of gene transfer between the chloroplast genome and nuclear genome via duplication and subsequent recombination are important processes in evolution. The duplicated gene or genome in the nucleus has been the subject of several recent reviews. In this review, we will briefly summarize gene duplication and evolution in the chloroplast genome. Also, we will provide an overview of gene transfer events between chloroplast and nuclear genomes.  相似文献   

4.
Inference of bacterial microevolution using multilocus sequence data   总被引:5,自引:0,他引:5  
Didelot X  Falush D 《Genetics》2007,175(3):1251-1266
We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.  相似文献   

5.
Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.  相似文献   

6.
Our thesis is that the DNA composition and structure of genomes are selected in part by mutation bias (GC pressure) and in part by ecology. To illustrate this point, we compare and contrast the oligonucleotide composition and the mosaic structure in 36 complete genomes and in 27 long genomic sequences from archaea and eubacteria. We report the following findings (1) High-GC-content genomes show a large underrepresentation of short distances between G(n) and C(n) homopolymers with respect to distances between A(n) and T(n) homopolymers; we discuss selection versus mutation bias hypotheses. (2) The oligonucleotide compositions of the genomes of Neisseria (meningitidis and gonorrhoea), Helicobacter pylori and Rhodobacter capsulatus are more biased than the other sequenced genomes. (3) The genomes of free-living species or nonchronic pathogens show more mosaic-like structure than genomes of chronic pathogens or intracellular symbionts. (4) Genome mosaicity of intracellular parasites has a maximum corresponding to the average gene length; in the genomes of free-living and nonchronic pathogens the maximum occurs at larger length scales. This suggests that free-living species can incorporate large pieces of DNA from the environment, whereas for intracellular parasites there are recombination events between homologous genes. We discuss the consequences in terms of evolution of genome size. (5) Intracellular symbionts and obligate pathogens show small, but not zero, amount of chromosome mosaicity, suggesting that recombination events occur in these species.  相似文献   

7.
Evidence is growing that homologous recombination is a powerful source of genetic variability among closely related free-living bacteria. Here we investigate the extent of recombination among housekeeping genes of the endosymbiotic bacteria Wolbachia. Four housekeeping genes, gltA, dnaA, ftsZ, and groEL, were sequenced from a sample of 22 strains belonging to supergroups A and B. Sequence alignments were searched for recombination within and between genes using phylogenetic inference, analysis of genetic variation, and four recombination detection programs (MaxChi, Chimera, RDP, and Geneconv). Independent analyses indicate no or weak intragenic recombination in ftsZ, dnaA, and groEL. Intragenic recombination affects gltA, with a clear evidence of horizontal DNA transfers within and between divergent Wolbachia supergroups. Intergenic recombination was detected between all pairs of genes, suggesting either a horizontal exchange of a genome portion encompassing several genes or multiple recombination events involving smaller tracts along the genome. Overall, the observed pattern is compatible with pervasive recombination. Such results, combined with previous evidence of recombination in a surface protein, phage, and IS elements, support an unexpected chimeric origin of Wolbachia strains, with important implications for Wolbachia phylogeny and adaptation of these obligate intracellular bacteria in arthropods.  相似文献   

8.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.  相似文献   

9.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

10.
11.
The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared.  相似文献   

12.
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.  相似文献   

13.
Resolving the structure of the eukaryotic tree of life remains one of the most important and challenging tasks facing biologists. The notion of six eukaryotic 'supergroups' has recently gained some acceptance, and several papers in 2007 suggest that resolution of higher taxonomic levels is possible. However, in organisms that acquired photosynthesis via secondary (i.e. eukaryote-eukaryote) endosymbiosis, the host nuclear genome is a mosaic of genes derived from two (or more) nuclei, a fact that is often overlooked in studies attempting to reconstruct the deep evolutionary history of eukaryotes. Accurate identification of gene transfers and replacements involving eukaryotic donor and recipient genomes represents a potentially formidable challenge for the phylogenomics community as more protist genomes are sequenced and concatenated data sets grow.  相似文献   

14.
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate.  相似文献   

15.
We recently described the presence of large chromosomal segments resulting from independent horizontal gene transfer (HGT) events in the genome of Saccharomyces cerevisiae strains, mostly of wine origin. We report here evidence for the amplification of one of these segments, a 17 kb DNA segment from Zygosaccharomyces bailii, in the genome of S. cerevisiae strains. The copy number, organization and location of this region differ considerably between strains, indicating that the insertions are independent and that they are post-HGT events. We identified eight different forms in 28 S. cerevisiae strains, mostly of wine origin, with up to four different copies in a single strain. The organization of these forms and the identification of an autonomously replicating sequence functional in S. cerevisiae, strongly suggest that an extrachromosomal circular DNA (eccDNA) molecule serves as an intermediate in the amplification of the Z. bailii region in yeast genomes. We found little or no sequence similarity at the breakpoint regions, suggesting that the insertions may be mediated by nonhomologous recombination. The diversity between these regions in S. cerevisiae represents roughly one third the divergence among the genomes of wine strains, which confirms the recent origin of this event, posterior to the start of wine strain expansion. This is the first report of a circle-based mechanism for the expansion of a DNA segment, mediated by nonhomologous recombination, in natural yeast populations.  相似文献   

16.
MOTIVATION: Recombination can be a prevailing drive in shaping genome evolution. RAT (Recombination Analysis Tool) is a Java-based tool for investigating recombination events in any number of aligned sequences (protein or DNA) of any length (short viral sequences to full genomes). It is an uncomplicated and intuitive application and allows the user to view only the regions of sequence alignments they are interested in. RESULTS: RAT was applied to viral sequences. Its utility was demonstrated through the detection of a known recombinant of HIV and a detailed analysis of Noroviruses, the most common cause of viral gastroenteritis in humans. AVAILABILITY: RAT, along with a user's guide, is freely available from http://jic-bioinfo.bbsrc.ac.uk/bioinformatics-research/staff/graham_etherington/RAT.htm.  相似文献   

17.
Although still not much understood, the universal reverse complement symmetry in genomes may contain much information about the genome. In this article, under the hypothesis that recombination rate variations may be related to the high order DNA structure, we studied the association between local recombination rates and local symmetry levels in mouse, rat and human. We found significant negative correlations between recombination rates and reverse complement compositional symmetries in these three organisms. This negative correlation pattern also held at individual chromosome levels when data only from each individual chromosome was analyzed.  相似文献   

18.
Multipartite viruses contain more than one distinctive genome component, and the origin of multipartite viruses has been suggested to evolve from a non-segmented wild-type virus. To explore whether recombination also plays a role in the evolution of the genomes of multipartite viruses, we developed a systematic approach that employs motif-finding tools to detect conserved motifs from divergent genomic regions and applies statistical approaches to select high-confidence motifs. The information that this approach provides helps us understand the evolution of viruses. In this study, we compared our motif-based strategy with current alignment-based recombination-detecting methods and applied our methods to the analysis of multipartite single-stranded plant DNA viruses, including bipartite begomoviruses, Banana bunchy top virus (BBTV) (consisting of 6 genome components) and Faba bean necrotic yellows virus (FBNYV) (consisting of 8 genome components). Our analysis revealed that recombination occurred between genome components in some begomoviruses, BBTV and FBNYV. Our data also show that several unusual recombination events have contributed to the evolution of BBTV genome components. We believe that similar approaches can be applied to resolve the evolutionary history of other viruses.  相似文献   

19.
For >20 years, the enigmatic behavior of plant mitochondrial genomes has been well described but not well understood. Chimeric genes appear, and occasionally are differentially replicated or expressed, with significant effects on plant phenotype, most notably on male fertility, yet the mechanisms of DNA replication, chimera formation, and recombination have remained elusive. Using mutations in two important genes of mitochondrial DNA metabolism, we have observed reproducible asymmetric recombination events occurring at specific locations in the mitochondrial genome. Based on these experiments and existing models of double-strand break repair, we propose a model for plant mitochondrial DNA replication, chimeric gene formation, and the illegitimate recombination events that lead to stoichiometric changes. We also address the physiological and developmental effects of aberrant events in mitochondrial genome maintenance, showing that mitochondrial genome rearrangements, when controlled, influence plant reproduction, but when uncontrolled, lead to aberrant growth phenotypes and dramatic reduction of the cell cycle.  相似文献   

20.
Live bacteria and archaea have been isolated from several rock salt deposits of up to hundreds of millions of years of age from all around the world. A key factor affecting their longevity is the ability to keep their genomic DNA intact, for which efficient repair mechanisms are needed. Polyploid microbes are known to have an increased resistance towards mutations and DNA damage, and it has been suggested that microbes from deeply buried rock salt would carry several copies of their genomes. Here, cultivable halophilic microbes were isolated from a surface sterilized middle-late Eocene (38–41 million years ago) rock salt sample, drilled from the depth of 800 m at Yunying salt mine, China. Eight unique isolates were obtained, which represented two haloarchaeal genera, Halobacterium and Halolamina. We used real-time PCR to show that our isolates are polyploid, with genome copy numbers of 11–14 genomes per cell in exponential growth phase. The ploidy level was slightly downregulated in stationary growth phase, but the cells still had an average genome copy number of 6–8. The polyploidy of halophilic archaea living in ancient rock salt might be a factor explaining how these organisms are able to overcome the challenge of prolonged survival during their entombment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号