首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 37 毫秒
1.
Comparative genomic approaches are useful in identifying molecular differences between organisms. Currently available methods fail to identify small changes in genomes, such as expansion of short repetitive motifs and to analyse divergent sequences. In this report, we describe an anchor-based whole genome comparison (ABWGC) method. ABWGC is based on random sampling of anchor sequences from one genome, followed by analysis of sampled and homologous regions from the target genome. The method was applied to compare two strains of Mycobacterium tuberculosis CDC1551 and H37Rv. ABWGC was able to identify a total of 104 indels including 20 expansion of short repetitive sequences and five recombination events. It included 18 new unidentified genomic differences. ABWGC also identified 188 SNPs including eight new ones. The method was also used to compare M. tuberculosis H37Rv and M. avium genomes. ABWGC was able to correctly pick 1002 additional indels (size>100nt) between the two organisms in contrast to MUMmer, a popular tool for comparative genomics. ABWGC was able to identify correctly repeat expansion and indels in a set of simulated sequences. The study also revealed important role of small repeat expansion in the evolution of M. tuberculosis strains.  相似文献   

2.

Background

Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes.

Methodology/Principal Findings

We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes.

Conclusion

The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.  相似文献   

3.
The availability of nearly complete moso bamboo genome sequences permits the detailed discovery and cross-species comparison of transposable elements (TEs) between Bambusoideae and other Poaceae species at the whole genome level. Long terminal repeat retroelements (LTR-retroelements) are the single largest components of most plant genomes and can substantially impact the genome in various ways. Through a combination of structure- and homology-based approaches, we initially investigated 982 LTR-retroelement families comprising 2,004,644 LTR-retroelement sequences, which accounted for more than 40% of the moso bamboo genome. Further analysis revealed that the ratio of solo LTRs to intact elements (S/I) in moso bamboo is significantly low (approximately 0.28:1), indicating that bamboo LTR-retroelements might have undergone relatively low frequencies of unequal recombination and illegitimate recombination. Phylogenetic analysis revealed four Ty1-copia and five Ty3-gypsy evolutionary lineages that were present before the divergence of eudicot and monocot species, but the scales and timeframes within which they proliferated significantly varied across families and lineages. Insertion time estimates showed that LTR-retroelements were amplified for approximately 0~3 million years and had longer periods of activity than those of rice and Arabidopsis. These findings suggest that the expansion of LTR-retroelements might be responsible for host large genome size during moso bamboo evolution.  相似文献   

4.
Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler''s deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.  相似文献   

5.

Background

Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp.

Results

We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes.

Conclusions

Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-561) contains supplementary material, which is available to authorized users.  相似文献   

6.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

7.
8.

Background and Aims

It is known that the miniature inverted-repeat terminal element (MITE) preferentially inserts into low-copy-number sequences or genic regions. Characterization of the second largest subunit of low-copy nuclear RNA polymerase II (RPB2) has indicated that MITE and indels have shaped the homoeologous RPB2 loci in the St and H genome of Eymus species in Triticeae. The aims of this study was to determine if there is MITE in the RPB2 gene in Hordeum genomes, and to compare the gene evolution of RPB2 with other diploid Triticeae species. The sequences were used to reconstruct the phylogeny of the genus Hordeum.

Methods

RPB2 regions from all diploid species of Hordeum, one tetraploid species (H. brevisubulatum) and ten accessions of diploid Triticeae species were amplified and sequenced. Parsimony analysis of the DNA dataset was performed in order to reveal the phylogeny of Hordeum species.

Key Results

MITE was detected in the Xu genome. A 27–36 bp indel sequence was found in the I and Xu genome, but deleted in the Xa and some H genome species. Interestingly, the indel length in H genomes corresponds well to their geographical distribution. Phylogenetic analysis of the RPB2 sequences positioned the H and Xa genome in one monophyletic group. The I and Xu genomes are distinctly separated from the H and Xa ones. The RPB2 data also separated all New World H genome species except H. patagonicum ssp. patagonicum from the Old World H genome species.

Conclusions

MITE and large indels have shaped the RPB2 loci between the Xu and H, I and Xa genomes. The phylogenetic analysis of the RPB2 sequences confirmed the monophyly of Hordeum. The maximum-parsimony analysis demonstrated the four genomes to be subdivided into two groups.Key words: Molecular evolution, RPB2, Hordeum, transposable element, phylogeny  相似文献   

9.

Background

Francisella tularensis subspecies tularensis and holarctica are pathogenic to humans, whereas the two other subspecies, novicida and mediasiatica, rarely cause disease. To uncover the factors that allow subspecies tularensis and holarctica to be pathogenic to humans, we compared their genome sequences with the genome sequence of Francisella tularensis subspecies novicida U112, which is nonpathogenic to humans.

Results

Comparison of the genomes of human pathogenic Francisella strains with the genome of U112 identifies genes specific to the human pathogenic strains and reveals pseudogenes that previously were unidentified. In addition, this analysis provides a coarse chronology of the evolutionary events that took place during the emergence of the human pathogenic strains. Genomic rearrangements at the level of insertion sequences (IS elements), point mutations, and small indels took place in the human pathogenic strains during and after differentiation from the nonpathogenic strain, resulting in gene inactivation.

Conclusion

The chronology of events suggests a substantial role for genetic drift in the formation of pseudogenes in Francisella genomes. Mutations that occurred early in the evolution, however, might have been fixed in the population either because of evolutionary bottlenecks or because they were pathoadaptive (beneficial in the context of infection). Because the structure of Francisella genomes is similar to that of the genomes of other emerging or highly pathogenic bacteria, this evolutionary scenario may be shared by pathogens from other species.  相似文献   

10.

Background

Streptomyces are widespread bacteria that contribute to the terrestrial carbon cycle and produce the majority of clinically useful antibiotics. While interspecific genomic diversity has been investigated among Streptomyces, information is lacking on intraspecific genomic diversity. Streptomyces pratensis has high rates of homologous recombination but the impact of such gene exchange on genome evolution and the evolution of natural product gene clusters remains uncharacterized.

Results

We report draft genome sequences of four S. pratensis strains and compare to the complete genome of Streptomyces flavogriseus IAF-45-CD (=ATCC 33331), a strain recently reclassified to S. pratensis. Despite disparate geographic origins, the genomes are highly similar with 85.9% of genes present in the core genome and conservation of all natural product gene clusters. Natural products include a novel combination of carbapenem and beta-lactamase inhibitor gene clusters. While high intraspecies recombination rates abolish the phylogenetic signal across the genome, intraspecies recombination is suppressed in two genomic regions. The first region is centered on an insertion/deletion polymorphism and the second on a hybrid NRPS-PKS gene. Finally, two gene families accounted for over 25% of the divergent genes in the core genome. The first includes homologs of bldB (required for spore development and antibiotic production) while the second includes homologs of an uncharacterized protein with a helix-turn-helix motif (hpb). Genes from these families co-occur with fifteen pairs spread across the genome. These genes have evidence for co-evolution of co-localized pairs, supporting previous assertions that these genes may function akin to a toxin-antitoxin system.

Conclusions

S. pratensis genomes are highly similar with exceptional levels of recombination which erase phylogenetic signal among strains of the species. This species has a large core genome and variable terminal regions that are smaller than those found in interspecies comparisons. There is no geographic differentiation between these strains, but there is evidence for local linkage disequilibrium affecting two genomic regions. We have also shown further observational evidence that the DUF397-HTH (bldB and hpb) are a novel toxin-antitoxin pair.  相似文献   

11.

Key message

This is the first clear evidence of duplication and/or triplication of large chromosomal regions in a genome of a Genistoid legume, the most basal clade of Papilionoid legumes.

Abstract

Lupinus angustifolius L. (narrow-leafed lupin) is the most widely cultivated species of Genistoid legume, grown for its high-protein grain. As a member of this most basal clade of Papilionoid legumes, L. angustifolius serves as a useful model for exploring legume genome evolution. Here, we report an improved reference genetic map of L. angustifolius comprising 1207 loci, including 299 newly developed Diversity Arrays Technology markers and 54 new gene-based PCR markers. A comparison between the L. angustifolius and Medicago truncatula genomes was performed using 394 sequence-tagged site markers acting as bridging points between the two genomes. The improved L. angustifolius genetic map, the updated M. truncatula genome assembly and the increased number of bridging points between the genomes together substantially enhanced the resolution of synteny and chromosomal colinearity between these genomes compared to previous reports. While a high degree of syntenic fragmentation was observed that was consistent with the large evolutionary distance between the L. angustifolius and M. truncatula genomes, there were striking examples of conserved colinearity of loci between these genomes. Compelling evidence was found of large-scale duplication and/or triplication in the L. angustifolius genome, consistent with one or more ancestral polyploidy events.  相似文献   

12.
13.

Background

Trypanosoma cruzi is the causal agent of Chagas Disease. Recently, the genomes of representative strains from two major evolutionary lineages were sequenced, allowing the construction of a detailed genetic diversity map for this important parasite. However this map is focused on coding regions of the genome, leaving a vast space of regulatory regions uncharacterized in terms of their evolutionary conservation and/or divergence.

Methodology

Using data from the hybrid CL Brener and Sylvio X10 genomes (from the TcVI and TcI Discrete Typing Units, respectively), we identified intergenic regions that share a common evolutionary ancestry, and are present in both CL Brener haplotypes (TcII-like and TcIII-like) and in the TcI genome; as well as intergenic regions that were conserved in only two of the three genomes/haplotypes analyzed. The genetic diversity in these regions was characterized in terms of the accumulation of indels and nucleotide changes.

Principal Findings

Based on this analysis we have identified i) a core of highly conserved intergenic regions, which remained essentially unchanged in independently evolving lineages; ii) intergenic regions that show high diversity in spite of still retaining their corresponding upstream and downstream coding sequences; iii) a number of defined sequence motifs that are shared by a number of unrelated intergenic regions. A fraction of indels explains the diversification of some intergenic regions by the expansion/contraction of microsatellite-like repeats.  相似文献   

14.
A large number of wheat (Triticum aestivum) and barley (Hordeum vulgare) varieties have evolved in agricultural ecosystems since domestication. Because of the large, repetitive genomes of these Triticeae crops, sequence information is limited and molecular differences between modern varieties are poorly understood. To study intraspecies genomic diversity, we compared large genomic sequences at the Lr34 locus of the wheat varieties Chinese Spring, Renan, and Glenlea, and diploid wheat Aegilops tauschii. Additionally, we compared the barley loci Vrs1 and Rym4 of the varieties Morex, Cebada Capa, and Haruna Nijo. Molecular dating showed that the wheat D genome haplotypes diverged only a few thousand years ago, while some barley and Ae. tauschii haplotypes diverged more than 500,000 years ago. This suggests gene flow from wild barley relatives after domestication, whereas this was rare or absent in the D genome of hexaploid wheat. In some segments, the compared haplotypes were very similar to each other, but for two varieties each at the Rym4 and Lr34 loci, sequence conservation showed a breakpoint that separates a highly conserved from a less conserved segment. We interpret this as recombination breakpoints of two ancient haplotypes, indicating that the Triticeae genomes are a heterogeneous and variable mosaic of haplotype fragments. Analysis of insertions and deletions showed that large events caused by transposable element insertions, illegitimate recombination, or unequal crossing over were relatively rare. Most insertions and deletions were small and caused by template slippage in short homopolymers of only a few base pairs in size. Such frequent polymorphisms could be exploited for future molecular marker development.  相似文献   

15.

Background

Multipartite mitochondrial genomes are very rare in animals but have been found previously in two insect orders with highly rearranged genomes, the Phthiraptera (parasitic lice), and the Psocoptera (booklice/barklice).

Results

We provide the first report of a multipartite mitochondrial genome architecture in a third order with highly rearranged genomes: Thysanoptera (thrips). We sequenced the complete mitochondrial genomes of two divergent members of the Scirtothrips dorsalis cryptic species complex. The East Asia 1 species has the single circular chromosome common to animals while the South Asia 1 species has a genome consisting of two circular chromosomes. The fragmented South Asia 1 genome exhibits extreme chromosome size asymmetry with the majority of genes on the large, 14.28 kb, chromosome and only nad6 and trnC on the 0.92 kb mini-circle chromosome. This genome also features paralogous control regions with high similarity suggesting a very recent origin of the nad6 mini-circle chromosome in the South Asia 1 cryptic species.

Conclusions

Thysanoptera, along with the other minor paraenopteran insect orders should be considered models for rapid mitochondrial genome evolution, including fragmentation. Continued use of these models will facilitate a greater understanding of recombination and other mitochondrial genome evolutionary processes across eukaryotes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1672-4) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.

Background

Knowledge of the origins, distribution, and inheritance of variation in the malaria parasite (Plasmodium falciparum) genome is crucial for understanding its evolution; however the 81% (A+T) genome poses challenges to high-throughput sequencing technologies. We explore the viability of the Roche 454 Genome Sequencer FLX (GS FLX) high throughput sequencing technology for both whole genome sequencing and fine-resolution characterization of genetic exchange in malaria parasites.

Results

We present a scheme to survey recombination in the haploid stage genomes of two sibling parasite clones, using whole genome pyrosequencing that includes a sliding window approach to predict recombination breakpoints. Whole genome shotgun (WGS) sequencing generated approximately 2 million reads, with an average read length of approximately 300 bp. De novo assembly using a combination of WGS and 3 kb paired end libraries resulted in contigs ≤ 34 kb. More than 8,000 of the 24,599 SNP markers identified between parents were genotyped in the progeny, resulting in a marker density of approximately 1 marker/3.3 kb and allowing for the detection of previously unrecognized crossovers (COs) and many non crossover (NCO) gene conversions throughout the genome.

Conclusions

By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, we captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. This study is the first to resequence progeny clones to examine fine structure of COs and NCOs in malaria parasites.  相似文献   

18.
19.
To explore the mitochondrial genes of the Cruciferae family, the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated. The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes, three rRNA genes and 17 tRNA genes. The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length, which may mediate genome reorga-nization into two sub-genomic circles, with predicted sizes of 124.8 kb and 115.0 kb, respectively. Furthermore, gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype), together with six other re-ported mitotypes. The cruciferous mitochondrial genomes have maintained almost the same set of functional genes. Compared with Cycas taitungensis (a representative gymnosperm), the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes, but acquired six chloroplast-like tRNAs. Among the Cruciferae, to maintain the same set of genes that are necessary for mitochondrial function, the exons of the genes have changed at the lowest rates, as indicated by the numbers of single nucleotide polymorphisms. The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved. Evolutionary events, such as mutations, genome reorganizations and sequence insertions or deletions (indels), have resulted in the non- conserved ORFs in the cruciferous mitochondrial genomes, which is becoming significantly different among mitotypes. This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family. It revealed significant variation in ORFs and the causes of such variation.  相似文献   

20.
Bov-A2 is a retroposon that is widely distributed among the genomes of ruminants (e.g., cow, deer, giraffe, pronghorn, musk deer, and chevrotain). This retroposon is composed of two monomers, called Bov-A units, which are joined by a linker sequence. The structure and origin of Bov-A2 has been well characterized but a genome-level exploration of this retroposon has not been implemented. In this study we performed an extensive search for Bov-A2 using all available genome sequence data on Bos taurus. We found unique Bov-A2-derived sequences that were longer than Bov-A2 due to amplification of three to six Bov-A units arranged in tandem. Detailed analysis of these elongated Bov-A2-derived sequences revealed that they originated through unequal crossing-over of Bov-A2. We found a large number of these elongated Bov-A2-derived sequences in cattle genomes, indicating that unequal crossing-over of Bov-A2 occurred very frequently. We found that this type of elongation is not observed in wild bovine and is therefore specific to the domesticated cattle genome. Furthermore, at specific loci, the number of Bov-A units was also polymorphic between alleles, implying that the elongation of Bov-A units might have occurred very recently. For these reasons, we speculate that genomic instability in bovine genomes can lead to extensive unequal crossing-over of Bov-A2 and levels of polymorphism might be generated in part by repeated outbreeding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号