期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient and accurate whole genome assembly and methylome profiling of E. coli

Jason G Powers Victor J Weigman Jenny Shu John M Pufky Donald Cox Patrick Hurban 《BMC genomics》2013,14(1)

Background

With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes.

Results

Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing.

Conclusion

Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users. 相似文献

2.

Comparative Chloroplast Genomes of Camellia Species

Jun-Bo Yang Shi-Xiong Yang Hong-Tao Li Jing Yang De-Zhu Li 《PloS one》2013,8(8)

Background

Camellia , comprising more than 200 species, is a valuable economic commodity due to its enormously popular commercial products: tea leaves, flowers, and high-quality edible oils. It is the largest and most important genus in the family Theaceae. However, phylogenetic resolution of the species has proven to be difficult. Consequently, the interspecies relationships of the genus Camellia are still hotly debated. Phylogenomics is an attractive avenue that can be used to reconstruct the tree of life, especially at low taxonomic levels.

Methodology/Principal Findings

Seven complete chloroplast (cp) genomes were sequenced from six species representing different subdivisions of the genus Camellia using Illumina sequencing technology. Four junctions between the single-copy segments and the inverted repeats were confirmed and genome assemblies were validated by PCR-based product sequencing using 123 pairs of primers covering preliminary cp genome assemblies. The length of the Camellia cp genome was found to be about 157kb, which contained 123 unique genes and 23 were duplicated in the IR regions. We determined that the complete Camellia cp genome was relatively well conserved, but contained enough genetic differences to provide useful phylogenetic information. Phylogenetic relationships were analyzed using seven complete cp genomes of six Camellia species. We also identified rapidly evolving regions of the cp genome that have the potential to be used for further species identification and phylogenetic resolution.

Conclusions/Significance

In this study, we wanted to determine if analyzing completely sequenced cp genomes could help settle these controversies of interspecies relationships in Camellia . The results demonstrate that cp genome data are beneficial in resolving species definition because they indicate that organelle-based “barcodes”, can be established for a species and then used to unmask interspecies phylogenetic relationships. It reveals that phylogenomics based on cp genomes is an effective approach for achieving phylogenetic resolution between Camellia species. 相似文献

3.

Prokaryotic photosynthesis and phototrophy illuminated 总被引：1，自引：0，他引：1

Bryant DA Frigaard NU 《Trends in microbiology》2006,14(11):488-496

Genome sequencing projects are revealing new information about the distribution and evolution of photosynthesis and phototrophy. Although coverage of the five phyla containing photosynthetic prokaryotes (Chlorobi, Chloroflexi, Cyanobacteria, Proteobacteria and Firmicutes) is limited and uneven, genome sequences are (or soon will be) available for >100 strains from these phyla. Present knowledge of photosynthesis is almost exclusively based on data derived from cultivated species but metagenomic studies can reveal new organisms with novel combinations of photosynthetic and phototrophic components that have not yet been described. Metagenomics has already shown how the relatively simple phototrophy based upon rhodopsins has spread laterally throughout Archaea, Bacteria and eukaryotes. In this review, we present examples that reflect recent advances in phototroph biology as a result of insights from genome and metagenome sequencing. 相似文献

4.

Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes

Timothy T. Perkins Chin Yen Tay Fanny Thirriot Barry Marshall 《PloS one》2013,8(6)

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori. 相似文献

5.

Ma-LMM01 infecting toxic Microcystis aeruginosa illuminates diverse cyanophage genome strategies 总被引：1，自引：0，他引：1

Yoshida T Nagasaki K Takashima Y Shirai Y Tomaru Y Takao Y Sakamoto S Hiroishi S Ogata H 《Journal of bacteriology》2008,190(5):1762-1772

Cyanobacteria and their phages are significant microbial components of the freshwater and marine environments. We identified a lytic phage, Ma-LMM01, infecting Microcystis aeruginosa, a cyanobacterium that forms toxic blooms on the surfaces of freshwater lakes. Here, we describe the first sequenced freshwater cyanomyovirus genome of Ma-LMM01. The linear, circularly permuted, and terminally redundant genome has 162,109 bp and contains 184 predicted protein-coding genes and two tRNA genes. The genome exhibits no colinearity with previously sequenced genomes of cyanomyoviruses or other Myoviridae. The majority of the predicted genes have no detectable homologues in the databases. These findings indicate that Ma-LMM01 is a member of a new lineage of the Myoviridae family. The genome lacks homologues for the photosynthetic genes that are prevalent in marine cyanophages. However, it has a homologue of nblA, which is essential for the degradation of the major cyanobacteria light-harvesting complex, the phycobilisomes. The genome codes for a site-specific recombinase and two prophage antirepressors, suggesting that it has the capacity to integrate into the host genome. Ma-LMM01 possesses six genes, including three coding for transposases, that are highly similar to homologues found in cyanobacteria, suggesting that recent gene transfers have occurred between Ma-LMM01 and its host. We propose that the Ma-LMM01 NblA homologue possibly reduces the absorption of excess light energy and confers benefits to the phage living in surface waters. This phage genome study suggests that light is central in the phage-cyanobacterium relationships where the viruses use diverse genetic strategies to control their host's photosynthesis. 相似文献

6.

Outer membrane efflux protein (OMEP) is a suitable molecular marker for resolving the phylogeny and taxonomic status of closely related cyanobacteria

下载免费PDF全文

Dzhemal Moten Tsvetelina Batsalova Diyana Basheva Rumen Mladenov Balik Dzhambazov Ivanka Teneva 《Phycological Research》2018,66(1):31-36

Taxonomy of Cyanobacteria, the oldest phototrophic prokaryotes, is problematic for many years due to their simple morphology, high variability and adaptability to diverse ecological niches. After introduction of the polyphasic approach, which is based on the combination of several criteria (molecular sequencing, morphological and ecological), the whole classification system of these organisms is subject to reorganization. The aim of this study was to evaluate whether the outer membrane efflux protein (OMEP) sequences can be used as a molecular marker for resolving the phylogeny and taxonomic status of closely related cyanobacteria. We have performed phylogenetic analyses based on the amino acid sequences of the OMEP and the DNA sequences of the 16S rRNA gene from 86 cyanobacterial species/strains with completely sequenced genomes. Phylogenetic trees based on the OMEP showed that most of the cyanobacterial species/strains belonging to different genera are clustered in separate clades supported by high bootstrap values. Comparing the OMEP trees with the 16S rDNA tree clearly showed that the OMEP is more suitable marker in resolving phylogenetic relationships within Cyanobacteria at generic and species level. 相似文献

7.

En route to a genome-based classification of Archaea and Bacteria?

H.-P. Klenk M. Göker 《Systematic and applied microbiology》2010

Given the considerable promise whole-genome sequencing offers for phylogeny and classification, it is surprising that microbial systematics and genomics have not yet been reconciled. This might be due to the intrinsic difficulties in inferring reasonable phylogenies from genomic sequences, particularly in the light of the significant amount of lateral gene transfer in prokaryotic genomes. However, recent studies indicate that the species tree and the hierarchical classification based on it are still meaningful concepts, and that state-of-the-art phylogenetic inference methods are able to provide reliable estimates of the species tree to the benefit of taxonomy. Conversely, we suspect that the current lack of completely sequenced genomes for many of the major lineages of prokaryotes and for most type strains is a major obstacle in progress towards a genome-based classification of microorganisms. We conclude that phylogeny-driven microbial genome sequencing projects such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) project are likely to rectify this situation. 相似文献

8.

The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies

Catherine E. Yoshida Peter Kruczkiewicz Chad R. Laing Erika J. Lingohr Victor P. J. Gannon John H. E. Nash Eduardo N. Taboada 《PloS one》2016,11(1)

For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/. 相似文献

9.

Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

James F. Denton Jose Lugo-Martinez Abraham E. Tucker Daniel R. Schrider Wesley C. Warren Matthew W. Hahn 《PLoS computational biology》2014,10(12)

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. 相似文献

10.

Lifestyle Transitions in Fusarioid Fungi are Frequent and Lack Clear Genomic Signatures

Rowena Hill Richard J.A. Buggs Dang Toan Vu Ester Gaya 《Molecular biology and evolution》2022,39(4)

The fungal genus Fusarium (Ascomycota) includes well-known plant pathogens that are implicated in diseases worldwide, and many of which have been genome sequenced. The genus also encompasses other diverse lifestyles, including species found ubiquitously as asymptomatic-plant inhabitants (endophytes). Here, we produced structurally annotated genome assemblies for five endophytic Fusarium strains, including the first whole-genome data for Fusarium chuoi. Phylogenomic reconstruction of Fusarium and closely related genera revealed multiple and frequent lifestyle transitions, the major exception being a monophyletic clade of mutualist insect symbionts. Differential codon usage bias and increased codon optimisation separated Fusarium sensu stricto from allied genera. We performed computational prediction of candidate secreted effector proteins (CSEPs) and carbohydrate-active enzymes (CAZymes)—both likely to be involved in the host–fungal interaction—and sought evidence that their frequencies could predict lifestyle. However, phylogenetic distance described gene variance better than lifestyle did. There was no significant difference in CSEP, CAZyme, or gene repertoires between phytopathogenic and endophytic strains, although we did find some evidence that gene copy number variation may be contributing to pathogenicity. Large numbers of accessory CSEPs (i.e., present in more than one taxon but not all) and a comparatively low number of strain-specific CSEPs suggested there is a limited specialisation among plant associated Fusarium species. We also found half of the core genes to be under positive selection and identified specific CSEPs and CAZymes predicted to be positively selected on certain lineages. Our results depict fusarioid fungi as prolific generalists and highlight the difficulty in predicting pathogenic potential in the group. 相似文献

11.

Molecular signatures for the main phyla of photosynthetic bacteria and their subgroups

Radhey S. Gupta 《Photosynthesis research》2010,104(2-3):357-372

The bacterial groups corresponding to different photosynthetic prokaryotes are presently identified mainly on the basis of their branching in phylogenetic trees. The availability of genome sequences is enabling identification of many molecular signatures that are specific for different groups of photosynthetic bacteria. Our recent work has identified large numbers of signatures consisting of conserved inserts or deletions (indels) in widely distributed proteins, as well as whole proteins that are specific for various sequenced species/strains from Cyanobacteria, Chlorobi, and Proteobacteria phyla. Based upon these signatures, it is now possible to identify/distinguish bacteria from these phyla of photosynthetic bacteria as well as their major subclades in clear molecular terms. The use of these signatures in conjunction with phylogenomic analyses, summarized here, is leading to a holistic picture concerning the branching order and evolutionary relationships among the above groups of photosynthetic bacteria. Although detailed studies in this regard have not yet been carried on Chloroflexi and Heliobacteriaceae, we have identified some conserved indels that are specific for these groups. Some of the conserved indels for the photosynthetic bacteria are present in photosynthesis-related proteins. These include a 4 aa insert in the pyruvate flavodoxin/ferridoxin oxidoreductase that is specific for the genus Chloroflexus, a 2 aa insert in magnesium chelatase that is uniquely shared by all Cyanobacteria except the deepest branching Clade A (Gloebacterales), a 6 aa insert in an A-type flavoprotein that is specific for various marine unicellular Cyanobacteria, a 2 aa insert in heme oxygenase that is specific for various Prochlorococcus strains/isolates, and 1 aa deletion in the protein protochlorophyllide oxidoreductase that is commonly shared by various Prochlorococcus strains except the deepest branching isolates MIT 9303 and MIT 9313. The identified CSIs are located in the structures of these proteins in surface loops indicating that they may be important in mediating protein–protein interactions. The cellular functions of these conserved indels, or most of the signature proteins are presently unknown, but they provide valuable means for discovering novel properties that are unique to different groups of photosynthetic bacteria. 相似文献

12.

A pangenomic study of Bacillus thuringiensis

Fang Y Li Z Liu J Shu C Wang X Zhang X Yu X Zhao D Liu G Hu S Zhang J Al-Mssallem I Yu J 《遗传学报》2011,38(12):567-576

Bacillus thuringiensis (B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins (Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis; the latter has a closed pangenome.We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms (SNPs).The identities among orthologous genes are greater than 84.5％ and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains,before all the gaps are closed,are very useful for pangenomic studies. 相似文献

13.

SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome

Kai Bernd Stadermann Bernd Weisshaar Daniela Holtgr?we 《BMC bioinformatics》2015,16(1)

Background

Third generation sequencing methods, like SMRT (Single Molecule, Real-Time) sequencing developed by Pacific Biosciences, offer much longer read length in comparison to Next Generation Sequencing (NGS) methods. Hence, they are well suited for de novo- or re-sequencing projects. Sequences generated for these purposes will not only contain reads originating from the nuclear genome, but also a significant amount of reads originating from the organelles of the target organism. These reads are usually discarded but they can also be used for an assembly of organellar replicons. The long read length supports resolution of repetitive regions and repeats within the organelles genome which might be problematic when just using short read data. Additionally, SMRT sequencing is less influenced by GC rich areas and by long stretches of the same base.

Results

We describe a workflow for a de novo assembly of the sugar beet (Beta vulgaris ssp. vulgaris) chloroplast genome sequence only based on data originating from a SMRT sequencing dataset targeted on its nuclear genome. We show that the data obtained from such an experiment are sufficient to create a high quality assembly with a higher reliability than assemblies derived from e.g. Illumina reads only. The chloroplast genome is especially challenging for de novo assembling as it contains two large inverted repeat (IR) regions. We also describe some limitations that still apply even though long reads are used for the assembly.

Conclusions

SMRT sequencing reads extracted from a dataset created for nuclear genome (re)sequencing can be used to obtain a high quality de novo assembly of the chloroplast of the sequenced organism. Even with a relatively small overall coverage for the nuclear genome it is possible to collect more than enough reads to generate a high quality assembly that outperforms short read based assemblies. However, even with long reads it is not always possible to clarify the order of elements of a chloroplast genome sequence reliantly which we could demonstrate with Fosmid End Sequences (FES) generated with Sanger technology. Nevertheless, this limitation also applies to short read sequencing data but is reached in this case at a much earlier stage during finishing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0726-6) contains supplementary material, which is available to authorized users. 相似文献

14.

A whole-genome assembly of the domestic cow, Bos taurus 总被引：4，自引：0，他引：4

Aleksey V Zimin Arthur L Delcher Liliana Florea David R Kelley Michael C Schatz Daniela Puiu Finnian Hanrahan Geo Pertea Curtis P Van Tassell Tad S Sonstegard Guillaume Marçais Michael Roberts Poorani Subramanian James A Yorke Steven L Salzberg 《Genome biology》2009,10(4):R42-10

Background

The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods.

Results

We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions.

Conclusions

By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome. 相似文献

15.

Minimal absent words in four human genome assemblies

Garcia SP Pinho AJ 《PloS one》2011,6(12):e29344

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. 相似文献

16.

Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

Dan Vanderpool Bui Quang Minh Robert Lanfear Daniel Hughes Shwetha Murali R. Alan Harris Muthuswamy Raveendran Donna M. Muzny Mark S. Hibbins Robert J. Williamson Richard A. Gibbs Kim C. Worley Jeffrey Rogers Matthew W. Hahn 《PLoS biology》2020,18(12)

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.

Combining three newly sequenced primate genomes with other published genomes, this study adapts a little-known method for detecting ancient introgression to genome-scale data, revealing multiple previously unknown examples of hybridization between primate species. 相似文献

17.

Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies

Florea L Souvorov A Kalbfleisch TS Salzberg SL 《PloS one》2011,6(6):e21400

Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome''s annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly. 相似文献

18.

Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

Dessimoz C Zoller S Manousaki T Qiu H Meyer A Kuraku S 《Briefings in bioinformatics》2011,12(5):474-484

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. 相似文献

19.

The Primitive Thylakoid-Less Cyanobacterium Gloeobacter Is a Common Rock-Dwelling Organism

Jan Mare? Pavel Hrouzek Radek Kaňa Stefano Ventura Otakar Strunecky Ji?í Komárek 《PloS one》2013,8(6)

Cyanobacteria are an ancient group of photosynthetic prokaryotes, which are significant in biogeochemical cycles. The most primitive among living cyanobacteria, Gloeobacter violaceus, shows a unique ancestral cell organization with a complete absence of inner membranes (thylakoids) and an uncommon structure of the photosynthetic apparatus. Numerous phylogenetic papers proved its basal position among all of the organisms and organelles capable of plant-like photosynthesis (i.e., cyanobacteria, chloroplasts of algae and plants). Hence, G. violaceus has become one of the key species in evolutionary study of photosynthetic life. It also numbers among the most widely used organisms in experimental photosynthesis research. Except for a few related culture isolates, there has been little data on the actual biology of Gloeobacter, being relegated to an “evolutionary curiosity” with an enigmatic identity. Here we show that members of the genus Gloeobacter probably are common rock-dwelling cyanobacteria. On the basis of morphological, ultrastructural, pigment, and phylogenetic comparisons of available Gloeobacter strains, as well as on the basis of three new independent isolates and historical type specimen, we have produced strong evidence as to the close relationship of Gloeobacter to a long known rock-dwelling cyanobacterial morphospecies Aphanothece caldariorum. Our results bring new clues to solving the 40 year old puzzle of the true biological identity of Gloeobacter violaceus, a model organism with a high value in several biological disciplines. A probable broader distribution of Gloeobacter in common wet-rock habitats worldwide is suggested by our data, and its ecological meaning is discussed taking into consideration the background of cyanobacterial evolution. We provide observations of previously unknown genetic variability and phenotypic plasticity, which we expect to be utilized by experimental and evolutionary researchers worldwide. 相似文献

20.

Comparative genomics of green sulfur bacteria

Colin Davenport David W. Ussery Burkhard Tümmler 《Photosynthesis research》2010,104(2-3):137-152

Eleven completely sequenced Chlorobi genomes were compared in oligonucleotide usage, gene contents, and synteny. The green sulfur bacteria (GSB) are equipped with a core genome that sustains their anoxygenic phototrophic lifestyle by photosynthesis, sulfur oxidation, and CO₂ fixation. Whole-genome gene family and single gene sequence comparisons yielded similar phylogenetic trees of the sequenced chromosomes indicating a concerted vertical evolution of large gene sets. Chromosomal synteny of genes is not preserved in the phylum Chlorobi. The accessory genome is characterized by anomalous oligonucleotide usage and endows the strains with individual features for transport, secretion, cell wall, extracellular constituents, and a few elements of the biosynthetic apparatus. Giant genes are a peculiar feature of the genera Chlorobium and Prosthecochloris. The predicted proteins have a huge molecular weight of 10⁶, and are probably instrumental for the bacteria to generate their own intimate (micro)environment. 相似文献