期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MetaSim: a sequencing simulator for genomics and metagenomics 总被引：1，自引：0，他引：1

Richter DC Ott F Auch AF Schmid R Huson DH 《PloS one》2008,3(10):e3373

Background

The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets.

Methodology/Principal Findings

To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.

Conclusions/Significance

MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. 相似文献

2.

Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

Jenna L. Morgan Aaron E. Darling Jonathan A. Eisen 《PloS one》2010,5(4)

Background

Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism''s DNA was observed in reads generated via DNA sequencing.

Methodology/Principal Findings

We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized.

Conclusions/Significance

We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different protocols are not suitable for comparative metagenomics. 相似文献

3.

Metagenomic analysis of the Rhinopithecus bieti fecal microbiome reveals a broad diversity of bacterial and glycoside hydrolase profiles related to lignocellulose degradation

Bo Xu Weijiang Xu Junjun Li Liming Dai Caiyun Xiong Xianghua Tang Yunjuan Yang Yuelin Mu Junpei Zhou Junmei Ding Qian Wu Zunxi Huang 《BMC genomics》2015,16(1)

Background

The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host and the diet adopted by the host. Although the importance of gut microbiota of humans has been well demonstrated, there is a paucity of research regarding non-human primates (NHPs), especially herbivorous NHPs.

Results

In this study, an analysis of 97,942 pyrosequencing reads generated from Rhinopithecus bieti fecal DNA extracts was performed to help better understanding of the microbial diversity and functional capacity of the R. bieti gut microbiome. The taxonomic analysis of the metagenomic reads indicated that R. bieti fecal microbiomes were dominated by Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria phyla. The comparative analysis of taxonomic classification revealed that the metagenome of R. bieti was characterized by an overrepresentation of bacteria of phylum Fibrobacteres and Spirochaetes as compared with other animals. Primary functional categories were associated mainly with protein, carbohydrates, amino acids, DNA and RNA metabolism, cofactors, cell wall and capsule and membrane transport. Comparing glycoside hydrolase profiles of R. bieti with those of other animal revealed that the R. bieti microbiome was most closely related to cow rumen.

Conclusions

Metagenomic and functional analysis demonstrated that R. bieti possesses a broad diversity of bacteria and numerous glycoside hydrolases responsible for lignocellulosic biomass degradation which might reflect the adaptations associated with a diet rich in fibrous matter. These results would contribute to the limited body of NHPs metagenome studies and provide a unique genetic resource of plant cell wall degrading microbial enzymes. However, future studies on the metagenome sequencing of R. bieti regarding the effects of age, genetics, diet and environment on the composition and activity of the metagenomes are required.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1378-7) contains supplementary material, which is available to authorized users. 相似文献

4.

Statistical Approach of Functional Profiling for a Microbial Community

Lingling An Nauromal Pookhao Hongmei Jiang Jiannong Xu 《PloS one》2014,9(9)

Background

Metagenomics is a relatively new but fast growing field within environmental biology and medical sciences. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Traditional methods in genomics and microbiology are not efficient in capturing the structure of the microbial community in an environment. Nowadays, high-throughput next-generation sequencing technologies are powerfully driving the metagenomic studies. However, there is an urgent need to develop efficient statistical methods and computational algorithms to rapidly analyze the massive metagenomic short sequencing data and to accurately detect the features/functions present in the microbial community. Although several issues about functions of metagenomes at pathways or subsystems level have been investigated, there is a lack of studies focusing on functional analysis at a low level of a hierarchical functional tree, such as SEED subsystem tree.

Results

A two-step statistical procedure (metaFunction) is proposed to detect all possible functional roles at the low level from a metagenomic sample/community. In the first step a statistical mixture model is proposed at the base of gene codons to estimate the abundances for the candidate functional roles, with sequencing error being considered. As a gene could be involved in multiple biological processes the functional assignment is therefore adjusted by utilizing an error distribution in the second step. The performance of the proposed procedure is evaluated through comprehensive simulation studies. Compared with other existing methods in metagenomic functional analysis the new approach is more accurate in assigning reads to functional roles, and therefore at more general levels. The method is also employed to analyze two real data sets.

Conclusions

metaFunction is a powerful tool in accurate profiling functions in a metagenomic sample. 相似文献

5.

Next-Generation Phage Display: Integrating and Comparing Available Molecular Tools to Enable Cost-Effective High-Throughput Analysis

Emmanuel Dias-Neto Diana N. Nunes Ricardo J. Giordano Jessica Sun Gregory H. Botz Kuan Yang Jo?o C. Setubal Renata Pasqualini Wadih Arap 《PloS one》2009,4(12)

Background

Combinatorial phage display has been used in the last 20 years in the identification of protein-ligands and protein-protein interactions, uncovering relevant molecular recognition events. Rate-limiting steps of combinatorial phage display library selection are (i) the counting of transducing units and (ii) the sequencing of the encoded displayed ligands. Here, we adapted emerging genomic technologies to minimize such challenges.

Methodology/Principal Findings

We gained efficiency by applying in tandem real-time PCR for rapid quantification to enable bacteria-free phage display library screening, and added phage DNA next-generation sequencing for large-scale ligand analysis, reporting a fully integrated set of high-throughput quantitative and analytical tools. The approach is far less labor-intensive and allows rigorous quantification; for medical applications, including selections in patients, it also represents an advance for quantitative distribution analysis and ligand identification of hundreds of thousands of targeted particles from patient-derived biopsy or autopsy in a longer timeframe post library administration. Additional advantages over current methods include increased sensitivity, less variability, enhanced linearity, scalability, and accuracy at much lower cost. Sequences obtained by qPhage plus pyrosequencing were similar to a dataset produced from conventional Sanger-sequenced transducing-units (TU), with no biases due to GC content, codon usage, and amino acid or peptide frequency. These tools allow phage display selection and ligand analysis at >1,000-fold faster rate, and reduce costs ∼250-fold for generating 10⁶ ligand sequences.

Conclusions/Significance

Our analyses demonstrates that whereas this approach correlates with the traditional colony-counting, it is also capable of a much larger sampling, allowing a faster, less expensive, more accurate and consistent analysis of phage enrichment. Overall, qPhage plus pyrosequencing is superior to TU-counting plus Sanger sequencing and is proposed as the method of choice over a broad range of phage display applications in vitro, in cells, and in vivo. 相似文献

6.

Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products

Mathieu Almeida Agnès Hébert Anne-Laure Abraham Simon Rasmussen Christophe Monnet Nicolas Pons Céline Delbès Valentin Loux Jean-Michel Batto Pierre Leonard Sean Kennedy Stanislas Dusko Ehrlich Mihai Pop Marie-Christine Montel Fran?oise Irlinger Pierre Renault 《BMC genomics》2014,15(1)

Background

Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned.

Results

We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study.

Conclusions

Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1101) contains supplementary material, which is available to authorized users. 相似文献

7.

Accurate genome relative abundance estimation for closely related species in a metagenomic sample

Michael B Sohn Lingling An Naruekamol Pookhao Qike Li 《BMC bioinformatics》2014,15(1)

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users. 相似文献

8.

Quality control of microbiota metagenomics by k-mer analysis

Florian Plaza Onate Jean-Michel Batto Catherine Juste Jehane Fadlallah Cyrielle Fougeroux Doriane Gouas Nicolas Pons Sean Kennedy Florence Levenez Joel Dore S Dusko Ehrlich Guy Gorochov Martin Larsen 《BMC genomics》2015,16(1)

Background

The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.

Results

We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.

Conclusions

We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users. 相似文献

9.

A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing

Charles C Bell Graham W Magor Kevin R Gillinder Andrew C Perkins 《BMC genomics》2014,15(1)

Background

CRISPR-Cas9 is a revolutionary genome editing technique that allows for efficient and directed alterations of the eukaryotic genome. This relatively new technology has already been used in a large number of ‘loss of function’ experiments in cultured cells. Despite its simplicity and efficiency, screening for mutated clones remains time-consuming, laborious and/or expensive.

Results

Here we report a high-throughput screening strategy that allows parallel screening of up to 96 clones, using next-generation sequencing. As a proof of principle, we used CRISPR-Cas9 to disrupt the coding sequence of the homeobox gene, Evx1 in mouse embryonic stem cells. We screened 67 CRISPR-Cas9 transfected clones simultaneously by next-generation sequencing on the Ion Torrent PGM. We were able to identify both homozygous and heterozygous Evx1 mutants, as well as mixed clones, which must be identified to maintain the integrity of subsequent experiments.

Conclusions

Our CRISPR-Cas9 screening strategy could be widely applied to screen for CRISPR-Cas9 mutants in a variety of contexts including the generation of mutant cell lines for in vitro research, the generation of transgenic organisms and for assessing the veracity of CRISPR-Cas9 homology directed repair. This technique is cost and time-effective, provides information on clonal heterogeneity and is adaptable for use on various sequencing platforms.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1002) contains supplementary material, which is available to authorized users. 相似文献

10.

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Rachid Ounit Steve Wanamaker Timothy J Close Stefano Lonardi 《BMC genomics》2015,16(1)

相似文献

11.

Census-based rapid and accurate metagenome taxonomic profiling

Amirhossein Shamsaddini Yang Pan W Evan Johnson Konstantinos Krampis Mariya Shcheglovitova Vahan Simonyan Amy Zanne Raja Mazumder 《BMC genomics》2014,15(1)

Background

Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis.

Results

Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a ‘sneak peak’ into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI’s nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results.

Conclusion

CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-918) contains supplementary material, which is available to authorized users. 相似文献

12.

MicroRNA expression profiling of the fifth-instar posterior silk gland of Bombyx mori

Jisheng Li Yimei Cai Lupeng Ye Shaohua Wang Jiaqian Che Zhengying You Jun Yu Boxiong Zhong 《BMC genomics》2014,15(1)

Background

The growth and development of the posterior silk gland and the biosynthesis of the silk core protein at the fifth larval instar stage of Bombyx mori are of paramount importance for silk production.

Results

Here, aided by next-generation sequencing and microarry assay, we profile 1,229 microRNAs (miRNAs), including 728 novel miRNAs and 110 miRNA/miRNA* duplexes, of the posterior silk gland at the fifth larval instar. Target gene prediction yields 14,222 unique target genes from 1,195 miRNAs. Functional categorization classifies the targets into complex pathways that include both cellular and metabolic processes, especially protein synthesis and processing.

Conclusion

The enrichment of target genes in the ribosome-related pathway indicates that miRNAs may directly regulate translation. Our findings pave a way for further functional elucidation of these miRNAs and their targets in silk production.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-410) contains supplementary material, which is available to authorized users. 相似文献

13.

SASI-Seq: sample assurance Spike-Ins,and highly differentiating 384 barcoding for Illumina sequencing

Michael A Quail Miriam Smith David Jackson Steven Leonard Thomas Skelly Harold P Swerdlow Yong Gu Peter Ellis 《BMC genomics》2014,15(1)

Background

A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.

Results

By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.

Conclusion

SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-110) contains supplementary material, which is available to authorized users. 相似文献

14.

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics

Ben Jia Liming Xuan Kaiye Cai Zhiqiang Hu Liangxiao Ma Chaochun Wei 《PloS one》2013,8(10)

Background

Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools.

Results

We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim.

Conclusions

NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it’s freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php. 相似文献

15.

Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population

Ikuko N Motoike Mitsuyo Matsumoto Inaho Danjoh Fumiki Katsuoka Kaname Kojima Naoki Nariai Yukuto Sato Yumi Yamaguchi-Kabata Shin Ito Hisaaki Kudo Ichiko Nishijima Satoshi Nishikawa Xiaoqing Pan Rumiko Saito Sakae Saito Tomo Saito Matsuyuki Shirota Kaoru Tsuda Junji Yokozawa Kazuhiko Igarashi Naoko Minegishi Osamu Tanabe Nobuo Fuse Masao Nagasaki Kengo Kinoshita Jun Yasuda Masayuki Yamamoto 《BMC genomics》2014,15(1)

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users. 相似文献

16.

Automated classification of tailed bacteriophages according to their neck organization

Anne Lopes Paulo Tavares Marie-Agnès Petit Rapha?l Guérois Sophie Zinn-Justin 《BMC genomics》2014,15(1)

Background

The genetic diversity observed among bacteriophages remains a major obstacle for the identification of homologs and the comparison of their functional modules. In the structural module, although several classes of homologous proteins contributing to the head and tail structure can be detected, proteins of the head-to-tail connection (or neck) are generally more divergent. Yet, molecular analyses of a few tailed phages belonging to different morphological classes suggested that only a limited number of structural solutions are used in order to produce a functional virion. To challenge this hypothesis and analyze proteins diversity at the virion neck, we developed a specific computational strategy to cope with sequence divergence in phage proteins. We searched for homologs of a set of proteins encoded in the structural module using a phage learning database.

Results

We show that using a combination of iterative profile-profile comparison and gene context analyses, we can identify a set of head, neck and tail proteins in most tailed bacteriophages of our database. Classification of phages based on neck protein sequences delineates 4 Types corresponding to known morphological subfamilies. Further analysis of the most abundant Type 1 yields 10 Clusters characterized by consistent sets of head, neck and tail proteins. We developed Virfam, a webserver that automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. This server was tested against 624 new phages from the NCBI database. 93% of the tailed and unclassified phages could be assigned to our head-neck-tail based categories, thus highlighting the large representativeness of the identified virion architectures. Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.

Conclusions

Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a “head-neck-tail” based classification. It should enable analysis of large sets of phage genomes. In particular, it should contribute to the classification of the abundant unknown viruses found on assembled contigs of metagenomic samples.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1027) contains supplementary material, which is available to authorized users. 相似文献

17.

CLAST: CUDA implemented large-scale alignment search tool

Masahiro Yano Hiroshi Mori Yutaka Akiyama Takuji Yamada Ken Kurokawa 《BMC bioinformatics》2014,15(1)

Background

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets.

Results

We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows–Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node.

Conclusions

CLAST achieved very high speed (similar to the Burrows–Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0406-y) contains supplementary material, which is available to authorized users. 相似文献

18.

Analysis of High-Throughput Sequencing and Annotation Strategies for Phage Genomes

Matthew R. Henn Matthew B. Sullivan Nicole Stange-Thomann Marcia S. Osburne Aaron M. Berlin Libusha Kelly Chandri Yandava Chinnappa Kodira Qiandong Zeng Michael Weiand Todd Sparrow Sakina Saif Georgia Giannoukos Sarah K. Young Chad Nusbaum Bruce W. Birren Sallie W. Chisholm 《PloS one》2010,5(2)

Background

Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage.

Methodology/Principal Findings

To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles), and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL) or of a whole genome shotgun library (WGSL), or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling.

Conclusions/Significance

These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics. 相似文献

19.

High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping

John E. Collins Neha Wali Ian M. Sealy James A. Morris Richard J. White Steven R. Leonard David K. Jackson Matthew C. Jones Nathalie C. Smerdon Jorge Zamora Christopher M. Dooley Samantha N. Carruthers Jeffrey C. Barrett Derek L. Stemple Elisabeth M. Busch-Nentwich 《BMC genomics》2015,16(1)

相似文献

20.

Snapshot of the eukaryotic gene expression in muskoxen rumen--a metatranscriptomic approach

Qi M Wang P O'Toole N Barboza PS Ungerfeld E Leigh MB Selinger LB Butler G Tsang A McAllister TA Forster RJ 《PloS one》2011,6(5):e20521

相似文献