首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Phages, viruses that infect prokaryotes, are the most abundant microbes in the world. A major limitation to studying these viruses is the difficulty of cultivating the appropriate prokaryotic hosts. One way around this limitation is to directly clone and sequence shotgun libraries of uncultured viral communities (i.e., metagenomic analyses). PHACCS, Phage Communities from Contig Spectrum, is an online bioinformatic tool to assess the biodiversity of uncultured viral communities. PHACCS uses the contig spectrum from shotgun DNA sequence assemblies to mathematically model the structure of viral communities and make predictions about diversity.  相似文献   

2.
Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozygosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov).  相似文献   

3.
Viruses infect all forms of life and play critical roles as agents of disease, drivers of biochemical cycles and sources of genetic diversity for their hosts. Our understanding of viral diversity derives primarily from comparisons among host species, precluding insight into how intraspecific variation in host ecology affects viral communities or how predictable viral communities are across populations. Here we test spatial, demographic and environmental hypotheses explaining viral richness and community composition across populations of common vampire bats, which occur in diverse habitats of North, Central and South America. We demonstrate marked variation in viral communities that was not consistently predicted by a null model of declining community similarity with increasing spatial or genetic distances separating populations. We also find no evidence that larger bat colonies host greater viral diversity. Instead, viral diversity follows an elevational gradient, is enriched by juvenile‐biased age structure, and declines with local anthropogenic food resources as measured by livestock density. Our results establish the value of linking the modern influx of metagenomic sequence data with comparative ecology, reveal that snapshot views of viral diversity are unlikely to be representative at the species level, and affirm existing ecological theories that link host ecology not only to single pathogen dynamics but also to viral communities.  相似文献   

4.
Genotypic diversity: estimation and prediction in samples   总被引:11,自引:1,他引:10  
Stoddart JA  Taylor JF 《Genetics》1988,118(4):705-711
We show that a commonly used statistic of genotypic diversity can be used to reflect one form of deviation from panmixia, viz. clonal reproduction, by comparing observed and predicted sample statistics. The characteristics of the statistic, in particular its relationship with population genotypic diversity, are formalised and a method of predicting the genotypic diversity of a sample drawn from a panmictic population using allelic frequencies and sample size is developed. The sensitivity of some possible tests of significance of the deviation from panmictic expectations is examined using computer simulations. Goodness-of-fit tests are robust but produce an unacceptably high level of type II error. With means and variances calculated either from Monte Carlo simulations or from distributional and series approximations, t-tests perform better than goodness-of-fit tests. Under simulation, both forms of t-test exhibit acceptable rates of type I error. Rates of type II are usually large when allele frequencies are severely skewed although the latter test performs the better in those conditions.  相似文献   

5.
Following an evaluation of the various methods available for non-destructive biomass estimation in short rotation forestry, a standardised procedure was defined and incorporated into a computer programme (BioEst). Special efforts were made to ensure that the system can be used by people who are unfamiliar with computers and mathematics. BioEst provides an interface between a calliper and a spreadsheet programme which was written in Microsoft Excel macro language. Therefore, it is simple to modify the programme and create personal protocols. BioEst can be run on a portable PC with Microsoft Excel for Windows. The computer continuously recalculates an estimate of the amount of biomass per hectare, as well as some summary statistics, when fed data on shoot diameter obtained by making row-section-wise measurements with a standard digital calliper. BioEst is available without cost from the author.  相似文献   

6.

Background

A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed.

Results

We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ? each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{? σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure.

Conclusions

Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
  相似文献   

7.

Background

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.

Result

Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.

Conclusion

Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.

Reviewers

This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
  相似文献   

8.
Summary: DeconMSn accurately determines the monoisotopic massand charge state of parent ions from high-resolution tandemmass spectrometry data, offering significant improvement forLTQ_FT and LTQ_Orbitrap instruments over the commercially deliveredThermo Fisher Scientific's extract_msn tool. Optimal parention mass tolerance values can be determined using accurate massinformation, thus improving peptide identifications for high-massmeasurement accuracy experiments. For low-resolution data fromLCQ and LTQ instruments, DeconMSn incorporates a support-vector-machine-basedcharge detection algorithm that identifies the most likely chargeof a parent species through peak characteristics of its fragmentationpattern. Availability: http://ncrr.pnl.gov/software/ or http://www.proteomicsresource.org/ Contact: rds{at}pnl.gov Supplementary information: PowerPoint presentation/Poster onhttp://ncrr.pnl.gov/software/. Associate Editor: Alfonso Valencia  相似文献   

9.
10.
Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp , a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp ‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp .  相似文献   

11.
A model for accurate drift estimation in streams   总被引:1,自引:0,他引:1  
1. This paper explores the experimental difficulties involved with the use of drift nets in small streams, and outlines a method whereby the estimation of drift density (number of specimens m−3 of water) can be improved.
2. Changes in the filtering efficiency of the net caused by trapping of organic debris ('clogging') has the effect of reducing net entrance velocities, causing errors in the calculation of sampled water volume, and thus drift density. A model of the reductions in net entrance velocity based on empirical measurements of trapped debris is developed.
3. Cross-sectional velocity calculations suggest that errors can also be introduced into drift density calculations by positioning sampling nets only on the bed. A method to allow this effect is demonstrated.
4. As adjustments to the calculation of sampled volume are required when sampling in rivers that undergo marked changes in discharge during the sampling period, a method whereby these effects can be accommodated to improve drift density estimations is also outlined.
5. The results of this study imply that theoretical links between flow hydraulics and short-term drift behaviour are poorly understood.  相似文献   

12.
New applications of DNA and RNA sequencing are expanding the field of biodiversity discovery and ecological monitoring, yet questions remain regarding precision and efficiency. Due to primer bias, the ability of metabarcoding to accurately depict biomass of different taxa from bulk communities remains unclear, while PCR‐free whole mitochondrial genome (mitogenome) sequencing may provide a more reliable alternative. Here, we used a set of documented mock communities comprising 13 species of freshwater macroinvertebrates of estimated individual biomass, to compare the detection efficiency of COI metabarcoding (three different amplicons) and shotgun mitogenome sequencing. Additionally, we used individual COI barcoding and de novo mitochondrial genome sequencing, to provide reference sequences for OTU assignment and metagenome mapping (mitogenome skimming), respectively. We found that, even though both methods occasionally failed to recover very low abundance species, metabarcoding was less consistent, by failing to recover some species with higher abundances, probably due to primer bias. Shotgun sequencing results provided highly significant correlations between read number and biomass in all but one species. Conversely, the read–biomass relationships obtained from metabarcoding varied across amplicons. Specifically, we found significant relationships for eight of 13 (amplicons B1FR‐450 bp, FF130R‐130 bp) or four of 13 (amplicon FFFR, 658 bp) species. Combining the results of all three COI amplicons (multiamplicon approach) improved the read–biomass correlations for some of the species. Overall, mitogenomic sequencing yielded more informative predictions of biomass content from bulk macroinvertebrate communities than metabarcoding. However, for large‐scale ecological studies, metabarcoding currently remains the most commonly used approach for diversity assessment.  相似文献   

13.
14.
15.

Background  

Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.  相似文献   

16.
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.  相似文献   

17.
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.

Highly divergent sites in multiple sequence alignments are thought to negatively impact phylogenetic inference; trimming methods aim to remove these sites, but recent analysis suggests that doing so can worsen inference. This study introduces ClipKIT, a trimming method that instead aims to retain parsimony-informative sites; phylogenetic inference using ClipKIT-trimmed alignments is accurate, robust and time-saving.  相似文献   

18.
19.
A simple method for accurate estimation of apoptotic cells   总被引:6,自引:0,他引:6  
A simple, sensitive, and reliable "DNA diffusion" assay for the quantification of apoptosis is described. Human lymphocytes and human lymphoblastoid cells, MOLT-4, were exposed to 0, 12.5, 25, 50, or 100 rad of X-rays. After 24 h of incubation, cells were mixed with agarose, microgels were made, and cells were lysed in high salt and detergents. DNA was precipitated in microgels by ethanol. Staining of DNA was done with an intense fluorescent dye, YOYO-1. Apoptotic cells show a halo of granular DNA with a hazy outer boundary. Necrotic cells, resulting from hyperthermia treatment, on the other hand, show an unusually large homogeneous nucleus with a clearly defined boundary. The number of cells with apoptotic and necrotic appearance can be scored and quantified by using a fluorescent microscope. Results were compared with other methods of apoptosis measurement: morphological estimations of apoptosis and DNA ladder pattern formation in regular agarose gel electrophoresis. Validation of the technique was done using some known inducers of apoptosis and necrosis (hyperthermia, hydrogen peroxide, mitoxantrone, novobiocin, and sodium ascorbate).  相似文献   

20.

Background  

Dekapentagonal maps depict the phylogenetic relationships of five genomes in a visually appealing diagram and can be viewed as an alternative to a single evolutionary consensus tree. In particular, the generated maps focus attention on those gene families that significantly deviate from the consensus or plurality phylogeny. PentaPlot is a software tool that computes such dekapentagonal maps given an appropriate probability support matrix.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号