共查询到20条相似文献,搜索用时 15 毫秒
1.
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible. 相似文献
2.
Saakshi Jalali Samantha Kohli Chitra Latka Sugandha Bhatia Shamsudheen Karuthedath Vellarikal Sridhar Sivasubbu Vinod Scaria Srinivasan Ramachandran 《PloS one》2015,10(6)
Fomites are a well-known source of microbial infections and previous studies have provided insights into the sojourning microbiome of fomites from various sources. Paper currency notes are one of the most commonly exchanged objects and its potential to transmit pathogenic organisms has been well recognized. Approaches to identify the microbiome associated with paper currency notes have been largely limited to culture dependent approaches. Subsequent studies portrayed the use of 16S ribosomal RNA based approaches which provided insights into the taxonomical distribution of the microbiome. However, recent techniques including shotgun sequencing provides resolution at gene level and enable estimation of their copy numbers in the metagenome. We investigated the microbiome of Indian paper currency notes using a shotgun metagenome sequencing approach. Metagenomic DNA isolated from samples of frequently circulated denominations of Indian currency notes were sequenced using Illumina Hiseq sequencer. Analysis of the data revealed presence of species belonging to both eukaryotic and prokaryotic genera. The taxonomic distribution at kingdom level revealed contigs mapping to eukaryota (70%), bacteria (9%), viruses and archae (~1%). We identified 78 pathogens including Staphylococcus aureus, Corynebacterium glutamicum, Enterococcus faecalis, and 75 cellulose degrading organisms including Acidothermus cellulolyticus, Cellulomonas flavigena and Ruminococcus albus. Additionally, 78 antibiotic resistance genes were identified and 18 of these were found in all the samples. Furthermore, six out of 78 pathogens harbored at least one of the 18 common antibiotic resistance genes. To the best of our knowledge, this is the first report of shotgun metagenome sequence dataset of paper currency notes, which can be useful for future applications including as bio-surveillance of exchangeable fomites for infectious agents. 相似文献
3.
Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq) has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Here we present a novel model-based approach (MiClip) to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets. In the HITS-CLIP dataset, the signal/noise ratios of miRNA seed motif enrichment produced by the MiClip approach are between 17% and 301% higher than those by the ad hoc method for the top 10 most enriched miRNAs. In the PAR-CLIP dataset, the MiClip approach can identify ∼50% more validated binding targets than the original ad hoc method and two recently published methods. To facilitate the application of the algorithm, we have released an R package, MiClip (
http://cran.r-project.org/web/packages/MiClip/index.html
), and a public web-based graphical user interface software (http://galaxy.qbrc.org/tool_runner?tool_id=mi_clip) for customized analysis. 相似文献
4.
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. 相似文献
5.
John D. O’Brien Xavier Didelot Zamin Iqbal Lucas Amenga-Etego Bartu Ahiska Daniel Falush 《Genetics》2014,197(3):925-937
Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of high rates of migration among sample sites. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples. 相似文献
6.
One of the major challenges to understanding population changes in ecology for assessment purposes is the difficulty in evaluating the suitability of an area for a given species. Here we used a new simple approach able to faithfully predict through time the abundance of two key zooplanktonic species by focusing on the relationship between the species’ environmental preferences and their observed abundances. The approach is applied to the marine copepods Calanus finmarchicus and C. helgolandicus as a case study characterising the multidecadal dynamics of the North Sea ecosystem. We removed all North Sea data from the Continuous Plankton Recorder (CPR) dataset and described for both species a simplified ecological niche using Sea Surface Temperature (SST) and CPR Phytoplankton Colour Index (PCI). We then modelled the dynamics of each species by associating the North Sea’s environmental parameters to the species’ ecological niches, thus creating a method to assess the suitability of this area. By using both C. finmarchicus and C. helgolandicus as indicators, the procedure reproduces the documented switches from cold to warm temperate states observed in the North Sea. 相似文献
7.
Statistics in Biosciences - We study how international flights can facilitate the spread of an epidemic to a worldwide scale. We combine an infrastructure network of flight connections with a... 相似文献
8.
What is the underlying mechanism behind the fat-tailed statistics observed for species abundance distributions? The two main hypotheses in the field are the adaptive (niche) theories, where species abundance reflects its fitness, and the neutral theory that assumes demographic stochasticity as the main factor determining community structure. Both explanations suggest quite similar species-abundance distributions, but very different histories: niche scenarios assume that a species population in the past was similar to the observed one, while neutral scenarios are characterized by strongly fluctuating populations. Since the genetic variations within a population depend on its abundance in the past, we present here a way to discriminate between the theories using the genetic diversity of noncoding DNA. A statistical test, based on the Fu-Li method, has been developed and enables such a differentiation. We have analyzed the results gathered from individual-based simulation of both types of histories and obtained clear distinction between the Fu-Li statistics of the neutral scenario and that of the niche scenario. Our results suggest that data for 10–50 species, with approximately 30 sequenced individuals for each species, may allow one to distinguish between these two theories. 相似文献
9.
Mosquitoes as one of the most common but important vectors have the potential to transmit or acquire a lot of viruses through biting, however viral flora in mosquitoes and its impact on mosquito-borne disease transmission has not been well investigated and evaluated. In this study, the metagenomic techniquehas been successfully employed in analyzing the abundance and diversity of viral community in three mosquito samples from Hubei, China. Among 92,304 reads produced through a run with 454 GS FLX system, 39% have high similarities with viral sequences belonging to identified bacterial, fungal, animal, plant and insect viruses, and 0.02% were classed into unidentified viral sequences, demonstrating high abundance and diversity of viruses in mosquitoes. Furthermore, two novel viruses in subfamily Densovirinae and family Dicistroviridae were identified, and six torque tenosus virus1 in family Anelloviridae, three porcine parvoviruses in subfamily Parvovirinae and a Culex tritaeniorhynchus rhabdovirus in Family Rhabdoviridae were preliminarily characterized. The viral metagenomic analysis offered us a deep insight into the viral population of mosquito which played an important role in viral initiative or passive transmission and evolution during the process. 相似文献
10.
In studies using macroinvertebrates as indicators for monitoring rivers and streams, species level identifications in comparison
with lower resolution identifications can have greater information content and result in more reliable site classifications
and better capacity to discriminate between sites, yet many such programmes identify specimens to the resolution of family
rather than species. This is often because it is cheaper to obtain family level data than species level data. Choice of appropriate
taxonomic resolution is a compromise between the cost of obtaining data at high taxonomic resolutions and the loss of information
at lower resolutions. Optimum taxonomic resolution should be determined by the information required to address programme objectives.
Costs saved in identifying macroinvertebrates to family level may not be justified if family level data can not give the answers
required and expending the extra cost to obtain species level data may not be warranted if cheaper family level data retains
sufficient information to meet objectives. We investigated the influence of taxonomic resolution and sample quantification
(abundance vs. presence/absence) on the representation of aquatic macroinvertebrate species assemblage patterns and species
richness estimates. The study was conducted in a physically harsh dryland river system (Condamine-Balonne River system, located
in south-western Queensland, Australia), characterised by low macroinvertebrate diversity. Our 29 study sites covered a wide
geographic range and a diversity of lotic conditions and this was reflected by differences between sites in macroinvertebrate
assemblage composition and richness. The usefulness of expending the extra cost necessary to identify macroinvertebrates to
species was quantified via the benefits this higher resolution data offered in its capacity to discriminate between sites
and give accurate estimates of site species richness. We found that very little information (<6%) was lost by identifying
taxa to family (or genus), as opposed to species, and that quantifying the abundance of taxa provided greater resolution for
pattern interpretation than simply noting their presence/absence. Species richness was very well represented by genus, family
and order richness, so that each of these could be used as surrogates of species richness if, for example, surveying to identify
diversity hot-spots. It is suggested that sharing of common ecological responses among species within higher taxonomic units
is the most plausible mechanism for the results. Based on a cost/benefit analysis, family level abundance data is recommended
as the best resolution for resolving patterns in macroinvertebrate assemblages in this system. The relevance of these findings
are discussed in the context of other low diversity, harsh, dryland river systems. 相似文献
11.
Zhigang Li Margaret R. Karagas Juliette C. Madan Anne G. Hoen A. James O’Malley Hongzhe Li 《Statistics in biosciences》2018,10(3):587-608
The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic-normal model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the discrete part and the logistic-normal part of the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Our approach is also compared to others using the analysis of real microbiome data. 相似文献
12.
Nino Nikolovski Pavel V. Shliaha Laurent Gatto Paul Dupree Kathryn S. Lilley 《Plant physiology》2014,166(2):1033-1043
The proteomic composition of the Arabidopsis (Arabidopsis thaliana) Golgi apparatus is currently reasonably well documented; however, little is known about the relative abundances between different proteins within this compartment. Accurate quantitative information of Golgi resident proteins is of great importance: it facilitates a better understanding of the biochemical processes that take place within this organelle, especially those of different polysaccharide synthesis pathways. Golgi resident proteins are challenging to quantify because the abundance of this organelle is relatively low within the cell. In this study, an organelle fractionation approach targeting the Golgi apparatus was combined with a label-free quantitative mass spectrometry (data-independent acquisition method using ion mobility separation known as LC-IMS-MSE [or HDMSE]) to simultaneously localize proteins to the Golgi apparatus and assess their relative quantity. In total, 102 Golgi-localized proteins were quantified. These data show that organelle fractionation in conjunction with label-free quantitative mass spectrometry is a powerful and relatively simple tool to access protein organelle localization and their relative abundances. The findings presented open a unique view on the organization of the plant Golgi apparatus, leading toward unique hypotheses centered on the biochemical processes of this organelle.The plant Golgi apparatus plays an important role in protein and lipid glycosylation and sorting as well as biosynthesis of large amounts of extracellular polysaccharides. It contains a large and diverse set of glycosyltransferases and other enzymes that are required for the synthesis and modification of these polysaccharides (Parsons et al., 2012b; Oikawa et al., 2013). The protein composition of this organelle has been the focus of a number of studies; however, these studies largely report a catalog of Golgi-localized proteins, and to date, there are no comprehensive data on the relative abundance of the different protein constituents of the Golgi apparatus (Dunkley et al., 2004, 2006; Sadowski et al., 2008; Nikolovski et al., 2012; Groen et al., 2014). The quantification of the plant Golgi proteome has been considered challenging, because this organelle is proportionally of low abundance in the cell; therefore, its constituent proteins are rarely identified in conventional proteomics experiments. Investigation of such low-abundance proteins generally requires sample fractionation on the organelle, protein, or peptide level (Stasyk and Huber, 2004; Haynes and Roberts, 2007; Di Palma et al., 2012).Here, an organelle fractionation approach in conjunction with label-free quantitative proteomic analysis was used to assess the localization and relative abundance of proteins within the plant Golgi apparatus. Label-free quantification is an increasingly popular alternative to isotopic tagging quantitative methods; it does not require labeling reagents and can be applied to an unlimited number of samples (Neilson et al., 2011; Evans et al., 2012). This is particularly appealing within plant proteomics, because the most conventional labeling strategy, Stable Isotope Labeling by Amino Acids in Cell Culture, is not easily suited for quantitative plant proteomic studies. The average labeling efficiency achieved using exogenous amino acid supply to Arabidopsis (Arabidopsis thaliana) cell cultures was found to be only 70% to 80% (Gruhler et al., 2005). Quantitative strategies with 15N metabolic labeling have been described for plant proteome analysis; however, care should be taken to ensure complete 15N incorporation, because even small amounts of 14N in the labeled sample can have significant detrimental effects on the number of peptide identifications (Nelson et al., 2007; Guo and Li, 2011; Arsova et al., 2012).In all label-free methods, samples under comparison are analyzed during separate mass spectrometry (MS) experiments (Neilson et al., 2011). The information from identified peptides is then used for relative and/or absolute quantification. The simplest label-free method involves taking the number of spectra acquired and assigned to peptides from the same protein as a measure of abundance (Ishihama et al., 2005). In an alternative approach, ion current recorded for a peptide ion is used as a measure of its abundance. The assumption is made that ion intensity is proportional to peptide amount in the sample analyzed, which holds true for nanoflow and microflow liquid chromatography (LC) systems (Levin et al., 2011; Christianson et al., 2013). Comparing peptide ion current between samples is, thus, widely used for relative quantification (Silva et al., 2005). To allow such comparison, a peptide must be identified across all samples under investigation, which is often challenging in LC-MS experiments given the highly complex nature of proteomics samples that contain tens of thousands of different peptides (Michalski et al., 2011). Hence, most relative ion intensity-based label-free approaches usually involve a step of identification transfer (Pasa-Tolíc et al., 2004). This involves matching ions from different acquisitions (in one of which, the ion has not been identified and is assigned the sequence from its matching pair in the other acquisition).Additionally, label-free proteomics can be used for absolute quantification (i.e. to estimate abundance of different proteins relative to each other within a given sample). Several different approaches have been suggested on how to convert peptide intensities to protein amounts (for comparison, see Wilhelm et al., 2014). One of the first such methods was Top-3 described by Silva et al. (2006b), who made a notable and unexpected observation, stating that the average MS signal response for the three most abundant peptides per 1 mol of protein is constant within a coefficient of variation of less than 10% (Silva et al., 2006b).In all these approaches, the peptide ion current is typically computed as the area under the curve of the chromatographic elution profile that is reconstituted from separate MS1 survey scans in which intact precursors are recorded. Determining a chromatographic profile accurately requires that the MS1 scans are performed at optimal frequency (Lange et al., 2008) and for optimal duration to record the MS1 signal at a high signal-to-noise ratio. In typical data-dependent acquisitions, however, the mass spectrometer oscillates between MS1 survey scans recording the mass/charge (m/z) for precursor peptide ions and then, a series of MS2 scans fragmenting one peptide ion precursor at a time, producing fragmentation spectra necessary for identification (Sadygov et al., 2004). As a result, the duration and frequency of MS2 scans determine the identification rate in data-dependent acquisition experiments but compromise time spent in MS1 required for accurate area under the curve quantification. Several groups have suggested data-independent acquisition, in which individual peptide ions are not selected for fragmentation but rather, groups of peptides of similar m/z are fragmented together. The exact number of cofragmented precursors depends on the speed and sensitivity of instrument configuration (for review, see Law and Lim, 2013). The simplest approach involves alternating between low-energy and high-energy scans of equal duration; low-energy scans record precursor peptide ions, whereas in high-energy scans, all precursors entering the mass spectrometer are cofragmented, and their fragments are recorded simultaneously. The method was called MSE for Waters qTOF Mass Spectrometers (Geromanos et al., 2009) or all-ion fragmentation for Thermo Orbitrap Mass Spectrometers (Geiger et al., 2010). The analysis required downstream of this type of data acquisition is challenging given that the information of fragment origin (i.e. from what precursor peptide ion fragment was generated) is lost completely and that the high number of coeluting peptides is expected to create highly overlapping fragment spectra on fragmentation. To address this problem, Hoaglund-Hyzer and Clemmer (2001) have suggested fractionating peptides by ion mobility separation before fragmentation and MS and assigning fragments to precursors based on similarity of both chromatographic and mobility profiles (Hoaglund-Hyzer and Clemmer, 2001). The method was termed parallel fragmentation, and since that time, it has been commercialized by Waters as IMS-MSE or HDMSE (Shliaha et al., 2013).To date, the application of label-free quantitative proteomics to plant biology has been very limited. Recently, Helm et al. (2014) applied the LC-IMS-MSE with Top-3 quantification to quantify the Arabidopsis chloroplast stroma proteome, allowing quantitative modeling of chloroplast metabolism. Two other works used the LC-MSE method to assess the quantitative changes of cytosolic ribosomal proteins in response to Suc feeding and the extracellular proteome in response to salicylic acid (Cheng et al., 2009; Hummel et al., 2012).A number of proteomics approaches have been described to assess protein localization on a large scale (for review, see Gatto et al., 2010). Purification approaches attempt to isolate organelles to high levels of purity and subsequently identify and quantify proteins using LC-MS; however, such attempts yield limiting success and high false discovery rates (Andersen et al., 2002; Parsons et al., 2012a). A known limitation of this technique is the inability to completely isolate an organelle of interest, which combined with high proteome dynamic range, can result in some more abundant contaminants being identified and quantified at higher amounts than the target organelle residents. Moreover, even if a target organelle could be isolated to a certain degree of purity, it would still be impossible to deconvolute organelle residents from transient proteins that traffic through the target organelle. This becomes especially challenging for the organelles of the secretory pathway. To address these challenges, several groups applied fractionation of all organelles by gradient centrifugation and subsequent protein quantification by LC-MS. This produces distributions across the gradient for all quantified proteins, which are then used to assign organelle localization based on the specific distributions of organelle marker proteins. This effectively solves the problem of organelle contamination and protein trafficking, because a protein is expected to have a distribution characteristic of its organelle of residence, even if it is identified in all fractions, including those enriched in other organelles. Current variations of this method differ mostly by the LC-MS strategy used for quantification; for example, spectral counting was applied for protein-correlating profiles (Andersen et al., 2003), isobaric mass tagging (Nikolovski et al., 2012) and isotope-coded affinity tagging (Dunkley et al., 2004) were applied for localization of organelle proteins by isotope tagging (LOPIT), and Stable Isotope Labeling by Amino Acids in Cell Culture was applied for nucleolus/nucleus/cytosolic fractionation (Boisvert and Lamond, 2010).Here, a label-free LC-IMS-MSE method was used for the analysis of density ultracentrifugation fractions enriched for the Golgi apparatus. First, we use relative label-free quantification involving identification transfer using the previously published synapter algorithm (Bond et al., 2013) to assess distributions of Golgi-localized proteins across the density gradient. These distributions are significantly different from those of residents of other organelles, which results in unambiguous protein assignment to the Golgi apparatus by multivariate data analysis. Second, the Top-3 absolute quantification method as implemented in Protein Lynx Global Server (PLGS) was used to rank order the Golgi-localized proteins by abundance in the fraction most enriched for Golgi apparatus. In conclusion, we present the analysis of protein distribution and abundances of the Golgi apparatus-enriched portion of the ultracentrifugation density gradient, allowing for simultaneous protein quantification and localization and leading to the assessment of relative abundances of 102 Golgi-localized proteins. 相似文献
13.
Maria Májeková Taavi Paal Nichola S. Plowman Michala Bryndová Liis Kasari Anna Norberg Matthias Weiss Tom R. Bishop Sarah H. Luke Katerina Sam Yoann Le Bagousse-Pinguet Jan Lep? Lars G?tzenberger Francesco de Bello 《PloS one》2016,11(2)
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data. 相似文献
14.
15.
Sean C. Taylor Thomas Berkelman Geetha Yadav Matt Hammond 《Molecular biotechnology》2013,55(3):217-226
Chemiluminescent western blotting has been in common practice for over three decades, but its use as a quantitative method for measuring the relative expression of the target proteins is still debatable. This is mainly due to the various steps, techniques, reagents, and detection methods that are used to obtain the associated data. In order to have confidence in densitometric data from western blots, researchers should be able to demonstrate statistically significant fold differences in protein expression. This entails a necessary evolution of the procedures, controls, and the analysis methods. We describe a methodology to obtain reliable quantitative data from chemiluminescent western blots using standardization procedures coupled with the updated reagents and detection methods. 相似文献
16.
17.
18.
Next-generation sequencing(NGS) technology has revolutionized and significantly impacted metagenomic research.However,the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads,which will significantly compromise downstream analysis.Many quality control(QC) tools have been proposed,however,few of them have been verified to be suitable or efficient for metagenomic data,which are composed of multiple genomes and are more complex than other kinds of NGS data.Here we present a metagenomic data QC method named Meta-QC-Chain.Meta-QC-Chain combines multiple QC functions:technical tests describe input data status and identify potential errors,quality trimming filters poor sequencing-quality bases and reads,and contamination screening identifies higher eukaryotic species,which are considered as contamination for metagenomic data.Most computing processes are optimized based on parallel programming.Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads,and the whole quality control procedure was completed within 20 min.Therefore,Meta-QC-Chain provides a comprehensive,useful and high-performance QC tool for metagenomic data.Meta-QC-Chain is publicly available for free at:http://computationalbioenergy.org/meta-qc-chain.html. 相似文献
19.
Russian Journal of Marine Biology - A method for estimating the integral microalgae biomass beneath a unit surface was designed based on a mathematical model for phytoplankton vital function in the... 相似文献
20.
ClaMS - "Classifier for Metagenomic Sequences" - is a Java application for binning assembled contigs in metagenomes using user-specified training sets and initial parameters. Since ClaMS trains on sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; ClaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 GH× Intel Core 2 Duo processor and 2 GB RAM. ClaMS is meant to be a desktop application for biologists and can be run on any machine under any Operating System on which the Java Runtime Environment can be installed. 相似文献