首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background

The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.

Results

We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.

Conclusions

We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.  相似文献   

3.
Epigenome mapping consortia are generating resources of tremendous value for studying epigenetic regulation. To maximize their utility and impact, new tools are needed that facilitate interactive analysis of epigenome datasets. Here we describe EpiExplorer, a web tool for exploring genome and epigenome data on a genomic scale. We demonstrate EpiExplorer's utility by describing a hypothesis-generating analysis of DNA hydroxymethylation in relation to public reference maps of the human epigenome. All EpiExplorer analyses are performed dynamically within seconds, using an efficient and versatile text indexing scheme that we introduce to bioinformatics. EpiExplorer is available at http://epiexplorer.mpi-inf.mpg.de.  相似文献   

4.
The rapid developments in computer techniques and the availability of large datasets open new perspectives for vegetation analysis aiming at better understanding of the ecology and functioning of ecosystems and underlying mechanisms. Information systems prove to be helpful tools in this new field. Such information systems may integrate different biological levels, viz. species, community and landscape. They incorporate a GIS platform for the visualization of the various layers of information, enabling the analysis of patterns and processes which relate the individual levels. An example of a newly developed information system is SynBioSys Europe, an initiative of the European Vegetation Survey (EVS). For the individual levels of the system, specific sources are available, notably national and regional Turboveg databases for the community level and data from the recently published European Map of Natural Vegetation for the landscape level. The structure of the system and its underlying databases allow user‐defined queries. With regard to its application, such information systems may play a vital role in European nature planning, such as the implementation the EU‐program Natura 2000. To illustrate the scope and perspectives of the program, some examples from The Netherlands are presented. They are dealing with long‐term changes in grassland ecosystems, including shifts in distribution, floristic composition, and ecological indicator values.  相似文献   

5.
A two-step method for the classification of very large phytosociological data sets is demonstrated. Stratification of the set is suggested either by area in the case of a large and geographically heterogeneous region, or by vegetation type in the case of a set covering all the plant communities of an area. First, cluster analysis is performed on each subset. The resulting basic clusters are summarized by calculating a ‘synoptic coverabundance value’ for each species in each cluster. All basic clusters are then subjected to the same procedure. Second order clusters are interpreted as community types. The synoptic value proposed reflects both frequency and average cover-abundance. It is emphasized that a species should have a high frequency to be used as a diagnostic species. The method is demonstrated with a set of 1138 relevés and 250 species of coastal sand dune vegetation in Yucatan treated with the programs TWINSPAN and TABORD. Some problems and perspectives of the approach are discussed in the light of hierarchy theory and classification theory.  相似文献   

6.

Background

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.

Results

To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS – Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.

Conclusions

RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0503-6) contains supplementary material, which is available to authorized users.  相似文献   

7.
8.
9.
10.
Metagenomics approaches represent an important way to acquire information on the microbial communities present in complex environments like soil. However, to what extent do these approaches provide us with a true picture of soil microbial diversity? Soil is a challenging environment to work with. Its physicochemical properties affect microbial distributions inside the soil matrix, metagenome extraction and its subsequent analyses. To better understand the bias inherent to soil metagenome 'processing', we focus on soil physicochemical properties and their effects on the perceived bacterial distribution. In the light of this information, each step of soil metagenome processing is then discussed, with an emphasis on strategies for optimal soil sampling. Then, the interaction of cells and DNA with the soil matrix and the consequences for microbial DNA extraction are examined. Soil DNA extraction methods are compared and the veracity of the microbial profiles obtained is discussed. Finally, soil metagenomic sequence analysis and exploitation methods are reviewed.  相似文献   

11.
Separation of very large DNA molecules by gel electrophoresis.   总被引:18,自引:6,他引:12       下载免费PDF全文
Very large DNA molecules were separated by electrophoresis in horizontal slab gels of dilute agarose. Conditions of electrophoresis were developed using intact DNA molecules from the bacterial viruses lambda, T4 and G. Their DNAs have molecular weights (M) of 32 million, 120 million, and 500 million, respectively. Several electrophoresis conditions were found which give sufficiently high mobilities and large differences that these DNAs are separated in a short time. Electrophoresis in 0.1% agarose at 2.5 V/cm of gel length separates T4 and lambda DNAs by 2.0 cm, and G and T4 DNAs by 1.0 cm in only 10 hr. With some conditions DNA mobilities are directly proportional to log M for M values from 10 to 500 million. The procedures used will allow rapid molecular weight determination and separation of very large DNA molecules.  相似文献   

12.
Heteroduplex analysis (HA) has proven to be a robust tool for mutation detection. HA by capillary array electrophoresis (HA-CAE) was developed to increase throughput and allow the scanning of large multiexon genes in multicapillary DNA sequencers. HA-CAE is a straightforward and high-throughput technique to detect both known and novel DNA variants with a high level of sensitivity and specificity. It consists of only three steps: multiplex-PCR using fluorescently labeled primers, heteroduplex formation and electrophoresis in a multicapillary DNA sequencer. It allows, e.g., the complete coding and flanking intronic sequences of BRCA1 and BRCA2 genes from two patients (approximately 25 kb each) to be scanned in a single run of a 16-capillary sequencer, and has enabled us to detect 150 different mutations to date (both single nucleotide substitutions, or SNSs, and small insertions/deletions). Here, we describe the protocol developed in our laboratory to scan BRCA1, BRCA2, MLH1, MSH2 and MSH6 genes using an ABI3130XL sequencer. This protocol could be adapted to other instruments or to the study of other large multiexon genes and can be completed in 7-8 h.  相似文献   

13.
The laser-induced pH jump (Gutman, M. and Huppert, D.J. (1979) Biochem. Biophys. Methods 1, 9–19) has a time resolution capable of measuring the diffusion-controlled rate constant of proton binding. In the present study we employed this technique for measuring the kinetics of protonation-deprotonation of surface groups of macromolecules.The heterogeneous surface of proteins excludes them from serving as a simple model, therefore we used micelles of a neutral detergent (Brij 58) as a high molecular weight structure. The charge was varied by the addition of a low concentration of sodium dodecyl sulfate and the surface group with which the protons react was an adsorbed pH indicator (bromocresol green or neutral red).The dissociation of a proton from adsorbed bromocresol green is slower than that from free indicator. This effect is attributed to the enhanced stabilization of the acid form of the indicator in the pallisade region of the micelle. The pK shift of bromocresol green adsorbed on neutral micelles is thus quantitatively accounted for by the decreased rate of proton dissociation. Indicators such as neutral red, which are more lipid soluble in their alkaline form, do not exhibit such decelerated proton dissociation in their adsorbed state nor a pK shift on adsorption to neutral micelles.The protonation of an indicator is a diffusion-controlled reaction, whether it is free in solution or adsorbed on micelles. By varying the electric charge of the micelle this rate can be accelerated or decelerated depending on the total charge of the micelle. The micellar charge calculated from this method was corroborated by other measurements which rely only on equilibrium parameters.The high time resulation of the pH jump is exemplified by the ability to estimate the diffusion coefficient of protons through the hydrated shell of the micelle.  相似文献   

14.
Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.  相似文献   

15.
Dense sets of hundreds of thousands of markers have been developed for genome-wide association studies. These marker sets are also beneficial for linkage analysis of large, deep pedigrees containing distantly related cases. It is impossible to analyse jointly all genotypes in large pedigrees using the Lander–Green Algorithm, however, as marker density increases it becomes less crucial to analyse all individuals’ genotypes simultaneously. In this report, an approximate multipoint non-parametric technique is described, where large pedigrees are split into many small pedigrees, each containing just two cases. This technique is demonstrated, using phased data from the International Hapmap Project to simulate sets of 10,000, 50,000 and 250,000 markers, showing that it becomes increasingly accurate as more markers are genotyped. This method allows routine linkage analysis of large families with dense marker sets and represents a more easily applied alternative to Monte Carlo Markov Chain methods.  相似文献   

16.
The increased availability of both open ecological data, and software to interact with it, allows the fast collection and integration of information at all spatial and taxonomic scales. This offers the opportunity to address macroecological questions in a cost‐effective way. In this contribution, we illustrate this approach by forecasting the structure of a stream food web at the global scale. In so doing, we highlight the most salient issues needing to be addressed before this approach can be used with a high degree of confidence.  相似文献   

17.
Molecular weight separation of very large DNA   总被引:1,自引:0,他引:1  
Gel electrophoresis has many applications in parasitology, especially for the separation of enzymes, immunoglobulins and DNA, but the ability to separate molecules based on size is usually restricted to within the upper and lower ranges o f molecular weight. These limitations are particularly evident in mocromolecular DNA electrophoresis, although recent innovations in ogorose gel electrophoresis have substantially reduced these boundaries and are permitting the separation of very large DNA molecules and intact chromosomes of many organisms. In this article, Hugh Dawkins explains these techniques and their principle variants.  相似文献   

18.
The extractabilities of plasmids of different sizes by the sodium lauryl sulfate (SDS)-alkali procedure were compared using either sodium acetate or potassium acetate buffer as the neutralizing agent. There was a selective loss of large plasmids (above 100 kb) when the potassium salt was used. When N-lauryl sarcosine instead of SDS was used as the detergent, no loss of large plasmids occurred in the presence of potassium salt. A comparison of the kinetics of precipitate formation with sodium acetate and potassium acetate indicated that the rate and the amount of lauryl sulfate precipitated were lower with the sodium salt. It is suggested that faster precipitation of lauryl sulfate with potassium acetate leads to trapping of large denatured plasmids that cannot renature as fast as the small ones.  相似文献   

19.
Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large‐scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web‐enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially‐averaged time series values filtered by user‐specified areas of interest, and categorical coverage fractions for various land‐use types.  相似文献   

20.
H Tse  AK Tsang  HW Tsoi  AS Leung  CC Ho  SK Lau  PC Woo  KY Yuen 《PloS one》2012,7(8):e43986
The discovery of novel viruses in animals expands our knowledge of viral diversity and potentially emerging zoonoses. High-throughput sequencing (HTS) technology gives millions or even billions of sequence reads per run, allowing a comprehensive survey of the genetic content within a sample without prior nucleic acid amplification. In this study, we screened 156 rectal swab samples from apparently healthy bats (n = 96), pigs (n = 9), cattles (n = 9), stray dogs (n = 11), stray cats (n = 11) and monkeys (n = 20) using a HTS metagenomics approach. The complete genome of a novel papillomavirus (PV), Miniopterus schreibersii papillomavirus type 1 (MscPV1), with L1 of 60% nucleotide identity to Canine papillomavirus (CPV6), was identified in a specimen from a Common Bent-wing Bat (M. schreibersii). It is about 7.5kb in length, with a G+C content of 45.8% and a genomic organization similar to that of other PVs. Despite the higher nucleotide identity between the genomes of MscPV1 and CPV6, maximum-likelihood phylogenetic analysis of the L1 gene sequence showed that MscPV1 and Erethizon dorsatum papillomavirus (EdPV1) are most closely related. Estimated divergence time of MscPV1 from the EdPV1/MscPV1 common ancestor was approximately 60.2–91.9 millions of years ago, inferred under strict clocks using the L1 and E1 genes. The estimates were limited by the lack of reliable calibration points from co-divergence because of possible host shifts. As the nucleotide sequence of this virus only showed limited similarity with that of related animal PVs, the conventional approach of PCR using consensus primers would be unlikely to have detected the novel virus in the sample. Unlike the first bat papillomavirus RaPV1, MscPV1 was found in an asymptomatic bat with no apparent mucosal or skin lesions whereas RaPV1 was detected in the basosquamous carcinoma of a fruit bat Rousettus aegyptiacus. We propose MscPV1 as the first member of the novel Dyolambda-papillomavirus genus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号