共查询到20条相似文献,搜索用时 31 毫秒
1.
Background
Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation.Results
We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC.Conclusions
Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.2.
Elizabeth A Tindall Desiree C Petersen Stina Nikolaysen Webb Miller Stephan C Schuster Vanessa M Hayes 《BMC research notes》2010,3(1):39
Background
High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS) technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM) assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors.Findings
We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage.Conclusions
We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.3.
4.
D. Jacob C. Deborde M. Lefebvre M. Maucourt A. Moing 《Metabolomics : Official journal of the Metabolomic Society》2017,13(4):36
Introduction
Concerning NMR-based metabolomics, 1D spectra processing often requires an expert eye for disentangling the intertwined peaks.Objectives
The objective of NMRProcFlow is to assist the expert in this task in the best way without requirement of programming skills.Methods
NMRProcFlow was developed to be a graphical and interactive 1D NMR (1H & 13C) spectra processing tool.Results
NMRProcFlow (http://nmrprocflow.org), dedicated to metabolic fingerprinting and targeted metabolomics, covers all spectra processing steps including baseline correction, chemical shift calibration and alignment.Conclusion
Biologists and NMR spectroscopists can easily interact and develop synergies by visualizing the NMR spectra along with their corresponding experimental-factor levels, thus setting a bridge between experimental design and subsequent statistical analyses.5.
Korey J. Brownstein Mahmoud Gargouri William R. Folk David R. Gang 《Metabolomics : Official journal of the Metabolomic Society》2017,13(11):133
Introduction
Botanicals containing iridoid and phenylethanoid/phenylpropanoid glycosides are used worldwide for the treatment of inflammatory musculoskeletal conditions that are primary causes of human years lived with disability, such as arthritis and lower back pain.Objectives
We report the analysis of candidate anti-inflammatory metabolites of several endemic Scrophularia species and Verbascum thapsus used medicinally by peoples of North America.Methods
Leaves, stems, and roots were analyzed by ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) and partial least squares-discriminant analysis (PLS-DA) was performed in MetaboAnalyst 3.0 after processing the datasets in Progenesis QI.Results
Comparison of the datasets revealed significant and differential accumulation of iridoid and phenylethanoid/phenylpropanoid glycosides in the tissues of the endemic Scrophularia species and Verbascum thapsus.Conclusions
Our investigation identified several species of pharmacological interest as good sources for harpagoside and other important anti-inflammatory metabolites.6.
Objective
To examine the activities of residual enzymes in dried shiitake mushrooms, which are a traditional foodstuff in Japanese cuisine, for possible applications in food processing.Results
Polysaccharide-degrading enzymes remained intact in dried shiitake mushrooms and the activities of amylase, β-glucosidase and pectinase were high. A potato digestion was tested using dried shiitake powder. The enzymes reacted with potato tuber specimens to solubilize sugars even under a heterogeneous solid-state condition and that their reaction modes were different at 38 and 50 °C.Conclusion
Dried shiitake mushrooms have a potential use in food processing as an enzyme preparation.7.
N. Cesbron A.-L. Royer Y. Guitton A. Sydor B. Le Bizec G. Dervilly-Pinel 《Metabolomics : Official journal of the Metabolomic Society》2017,13(8):99
Introduction
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.8.
Sonia Liggi Christine Hinz Zoe Hall Maria Laura Santoru Simone Poddighe John Fjeldsted Luigi Atzori Julian L. Griffin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(4):52
Introduction
Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.Objectives
Merge in the same platform the steps required for metabolomics data processing.Methods
KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.Results
The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.Conclusion
KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.9.
Ying Wang Brian D. Carter Susan M. Gapstur Marjorie L. McCullough Mia M. Gaudet Victoria L. Stevens 《Metabolomics : Official journal of the Metabolomic Society》2018,14(10):129
Introduction
Processing delays after blood collection is a common pre-analytical condition in large epidemiologic studies. It is critical to evaluate the suitability of blood samples with processing delays for metabolomics analysis as it is a potential source of variation that could attenuate associations between metabolites and disease outcomes.Objectives
We aimed to evaluate the reproducibility of metabolites over extended processing delays up to 48 h. We also aimed to test the reproducibility of the metabolomics platform.Methods
Blood samples were collected from 18 healthy volunteers. Blood was stored in the refrigerator and processed for plasma at 0, 15, 30, and 48 h after collection. Plasma samples were metabolically profiled using an untargeted, ultrahigh performance liquid chromatography–tandem mass spectrometry (UPLC–MS/MS) platform. Reproducibility of 1012 metabolites over processing delays and reproducibility of the platform were determined by intraclass correlation coefficients (ICCs) with variance components estimated from mixed-effects models.Results
The majority of metabolites (approximately 70% of 1012) were highly reproducible (ICCs?≥?0.75) over 15-, 30- or 48-h processing delays. Nucleotides, energy-related metabolites, peptides, and carbohydrates were most affected by processing delays. The platform was highly reproducible with a median technical ICC of 0.84 (interquartile range 0.68–0.93).Conclusion
Most metabolites measured by the UPLC–MS/MS platform show acceptable reproducibility up to 48-h processing delays. Metabolites of certain pathways need to be interpreted cautiously in relation to outcomes in epidemiologic studies with prolonged processing delays.10.
Background
Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets.Methods
Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method.Results
Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score?=?0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score?<?0.80) and/or failed to process one or more datasets.Conclusions
This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.11.
Identification of pathogens in culture-negative infective endocarditis cases by metagenomic analysis
Jun Cheng Huan Hu Yue Kang Weizhi Chen Wei Fang Kaijuan Wang Qian Zhang Aisi Fu Shuilian Zhou Chen Cheng Qingqing Cao Feiyan Wang Shela Lee Zhou Zhou 《Annals of clinical microbiology and antimicrobials》2018,17(1):43
Background
Pathogens identification is critical for the proper diagnosis and precise treatment of infective endocarditis (IE). Although blood and valve cultures are the gold standard for IE pathogens detection, many cases are culture-negative, especially in patients who had received long-term antibiotic treatment, and precise diagnosis has therefore become a major challenge in the clinic. Metagenomic sequencing can provide both information on the pathogenic strain and the antibiotic susceptibility profile of patient samples without culturing, offering a powerful method to deal with culture-negative cases.Methods
To assess the feasibility of a metagenomic approach to detect the causative pathogens in resected valves from IE patients, we employed both next-generation sequencing and Oxford Nanopore Technologies MinION nanopore sequencing for pathogens and antimicrobial resistance detection in seven culture-negative IE patients. Using our in-house developed bioinformatics pipeline, we analyzed the sequencing results generated from both platforms for the direct identification of pathogens from the resected valves of seven clinically culture-negative IE patients according to the modified Duke criteria.Results
Our results showed both metagenomics methods can be applied for the causative pathogen detection in all IE samples. Moreover, we were able to simultaneously characterize respective antimicrobial resistance features.Conclusion
Metagenomic methods for IE detection can provide clinicians with valuable information to diagnose and treat IE patients after valve replacement surgery. However, more efforts should be made to optimize protocols for sample processing, sequencing and bioinformatics analysis.12.
David Sadigursky Lucas Cortizo Garcia Rodrigo Rêgo Martins Gustavo Castro De Queiroz Rogério Jamil Fernandes Carneiro Paulo Oliveira Colavolpe 《Journal of medical case reports》2017,11(1):351
Background
There are several reports on anatomical differences of the meniscus. However, there are only a few reports on abnormalities in both menisci and anatomical differences in anterior cruciate ligament insertions.Case presentation
This is a case report of a 36-year-old Hispanic man presenting symptoms, including knee pain, locking, and effusion, with an anatomical abnormality of the menisci corresponding to the fusion of the posterior horns of the menisci in tandem with the insertion of the posterior meniscus fibers in the anterior cruciate ligament.Conclusions
This is the first study describing a meniscus anatomical variant with isolated posterior junction of the posterior horn with an anomalous insertion to the anterior cruciate ligament. The recognition of meniscus variants is important as they can be misinterpreted for more significant pathology on magnetic resonance images.13.
Christina Nieuwoudt Samantha J. Jones Angela Brooks-Wilson Jinko Graham 《Source code for biology and medicine》2018,13(1):2
Background
Studies that ascertain families containing multiple relatives affected by disease can be useful for identification of causal, rare variants from next-generation sequencing data.Results
We present the R package SimRVPedigree, which allows researchers to simulate pedigrees ascertained on the basis of multiple, affected relatives. By incorporating the ascertainment process in the simulation, SimRVPedigree allows researchers to better understand the within-family patterns of relationship amongst affected individuals and ages of disease onset.Conclusions
Through simulation, we show that affected members of a family segregating a rare disease variant tend to be more numerous and cluster in relationships more closely than those for sporadic disease. We also show that the family ascertainment process can lead to apparent anticipation in the age of onset. Finally, we use simulation to gain insight into the limit on the proportion of ascertained families segregating a causal variant. SimRVPedigree should be useful to investigators seeking insight into the family-based study design through simulation.14.
15.
Background
Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.Results
We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.Conclusions
These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.16.
Takeo Moriya Yoshinori Satomi Hiroyuki Kobayashi 《Metabolomics : Official journal of the Metabolomic Society》2016,12(12):179
Introduction
Human plasma metabolomics offer powerful tools for understanding disease mechanisms and identifying clinical biomarkers for diagnosis, efficacy prediction and patient stratification. Although storage conditions can affect the reliability of data from metabolites, strict control of these conditions remains challenging, particularly when clinical samples are included from multiple centers. Therefore, it is necessary to consider stability profiles of each analyte.Objectives
The purpose of this study was to extract unstable metabolites from vast metabolome data and identify factors that cause instability.Method
Plasma samples were obtained from five healthy volunteers, were stored under ten different conditions of time and temperature and were quantified using leading-edge metabolomics. Instability was evaluated by comparing quantitation values under each storage condition with those obtained after ?80 °C storage.Result
Stability profiling of the 992 metabolites showed time- and temperature-dependent increases in numbers of significantly changed metabolites. This large volume of data enabled comparisons of unstable metabolites with their related molecules and allowed identification of causative factors, including compound-specific enzymatic activity in plasma and chemical reactivity. Furthermore, these analyses indicated extreme instability of 1-docosahexaenoylglycerol, 1-arachidonoylglycerophosphate, cystine, cysteine and N6-methyladenosine.Conclusion
A large volume of data regarding storage stability was obtained. These data are a contribution to the discovery of biomarker candidates without misselection based on unreliable values and to the establishment of suitable handling procedures for targeted biomarker quantification.17.
Edoardo Saccenti Age K. Smilde José Camacho 《Metabolomics : Official journal of the Metabolomic Society》2018,14(6):73
Introduction
Modern omics experiments pertain not only to the measurement of many variables but also follow complex experimental designs where many factors are manipulated at the same time. This data can be conveniently analyzed using multivariate tools like ANOVA-simultaneous component analysis (ASCA) which allows interpretation of the variation induced by the different factors in a principal component analysis fashion. However, while in general only a subset of the measured variables may be related to the problem studied, all variables contribute to the final model and this may hamper interpretation.Objectives
We introduce here a sparse implementation of ASCA termed group-wise ANOVA-simultaneous component analysis (GASCA) with the aim of obtaining models that are easier to interpret.Methods
GASCA is based on the concept of group-wise sparsity introduced in group-wise principal components analysis where structure to impose sparsity is defined in terms of groups of correlated variables found in the correlation matrices calculated from the effect matrices.Results
The GASCA model, containing only selected subsets of the original variables, is easier to interpret and describes relevant biological processes.Conclusions
GASCA is applicable to any kind of omics data obtained through designed experiments such as, but not limited to, metabolomic, proteomic and gene expression data.18.
19.
Background
Metagenomics method directly sequences and analyses genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenomic data analyses include taxonomical and functional component examination of all genomes in the microbial community. Metagenomic data analysis is both data- and computation- intensive, which requires extensive computational power. Most of the current metagenomic data analysis softwares were designed to be used on a single computer or single computer clusters, which could not match with the fast increasing number of large metagenomic projects' computational requirements. Therefore, advanced computational methods and pipelines have to be developed to cope with such need for efficient analyses.Result
In this paper, we proposed Parallel-META, a GPU- and multi-core-CPU-based open-source pipeline for metagenomic data analysis, which enabled the efficient and parallel analysis of multiple metagenomic datasets and the visualization of the results for multiple samples. In Parallel-META, the similarity-based database search was parallelized based on GPU computing and multi-core CPU computing optimization. Experiments have shown that Parallel-META has at least 15 times speed-up compared to traditional metagenomic data analysis method, with the same accuracy of the results http://www.computationalbioenergy.org/parallel-meta.html.Conclusion
The parallel processing of current metagenomic data would be very promising: with current speed up of 15 times and above, binning would not be a very time-consuming process any more. Therefore, some deeper analysis of the metagenomic data, such as the comparison of different samples, would be feasible in the pipeline, and some of these functionalities have been included into the Parallel-META pipeline.20.
Discovery of A-type procyanidin dimers in yellow raspberries by untargeted metabolomics and correlation based data analysis 总被引:1,自引:0,他引:1
Elisabete Carvalho Pietro Franceschi Antje Feller Lorena Herrera Luisa Palmieri Panagiotis Arapitsas Samantha Riccadonna Stefan Martens 《Metabolomics : Official journal of the Metabolomic Society》2016,12(9):144