共查询到20条相似文献,搜索用时 31 毫秒
1.
Background
Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation.Results
We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC.Conclusions
Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.2.
Alena Zablotskaya Hilde Van Esch Kevin J. Verstrepen Guy Froyen Joris R. Vermeesch 《BMC medical genomics》2018,11(1):123
Background
The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability.Methods
We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability.Results
In male DNA samples, full tandem repeat length sequences were obtained for 88–93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of >?900?bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders.Conclusions
This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.3.
Chao Xie Chin Lui Wesley Goi Daniel H. Huson Peter F. R. Little Rohan B. H. Williams 《BMC bioinformatics》2016,17(19):508
Background
Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.Results
Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.Conclusions
RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.4.
Oliver A. Hampton Adam C. English Mark Wang William J. Salerno Yue Liu Donna M. Muzny Yi Han David A. Wheeler Kim C. Worley James R. Lupski 《BMC genomics》2017,18(6):691
Background
Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.Results
We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers.Conclusions
SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.5.
Ekaterina A. Zelentsova Lyudmila V. Yanshole Olga A. Snytnikova Vadim V. Yanshole Yuri P. Tsentalovich Renad Z. Sagdeev 《Metabolomics : Official journal of the Metabolomic Society》2016,12(11):172
Introduction
The analysis of post-mortem metabolomic changes in biological fluids opens the way to develop new methods for the estimation of post-mortem interval (PMI). It may also help in the analysis of disease-induced metabolomic changes in human tissues when the postoperational samples are compared to the post-mortem samples from healthy donors.Objectives
The goals of this study are to observe and classify the post-mortem changes occurring in the rabbit blood, aqueous and vitreous humors (AH and VH), to identify the potential PMI markers among a wide range of metabolites, and also to determine which biological fluid—blood, AH or VH—is more suitable for the PMI estimation.Methods
The quantitative metabolomic profiling of samples of the rabbit serum, AH and VH taken at different PMIs has been performed with the combined use of high-frequency NMR and high-resolution LC–MS methods.Results
The quantitative levels of 61 metabolites in the rabbit serum, AH and VH at different PMIs have been measured. It has been found that the post-mortem metabolomic changes in AH and VH proceed slower than in blood, and the data scattering is lower. Among the metabolites whose concentrations increase with time, the most significant and linear growth is found for hypoxanthine, choline and glycerol.Conclusion
The obtained results suggest that the ocular fluids AH and VH may have some advantages over blood serum for the search of potential biochemical markers for the PMI estimation. Among the compounds studied in the present work, hypoxanthine, choline and glycerol give the biggest promise as the potential PMI biomarkers.6.
7.
Background
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.Methods
Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction.Results
The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource.Conclusions
THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.8.
D. Jacob C. Deborde M. Lefebvre M. Maucourt A. Moing 《Metabolomics : Official journal of the Metabolomic Society》2017,13(4):36
Introduction
Concerning NMR-based metabolomics, 1D spectra processing often requires an expert eye for disentangling the intertwined peaks.Objectives
The objective of NMRProcFlow is to assist the expert in this task in the best way without requirement of programming skills.Methods
NMRProcFlow was developed to be a graphical and interactive 1D NMR (1H & 13C) spectra processing tool.Results
NMRProcFlow (http://nmrprocflow.org), dedicated to metabolic fingerprinting and targeted metabolomics, covers all spectra processing steps including baseline correction, chemical shift calibration and alignment.Conclusion
Biologists and NMR spectroscopists can easily interact and develop synergies by visualizing the NMR spectra along with their corresponding experimental-factor levels, thus setting a bridge between experimental design and subsequent statistical analyses.9.
Zhan Zhou Shanshan Wu Jun Lai Yuan Shi Chixiao Qiu Zhe Chen Yufeng Wang Xun Gu Jie Zhou Shuqing Chen 《BMC medical genomics》2017,10(1):49
Background
Intratumor heterogeneity (ITH) poses an urgent challenge for cancer precision medicine because it can cause drug resistance against cancer target therapy and immunotherapy. The search for trunk mutations that are present in all cancer cells is therefore critical for each patient.Case presentation
In this study, we aimed to evaluate the efficiency of multiregional sequencing for the identification of trunk mutations present in all regions of a tumor as a case study. We applied multiregional whole-exome sequencing (WES) to investigate the genetic heterogeneity and homogeneity of a case of gastric carcinoma. Approximately 83% of common missense mutations present in two samples and approximately 89% of common missense mutations present in three samples were trunk mutations. Notably, trunk mutations appeared to have higher variant allele frequencies (VAFs) than non-trunk mutations.Conclusions
Our results indicate that small-scale multiregional sampling and subsequent screening of low VAF somatic mutations might be a cost-effective strategy for identifying the majority of trunk mutations in gastric carcinoma.10.
Chao Zhao Yanan Chu Yanhong Li Chengfeng Yang Yuqing Chen Xumin Wang Bin Liu 《Biotechnology letters》2017,39(1):123-131
Objectives
To analyze the microbial diversity and gene content of a thermophilic cellulose-degrading consortium from hot springs in Xiamen, China using 454 pyrosequencing for discovering cellulolytic enzyme resources.Results
A thermophilic cellulose-degrading consortium, XM70 that was isolated from a hot spring, used sugarcane bagasse as sole carbon and energy source. DNA sequencing of the XM70 sample resulted in 349,978 reads with an average read length of 380 bases, accounting for 133,896,867 bases of sequence information. The characterization of sequencing reads and assembled contigs revealed that most microbes were derived from four phyla: Geobacillus (Firmicutes), Thermus, Bacillus, and Anoxybacillus. Twenty-eight homologous genes belonging to 15 glycoside hydrolase families were detected, including several cellulase genes. A novel hot spring metagenome-derived thermophilic cellulase was expressed and characterized.Conclusions
The application value of thermostable sugarcane bagasse-degrading enzymes is shown for production of cellulosic biofuel. The practical power of using a short-read-based metagenomic approach for harvesting novel microbial genes is also demonstrated.11.
Background
Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data.Results
We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data.Conclusions
Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/).12.
Background
Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.Results
Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.Conclusions
Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.13.
Olga A. Snytnikova Anastasiya A. Khlichkina Lyudmila V. Yanshole Vadim V. Yanshole Igor A. Iskakov Elena V. Egorova Denis A. Stepakov Vladimir P. Novoselov Yuri P. Tsentalovich 《Metabolomics : Official journal of the Metabolomic Society》2017,13(1):5
Introduction
The optical elements of the eye—cornea, lens, and vitreous humor—are avascular tissues, and their nutrition and waste removal are provided by aqueous humor (AH). The AH production occurs through the active secretion and the passive diffusion/ultrafiltration of blood plasma. The comparison of the metabolomic profiles of AH and plasma is important for understanding of the mechanisms of biochemical processes and metabolite transport taking place in vivo in ocular tissues.Objectives
The work is aimed at the determination of concentrations of a wide range of most abundant metabolites in the human AH, the comparison of the metabolomic profiles of AH and serum, and the analysis of the post-mortem metabolomic changes in these two biological fluids.Methods
The quantitative metabolomic profiling was carried out with the use of two independent methods—high-frequency 1H NMR spectroscopy and HPLC with high-resolution ESI-MS detection.Results
The concentrations of 71 most abundant metabolites in blood serum and AH from living patients and human cadavers have been measured. It has been found that the level of ascorbate in AH is by two orders of magnitude higher than that in serum; the levels of other metabolites are either similar to that in serum, or differ from that by a factor of 2–5. The post-mortem metabolomic composition of both serum and AH undergoes rapid and strong changes.Conclusion
The differences between the metabolomic profiles of AH and serum for majority of metabolites can be attributed to the metabolic activity of the ocular tissues leading to the lack or excess of some metabolites, while the high concentration of ascorbate in AH demonstrates the activity of ascorbate-specific pumps at the blood-aqueous border. The post-mortem metabolomic changes are caused by the disruption of the major biochemical cycles and cell lysis. These changes should be taken into account in the analysis of disease-induced changes in post-mortem samples of the ocular tissues.14.
Lei Shang David P Gardner Weijia Xu Jamie J Cannone Daniel P Miranker Stuart Ozer Robin R Gutell 《BMC systems biology》2013,7(Z4):S13
Background
The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data.Methods
Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment.Results
The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed.Conclusions
Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately implicated in a significant number of functions within the cell. In addition, the use of bacterial 16S rRNA sequencing in the identification of the microbiome in many different environmental systems creates a need for rapid and highly accurate alignment of bacterial 16S rRNA sequences.15.
N. Cesbron A.-L. Royer Y. Guitton A. Sydor B. Le Bizec G. Dervilly-Pinel 《Metabolomics : Official journal of the Metabolomic Society》2017,13(8):99
Introduction
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.16.
Background
Somatic copy number alternations (SCNAs) can be utilized to infer tumor subclonal populations in whole genome seuqncing studies, where usually their read count ratios between tumor-normal paired samples serve as the inferring proxy. Existing SCNA based subclonal population inferring tools consider the GC bias of tumor and normal sample is of the same fature, and could be fully offset by read count ratio. However, we found that, the read count ratio on SCNA segments presents a Log linear biased pattern, which influence existing read count ratios based subclonal inferring tools performance. Currently no correction tools take into account the read ratio bias.Results
We present Pre-SCNAClonal, a tool that improving tumor subclonal population inferring by correcting GC-bias at SCNAs level. Pre-SCNAClonal first corrects GC bias using Markov chain Monte Carlo probability model, then accurately locates baseline DNA segments (not containing any SCNAs) with a hierarchy clustering model. We show Pre-SCNAClonal’s superiority to exsiting GC-bias correction methods at any level of subclonal population.Conclusions
Pre-SCNAClonal could be run independently as well as serving as pre-processing/gc-correction step in conjuntion with exsiting SCNA-based subclonal inferring tools.17.
18.
Dimitrios J. Floros Paul R. Jensen Pieter C. Dorrestein Nobuhiro Koyama 《Metabolomics : Official journal of the Metabolomic Society》2016,12(9):145
Introduction
Natural products from culture collections have enormous impact in advancing discovery programs for metabolites of biotechnological importance. These discovery efforts rely on the metabolomic characterization of strain collections.Objective
Many emerging approaches compare metabolomic profiles of such collections, but few enable the analysis and prioritization of thousands of samples from diverse organisms while delivering chemistry specific read outs.Method
In this work we utilize untargeted LC–MS/MS based metabolomics together with molecular networking to inventory the chemistries associated with 1000 marine microorganisms.Result
This approach annotated 76 molecular families (a spectral match rate of 28 %), including clinically and biotechnologically important molecules such as valinomycin, actinomycin D, and desferrioxamine E. Targeting a molecular family produced primarily by one microorganism led to the isolation and structure elucidation of two new molecules designated maridric acids A and B.Conclusion
Molecular networking guided exploration of large culture collections allows for rapid dereplication of know molecules and can highlight producers of uniques metabolites. These methods, together with large culture collections and growing databases, allow for data driven strain prioritization with a focus on novel chemistries.19.
Qiong Xu Chun-yang Li Yi Wang Hui-ping Li Bing-bing Wu Yong-hui Jiang 《BMC medical genomics》2018,11(1):92
Background
Verheij syndrome is a rare microdeletion syndrome of chromosome 8q24.3 that harbors PUF60, SCRIB, and NRBP2 genes. Subsequently, loss of function mutations in PUF60 have been found in children with clinical features significantly overlapping with Verheij.Case presentation
Here we present the first Chinese Han patient with a de novo nonsense variant (c.1357C?>?T, p.Gln453*) in PUF60 by clinical whole exome sequencing. The 5-year-old boy presents with dysmorphic facial features, intellectual disability, and growth retardation but without apparent cardiac, renal, ocular, and spinal anomalies.Conclusions
Our finding contributes to the understanding of the genotype and phenotype in PUF60 related disorder.20.
Christina Nieuwoudt Samantha J. Jones Angela Brooks-Wilson Jinko Graham 《Source code for biology and medicine》2018,13(1):2