共查询到20条相似文献,搜索用时 15 毫秒
1.
It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA). 相似文献
2.
Todd J Treangen Sergey Koren Daniel D Sommer Bo Liu Irina Astrovskaya Brian Ondov Aaron E Darling Adam M Phillippy Mihai Pop 《Genome biology》2013,14(1):R2
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. 相似文献
3.
4.
When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper. 相似文献
5.
Vladimir Smirnov 《PLoS computational biology》2021,17(10)
Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS. 相似文献
6.
Zev N. Kronenberg Edward J. Osborne Kelsey R. Cone Brett J. Kennedy Eric T. Domyan Michael D. Shapiro Nels C. Elde Mark Yandell 《PLoS computational biology》2015,11(12)
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.相似文献
7.
Ronghai Cheng Ross Ka-Kit Leung Yao Chen Yidan Pan Yin Tong Zhoufang Li Luwen Ning Xuefeng B. Ling Jiankui He 《PloS one》2015,10(10)
We present Virtual Pharmacist, a web-based platform that takes common types of high-throughput data, namely microarray SNP genotyping data, FASTQ and Variant Call Format (VCF) files as inputs, and reports potential drug responses in terms of efficacy, dosage and toxicity at one glance. Batch submission facilitates multivariate analysis or data mining of targeted groups. Individual analysis consists of a report that is readily comprehensible to patients and practioners who have basic knowledge in pharmacology, a table that summarizes variants and potential affected drug response according to the US Food and Drug Administration pharmacogenomic biomarker labeled drug list and PharmGKB, and visualization of a gene-drug-target network. Group analysis provides the distribution of the variants and potential affected drug response of a target group, a sample-gene variant count table, and a sample-drug count table. Our analysis of genomes from the 1000 Genome Project underlines the potentially differential drug responses among different human populations. Even within the same population, the findings from Watson’s genome highlight the importance of personalized medicine. Virtual Pharmacist can be accessed freely at http://www.sustc-genome.org.cn/vp or installed as a local web server. The codes and documentation are available at the GitHub repository (https://github.com/VirtualPharmacist/vp). Administrators can download the source codes to customize access settings for further development. 相似文献
8.
《Standards in genomic sciences》2013,8(2):228-238
Turneriella parva Levett et al. 2005 is the only species of the genus Turneriella which was established as a result of the reclassification of Leptospira parva Hovind-Hougen et al. 1982. Together with Leptonema and Leptospira, Turneriella constitutes the family Leptospiraceae, within the order Spirochaetales. Here we describe the features of this free-living aerobic spirochete together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the genus Turneriella and the 13th member of the family Leptospiraceae for which a complete or draft genome sequence is now available. The 4,409,302 bp long genome with its 4,169 protein-coding and 45 RNA genes is part of the Genomic
Encyclopedia of
Bacteria and
Archaea project. 相似文献
9.
Mohamed Sassi Catherine Robert Didier Raoult Michel Drancourt 《Standards in genomic sciences》2013,8(2):306-317
Mycobacterium simiae is a non-tuberculosis mycobacterium causing pulmonary infections in both immunocompetent and imunocompromized patients. We announce the draft genome sequence of M. simiae DSM 44165T. The 5,782,968-bp long genome with 65.15% GC content (one chromosome, no plasmid) contains 5,727 open reading frames (33% with unknown function and 11 ORFs sizing more than 5000 -bp), three rRNA operons, 52 tRNA, one 66-bp tmRNA matching with tmRNA tags from Mycobacterium avium, Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium microti, Mycobacterium marinum, and Mycobacterium africanum and 389 DNA repetitive sequences. Comparing ORFs and size distribution between M. simiae and five other Mycobacterium species M. simiae clustered with M. abscessus and M. smegmatis. A 40-kb prophage was predicted in addition to two prophage-like elements, 7-kb and 18-kb in size, but no mycobacteriophage was seen after the observation of 106
M. simiae cells. Fifteen putative CRISPRs were found. Three genes were predicted to encode resistance to aminoglycosides, betalactams and macrolide-lincosamide-streptogramin B. A total of 163 CAZYmes were annotated. M. simiae contains ESX-1 to ESX-5 genes encoding for a type-VII secretion system. Availability of the genome sequence may help depict the unique properties of this environmental, opportunistic pathogen. 相似文献
10.
Shenghan Gao Xiaofei Yang Jianyong Sun Xixi Zhao Bo Wang Kai Ye 《Molecular biology and evolution》2022,39(3)
Significant improvements in genome sequencing and assembly technology have led to increasing numbers of high-quality genomes, revealing complex evolutionary scenarios such as multiple whole-genome duplication events, which hinders ancestral genome reconstruction via the currently available computational frameworks. Here, we present the Inferring Ancestor Genome Structure (IAGS) framework, a novel block/endpoint matching optimization strategy with single-cut-or-join distance, to allow ancestral genome reconstruction under both simple (single-copy ancestor) and complex (multicopy ancestor) scenarios. We evaluated IAGS with two simulated data sets and applied it to four different real evolutionary scenarios to demonstrate its performance and general applicability. IAGS is available at https://github.com/xjtu-omics/IAGS. 相似文献
11.
Elizabeth T. Bartom Masha Kocherginsky Bidur Paudel Aparajitha Vaidyanathan Ashley Haluck-Kangas Monal Patel Kaitlyn L. OShea Andrea E. Murmann Marcus E. Peter 《PLoS computational biology》2022,18(3)
microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3’ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2–7 of any short (s)RNA that can enter the RISC. Therefore, we developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. We provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). Our methods (found at https://github.com/ebartom/SPOROS and at Code Ocean: https://doi.org/10.24433/CO.1732496.v1) are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings. 相似文献
12.
Haibao Tang Xingtan Zhang Chenyong Miao Jisen Zhang Ray Ming James C Schnable Patrick S Schnable Eric Lyons Jianguo Lu 《Genome biology》2015,16(1)
The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS. 相似文献
13.
Pol Castellano-Escuder Raúl Gonzlez-Domínguez Francesc Carmona-Pontaque Cristina Andrs-Lacueva Alex Snchez-Pla 《PLoS computational biology》2021,17(7)
Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively. 相似文献
14.
Michael Visser Sofiya N. Parshina Joana I. Alves Diana Z. Sousa Inês A. C. Pereira Gerard Muyzer Jan Kuever Alexander V. Lebedinsky Jasper J. Koehorst Petra Worm Caroline M. Plugge Peter J. Schaap Lynne A. Goodwin Alla Lapidus Nikos C. Kyrpides Janine C. Detter Tanja Woyke Patrick Chain Karen W. Davenport Stefan Spring Manfred Rohde Hans Peter Klenk Alfons J.M. Stams 《Standards in genomic sciences》2014,9(3):655-675
15.
Thomas Riedel Stefan Spring Anne Fiebig J?rn Petersen Nikos C. Kyrpides Markus G?ker Hans-Peter Klenk 《Standards in genomic sciences》2014,9(3):1333-1345
Salipiger mucosus Martínez-Cànovas et al. 2004 is the type species of the genus Salipiger, a moderately halophilic and exopolysaccharide-producing representative of the Roseobacter lineage within the alphaproteobacterial family Rhodobacteraceae. Members of this family were shown to be the most abundant bacteria especially in coastal and polar waters, but were also found in microbial mats and sediments. Here we describe the features of the S. mucosus strain DSM 16094T together with its genome sequence and annotation. The 5,689,389-bp genome sequence consists of one chromosome and several extrachromosomal elements. It contains 5,650 protein-coding genes and 95 RNA genes. The genome of S. mucosus DSM 16094T was sequenced as part of the activities of the Transregional Collaborative Research Center 51 (TRR51) funded by the German Research Foundation (DFG). 相似文献
16.
17.
18.
Thomas W. Laver Elisa De Franco Matthew B. Johnson Kashyap A. Patel Sian Ellard Michael N. Weedon Sarah E. Flanagan Matthew N. Wakeling 《PLoS computational biology》2022,18(3)
Identifying copy number variants (CNVs) can provide diagnoses to patients and provide important biological insights into human health and disease. Current exome and targeted sequencing approaches cannot detect clinically and biologically-relevant CNVs outside their target area. We present SavvyCNV, a tool which uses off-target read data from exome and targeted sequencing data to call germline CNVs genome-wide. Up to 70% of sequencing reads from exome and targeted sequencing fall outside the targeted regions. We have developed a new tool, SavvyCNV, to exploit this ‘free data’ to call CNVs across the genome. We benchmarked SavvyCNV against five state-of-the-art CNV callers using truth sets generated from genome sequencing data and Multiplex Ligation-dependent Probe Amplification assays. SavvyCNV called CNVs with high precision and recall, outperforming the five other tools at calling CNVs genome-wide, using off-target or on-target reads from targeted panel and exome sequencing. We then applied SavvyCNV to clinical samples sequenced using a targeted panel and were able to call previously undetected clinically-relevant CNVs, highlighting the utility of this tool within the diagnostic setting. SavvyCNV outperforms existing tools for calling CNVs from off-target reads. It can call CNVs genome-wide from targeted panel and exome data, increasing the utility and diagnostic yield of these tests. SavvyCNV is freely available at https://github.com/rdemolgen/SavvySuite. 相似文献
19.
Spyridon Ntougias Alla Lapidus James Han Konstantinos Mavromatis Amrita Pati Amy Chen Hans-Peter Klenk Tanja Woyke Constantinos Fasseas Nikos C. Kyrpides Georgios I. Zervakis 《Standards in genomic sciences》2014,9(3):783-793
Olivibacter sitiensis Ntougias et al. 2007 is a member of the family Sphingobacteriaceae, phylum Bacteroidetes. Members of the genus Olivibacter are phylogenetically diverse and of significant interest. They occur in diverse habitats, such as rhizosphere and contaminated soils, viscous wastes, composts, biofilter clean-up facilities on contaminated sites and cave environments, and they are involved in the degradation of complex and toxic compounds. Here we describe the features of O. sitiensis AW-6T, together with the permanent-draft genome sequence and annotation. The organism was sequenced under the Genomic Encyclopedia for Bacteria and Archaea (GEBA) project at the DOE Joint Genome Institute and is the first genome sequence of a species within the genus Olivibacter. The genome is 5,053,571 bp long and is comprised of 110 scaffolds with an average GC content of 44.61%. Of the 4,565 genes predicted, 4,501 were protein-coding genes and 64 were RNA genes. Most protein-coding genes (68.52%) were assigned to a putative function. The identification of 2-keto-4-pentenoate hydratase/2-oxohepta-3-ene-1,7-dioic acid hydratase-coding genes indicates involvement of this organism in the catechol catabolic pathway. In addition, genes encoding for β-1,4-xylanases and β-1,4-xylosidases reveal the xylanolytic action of O. sitiensis. 相似文献
20.
Laura Tatjer Almudena Sacristán-Reviriego Carlos Casado Asier González Boris Rodríguez-Porrata Lorena Palacios David Canadell Albert Serra-Cardona Humberto Martín María Molina Joaquín Ari?o 《Genetics》2016,202(1):141-156
The Saccharomyces cerevisiae type 2C protein phosphatase Ptc1 is required for a wide variety of cellular functions, although only a few cellular targets have been identified. A genetic screen in search of mutations in protein kinase–encoding genes able to suppress multiple phenotypic traits caused by the ptc1 deletion yielded a single gene, MKK1, coding for a MAPK kinase (MAPKK) known to activate the cell-wall integrity (CWI) Slt2 MAPK. In contrast, mutation of the MKK1 paralog, MKK2, had a less significant effect. Deletion of MKK1 abolished the increased phosphorylation of Slt2 induced by the absence of Ptc1 both under basal and CWI pathway stimulatory conditions. We demonstrate that Ptc1 acts at the level of the MAPKKs of the CWI pathway, but only the Mkk1 kinase activity is essential for ptc1 mutants to display high Slt2 activation. We also show that Ptc1 is able to dephosphorylate Mkk1
in vitro. Our results reveal the preeminent role of Mkk1 in signaling through the CWI pathway and strongly suggest that hyperactivation of Slt2 caused by upregulation of Mkk1 is at the basis of most of the phenotypic defects associated with lack of Ptc1 function. 相似文献