期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs

Manuel Tognon Vincenzo Bonnici Erik Garrison Rosalba Giugno Luca Pinello 《PLoS computational biology》2021,17(9)

相似文献

2.

iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species

Pengyu Zhang Hongming Zhang Hao Wu 《Nucleic acids research》2022,50(18):10278

相似文献

3.

A Real-Time All-Atom Structural Search Engine for Proteins

Gabriel Gonzalez Brett Hannigan William F. DeGrado 《PLoS computational biology》2014,10(7)

Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).

This is a PLOS Computational Biology Software Article

相似文献

4.

CHANCE: comprehensive software for quality control and validation of ChIP-seq data

Aaron Diaz Abhinav Nellore Jun S Song 《Genome biology》2012,13(10):R98

ChIP-seq is a powerful method for obtaining genome-wide maps of protein-DNA interactions and epigenetic modifications. CHANCE (CHip-seq ANalytics and Confidence Estimation) is a standalone package for ChIP-seq quality control and protocol optimization. Our user-friendly graphical software quickly estimates the strength and quality of immunoprecipitations, identifies biases, compares the user''s data with ENCODE''s large collection of published datasets, performs multi-sample normalization, checks against quantitative PCR-validated control regions, and produces informative graphical reports. CHANCE is available at https://github.com/songlab/chance. 相似文献

5.

Annotating RNA motifs in sequences and alignments

Paul P. Gardner Hisham Eldai 《Nucleic acids research》2015,43(2):691-698

相似文献

6.

Learning,visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Zhengqiao Zhao Stephen Woloszynek Felix Agbavor Joshua Chang Mell Bahrad A. Sokhansanj Gail L. Rosen 《PLoS computational biology》2021,17(9)

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool). 相似文献

7.

OKseqHMM: a genome-wide replication fork directionality analysis toolkit

Yaqun Liu Xia Wu Yves d&#x;Aubenton-Carafa Claude Thermes Chun-Long Chen 《Nucleic acids research》2023,51(4):e22

相似文献

8.

POMAShiny: A user-friendly web-based workflow for metabolomics and proteomics data analysis

Pol Castellano-Escuder Raúl Gonzlez-Domínguez Francesc Carmona-Pontaque Cristina Andrs-Lacueva Alex Snchez-Pla 《PLoS computational biology》2021,17(7)

Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively. 相似文献

9.

Exome-Wide Somatic Microsatellite Variation Is Altered in Cells with DNA Repair Deficiencies

Zalman Vaksman Natalie C. Fonville Hongseok Tae Harold R. Garner 《PloS one》2014,9(11)

Microsatellites (MST), tandem repeats of 1–6 nucleotide motifs, are mutational hot-spots with a bias for insertions and deletions (INDELs) rather than single nucleotide polymorphisms (SNPs). The majority of MST instability studies are limited to a small number of loci, the Bethesda markers, which are only informative for a subset of colorectal cancers. In this paper we evaluate non-haplotype alleles present within next-gen sequencing data to evaluate somatic MST variation (SMV) within DNA repair proficient and DNA repair defective cell lines. We confirm that alleles present within next-gen data that do not contribute to the haplotype can be reliably quantified and utilized to evaluate the SMV without requiring comparisons of matched samples. We observed that SMV patterns found in DNA repair proficient cell lines without DNA repair defects, MCF10A, HEK293 and PD20 RV:D2, had consistent patterns among samples. Further, we were able to confirm that changes in SMV patterns in cell lines lacking functional BRCA2, FANCD2 and mismatch repair were consistent with the different pathways perturbed. Using this new exome sequencing analysis approach we show that DNA instability can be identified in a sample and that patterns of instability vary depending on the impaired DNA repair mechanism, and that genes harboring minor alleles are strongly associated with cancer pathways. The MST Minor Allele Caller used for this study is available at https://github.com/zalmanv/MST_minor_allele_caller. 相似文献

10.

Wham: Identifying Structural Variants of Biological Consequence

Zev N. Kronenberg Edward J. Osborne Kelsey R. Cone Brett J. Kennedy Eric T. Domyan Michael D. Shapiro Nels C. Elde Mark Yandell 《PLoS computational biology》2015,11(12)

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.

This is PLOS Computational Biology software paper.

相似文献

11.

HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences

Furong Tang Jiannan Chao Yanming Wei Fenglong Yang Yixiao Zhai Lei Xu Quan Zou 《Molecular biology and evolution》2022,39(8)

HAlign is a cross-platform program that performs multiple sequence alignments based on the center star strategy. Here we present two major updates of HAlign 3, which helped improve the time efficiency and the alignment quality, and made HAlign 3 a specialized program to process ultra-large numbers of similar DNA/RNA sequences, such as closely related viral or prokaryotic genomes. HAlign 3 can be easily installed via the Anaconda and Java release package on macOS, Linux, Windows subsystem for Linux, and Windows systems, and the source code is available on GitHub (https://github.com/malabz/HAlign-3). 相似文献

12.

MixMir: microRNA motif discovery from gene expression data using mixed linear models

Liyang Diao Antoine Marcais Scott Norton Kevin C. Chen 《Nucleic acids research》2014,42(17):e135

相似文献

13.

DNA G-quadruplexes for native mass spectrometry in potassium: a database of validated structures in electrospray-compatible conditions

Anirban Ghosh Eric Largy Valrie Gabelica 《Nucleic acids research》2021,49(4):2333

G-quadruplex DNA structures have become attractive drug targets, and native mass spectrometry can provide detailed characterization of drug binding stoichiometry and affinity, potentially at high throughput. However, the G-quadruplex DNA polymorphism poses problems for interpreting ligand screening assays. In order to establish standardized MS-based screening assays, we studied 28 sequences with documented NMR structures in (usually ∼100 mM) potassium, and report here their circular dichroism (CD), melting temperature (T_m), NMR spectra and electrospray mass spectra in 1 mM KCl/100 mM trimethylammonium acetate. Based on these results, we make a short-list of sequences that adopt the same structure in the MS assay as reported by NMR, and provide recommendations on using them for MS-based assays. We also built an R-based open-source application to build and consult a database, wherein further sequences can be incorporated in the future. The application handles automatically most of the data processing, and allows generating custom figures and reports. The database is included in the g4dbr package (https://github.com/EricLarG4/g4dbr) and can be explored online (https://ericlarg4.github.io/G4_database.html). 相似文献

14.

NGS-QCbox and Raspberry for Parallel,Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Mohan A. V. S. K. Katta Aamir W. Khan Dadakhalandar Doddamani Mahendar Thudi Rajeev K. Varshney 《PloS one》2015,10(10)

相似文献

15.

HiCPlotter integrates genomic data with interaction matrices

Kadir Caner Akdemir Lynda Chin 《Genome biology》2015,16(1)

相似文献

16.

tidy tree: A New Layout for Phylogenetic Trees

Simon Penel Damien M de Vienne 《Molecular biology and evolution》2022,39(10)

Many layouts exist for visualizing phylogenetic trees, allowing to display the same information (evolutionary relationships) in different ways. For large phylogenies, the choice of the layout is a key element, because the printable area is limited, and because interactive on-screen visualizers can lead to unreadable phylogenetic relationships at high zoom levels. A visual inspection of available layouts for rooted trees reveals large empty areas that one may want to fill in order to use less drawing space and eventually gain readability. This can be achieved by using the nonlayered tidy tree layout algorithm that was proposed earlier but was never used in a phylogenetic context so far. Here, we present its implementation, and we demonstrate its advantages on simulated and biological data (the measles virus phylogeny). Our results call for the integration of this new layout in phylogenetic software. We implemented the nonlayered tidy tree layout in R language as a stand-alone function (available at https://github.com/damiendevienne/non-layered-tidy-trees), as an option in the tree plotting function of the R package ape, and in the recent tool for visualizing reconciled phylogenetic trees thirdkind (https://github.com/simonpenel/thirdkind/wiki). 相似文献

17.

UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation

Shaun D. Jackman Joerg Bohlmann ?nan? Birol 《PloS one》2015,10(5)

When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper. 相似文献

18.

A multi-objective genetic algorithm to find active modules in multiplex biological networks

Elva María Novoa-del-Toro Efrn Mezura-Montes Matthieu Vignes Morgane Trzol Frdrique Magdinier Laurent Tichit Anaïs Baudot 《PLoS computational biology》2021,17(8)

The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease.Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: rf.uma-vinu@toduab.siana 相似文献

19.

BlastXtract2: Improving early exploration of (meta) genomes

Heleen de Weerd Bernd E van der Veen Marcus J Claesson 《Bioinformation》2015,11(4):173-175

To manage and intelligently mine the avalanche of genomic sequences intuitive and user-friendly graphical interfaces are required. Here we present BlastXtract2 which exclusively facilitates early exploration of un-annotated genomic and metagenomic sequences. Various formats of translated searches, including the commonly used BlastX, of multiple sequences against multiple protein databases can be uploaded to a relational database server, which can be accessed via a locally installed web-server. There, an intuitive GUI allows straightforward data-mining and enables quick detection of potential frameshifts and poorly sequenced or assembled regions, thereby contributing in making BlastXtract2 a unique and valuable tool for early exploration of (meta)genomic sequences.

Availability

Source code, documentation and an online demo version are available at https://github.com/ ClaessonLab/BlastXtract2 相似文献

20.

Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase Kinetic

Zhixing Feng Gang Fang Jonas Korlach Tyson Clark Khai Luong Xuegong Zhang Wing Wong Eric Schadt 《PLoS computational biology》2013,9(3)

DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch. 相似文献