共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
The fundamental challenge in optimally aligning homologous sequences is to define a scoring scheme that best reflects the underlying biological processes. Maximising the overall number of matches in the alignment does not always reflect the patterns by which nucleotides mutate. Efficiently implemented algorithms that can be parameterised to accommodate more complex non-linear scoring schemes are thus desirable.Results
We present Cola, alignment software that implements different optimal alignment algorithms, also allowing for scoring contiguous matches of nucleotides in a nonlinear manner. The latter places more emphasis on short, highly conserved motifs, and less on the surrounding nucleotides, which can be more diverged. To illustrate the differences, we report results from aligning 14,100 sequences from 3' untranslated regions of human genes to 25 of their mammalian counterparts, where we found that a nonlinear scoring scheme is more consistent than a linear scheme in detecting short, conserved motifs.Conclusions
Cola is freely available under LPGL from https://github.com/nedaz/cola.2.
Background
Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness.Results
We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species.Conclusions
GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign.3.
Background
Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run.Results
Here we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings.Conclusions
FASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into analysis pipelines.4.
Michał Aleksander Ciach Anna Muszewska Paweł Górecki 《Algorithms for molecular biology : AMB》2018,13(1):11
Background
Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values.Results
We investigate the locus tree inference problem as a possible alternative that combines the advantages of both approaches. We present several algorithms to solve the problem in the parsimony framework. We introduce a novel tree mapping, which allows us to obtain a heuristic solution to the problems of locus tree inference and duplication classification.Conclusions
Our approach allows for faster comparisons of gene and species trees and improves known algorithms for duplication inference in the presence of polytomies in the species trees. We have implemented our algorithms in a software tool available at https://github.com/mciach/LocusTreeInference.5.
D. Jacob C. Deborde M. Lefebvre M. Maucourt A. Moing 《Metabolomics : Official journal of the Metabolomic Society》2017,13(4):36
Introduction
Concerning NMR-based metabolomics, 1D spectra processing often requires an expert eye for disentangling the intertwined peaks.Objectives
The objective of NMRProcFlow is to assist the expert in this task in the best way without requirement of programming skills.Methods
NMRProcFlow was developed to be a graphical and interactive 1D NMR (1H & 13C) spectra processing tool.Results
NMRProcFlow (http://nmrprocflow.org), dedicated to metabolic fingerprinting and targeted metabolomics, covers all spectra processing steps including baseline correction, chemical shift calibration and alignment.Conclusion
Biologists and NMR spectroscopists can easily interact and develop synergies by visualizing the NMR spectra along with their corresponding experimental-factor levels, thus setting a bridge between experimental design and subsequent statistical analyses.6.
7.
8.
Background
Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.Results
Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.Conclusions
Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.9.
Background
Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.Methods
To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.Results
Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.Conclusions
Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.10.
Background
Miniature inverted-repeat transposable element (MITE) is a type of class II non-autonomous transposable element playing a crucial role in the process of evolution in biology. There is an urgent need to develop bioinformatics tools to effectively identify MITEs on a whole genome-wide scale. However, most of currently existing tools suffer from low ability to deal with large eukaryotic genomes.Methods
In this paper, we proposed a novel tool MiteFinderII, which was adapted from our previous algorithm MiteFinder, to efficiently detect MITEs from genomics sequences. It has six major steps: (1) build K-mer Index and search for inverted repeats; (2) filtration of inverted repeats with low complexity; (3) merger of inverted repeats; (4) filtration of candidates with low score; (5) selection of final MITE sequences; (6) selection of representative sequences.Results
To test the performance, MiteFinderII and three other existing algorithms were applied to identify MITEs on the whole genome of oryza sativa. Results suggest that MiteFinderII outperforms existing popular tools in terms of both specificity and recall. Additionally, it is much faster and more memory-efficient than other tools in the detection.Conclusion
MiteFinderII is an accurate and effective tool to detect MITEs hidden in eukaryotic genomes. The source code is freely accessible at the website: https://github.com/screamer/miteFinder.11.
12.
Background
Today researchers can choose from many bioinformatics protocols for all types of life sciences research, computational environments and coding languages. Although the majority of these are open source, few of them possess all virtues to maximize reuse and promote reproducible science. Wikipedia has proven a great tool to disseminate information and enhance collaboration between users with varying expertise and background to author qualitative content via crowdsourcing. However, it remains an open question whether the wiki paradigm can be applied to bioinformatics protocols.Results
We piloted PyPedia, a wiki where each article is both implementation and documentation of a bioinformatics computational protocol in the python language. Hyperlinks within the wiki can be used to compose complex workflows and induce reuse. A RESTful API enables code execution outside the wiki. Initial content of PyPedia contains articles for population statistics, bioinformatics format conversions and genotype imputation. Use of the easy to learn wiki syntax effectively lowers the barriers to bring expert programmers and less computer savvy researchers on the same page.Conclusions
PyPedia demonstrates how wiki can provide a collaborative development, sharing and even execution environment for biologists and bioinformaticians that complement existing resources, useful for local and multi-center research teams.Availability
PyPedia is available online at: http://www.pypedia.com. The source code and installation instructions are available at: https://github.com/kantale/PyPedia_server. The PyPedia python library is available at: https://github.com/kantale/pypedia. PyPedia is open-source, available under the BSD 2-Clause License.13.
14.
Background
Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.Results
We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.Conclusion
HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp.15.
Background
Flux balance analysis (FBA) is a widely-used method for analyzing metabolic networks. However, most existing tools that implement FBA require downloading software and writing code. Furthermore, FBA generates predictions for metabolic networks with thousands of components, so meaningful changes in FBA solutions can be difficult to identify. These challenges make it difficult for beginners to learn how FBA works.Results
To meet this need, we present Escher-FBA, a web application for interactive FBA simulations within a pathway visualization. Escher-FBA allows users to set flux bounds, knock out reactions, change objective functions, upload metabolic models, and generate high-quality figures without downloading software or writing code. We provide detailed instructions on how to use Escher-FBA to replicate several FBA simulations that generate real scientific hypotheses.Conclusions
We designed Escher-FBA to be as intuitive as possible so that users can quickly and easily understand the core concepts of FBA. The web application can be accessed at https://sbrg.github.io/escher-fba.16.
András Hartmann Ana Vila-Santa Nicolai Kallscheuer Michael Vogt Alice Julien-Laferrière Marie-France Sagot Jan Marienhagen Susana Vinga 《BMC systems biology》2017,11(1):143
Background
We propose OptPipe - a Pipeline for Optimizing Metabolic Engineering Targets, based on a consensus approach. The method generates consensus hypotheses for metabolic engineering applications by combining several optimization solutions obtained from distinct algorithms. The solutions are ranked according to several objectives, such as biomass and target production, by using the rank product tests corrected for multiple comparisons.Results
OptPipe was applied in a genome-scale model of Corynebacterium glutamicum for maximizing malonyl-CoA, which is a valuable precursor for many phenolic compounds. In vivo experimental validation confirmed increased malonyl-CoA level in case of ΔsdhCAB deletion, as predicted in silico.Conclusions
A method was developed to combine the optimization solutions provided by common knockout prediction procedures and rank the suggested mutants according to the expected growth rate, production and a new adaptability measure. The implementation of the pipeline along with the complete documentation is freely available at https://github.com/AndrasHartmann/OptPipe.17.
Background
Cryo-electron tomography emerges as an important component for structural system biology. It not only allows the structural characterization of macromolecular complexes, but also the detection of their cellular localizations in near living conditions. However, the method is hampered by low resolution, missing data and low signal-to-noise ratio (SNR). To overcome some of these difficulties and enhance the nominal resolution one can align and average a large set of subtomograms. Existing methods for obtaining the optimal alignments are mostly based on an exhaustive scanning of all but discrete relative rigid transformations (i.e. rotations and translations) of one subtomogram with respect to the other.Results
In this paper, we propose gradient-guided alignment methods based on two popular subtomogram similarity measures, a real space as well as a Fourier-space constrained score. We also propose a stochastic parallel refinement method that increases significantly the efficiency for the simultaneous refinement of a set of alignment candidates. We estimate that our stochastic parallel refinement is on average about 20 to 40 fold faster in comparison to the standard independent refinement approach. Results on simulated data of model complexes and experimental structures of protein complexes show that even for highly distorted subtomograms and with only a small number of very sparsely distributed initial alignment seeds, our combined methods can accurately recover true transformations with a substantially higher precision than the scanning based alignment methods.Conclusions
Our methods increase significantly the efficiency and accuracy for subtomogram alignments, which is a key factor for the systematic classification of macromolecular complexes in cryo-electron tomograms of whole cells.18.
Pathomwat Wongrattanakamon Vannajan Sanghiran Lee Piyarat Nimmanpipug Supat Jiranusornkul 《生物学前沿》2016,11(5):391-395
Background
P-glycoprotein (P-gp) is a 170-kDa membrane protein. It provides a barrier function and help to excrete toxins from the body as a transporter. Some bioflavonoids have been shown to block P-gp activity.Objective
To evaluate the important amino acid residues within nucleotide binding domain 1 (NBD1) of P-gp that play a key role in molecular interactions with flavonoids using structure-based pharmacophore model.Methods
In the molecular docking with NBD1 models, a putative binding site of flavonoids was proposed and compared with the site for ATP. The binding modes for ligands were achieved using LigandScout to generate the P-gp–flavonoid pharmacophore models.Results
The binding pocket for flavonoids was investigated and found these inhibitors compete with the ATP for binding site in NBD1 including the NBD1 amino acid residues identified by the in silico techniques to be involved in the hydrogen bonding and van der Waals (hydrophobic) interactions with flavonoids.Conclusion
These flavonoids occupy with the same binding site of ATP in NBD1 proffering that they may act as an ATP competitive inhibitor.19.
Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.20.
Eriko Tomitsuka Katsura Igai Kiyoshi Tadokoro Ayako Morita Jun Baba Wataru Suda Andrew R. Greenhill Paul F. Horwood Kevin W. Soli Peter M. Siba Shingo Odani Kazumi Natsuhara Hidetoshi Morita Masahiro Umezaki 《Metabolomics : Official journal of the Metabolomic Society》2017,13(9):105