共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders.Results
To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm.Conclusions
The proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.2.
Background
Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.Methods
A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).Results
Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.Conclusions
The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.3.
Background
Proteins play fundamental and crucial roles in nearly all biological processes, such as, enzymatic catalysis, signaling transduction, DNA and RNA synthesis, and embryonic development. It has been a long-standing goal in molecular biology to predict the tertiary structure of a protein from its primary amino acid sequence. From visual comparison, it was found that a 2D triangular lattice model can give a better structure modeling and prediction for proteins with short primary amino acid sequences.Methods
This paper proposes a hybrid of hill-climbing and genetic algorithm (HHGA) based on elite-based reproduction strategy for protein structure prediction on the 2D triangular lattice.Results
The simulation results show that the proposed HHGA can successfully deal with the protein structure prediction problems. Specifically, HHGA significantly outperforms conventional genetic algorithms and is comparable to the state-of-the-art method in terms of free energy.Conclusions
Thanks to the enhancement of local search on the global search, the proposed HHGA achieves promising results on the 2D triangular protein structure prediction problem. The satisfactory simulation results demonstrate the effectiveness of the proposed HHGA and the utility of the 2D triangular lattice model for protein structure prediction.4.
Background
The protein encoded by the gene ybgI was chosen as a target for a structural genomics project emphasizing the relation of protein structure to function.Results
The structure of the ybgI protein is a toroid composed of six polypeptide chains forming a trimer of dimers. Each polypeptide chain binds two metal ions on the inside of the toroid.Conclusion
The toroidal structure is comparable to that of some proteins that are involved in DNA metabolism. The di-nuclear metal site could imply that the specific function of this protein is as a hydrolase-oxidase enzyme.5.
Background
Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.Result
Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.Conclusion
Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.Reviewers
This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.6.
Background
Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA) families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors.Methods
In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families.Results
We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database.Conclusions
Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement.7.
8.
Background
For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.Results
We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.Conclusion
RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.9.
Ronesh Sharma Shiu Kumar Tatsuhiko Tsunoda Ashwini Patil Alok Sharma 《BMC bioinformatics》2016,17(19):504
Background
Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational methods is a challenging task.Methods
In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the sequence (Others).Results
To evaluate the proposed method, its performance was compared to that of other MoRF predictors; MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors.Conclusions
Using HMM profile as a source of feature extraction, the proposed method indicates improvement in predicting MoRFs in disordered protein sequences.10.
Introduction
Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.Objectives
Assessment of the quality of raw data processing in untargeted metabolomics.Methods
Five published untargeted metabolomics studies, were reanalyzed.Results
Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.Conclusion
Incomplete raw data processing shows unexplored potential of current and legacy data.11.
Background
Chicken anemia virus (CAV) is the causative agent of chicken infectious anemia. CAV putative intergenotypic recombinants have been reported previously. This fact is based on the previous classification of CAV sequences into three genotypes. However, it is unknown whether intersubtype recombination occurs between the recently reported four CAV genotypes and five subtypes of genome sequences.Results
Phylogenetic analysis, together with a variety of computational recombination detection algorithms, was used to investigate CAV approximately full genomes. Statistically significant evidence of intersubtype recombination was detected in the parent-like and two putative CAV recombinant sequences. This event was shown to occur between CAV subgroup A1 and A2 sequences in the phylogenetic trees.Conclusions
We revealed that intersubtype recombination in CAV genome sequences played a role in generating genetic diversity within the natural population of CAV.12.
Olesya I. Klimchuk Kirill A. Konovalov Vadim V. Perekhvatov Konstantin V. Skulachev Daria V. Dibrova Armen Y. Mulkidjanian 《Biology direct》2017,12(1):26
Background
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families.Aim
We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server.Results
Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002).Reviewers
This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.13.
Background
Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.Methods
To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.Results
Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.Conclusions
Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.14.
Background
Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field.Results
We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt’s features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster.Conclusions
The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.15.
Dimitrios J. Floros Paul R. Jensen Pieter C. Dorrestein Nobuhiro Koyama 《Metabolomics : Official journal of the Metabolomic Society》2016,12(9):145
Introduction
Natural products from culture collections have enormous impact in advancing discovery programs for metabolites of biotechnological importance. These discovery efforts rely on the metabolomic characterization of strain collections.Objective
Many emerging approaches compare metabolomic profiles of such collections, but few enable the analysis and prioritization of thousands of samples from diverse organisms while delivering chemistry specific read outs.Method
In this work we utilize untargeted LC–MS/MS based metabolomics together with molecular networking to inventory the chemistries associated with 1000 marine microorganisms.Result
This approach annotated 76 molecular families (a spectral match rate of 28 %), including clinically and biotechnologically important molecules such as valinomycin, actinomycin D, and desferrioxamine E. Targeting a molecular family produced primarily by one microorganism led to the isolation and structure elucidation of two new molecules designated maridric acids A and B.Conclusion
Molecular networking guided exploration of large culture collections allows for rapid dereplication of know molecules and can highlight producers of uniques metabolites. These methods, together with large culture collections and growing databases, allow for data driven strain prioritization with a focus on novel chemistries.16.
Background
The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.Methods
Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.Results
Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.Conclusions
Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.17.
Anna Skorczyk-Werner Anna Wawrocka Natalia Kochalska Maciej Robert Krawczynski 《Orphanet journal of rare diseases》2018,13(1):221
Background
Choroideremia (CHM) is a rare X-linked recessive retinal dystrophy characterized by progressive chorioretinal degeneration in the males affected. The symptoms include night blindness in childhood, progressive peripheral vision loss and total blindness in the late stages. The disease is caused by mutations in the CHM gene encoding Rab Escort Protein 1 (REP-1). The aim of the study was to identify the molecular basis of choroideremia in five families of Polish origin.Methods
Six male patients from five unrelated families of Polish ethnicity, who were clinically diagnosed with choroideremia, were examined in this study. An ophthalmologic examination performed in all the probands included: best-corrected visual acuity, slit-lamp examination, funduscopy, fluorescein angiography and perimetry. The entire coding region encompassing 15 exons and the flanking intronic sequences of the CHM gene were amplified with PCR and directly sequenced in all the patients.Results
Five variants in the CHM gene were identified in the five families examined. Two of the variants were new: c.1175dupT and c.83C?>?G, while three had been previously reported.Conclusions
This study provides the first molecular genetic characteristics of patients with choroideremia from the previously unexplored Polish population.18.
N. Cesbron A.-L. Royer Y. Guitton A. Sydor B. Le Bizec G. Dervilly-Pinel 《Metabolomics : Official journal of the Metabolomic Society》2017,13(8):99
Introduction
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.19.
Rachel A. Spicer Christoph Steinbeck 《Metabolomics : Official journal of the Metabolomic Society》2018,14(1):16
Introduction
Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.Objectives
(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.Methods
A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.Results
Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.Conclusion
Further efforts are required to improve data sharing in metabolomics.20.