共查询到20条相似文献,搜索用时 15 毫秒
1.
Marcin J. Skwark Daniele Raimondi Mirco Michel Arne Elofsson 《PLoS computational biology》2014,10(11)
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction. 相似文献
2.
In this paper a novel application of a particular type of spiking neural network, a Polychronous Spiking Network, was used for financial time series prediction. It is argued that the inherent temporal capabilities of this type of network are suited to non-stationary data such as this. The performance of the spiking neural network was benchmarked against three systems: two “traditional”, rate-encoded, neural networks; a Multi-Layer Perceptron neural network and a Dynamic Ridge Polynomial neural network, and a standard Linear Predictor Coefficients model. For this comparison three non-stationary and noisy time series were used: IBM stock data; US/Euro exchange rate data, and the price of Brent crude oil. The experiments demonstrated favourable prediction results for the Spiking Neural Network in terms of Annualised Return and prediction error for 5-Step ahead predictions. These results were also supported by other relevant metrics such as Maximum Drawdown and Signal-To-Noise ratio. This work demonstrated the applicability of the Polychronous Spiking Network to financial data forecasting and this in turn indicates the potential of using such networks over traditional systems in difficult to manage non-stationary environments. 相似文献
3.
Contacts play a fundamental role in the study of protein structure and folding problems. The contact map of a protein can be represented by arranging its amino acids on a horizontal line and drawing an arc between two residues if they form a contact. In this paper, we are mainly concerned with the combinatorial enumeration of the arcs in m-regular linear stack, an elementary structure of the protein contact map, which was introduced by Chen et al. (J Comput Biol 21(12):915–935, 2014). We modify the generating function for m-regular linear stacks by introducing a new variable y regarding to the number of arcs and obtain an equation satisfied by the generating function for m-regular linear stacks with n vertices and k arcs. Consequently, we also derive an equation satisfied by the generating function of the overall number of arcs in m-regular linear stacks with n vertices. 相似文献
4.
A protein fold can be viewed as a self-avoiding walk in certain lattice model, and its contact map is a graph that represents the patterns of contacts in the fold. Goldman, Istrail, and Papadimitriou showed that a contact map in the 2D square lattice can be decomposed into at most two stacks and one queue. In the terminology of combinatorics, stacks and queues are noncrossing and nonnesting partitions, respectively. In this paper, we are concerned with 2-regular and 3-regular simple queues, for which the degree of each vertex is at most one and the arc lengths are at least 2 and 3, respectively. We show that 2-regular simple queues are in one-to-one correspondence with hill-free Motzkin paths, which have been enumerated by Barcucci, Pergola, Pinzani, and Rinaldi by using the Enumerating Combinatorial Objects method. We derive a recurrence relation for the generating function of Motzkin paths with \(k_i\) peaks at level i, which reduces to the generating function for hill-free Motzkin paths. Moreover, we show that 3-regular simple queues are in one-to-one correspondence with Motzkin paths avoiding certain patterns. Then we obtain a formula for the generating function of 3-regular simple queues. Asymptotic formulas for 2-regular and 3-regular simple queues are derived based on the generating functions. 相似文献
5.
Signal peptide identification is of immense importance in drug design. Accurate identification of signal peptides is the first critical step to be able to change the direction of the targeting proteins and use the designed drug to target a specific organelle to correct a defect. Because experimental identification is the most accurate method, but is expensive and time-consuming, an efficient and affordable automated system is of great interest. In this article, we propose using an adapted neural network, called a bio-basis function neural network, and decision trees for predicting signal peptides. The bio-basis function neural network model and decision trees achieved 97.16% and 97.63% accuracy respectively, demonstrating that the methods work well for the prediction of signal peptides. Moreover, decision trees revealed that position P(1'), which is important in forming signal peptides, most commonly comprises either leucine or alanine. This concurs with the (P(3)-P(1)-P(1')) coupling model. 相似文献
6.
We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the
Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding
sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing
hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can
be used to guide biologists in designing specific experiments for protein analysis. 相似文献
7.
The Protein Journal - Three-dimensional protein structure prediction is one of the major challenges in bioinformatics. According to recent research findings, real-valued distance prediction plays a... 相似文献
8.
9.
A major focus of systems biology is to characterize interactions between cellular components, in order to develop an accurate picture of the intricate networks within biological systems. Over the past decade, protein microarrays have greatly contributed to advances in proteomics and are becoming an important platform for systems biology. Protein microarrays are highly flexible, ranging from large-scale proteome microarrays to smaller customizable microarrays, making the technology amenable for detection of a broad spectrum of biochemical properties of proteins. In this article, we will focus on the numerous studies that have utilized protein microarrays to reconstruct biological networks including protein-DNA interactions, posttranslational protein modifications (PTMs), lectin-glycan recognition, pathogen-host interactions and hierarchical signaling cascades. The diversity in applications allows for integration of interaction data from numerous molecular classes and cellular states, providing insight into the structure of complex biological systems. We will also discuss emerging applications and future directions of protein microarray technology in the global frontier. 相似文献
10.
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. 相似文献
11.
MOTIVATION: Starting from linear chains of amino acids, the spontaneous folding of proteins into their elaborate 3D structures is one of the remarkable examples of biological self-organization. We investigated native state structures of 30 single-domain, two-state proteins, from complex networks perspective, to understand the role of topological parameters in proteins' folding kinetics, at two length scales--as 'Protein Contact Networks (PCNs)' and their corresponding 'Long-range Interaction Networks (LINs)' constructed by ignoring the short-range interactions. RESULTS: Our results show that, both PCNs and LINs exhibit the exceptional topological property of 'assortative mixing' that is absent in all other biological and technological networks studied so far. We show that the degree distribution of these contact networks is partly responsible for the observed assortativity. The coefficient of assortativity also shows a positive correlation with the rate of protein folding at both short- and long-contact scale, whereas, the clustering coefficients of only the LINs exhibit a negative correlation. The results indicate that the general topological parameters of these naturally evolved protein networks can effectively represent the structural and functional properties required for fast information transfer among the residues facilitating biochemical/kinetic functions, such as, allostery, stability and the rate of folding. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
12.
We describe a method based on Rosetta structure refinement for generating high-resolution, all-atom protein models from electron cryomicroscopy density maps. A local measure of the fit of a model to the density is used to directly guide structure refinement and to identify regions incompatible with the density that are then targeted for extensive rebuilding. Over a range of test cases using both simulated and experimentally generated data, the method consistently increases the accuracy of starting models generated either by comparative modeling or by hand-tracing the density. The method can achieve near-atomic resolution starting from density maps at 4-6 Å resolution. 相似文献
13.
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes. 相似文献
14.
Misha B. Ahrens 《Current biology : CB》2019,29(21):R1138-R1140
15.
16.
17.
Mengfei Cao Hao Zhang Jisoo Park Noah M. Daniels Mark E. Crovella Lenore J. Cowen Benjamin Hescott 《PloS one》2013,8(10)
In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board. 相似文献
18.
An effective forecasting model for short-term load plays a significant role in promoting the management efficiency of an electric power system. This paper proposes a new forecasting model based on the improved neural networks with random weights (INNRW). The key is to introduce a weighting technique to the inputs of the model and use a novel neural network to forecast the daily maximum load. Eight factors are selected as the inputs. A mutual information weighting algorithm is then used to allocate different weights to the inputs. The neural networks with random weights and kernels (KNNRW) is applied to approximate the nonlinear function between the selected inputs and the daily maximum load due to the fast learning speed and good generalization performance. In the application of the daily load in Dalian, the result of the proposed INNRW is compared with several previously developed forecasting models. The simulation experiment shows that the proposed model performs the best overall in short-term load forecasting. 相似文献
19.
《基因组蛋白质组与生物信息学报(英文版)》2019,17(6):645-656
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/. 相似文献
20.
《IRBM》2022,43(5):422-433
BackgroundElectrocardiogram (ECG) is a method of recording the electrical activity of the heart and it provides a diagnostic means for heart-related diseases. Arrhythmia is any irregularity of the heartbeat that causes an abnormality in the heart rhythm. Early detection of arrhythmia has great importance to prevent many diseases. Manual analysis of ECG recordings is not practical for quickly identifying arrhythmias that may cause sudden deaths. Hence, many studies have been presented to develop computer-aided-diagnosis (CAD) systems to automatically identify arrhythmias.MethodsThis paper proposes a novel deep learning approach to identify arrhythmias in ECG signals. The proposed approach identifies arrhythmia classes using Convolutional Neural Network (CNN) trained by two-dimensional (2D) ECG beat images. Firstly, ECG signals, which consist of 5 different arrhythmias, are segmented into heartbeats which are transformed into 2D grayscale images. Afterward, the images are used as input for training a new CNN architecture to classify heartbeats.ResultsThe experimental results show that the classification performance of the proposed approach reaches an overall accuracy of 99.7%, sensitivity of 99.7%, and specificity of 99.22% in the classification of five different ECG arrhythmias. Further, the proposed CNN architecture is compared to other popular CNN architectures such as LeNet and ResNet-50 to evaluate the performance of the study.ConclusionsTest results demonstrate that the deep network trained by ECG images provides outstanding classification performance of arrhythmic ECG signals and outperforms similar network architectures. Moreover, the proposed method has lower computational costs compared to existing methods and is more suitable for mobile device-based diagnosis systems as it does not involve any complex preprocessing process. Hence, the proposed approach provides a simple and robust automatic cardiac arrhythmia detection scheme for the classification of ECG arrhythmias. 相似文献