首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
The cancer diagnostic is a complex process and, sometimes, the specific markers can interfere or produce negative results. Thus, new simple and fast theoretical models are required. One option is the complex network graphs theory that permits us to describe any real system, from the small molecules to the complex genetic, neural or social networks by transforming real properties in topological indices. This work converts the protein primary structure data in specific Randic's star networks topological indices using the new sequence to star networks (S2SNet) application. A set of 1054 proteins were selected from previous works and contains proteins related or not with two types of cancer, human breast cancer (HBC) and human colon cancer (HCC). The general discriminant analysis method generates an input-coded multi-target classification model with the training/predicting set accuracies of 90.0% for the forward stepwise model type. In addition, a protein subset was modified by single amino acid mutations with higher log-odds PAM250 values and tested with the new classification if can be related with HBC or HCC. In conclusion, we shown that, using simple input data such is the primary protein sequence and the simples linear analysis, it is possible to obtain accurate classification models that can predict if a new protein related with two types of cancer. These results promote the use of the S2SNet in clinical proteomics.  相似文献   

3.
The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria Microcystis aeruginosa. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. TI2BioP (Topological Indices to BioPolymers) version 2.0, available at http://ti2biop.sourceforge.net/ allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families.  相似文献   

4.
Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.  相似文献   

5.
Relationship between the topological indices and anti-HIV activity of Dihydro (alkylthio) (naphthylmethyl) oxopyrimidines has been investigated. Three topological indices--the Wiener's index--a distance-based topological index, molecular connectivity index--an adjacency based topological index and eccentric connectivity index--an adjacency-cum-distance based topological index were used for the present investigations. A data set comprising of 67 analogues of dihydro (alkylthio) (naphthylmethyl) oxopyrimidine (S-DABO) was selected for the present investigations. The values of the Wiener's index, molecular connectivity index and eccentric connectivity index for each of the 67 compounds comprising the data set were computed using an in house computer program. Resultant data were subsequently analyzed and suitable models were developed after identification of active ranges. Subsequently, a biological activity was assigned to each compound using these models, which was then compared with the reported anti-HIV activity. The use of models based upon these topological indices resulted in prediction of anti-HIV activity with an accuracy ranging from 86% to 89%.  相似文献   

6.
Rücker's walk count (WC) indices are well-known topological indices (TIs) used in Chemoinformatics to quantify the molecular structure of drugs represented by a graph in Quantitative structure–activity/property relationship (QSAR/QSPR) studies. In this work, we introduce for the first time the higher-order (kth order) analogues (WCk) of these indices using Markov chains. In addition, we report new QSPR models for large complex networks of different Bio-Systems useful in Parasitology and Neuroinformatics. The new type of QSPR models can be used for model checking to calculate numerical scores S(Lij) for links Lij (checking or re-evaluation of network connectivity) in large networks of all these fields. The method may be summarized as follows: (i) first, the WCk(j) values are calculated for all jth nodes in a complex network already created; (ii) A linear discriminant analysis (LDA) is used to seek a linear equation that discriminates connected or linked (Lij = 1) pairs of nodes experimentally confirmed from non-linked ones (Lij = 0); (iii) The new model is validated with external series of pairs of nodes; (iv) The equation obtained is used to re-evaluate the connectivity quality of the network, connecting/disconnecting nodes based on the quality scores calculated with the new connectivity function. The linear QSPR models obtained yielded the following results in terms of overall test accuracy for re-construction of complex networks of different Bio-Systems: parasite–host networks (93.14%), NW Spain fasciolosis spreading networks (71.42/70.18%) and CoCoMac Brain Cortex co-activation network (86.40%). Thus, this work can contribute to the computational re-evaluation or model checking of connectivity (collation) in complex systems of any science field.  相似文献   

7.
Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments.  相似文献   

8.
In order to increase the efficiency of monitoring and conservation efforts, it is of key importance to develop sound quantitative methods that are able to indicate which key areas and landscape elements play prominent and crucial role in the functioning of habitat mosaics. In particular, network models are being widely used to evaluate the contribution of landscape elements to uphold connectivity and related ecological fluxes. However, monitoring programs and conservation practitioners are overwhelmed by a myriad of network indices without being fully aware of their differences for characterizing the importance of individual habitat patches in fragmented landscapes. We analysed a set of thirteen commonly used graph indices and the forest habitat network of goshawks living in NE Spain in order to (a) evaluate how the patch rank orders derived from these indices differ from each other and (b) identify which indices tend to quantify the same network characteristics and which others are quite unique in addressing topological characteristics that are not considered by the rest. We found that most of the variability in patch rankings can be captured by only three network indices. The largest group of redundant indices corresponded to those that intend to measure the amount of flux received by a given habitat patch. The connector fraction of the integral index of connectivity (IIC) and probability of connectivity (PC) indices and betweenness centrality (BC) stood out as quite unique by focusing on the way habitat patches act as connecting elements between other habitat areas. We discuss which indices can be most beneficial by clearly indicating and differentiating the value of the top patches compared to the others, so that conservation priorities can be established with lower uncertainties. We believe that our results can provide valuable guidelines by facilitating the selection of a few non-redundant and complementary indicators that quantify the important and distinctive roles of habitat patches in maintaining the connectivity of habitat networks.  相似文献   

9.
This paper is concerned with a branch of computational biology related to protein prediction and analysis of secondary structure of proteins. Although traditional methods use a simple amino acid composition to predict the secondary structure content, hydrophobicity has been recently found to improve the results in this and several related prediction tasks. To this end, we propose and analyze advantages of two new hydrophobicity index-based scales that incorporate information about long-range interactions along the protein sequence and contrast them with currently used raw hydrophobic index values. We also compare three leading hydrophobicity indices, i.e., Eisenberg's, Fauchere-Pliska's, and Cid's, using the proposed scales. The analysis is performed using fuzzy cognitive maps that quantify the strength of relation between the hydrophobicity scales/indices and the protein content values. A set of empirical tests that involve generation of fuzzy cognitive map models for a set of 200 low homology proteins have been performed. The results show that the secondary structure content along the protein sequence is characterized by about 2.5 times stronger relation with the two proposed hydrophobicity scales when compared with the currently used raw index values. The new scales exhibit stronger relation irrespective of the applied hydrobhobicity indices. Analysis of different scales shows superiority of the Eisenberg's hydrophobicity index, when used with the new scales. In contrast, the Fauchere-Pliska's index is found to perform better when compared with the two other indices when using raw hydrophobic index values that disregard the long-range interactions.  相似文献   

10.
Suggestions for "safe" residue substitutions in site-directed mutagenesis   总被引:25,自引:0,他引:25  
The conserved topological structure observed in various molecular families such as globins or cytochromes c allows structural equivalencing of residues in every homologous structure and defines in a coherent way a global alignment in each sequence family. A search was performed for equivalent residue pairs in various topological families that were buried in protein cores or exposed at the protein surface and that had mutated but maintained similar unmutated environments. Amino acid residues with atoms in contact with the mutated residue pairs defined the environment. Matrices of preferred amino acid exchanges were then constructed and preferred or avoided amino acid substitutions deduced. Given the conserved atomic neighborhoods, such natural in vivo substitutions are subject to similar constrains as point mutations performed in site-directed mutagenesis experiments. The exchange matrices should provide guidelines for "safe" amino acid substitutions least likely to disturb the protein structure, either locally or in its overall folding pathway, and most likely to allow probing the structural and functional significance of the substituted site.  相似文献   

11.
As an effective modeling, analysis and computational tool, graph theory is widely used in biological mathematics to deal with various biology problems. In the field of microbiology, graph can express the molecular structure, where cell, gene or protein can be denoted as a vertex, and the connect element can be regarded as an edge. In this way, the biological activity characteristic can be measured via topological index computing in the corresponding graphs. In our article, we mainly study the biology features of biological networks in terms of eccentric topological indices computation. By means of graph structure analysis and distance calculating, the exact expression of several important eccentric related indices of hypertree network and X-tree are determined. The conclusions we get in this paper illustrate that the bioengineering has the promising application prospects.  相似文献   

12.
Understanding processes and landscape features governing connectivity among individuals and populations is fundamental to many ecological, evolutionary, and conservation questions. Network analyses based on graph theory are emerging as a prominent approach to quantify patterns of connectivity with more recent applications in landscape genetics aimed at understanding the influence of landscape features on gene flow. Despite the strong conceptual framework of graph theory, the effect of incomplete networks resulting from missing nodes (i.e. populations) and their genetic connectivity network interactions on landscape genetic inferences remains unknown. We tested the violation of this assumption by subsampling from a known complete network of breeding ponds of the Columbia Spotted Frog (Rana luteiventris) in the Bighorn Crags (Idaho, USA). Variation in the proportion of missing nodes strongly influenced node-level centrality indices, whereas indices describing network-level properties were more robust. Overall incomplete networks combined with network algorithm types used to link nodes appears to be critical to the rank-order sensitivity of centrality indices and to the Mantel-based inferences made regarding the role of landscape features on gene flow. Our findings stress the importance of sampling effort and topological network structure as they both affect the estimation of genetic connectivity. Given that failing to account for uncertainty on network outcomes can lead to quantitatively different conclusions, we recommend the routine application of sensitivity analyses to network inputs and assumptions.  相似文献   

13.
The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate.  相似文献   

14.
L. Pogliani 《Amino acids》1995,9(3):217-228
Summary The linear combinations of connectivity indices method (LCCI) is here employed to model the water solubility and activity of 19 natural amino acids. Starting with the molecular connectivity indices, reciprocal and supra molecular connectivity indices are designed to model the solubility and activity spaces of the natural amino acids. The reciprocal and supra molecular reciprocal connectivity indices have been obtained following the variability of the connectivity indices along solubility space of the natural amino acids. A linear combination of the reciprocals of the connectivity indices (LCRCI) showed a satisfactory modelling of the solubility and activity space while a model based on the LCRCI together with the introduction of supra reciprocal molecular connectivity indices for Pro, Ser and Arg achieved an optimal modelling of the solubility and activity space of the natural amino acids. Because the properties are a consequence of the structure (Kier and Hall, 1986)  相似文献   

15.
B-factor from X-ray crystal structure can well measure protein structural flexibility, which plays an important role in different biological processes, such as catalysis, binding and molecular recognition. Understanding the essence of flexibility can be helpful for the further study of the protein function. In this study, we attempted to correlate the flexibility of a residue to its interactions with other residues by representing the protein structure as a residue contact network. Here, several well established network topological parameters were employed to feature such interactions. A prediction model was constructed for B-factor of a residue by using support vector regression (SVR). Pearson correlation coefficient (CC) was used as the performance measure. CC values were 0.63 and 0.62 for single amino acid and for the whole sequence, respectively. Our results revealed well correlations between B-factors and network topological parameters. This suggests that the protein structural flexibility could be well characterized by the inter-amino acid interactions in a protein.  相似文献   

16.
Screening of functional proteins from a random‐sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random‐sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random‐sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random‐sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120‐amino acid, random‐sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random‐sequence proteins arbitrarily chosen from these libraries. We found that random‐sequence proteins constructed with the 12‐member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20‐member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.  相似文献   

17.
Using discriminant analysis, three types of protein secondary structure segments--helices, beta-strands and coils--are discriminated by amino acid sequence information alone. A variable in the discriminant analysis is defined by the amino acid index used to represent the sequence data and by the calculation method used to extract a feature in this representation. Thus, the three types of secondary structure segments derived from a set of non-homologous proteins from the Protein Data Bank are analyzed by 888 variables, which correspond to the mean, standard deviation, 3.6-residue periodicity and 2-residue periodicity for the numerical profiles determined from 222 published amino acid indices. These variables are combined to obtain best discrimination of the three types of segments. When up to three variables are combined, the best discrimination rate was 75%. The variables selected consist of the mean of alpha propensity (or turn propensity), the mean of beta propensity, and the 3.6-residue periodicity of hydrophobicity. This variable selection procedure can also be applied to other types of discrimination problem, once groups of sequence data are properly organized.  相似文献   

18.
19.
Bacteriocins are proteinaceous toxins produced and exported by both gram-negative and gram-positive bacteria as a defense mechanism. The bacteriocin protein family is highly diverse, which complicates the identification of bacteriocin-like sequences using alignment approaches. The use of topological indices (TIs) irrespective of sequence similarity can be a promising alternative to predict proteinaceous bacteriocins. Thus, we present Topological Indices to BioPolymers (TI2BioP) as an alignment-free approach inspired in both the Topological Substructural Molecular Design (TOPS-MODE) and Markov Chain Invariants for Network Selection and Design (MARCH-INSIDE) methodology. TI2BioP allows the calculation of the spectral moments as simple TIs to seek quantitative sequence-function relationships (QSFR) models. Since hydrophobicity and basicity are major criteria for the bactericide activity of bacteriocins, the spectral moments (HPμ k ) were derived for the first time from protein artificial secondary structures based on amino acid clustering into a Cartesian system of hydrophobicity and polarity. Several orders of HPμ k characterized numerically 196 bacteriocin-like sequences and a control group made up of 200 representative CATH domains. Subsequently, they were used to develop an alignment-free QSFR model allowing a 76.92% discrimination of bacteriocin proteins from other domains, a relevant result considering the high sequence diversity among the members of both groups. The model showed a prediction overall performance of 72.16%, detecting specifically 66.7% of proteinaceous bacteriocins whereas the InterProScan retrieved just 60.2%. As a practical validation, the model also predicted successfully the cryptic bactericide function of the Cry 1Ab C-terminal domain from Bacillus thuringiensis’s endotoxin, which has not been detected by classical alignment methods.  相似文献   

20.
理论和实验研究表明,蛋白质天然拓扑结构对其折叠过程具有重要的影响.采用复杂网络的方法分析蛋白质天然结构的拓扑特征,并探索蛋白质结构特征与折叠速率之间的内在联系.分别构建了蛋白质氨基酸网络、疏水网、亲水网、亲水-疏水网以及相应的长程网络,研究了这些网络的匹配系数(assortativity coefficient)和聚集系数(clustering coefficient)的统计特性.结果表明,除了亲水-疏水网,上述各网络的匹配系数均为正值,并且氨基酸网和疏水网的匹配系数与折叠速率表现出明显的线性正相关,揭示了疏水残基间相互作用的协同性有助于蛋白质的快速折叠.同时,研究发现疏水网的聚集系数与折叠速率有明显的线性负相关关系,这表明疏水残基间三角结构(triangle construction)的形成不利于蛋白质快速折叠.还进一步构建了相应的长程网络,发现序列上间距较远的残基接触对的形成将使蛋白质折叠进程变慢.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号