Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins |
| |
Authors: | Ailan F Arenas Gladys E Salcedo Diego M Moncada Diego A Erazo Juan F Osorio Jorge E Gomez-Marin |
| |
Affiliation: | 1Grupo de Parasitología Molecular (GEPAMOL), Centro de Investigaciones Biomédicas, Universidad del Quindío, Armenia, Colombia;2Grupo de Investigación y Asesoría en Estadística, Universidad del Quindío, Armenia, Colombia |
| |
Abstract: | Toxoplasma gondii invade host cells using a multi-step process that depends on the regulated secretion of adhesions. To identify keyprimary sequence features of adhesins in this parasite, we analyze the relative frequency of individual amino acids, their dipeptidefrequencies, and the polarity, polarizability and Van der Waals volume of the individual amino acids by using cluster analysis. Thismethod identified cysteine as a key amino acid in the Toxoplasma adhesin group. The best vector algorithm of non-concatenatedfeatures was for 2 attributes: the single amino acid relative frequency and the dipeptide frequency. Polarity, polarizability and Vander Waals volume were not good classificatory attributes. Single amino acid attributes clustered unambiguously 67 apicomplexanhypothetical adhesins. This algorithm was also useful for clustering hypothetical Toxoplasma target host receptors. All of the clusterperformances had over 70% sensitivity and 80% specificity. Compositional aminoacid data can be useful for improving machinelearning-based prediction software when homology and structural data are not sufficient. |
| |
Keywords: | Cluster analysis adhesin Toxoplasma |
|
|