Abstract: | Neural networks were used to generalize common themes found in transmembrane-spanning protein helices. Various-sized databases were used containing nonoverlapping sequences, each 25 amino acids long. Training consisted of sorting these sequences into 1 of 2 groups: transmembrane helical peptides or nontransmembrane peptides. Learning was measured using a test set 10% the size of the training set. As training set size increased from 214 sequences to 1,751 sequences, learning increased in a nonlinear manner from 75% to a high of 98%, then declined to a low of 87%. The final training database consisted of roughly equal numbers of transmembrane (928) and nontransmembrane (1,018) sequences. All transmembrane sequences were entered into the database with respect to their lipid membrane orientation: from inside the membrane to outside. Generalized transmembrane helix and nontransmembrane peptides were constructed from the maximally weighted connecting strengths of fully trained networks. Four generalized transmembrane helices were found to contain 9 consensus residues: a K-R-F triplet was found at the inside lipid interface, 2 isoleucine and 2 other phenylalanine residues were present in the helical body, and 2 tryptophan residues were found near the outside lipid interface. As a test of the training method, bacteriorhodopsin was examined to determine the position of its 7 transmembrane helices. |