Amino acid composition and hydrophobicity patterns of protein domains correlate with their structures期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Amino acid composition and hydrophobicity patterns of protein domains correlate with their structures

Authors:	R P Sheridan J S Dixon R Venkataraghavan I D Kuntz K P Scott

Abstract:	We examine the correlation between the sequence and tertiary structure for 212 domains from globular proteins and polypeptides. The sequence of each domain is described as a set of 25 features: the mole percent of 20 amino acids, the number of residues in the domain, and the abundance of four simple patterns in the hydrophobicity profile of the sequence. Each domain, then, is described as a location in 25-dimensional sequence-feature space. We use pattern-recognition methods to find the two axes through the 25-dimensional sequence-feature space that best discriminate, respectively, predominantly α-helix domains from predominantly β-strand domains (the “secondary structure vector,” SV) and parallel α/β domains from other domains (the “parallel vector,” PV). When we divide the domains into two categories based on whether the cysteine content is above (CYS -RICH ) or below (NORMAL ) 4.5%, we find the secondary structure vector for the subset of CYS -RICH domains points in a significantly different direction than the equivalent vector for the NORMAL domains. Thus, CYS -RICH and NORMAL , domains are best treated separately. The secondary structure vector and the parallel vector for NORMAL domains describes statistically meaningful information, but the secondary structure vector for CYS -RICH domains may not be as reliable. We show how the secondary structure content of a NORMAL domain can be predicted by projecting the domain in the feature space onto the secondary structure vector. We subdivide the domains into five structural classes based on whether there is a parallel or mixed β-sheet in the domain and whether there are more helix or strand residues: NORMAL ALPHA , NORMAL BETA , NORMAL PARALLEL , CYS -RICH ALPHA , and CYS -RICH BETA . When we project the NORMAL domains onto the plane containing the origin of the feature space and SV and PV, we see that ALPHA , BETA , and PARALLEL , domains cluster in the plane, with the BETA cluster partially overlapping the PARALLEL cluster. The separations between the clusters are such that, by looking at the location of any given NORMAL domain in the plane, we can correctly predict its structural class with 83% accuracy. CYS -RICH ALPHA and BETA domains cluster when projected onto the CYS -RICH SV vector, and the classes can be preducted with 83% accuracy, but this accuracy for CYS -RICH domains may not be statistically meaningful.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏