首页 | 本学科首页   官方微博 | 高级检索  
     


The size distribution of protein families within different types of folds
Authors:Liu Xinsheng  Lv Bo  Guo Wanlin
Affiliation:aInstitute of Nanoscience, and Key Laboratory for Intelligent Nano Materials and Devices of Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;bCollege of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Abstract:It is well known that the structure is currently available only for a small fraction of known protein sequences. It is urgent to discover the important features of known protein sequences based on present protein structures. Here, we report a study on the size distribution of protein families within different types of folds. The fold of a protein means the global arrangement of its main secondary structures, both in terms of their relative orientations and their topological connections, which specify a certain biochemical and biophysical aspect. We first search protein families in the structural database SCOP against the sequence-based database Pfam, and acquire a pool of corresponding Pfam families whose structures can be deemed as known. This pool of Pfam families is called the sample space for short. Then the size distributions of protein families involving the sample space, the Pfam database and the SCOP database are obtained. The results indicate that the size distributions of protein families under different kinds of folds abide by similar power-law. Specially, the largest families scatter evenly in different kinds of folds. This may help better understand the relationship of protein sequence, structure and function. We also show that the total of proteins with known structures can be considered a random sample from the whole space of protein sequences, which is an essential but unsettled assumption for related predictions, such as, estimating the number of protein folds in nature. Finally we conclude that about 2957 folds are needed to cover the total Pfam families by a simple method.
Keywords:Size distribution   Protein families   Folds   Power-law
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号