首页 | 本学科首页   官方微博 | 高级检索  
   检索      


PseqIP: a nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections
Authors:J M Claverie  L Bricault
Institution:Computer Science Unit, Institut Pasteur, Paris, France.
Abstract:Four major protein sequence data collections (NBRF-PIR, PSD-Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences of a protein sequence recorded with a one-amino-acid variation in at least two different data banks. A flat file (ASCII computer-readable format) version of PseqIP 1.0, well-suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.
Keywords:computerized data bank  sequence comparison heuristics  databank access  data bank merging
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号