Metric-space indexes as a basis for scalable biological databases |
| |
Authors: | Miranker Daniel P |
| |
Affiliation: | Department of Computer Science, University of Texas, Austin, Texas 78712, USA. miranker@cs.utexas.edu |
| |
Abstract: | Biochemical databases will be best served by the development of new specialized database management systems whose storage managers are based on metric-space indexing techniques and the development a database query languages that embody semantics derived from biochemical models of similarity and evolution. Important biochemical data types cannot be effectively mapped to low dimensional coordinate systems on which O(log n) indexing methods rely. It is clear from an abundance of bioinformatic discoveries that biochemical data is not random and exhibits interesting structure with respect to clustering. Metric-space indexing exploits a data set's intrinsic clustering to speed the execution of similarity queries, even when the data cannot be mapped to a coordinate system. Database management systems that seamlessly integrate semantically rich query languages with a metric-storage and retrieval mechanism will allow biologists to simply and concisely develop informatic studies that have traditionally been large and labor intensive. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|