Scalable Similarity Search in Metric Spaces
Authors | |
---|---|
Year of publication | 2004 |
Type | Article in Proceedings |
Conference | Pre-proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS |
MU Faculty or unit | |
Citation | |
Field | Computer hardware and software |
Keywords | distributed data; scalable structures; similarity search; metric space |
Description | Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets. |
Related projects: |