TY - CONF
T1 - TLCSim: A Large-Scale Two-Level Clustering Similarity Search with MapReduce
AU - Phan, Trong Nhan
AU - Jäger, Markus
AU - Nadschläger, Stefan
AU - Pérez, Pablo Gómez
AU - Nguyen, Cong An
AU - Huber, Christian
PY - 2016
Y1 - 2016
N2 - Similarity search has become a principal operation not only in databases but also in diverse application domains. Very large datasets, however, pose a big challenge on its enormous volume-processing capability. In order to deal with the challenge, we propose a two-level clustering approach aiming at supporting fast similarity searches in massive datasets. In addition, we embed some pruning and filtering strategies into our methods so that redundancy-free data, data accuracy, inessential data accesses, unnecessary distance computations, and other following consequences are taken into account. Furthermore, we validate our methods by a series of empirical experiments in real big datasets. The results show that our approach performs better than the two inverted index-based approaches, especially when given big query batches.
AB - Similarity search has become a principal operation not only in databases but also in diverse application domains. Very large datasets, however, pose a big challenge on its enormous volume-processing capability. In order to deal with the challenge, we propose a two-level clustering approach aiming at supporting fast similarity searches in massive datasets. In addition, we embed some pruning and filtering strategies into our methods so that redundancy-free data, data accuracy, inessential data accesses, unnecessary distance computations, and other following consequences are taken into account. Furthermore, we validate our methods by a series of empirical experiments in real big datasets. The results show that our approach performs better than the two inverted index-based approaches, especially when given big query batches.
M3 - Paper
ER -