Abstract:
Subspace clustering is a challenging task in the field of data mining.
Traditional distance measures fail to differentiate the furthest point from the
nearest point in very high dimensional data space. To tackle the problem, we
design minimal subspace distance which measures the similarity between two
points in the subspace where they are nearest to each other. It can discover
subspace clusters implicitly when measuring the similarities between points.
We use the new similarity measure to improve traditional k-means algorithm
for discovering clusters in subspaces. By clustering with low-dimensional
minimal subspace distance first, the clusters in low-dimensional subspaces are
detected. Then by gradually increasing the dimension of minimal subspace
distance, the clusters get refined in higher dimensional subspaces. Our
experiments on both synthetic data and real data show the effectiveness of the
proposed similarity measure and algorithm.