Abstract:
Outlier detection is an important task in data mining because outliers can be
either useful knowledge or noise. Many statistical methods have been applied
to detect outliers, but they usually assume a given distribution of data and it is
difficult to deal with high dimensional data. The Statistical Learning Theory
(SLT) established by Vapnik et al. provides a new way to overcome these
drawbacks. According to SLT Scholkopf et al. proposed a v-Support Vector
Machine (v-SVM) and applied it to detect outliers. However, it is still difficult
for data mining users to decide one key parameter in v-SVM. This paper
proposes a new SVM method to detect outliers, SVM-OD, which can avoid
this parameter. We provide the theoretical analysis based on SLT as well as
experiments to verify the effectiveness of our method. Moreover, an experiment
on synthetic data shows that SVM-OD can detect some local outliers
near the cluster with some distribution while v-SVM cannot do that.