Abstract:
Many large organizations have multiple databases distributed in different branches, and therefore multi-database
mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to
identify which databases are most likely relevant to a data mining application. This is referred to as database selection.
For real-world applications, database selection has to be carried out multiple times to identify relevant databases that
meet different applications. In particular, a mining task may be without reference to any specific application. In this
paper, we present an efficient approach for classifying multiple databases based on their similarity between each other.
Our approach is application-independent.