Abstract:
An XML enabled framework for representation of association rules in databases
was first presented in [4]. In Frequent Structure Mining (FSM), one of the
popular approaches is to use graph matching that use data structures such as the
adjacency matrix [7] or adjacency list [8]. Another approach represents semistructured
tree-like structures using a string representation, which is more space
efficient and relatively easy for manipulation [10]. However, with XML, mining
association rules is faced with more challenges due to the inherent flexibilities in
both structure and semantics, such as: 1) more complicated hierarchical data
structure; 2) ordered data context; and 3) much bigger data size. To tackle these
challenges, we propose an approach X3-Miner that efficiently extracts patterns
from a large XML data set, and overcomes the challenges by: (1) exploring the
use of a model validating approach in deducing the number of candidates
generated by taking into account of the semantics embedded in the tree-like
structure in an XML database and obtain only valid candidates out of the XML
database; (2) minimising I/O overhead by intersecting XML database with the
frequent I -itemset. This results in a frequent l-item set XML tree. The algorithm
also progressively trims infrequent k-itemsets that contain infrequent (k-I)-
itemsets. (3) extending the notion of string representation of a tree structure
proposed in [10] to xstring for describing an XML document without loss of both
structure and semantics. Such an extension enables an easier traversal of the treestructured
XML data during our model-validating candidate generation. Our
experiments with both synthetic and real-life data sets demonstrate the
effectiveness of the proposed model-validating approach in mining XML data.