Abstract:
Tree mining has recently attracted a lot of interest in areas such as
Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with
mining frequent induced and embedded subtrees. While more interesting patterns
can be obtained when mining embedded subtrees, unfortunately mining
such embedding relationships can be very costly. In this paper, we propose an
efficient approach to tackle the complexity of mining embedded subtrees by
utilizing a novel Embedding List representation, Tree Model Guided enumeration,
and introducing the Level of Embedding constraint. Thus, when it is too
costly to mine all frequent embedded subtrees, one can decrease the level of
embedding constraint gradually up to 1, from which all the obtained frequent
subtrees are induced subtrees. Our experiments with both synthetic and real
datasets against two known algorithms for mining induced and embedded subtrees,
FREQT and TreeMiner, demonstrate the effectiveness and the efficiency
of the technique.