| dc.contributor.author | Tan Hark | en_US |
| dc.contributor.author | Dillon Tharam | en_US |
| dc.contributor.author | Feng Ling | en_US |
| dc.contributor.author | Chang Elizabeth | en_US |
| dc.contributor.author | Hadzic Fedja | en_US |
| dc.contributor.editor | Zanasi, A; Brebbia, CA; Ebecken, NFF | en_US |
| dc.date.accessioned | 2009-11-09T02:45:35Z | |
| dc.date.available | 2009-11-09T02:45:35Z | |
| dc.date.issued | 2005 | en_US |
| dc.identifier | 2005000980 | en_US |
| dc.identifier.citation | Tan Hark et al. 2005, 'X3-Miner: mining patterns from an XML database', IEEE, New York, USA, pp. 287-296. | en_US |
| dc.identifier.issn | 1-84564-017-9 | en_US |
| dc.identifier.other | E1 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10453/1869 | |
| dc.description.abstract | An XML enabled framework for representation of association rules in databases was first presented in [4]. In Frequent Structure Mining (FSM), one of the popular approaches is to use graph matching that use data structures such as the adjacency matrix [7] or adjacency list [8]. Another approach represents semistructured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [10]. However, with XML, mining association rules is faced with more challenges due to the inherent flexibilities in both structure and semantics, such as: 1) more complicated hierarchical data structure; 2) ordered data context; and 3) much bigger data size. To tackle these challenges, we propose an approach X3-Miner that efficiently extracts patterns from a large XML data set, and overcomes the challenges by: (1) exploring the use of a model validating approach in deducing the number of candidates generated by taking into account of the semantics embedded in the tree-like structure in an XML database and obtain only valid candidates out of the XML database; (2) minimising I/O overhead by intersecting XML database with the frequent I -itemset. This results in a frequent l-item set XML tree. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-I)- itemsets. (3) extending the notion of string representation of a tree structure proposed in [10] to xstring for describing an XML document without loss of both structure and semantics. Such an extension enables an easier traversal of the treestructured XML data during our model-validating candidate generation. Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data. | en_US |
| dc.publisher | WIT Press | en_US |
| dc.relation.isbasedon | http://library.witpress.com/pages/PaperInfo.asp?PaperID=15013 | en_US |
| dc.title | X3-Miner: mining patterns from an XML database | en_US |
| dc.parent | Data Mining VI- Data Mining, Text Mining and their business applications | en_US |
| dc.journal.volume | en_US | |
| dc.journal.number | en_US | |
| dc.publocation | Southampton, UK | en_US |
| dc.identifier.startpage | 287 | en_US |
| dc.identifier.endpage | 296 | en_US |
| dc.cauo.name | Information Technology | en_US |
| dc.conference | 6th Conference on Data Mining - Text Mining and Their Business Applications | en_US |
| dc.conference.location | Skiathos, Greece | en_US |