Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

UTSePress Research/Manakin Repository

Search UTSePress Research


Advanced Search

Browse

My Account

Show simple item record

dc.contributor.author Wang Tao en_US
dc.contributor.author Qin Zhenxing en_US
dc.contributor.author Jin Zhi en_US
dc.contributor.author Zhang Shichao en_US
dc.contributor.editor en_US
dc.date.accessioned 2011-02-07T06:22:05Z
dc.date.available 2011-02-07T06:22:05Z
dc.date.issued 2010 en_US
dc.identifier 2009007504 en_US
dc.identifier.citation Wang Tao et al. 2010, 'Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning', Elsevier Ltd, vol. 83, no. 7, pp. 1137-1147. en_US
dc.identifier.issn 0164-1212 en_US
dc.identifier.other C1 en_US
dc.identifier.uri http://hdl.handle.net/10453/13481
dc.description.abstract Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. en_US
dc.language en_US
dc.publisher Elsevier Ltd en_US
dc.relation.isbasedon http://dx.doi.org/10.1016/j.jss.2010.01.002 en_US
dc.title Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning en_US
dc.parent Journal of Systems and Software en_US
dc.journal.volume 83 en_US
dc.journal.number 7 en_US
dc.publocation UK en_US
dc.identifier.startpage 1137 en_US
dc.identifier.endpage 1147 en_US
dc.cauo.name FEIT.School of Systems, Management and Leadership en_US
dc.conference Verified OK en_US
dc.for 080300 en_US
dc.personcode 10503276;999567;100789;020030 en_US
dc.percentage 000050 en_US
dc.classification.name Computer Software en_US
dc.classification.type FOR-08 en_US
dc.edition en_US
dc.custom en_US
dc.date.activity en_US
dc.location.activity en_US
dc.description.keywords Classification; Cost-sensitive learning; Over-fitting en_US
dc.staffid Peking University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record