Abstract:
Fully taking into account the hints possibly hidden in the absent data,
this paper proposes a new criterion when selecting attributes for splitting to
build a decision tree for a given dataset. In our approach, it must pay a certain
cost to obtain an attribute value and pay a cost if a prediction is error. We use
different scales for the two kinds of cost instead of the same cost scale defined
by previous works. We propose a new algorithm to build decision tree with null
branch strategy to minimize the misclassification cost. When consumer offers
finite resources, we can make the best use of the resources as well as optimal
results obtained by the tree. We also consider discounts in test costs when
groups of attributes are tested together. In addition, we also put forward advice
about whether it is worthy of increasing resources or not. Our results can be
readily applied to real-world diagnosis tasks, such as medical diagnosis where
doctors must try to determine what tests should be performed for a patient to
minimize the misclassification cost in certain resources.