Abstract:
In this paper we look at automated classification to determine a metadata attribute related to the 'tone' of a consumer-oriented breast cancer Webpage as medical or supportive. We use a semantic space model called hyperspace analog to language (HAL), based on word co-occurrence, to provide features for webpage classification. Adaptive k-local hyperplane (AKLH), an extension of k nearest neighbour, is then applied to training and testing data. We observe 92% classification accuracy on test cases. This combination of methods appears promising for identifying non-trivial metadata attributes of consumer health webpages, with potential use embedded in a search engine or as a meta-data coding support tool.