Abstract:
As the use of the web grows globally and exponentially, it becomes increasingly harder for users to find the
information they want. Therefore, there is a need for good information filtering mechanisms. This paper
presents a new, efficient information filtering method using word clusters. Traditional filtering methods
only consider the relevance values of document. As a result, these conventional methods fail to consider the
efficiency of document retrieval, which is also crucial. Our algorithm using offline computation attempts to
cluster similar documents based on words shared by documents to produce clusters, so that the efficiency of
information filtering and retrieval can be improved.