Abstract:
Traditional bag-of-words model and recent word sequence
kernel are two well-known techniques in the
field of text categorization. Bag-of-words representation
neglects the word order, which could result in less
computation accuracy for some types of documents,
Word-sequence kernel takes into account word order,
but does not include all information of the word frequency.
A weighted kernel model that combines these
two models was proposed by the authors [1]. This paper
is focused all the optimization of the weighting paramaters.
which are functions of word frequency, Experiments
have been conducted with Reuter's database
aud show that the new weighted kernel achieves
better classification accuracy.