Abstract:
Automatic distinction between posed and spontaneous ex-
pressions is an unsolved problem. Previously cognitive sci-
ences’ studies indicated that the automatic separation of
posed from spontaneous expressions is possible using the face
modality. However, little is known about the information
from head and shoulder motion. In this work, we propose
to (i) distinguish between posed and spontaneous smiles by
fusing head, face, and shoulder modalities, (ii) investigate
which modalities carry important information and how the
modalities relate to each other, and (iii) to which extent the
temporal dynamics of these signals attribute to solving the
problem. A cylindrical head tracker is used to track head
motion and two particle filtering techniques to track facial
and shoulder motion. Classification is performed by kernel
methods combined with ensemble learning techniques. We
investigated two aspects of multimodal fusion: the level of
abstraction (i.e., early, mid-level, and late fusion) and the
fusion rule used (i.e., sum, product and weight criteria). Ex-
perimental results from 100 videos displaying posed smiles
and 102 videos displaying spontaneous smiles are presented.
Best results were obtained with late fusion of all modalities
when 94.0% of the videos were classified correctly.