TY - GEN
T1 - Placing objects in gesture space
T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
AU - Han, Ting
AU - Kennington, Casey
AU - Schlangen, David
N1 - Publisher Copyright:
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2018
Y1 - 2018
N2 - When describing routes not in the current environment, a common strategy is to anchor the description in configurations of salient landmarks, complementing the verbal descriptions by “placing” the non-visible landmarks in the gesture space. Understanding such multimodal descriptions and later locating the landmarks from real world is a challenging task for the hearer, who must interpret speech and gestures in parallel, fuse information from both modalities, build a mental representation of the description, and ground the knowledge to real world landmarks. In this paper, we model the hearer's task, using a multimodal spatial description corpus we collected. To reduce the variability of verbal descriptions, we simplified the setup to use simple objects as landmarks. We describe a real-time system to evaluate the separate and joint contributions of the modalities. We show that gestures not only help to improve the overall system performance, even if to a large extent they encode redundant information, but also result in earlier final correct interpretations. Being able to build and apply representations incrementally will be of use in more dialogical settings, we argue, where it can enable immediate clarification in cases of mismatch.
AB - When describing routes not in the current environment, a common strategy is to anchor the description in configurations of salient landmarks, complementing the verbal descriptions by “placing” the non-visible landmarks in the gesture space. Understanding such multimodal descriptions and later locating the landmarks from real world is a challenging task for the hearer, who must interpret speech and gestures in parallel, fuse information from both modalities, build a mental representation of the description, and ground the knowledge to real world landmarks. In this paper, we model the hearer's task, using a multimodal spatial description corpus we collected. To reduce the variability of verbal descriptions, we simplified the setup to use simple objects as landmarks. We describe a real-time system to evaluate the separate and joint contributions of the modalities. We show that gestures not only help to improve the overall system performance, even if to a large extent they encode redundant information, but also result in earlier final correct interpretations. Being able to build and apply representations incrementally will be of use in more dialogical settings, we argue, where it can enable immediate clarification in cases of mismatch.
UR - http://www.scopus.com/inward/record.url?scp=85060461097&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85060461097
T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
SP - 5157
EP - 5164
BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Y2 - 2 February 2018 through 7 February 2018
ER -