A simple generative model of incremental reference resolution for situated dialogue

Casey Kennington, David Schlangen

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Referring to visually perceivable objects is a very common occurrence in everyday language use. In order to produce expressions that refer, the speaker needs to be able to pick out visual properties that the referred object has and determine the words that name those properties, such that the expression can direct a listener's attention to the intended object. The speaker can aid the listener by looking in the direction of the object and by providing a pointing gesture to indicate it. In order to resolve the reference, the listener has a difficult job to do: simultaneously use all of the linguistic and non-linguistic information; the words of the referring expression that denote properties of the object, such as its colour or shape, need to already be known, and the non-linguistic gaze direction and pointing gesture of the speaker need to be incorporated. Crucially, the listener does not wait until the end of the referring expression before she begins to resolve it; rather, she is interpreting it as it unfolds. A model that resolves referring expressions as the listener must be able to do all of these things. In this paper, we present such a generative model of reference resolution. We explain our model and show empirically through a series of experiments that the model can work incrementally (i.e., word for word) as referring expressions unfold, can incorporate multimodal information such as gaze and pointing gestures in two ways, can learn a grounded meaning of words in the referring expression, can incorporate contextual (i.e., saliency) information, and is robust to noisy input such as automatic speech recognition transcriptions, as well as uncertainty in the representation of the candidate objects.

Original languageEnglish
Pages (from-to)43-67
Number of pages25
JournalComputer Speech & Language
Volume41
DOIs
StatePublished - 1 Jan 2017

Keywords

  • Dialogue
  • Incremental
  • Reference resolution
  • Situated
  • Stochastic

Fingerprint

Dive into the research topics of 'A simple generative model of incremental reference resolution for situated dialogue'. Together they form a unique fingerprint.

Cite this