## Ph.D. Thesis: "Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue" [<img style="float: right" src="files/agus-julia-small.jpg">](files/agus-julia.jpg) The dissertation was awarded «distinction» by the Department of Computer Science. **Defense:** Jan 28th, 2009 - 3PM - CS Conference Room. **Advisor:** Julia Hirschberg. **Committee Members:** Kathleen McKeown, Rebecca Passonneau, Maxine Eskenazi, Amanda Stent. [Download PDF document (4.5 MB)](files/gravano_thesis_2009.pdf) Please check out the [errata](#errata) section below. ###Abstract: As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as *okay* or *alright* that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination. ###Errata: * Section 6.1.7, **IPU duration**, first paragraph: &ldquo;The number of words in IPUs preceding smooth switches (S) is significantly <font style="text-decoration:line-through;color:red">smaller</font> <font style="font-weight:bold;color:green">larger</font> than in IPUs preceding holds (H)&rdquo;. * Table 6.6: Speakers 110 and 113 did not show significant evidence of the Intonation turn-yielding cue. * Section 6.1.8, **Speaker variation**, first paragraph: &ldquo;<font style="text-decoration:line-through;color:red">Six</font> <font style="font-weight:bold;color:green">Five</font> speakers show evidence of all seven cues, while the remaining <font style="text-decoration:line-through;color:red">seven</font> <font style="font-weight:bold;color:green">eight</font> speakers show <font style="text-decoration:line-through;color:red">at least</font> <font style="font-weight:bold;color:green">either five or</font> six cues&rdquo;. * Section 11, footnote 1: &ldquo;In the coding scheme presented in Chapter <font style="text-decoration:line-through;color:red">3</font> <font style="font-weight:bold;color:green">12</font>&rdquo;.