Gesture Generation by Imitation
From Human Behavior to Computer Character Animation
In an effort to extend traditional human-computer interfaces research has introduced embodied agents to utilize the modalities of everyday human-human communication, like facial expression, gestures and body postures. However, giving computer agents a human-like body introduces new challenges. Since human users are very sensitive and critical concerning bodily behavior the agents must act naturally and individually in order to be believable.
This dissertation focuses on conversational gestures. It shows how to generate conversational gestures for an animated embodied agent based on annotated text input. The central idea is to imitate the gestural behavior of a human individual. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. The gesture generation task is solved in three stages: observation, modeling and generation. For each stage, a software module was developed.
For observation, the video annotation research tool ANVIL was created. It allows the efficient transcription of gesture, speech and other modalities on multiple layers. ANVIL is application-independent by allowing users to define their own annotation schemes, it provides various import/export facilities and it is extensible via its plug-in interface. Therefore, the tool is suitable for a wide variety of research fields. For this work, selected clips of the TV talk show ``Das Literarische Quartett'' were transcribed and analyzed, arriving at a total of 1,056 gestures. For the modeling stage, the NOVALIS module was created to compute individual gesture profiles from these transcriptions with statistical methods. A gesture profile models the aspects handedness, timing and function of gestures for a single human individual using estimated conditional probabilities. The profiles are based on a shared lexicon of 68 gestures, assembled from the data. Finally, for generation, the NOVA generator was devised to create gestures based on gesture profiles in an overgenerate-and-filter approach. Annotated text input is processed in a graph-based representation in multiple stages where semantic data is added, the location of potential gestures is determined by heuristic rules, and gestures are added and filtered based on a gesture profile. NOVA outputs a linear, player-independent action script in XML.
About the Author
Michael Kipp studied Computer Science, Mathematics and Psychology at Saarland University, Germany, and the University of Edinburgh, UK. From 1997 on he worked at the German Research Center for Artificial Intelligence (DFKI) on fields as diverse as neural networks, machine translation, embodied agents and virtual theater. After finishing his Doctor of Engineering in 2003, he embarked on a whole new career journey by starting to work at the National Theater of the Saarland as a director's assistent.