Semantic Analysis in a Text-to-Scene Conversion System
| rws | ![]() |
research.att.com |
|---|
http://www.research.att.com/~rws/wordseye.html
Natural language is an easy and effective medium for describing visual ideas and mental images. Thus, we foresee the emergence of language-based 3D scene generation systems to let ordinary users quickly create 3D scenes without having to learn special software, acquire artistic skills, or even touch a desktop window-oriented interface. WordsEye is such a system for automatically converting text into representative 3D scenes. WordsEye relies on a large database of 3D models and poses to depict entities and actions. Every 3D model can have associated shape displacements, spatial tags, and functional properties to be used in the depiction process. We describe the linguistic analysis and depiction techniques used by WordsEye along with some general strategies by which more abstract concepts are made depictable.
We start with an overview of WordsEye. We then describe the intended use of FrameNet in the WordsEye system, specifically in an analysis-by-generation component currently under development. We also report on some work on developing a set of frames for food preparation.
Finally we report on a statistical method for inferring depictable properties of the environment so that if we wish to depict someone eating dinner, we know that he or she is probably in a dining room or a restaurant (rather than the bathroom), and that it is probably evening (rather than morning).
(This is joint work with Bob Coyne, Owen Rambow, Srinivas Bangalore, Christopher Johnson, and Tahir Butt.)