PhD Thesis Completion Seminar
Wednesday, 7 March 1-2pm
Seminar Room IR214 , Ian Ross Building 31
Speaker: Peter Anderson
Title: Towards Agents that See, Communicate and Act
Abstract: One of the long-term goals of artificial intelligence (AI) research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or application environment. Achieving this tripartite goal (agents that see, communicate and act) is central to the development of household service robots, contextually-aware virtual assistants, voice-guided drones and many other applications. In this talk, I will focus initially on tasks that bridge visual and linguistic understanding, such as image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and methods for improving generalization to images 'in the wild'. We will then move to consider embodied vision-and-language agents that actively interact with their environments. I will introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in an environment constructed from dense RGB-D imagery of 90 large buildings.
Bio: Peter Anderson is PhD candidate in Computer Science at ANU, supervised by Dr Stephen Gould, and a researcher within the Australian Centre for Robotic Vision. During his PhD he has visited numerous universities and research labs including Adelaide University, Macquarie University, Queensland University of Technology and Microsoft Research. He has published at CVPR, ECCV, EMNLP and ICRA, and recently his team won first place in the highly competitive 2017 CVPR visual question answering (VQA) challenge. He holds two undergraduate degrees, both with university medal, and first started his career in finance before switching to computer science.