Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions (ICRA-2018)

Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, Jethro Tan. Proceedings of International Conference on Robotics and Automation (ICRA), 2018. Best Paper on HRI.


Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures including a wide variety of expressions used in spoken language and (2) inherent ambiguity in interpretation of human instructions. In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and how higher success rates of the object picking task can be achieved through an interactive clarification process.

Project page:


Incremental Joint Approach to Word Segmentation, POS Tagging and Dependency Parsing in Chinese (ACL-2012)

Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, Jun’ichi Tsujii. Incremental Joint Approach to Chinese Word Segmentation, POS Tagging, and Dependency Parsing. In the Proceedings of the 50th Annual Meeting for the Association of Computational Linguistics (ACL-2012). Jeju, Korea. 2012.

We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models. We also describe our method to align comparable states in the beam, and how we can combine features of different characteristics in our incremental framework. In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing. We also perform comparison experiments with the partially joint models.


Slides: Google Drive

Source code:

See Publications for a more comprehensive list.