Industrial robots have been mostly limited to carefully controlled environments and driven by meticulously hand-coded scripts.1Siciliano, Bruno, and Oussama Khatib, eds. Springer Handbook of Robotics. Springer, 2016. While this approach works for certain high-volume products like automobiles, most tasks are prohibitively expensive to automate in this way. By comparison, humans intelligently adapt, quickly master new skills, and easily change environments. At Vicarious, we are building systems to bring human-like intelligence to the world of robots.
To reach this challenging goal, research at Vicarious is organized differently from mainstream deep learning and computer vision research. This sort of work has been traditionally directed around a set of standard benchmarks, which allow for researchers to steadily improve their techniques and compare results. However, using standard datasets as the primary benchmark of progress is a controversial issue,2A. Yuille, “Computer vision needs a core and foundations” Image and Vision Computing, 2012. and many researchers have noted the downsides of it, the main objection being that to beat a benchmark the best strategy is to take the winning model from one year and to make incremental modifications on top of that. This creates a significant barrier to entry for novel methods,3A. Yuille, “Computer vision needs a core and foundations” Image and Vision Computing, 2012. and makes it difficult to understand why a particular model does well. In addition, many different modes of errors are mixed up in these large datasets, which means that fundamental improvements in some aspects of the model bring only marginal improvements in overall performance of the system.4D. Hoiem, Y. Chodpathumwan, and Q. Dai, “Diagnosing Error in Object Detectors” ECCV, 2012. Finally, the benchmark-beating techniques tend to over-fit to the idiosyncrasies of the data.5A. Torralba and A. A. Efros, “Unbiased Look at Dataset Bias” CVPR, 2011.6J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B. C. Russell, A. Torralba, C. K. I. Williams, J. Zhang, A. Zisserman, “Dataset Issues in Object Recognition” Toward Category-Level Object Recognition, 2006.
In contrast, our research is organized around a set of questions and themes, and we use datasets that are appropriately designed to probe those questions. When we test on standard benchmarks, we try to carefully identify and characterize the sources of errors 3 instead of trying to beat the current state of the art algorithm. While this approach has the downside that it might not beat many standard benchmarks in the short term, we believe proper execution of our approach will lead to significantly better understanding and significantly better performing systems in the long term.
The following themes and constraints are emphasized in the work we do: