Recent advances in deep learning gave rise to highly expressive models achieving remarkable results on visual perception tasks such as object, action and scene recognition. However, it is widely accepted that in order to develop truly intelligent systems, we need to bridge the gap between perception and cognition. Highly cognitive tasks such as planning, abstracting, reasoning and explaining are typically associated with symbolic systems which do not scale to the complex high-dimensional visual world. The relatively new field of neuro-symbolic computation proposes to combine the strengths of deep models with symbolic approaches, by using the former to learn disentangled, interpretable, low-dimensional representations which significantly reduce the search space for symbolic approaches such as program synthesis (cf. [5,15]). Another reason to study the interplay between neural and symbolic approaches is related to human, and in particular infant, learning. While far from fully understood, there is an increasing body of evidence that similar mechanisms combining low-level perception with high level cognition are at play in the human brain [1,12].

This tutorial will bring together researchers from computer vision, graphics, robotics, cognitive science, and developmental psychology to exchange ideas, share recent research results and applications in the emerging field of neuro-symbolic computation, focusing on computer vision.

Schedule (Pacific Time)

9:00 AM - 9:10 AM Opening Remarks: Jiajun Wu [Youtube]
9:10 AM - 9:40 AM Talk 1: Christopher Manning: More-neural Symbolic Concept Learning [Youtube]
9:40 AM - 10:10 AM Talk 2: Yejin Choi: Neuro-Symbolic Commonsense Intelligence: Closing the Gap between Perception and Cognition [Youtube]
10:10 AM - 10:40 AM Talk 3: Jiayuan Mao: Neuro-Symbolic Visual Concept Learning [Youtube]
10:40 AM - 10:50 AM Break 1
10:50 AM - 11:20 AM Talk 4: Daniel Ritchie: From Neural to Neurosymbolic 3D Modeling [Recording Unavailable *]
11:20 AM - 11:50 AM Talk 5: Kevin Ellis: Learning Languages for Visual Programs [Youtube]
11:50 AM - Noon Break 2
Noon - 12:30 PM Talk 6: Rishabh Singh: Towards Human-like Program Synthesis [Youtube]
12:30 AM - 13:00 PM Talk 7: Xinyun Chen: Neural Program Synthesis for Navigation and Language Understanding [Youtube]

*: Video recording for Prof. Daniel Ritchie is unavailable online as requested by the speaker.


Yejin Choi is an associate professor of Paul G. Allen School of Computer Science & Engineering at the University of Washington, adjunct of the Linguistics department, and affiliate of the Center for Statistics and Social Sciences. She is also a senior research manager at the Allen Institute for Artificial Intelligence. Her primary research interests are in the fields of Natural Language Processing, Machine Learning, Artificial Intelligence, with broader interests in Computer Vision and Digital Humanities.

Christopher Manning is the inaugural Thomas M. Siebel Professor in Machine Learning in the Departments of Linguistics and Computer Science at Stanford University, Director of the Stanford Artificial Intelligence Laboratory (SAIL), and an Associate Director of the Stanford Human-Centered Artificial Intelligence Institute (HAI). His research goal is computers that can intelligently process, understand, and generate human language material. Manning is a leader in applying Deep Learning to Natural Language Processing, with well-known research on Tree Recursive Neural Networks, the GloVe model of word vectors, sentiment analysis, neural network dependency parsing, neural machine translation, question answering, and deep language understanding. He also focuses on computational linguistic approaches to parsing, natural language inference and multilingual language processing, including being a principal developer of Stanford Dependencies and Universal Dependencies.

Daniel Ritchie is an Assistant Professor of Computer Science at Brown University, where he co-lead the Brown Visual Computing group. Ritchie is broadly interested in the intersection of computer graphics with artificial intelligence and machine learning: he builds intelligent machines that understand the visual world and can help people be visually creative.

Rishabh Singh is a research scientist in the Google Brain team, which works on developing new deep learning architectures for learning programs and program analysis. Singh develop new program synthesis techniques for helping end-users, students, and programmers. Apart from research, he enjoy playing bridge.

Xinyun Chen is a Ph.D. candidate at UC Berkeley, working with Prof. Dawn Song. Chen's research lies at the intersection of deep learning, programming languages, and security. Her recent research focuses on neural program synthesis and adversarial machine learning, towards tackling the grand challenges of increasing the accessibility of programming to general users, and enhancing the security and trustworthiness of machine learning models. She received the Facebook Fellowship in 2020.

Kevin Ellis is a Ph.D. student at MIT, advised by Professors Josh Tenenbaum and Armando Solar-Lezama, working in cognitive AI and program synthesis. Ellis develops algorithms for program induction, which means synthesizing programs from data, and apply these algorithms to problems in artificial intelligence. He Will be starting as an assistant professor in the computer science department at Cornell in summer 2021.

Jiayuan Mao is a Ph.D. student at MIT, advised by Professors Josh Tenenbaum and Leslie Kaelbling. Mao's research focuses on structured knowledge representations that can be transferred among tasks and inductive biases that improve the learning efficiency and generalization. Representative research topics are concept learning, neuro-symbolic reasoning, scene understanding, and language acquisition.


Jiayuan Mao

Kevin Ellis

Chuang Gan
(MIT-IBM Watson AI Lab)

Jiajun Wu

Dan Gutfreund
(MIT-IBM Watson AI Lab)

Josh Tenenbaum


Ben Deen, Hilary Richardson, Daniel D. Dilks, Atsushi Takahashi, Boris Keil, Lawrence L. Wald, Nancy Kanwisher, and Rebecca Saxe. Organization of high-level visual cortex in human infants. Nature communications,8(1):1–10, 2017.
Xuguang Duan, Qi Wu, Chuang Gan, Zhang Yiwei, Wenbing Huang, and Wenwu Zhu. Watch, reason and code: Learning to represent videos using program. In ACM Multimedia, 2019.
Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Joshua B. Tenenbaum, and Armando Solar-Lezama. Write, execute, assess: Program synthesis with a RERL. In NeurIPS, 2019
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B. Tenenbaum. Learning to infer graphics programs from hand-drawn images. In NeurIPS, 2018.
Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, and Boqing Gong. VQS: Linking segmentations to questions and answers for supervised attention in VQA and question-focused semantic segmentation. In ICCV, 2017.
Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, and Jiajun Wu. Visual concept metaconcept learner. In NeurIPS, 2019.
Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In ICLR, 2019.
Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, and Russ Tedrake. Propagation networks for model-based controlunder partial observation. In ICRA, 2019.
Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B Tenenbaum, and Jiajun Wu. Learning to describe scenes with programs. In ICLR, 2019.
Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In ICLR, 2019.
Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum,and Jiajun Wu. Program-guided image manipulators. In ICCV, 2019.
Elizabeth S. Spelke and Katherine D. Kinzler. Core knowledge. Developmental Science, 10(1):89–96, 2007.
Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Free-man, Joshua B. Tenenbaum, and Jiajun Wu. Learning to infer and execute3d shape programs. In ICLR, 2019.
Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, and Wei-Ying Ma. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations. In CVPR, 2019.
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Joshua B. Tenenbaum. Neural-symbolic VQA: Disentangling reasoning from vision and language understanding. In NeurIPS, 2018.