Kenneth Marino

I am a Research Scientist at DeepMind. Previously, I was a PhD student at Carnegie Melon University's Machine Learning Department advised by Abhinav Gupta. During my PhD I was funded by the NDSEGand the NSF GRFP.

I completed my undergraduate education at Georgia Tech with a major in Computer Engineering and a minor in Computer Science.

Email  /  CV  /  Google Scholar  /  Github /  Twitter

profile photo

My research interests include incorporating structured knowledge into end-to-end learning, CV+NLP, RL and general problems in deep learning.

clean-usnob Distilling Internet-Scale Vision-Language Models into Embodied Agents.
Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta
ICML 2023

We propose using pretrained VLMs to supervise embodied agents by combining ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior.


clean-usnob Learning to Navigate Wikipedia by Taking Random Walks.
Manzil Zaheer*, Kenneth Marino*, Will Grathwohl*, John Schultz*, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta Christine Kaeser-Chen, Rob Fergus
NeurIPS 2022

We demonstrate Wikipedia link navigation using behavioral cloning of randomly sampled trajectories. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges


clean-usnob A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge.
Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi
ECCV 2022

A-OKVQA is a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer


clean-usnob Collaborating with language models for embodied reasoning.
Ishita Dasgupta Christine Kaeser-Chen, Kenneth Marino*, Sheila Babayan, Arun Ahuja, Felix Hill, Rob Fergus,
LaReL: Language and Reinforcement Learning Workshop, NeurIPS 2022

We investigate how to combine the abilities of LLMs in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command.


clean-usnob KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach
CVPR 2021

KRISP is a state-of-the-art method for knowledge based VQA on OK-VQA utilizing implicit and symbolic knowledge.


clean-usnob Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning.
Valerie Chen, Abhinav Gupta, Kenneth Marino
ICLR 2021

Ask Your Humans is a dataset and experimental paper looking at how we can use human-generated instructions to create interpretable and adaptable RL agents


clean-usnob Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping.
Adithya Murali, Wei Liu, Kenneth Marino, Sonia Chernova, Abhinav Gupta
CoRL 2020

A new dataset for semantic grasping and knowledge-graph based method


clean-usnob Empirically Verifying Hypotheses Using Reinforcement Learning
Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta
Under Review


clean-usnob OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
CVPR 2019

OK-VQA is a new dataset for visual question answering that requires methods which can draw upon outside knowledge to answer questions.


clean-usnob Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam
ICLR 2019

We introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks taking inspiration from ideas of periodicity and proprioception.


clean-usnob The Pose Knows: Video Forecasting by Generating Pose Futures
Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert
ICCV 2017

We introduce a new method for video forcasting that exploits human pose detectors as a free source of supervision to break the forecasting problem into a high-level pose prediction and then a low-level video stream prediciton.


clean-usnob The More You Know: Using Knowledge Graphs for Image Classification
Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta
CVPR 2017

We introduce the Graph Search Neural Network as a way of efficiently incorporating large knowledge graphs into an end-to-end vision classification pipeline.


Service & Teaching
MLD Ph.D Admissions Committee (2019-2020)

MLD M.S. Admissions Committee (2016-2017)

CMU Graduate Student Assembly (2017-Present)

Refereed for CVPR, ICCV, Neurips and ECCV
cmu Graduate Student Instructor, 16-824 Visual Learning and Recognition, Spring 2019

Graduate Student Instructor, 10-401 Introduction to Machine Learning, Spring 2018