Kenneth Marino
I am a Research Scientist at DeepMind. Previously, I was a PhD student at Carnegie Melon University's Machine Learning Department advised by Abhinav Gupta. During my PhD I was funded by the NDSEGand the NSF GRFP.
I completed my undergraduate education at Georgia Tech with a major in Computer Engineering and a minor in Computer Science.
Email  / 
CV  / 
Google Scholar  / 
Github / 
Twitter
|
|
Research
My research interests include incorporating structured knowledge into end-to-end learning, CV+NLP, RL and general problems in deep learning.
|
|
Distilling Internet-Scale Vision-Language Models into Embodied Agents.
Theodore Sumers,
Kenneth Marino,
Arun Ahuja,
Rob Fergus,
Ishita Dasgupta
ICML 2023
We propose using pretrained VLMs to supervise embodied agents by combining ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior.
[Paper]
|
|
Learning to Navigate Wikipedia by Taking Random Walks.
Manzil Zaheer*,
Kenneth Marino*,
Will Grathwohl*,
John Schultz*,
Wendy Shang,
Sheila Babayan,
Arun Ahuja,
Ishita Dasgupta
Christine Kaeser-Chen,
Rob Fergus
NeurIPS 2022
We demonstrate Wikipedia link navigation using behavioral cloning of randomly sampled trajectories. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges
[Paper]
|
|
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge.
Dustin Schwenk,
Apoorv Khandelwal,
Christopher Clark,
Kenneth Marino,
Roozbeh Mottaghi
ECCV 2022
A-OKVQA is a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer
[Paper]
|
|
Collaborating with language models for embodied reasoning.
Ishita Dasgupta
Christine Kaeser-Chen,
Kenneth Marino*,
Sheila Babayan,
Arun Ahuja,
Felix Hill,
Rob Fergus,
LaReL: Language and Reinforcement Learning Workshop, NeurIPS 2022
We investigate how to combine the abilities of LLMs in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command.
[Paper]
|
|
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino,
Xinlei Chen,
Devi Parikh,
Abhinav Gupta,
Marcus Rohrbach
CVPR 2021
KRISP is a state-of-the-art method for knowledge based VQA on OK-VQA utilizing implicit and symbolic knowledge.
[Paper]
|
|
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning.
Valerie Chen,
Abhinav Gupta,
Kenneth Marino
ICLR 2021
Ask Your Humans is a dataset and experimental paper looking at how we can use human-generated instructions to create interpretable and adaptable RL agents
[Paper]
|
|
Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping.
Adithya Murali,
Wei Liu,
Kenneth Marino,
Sonia Chernova,
Abhinav Gupta
CoRL 2020
A new dataset for semantic grasping and knowledge-graph based method
[Paper]
|
|
Empirically Verifying Hypotheses Using Reinforcement Learning
Kenneth Marino,
Rob Fergus,
Arthur Szlam,
Abhinav Gupta
Under Review
[Paper]
|
|
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino,
Mohammad Rastegari,
Ali Farhadi,
Roozbeh Mottaghi
CVPR 2019
OK-VQA is a new dataset for visual question answering that requires methods which can draw upon outside knowledge to answer questions.
[Paper][Website]
|
|
Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
Kenneth Marino,
Abhinav Gupta,
Rob Fergus,
Arthur Szlam
ICLR 2019
We introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks taking inspiration from ideas of periodicity and proprioception.
[Paper][Website][Code]
|
|
The Pose Knows: Video Forecasting by Generating Pose Futures
Jacob Walker,
Kenneth Marino,
Abhinav Gupta,
Martial Hebert
ICCV 2017
We introduce a new method for video forcasting that exploits human pose detectors as a free source of supervision to break the forecasting problem into a high-level pose prediction and then a low-level video stream prediciton.
[Paper][Website][Code]
|
|
The More You Know: Using Knowledge Graphs for Image Classification
Kenneth Marino,
Ruslan Salakhutdinov,
Abhinav Gupta
CVPR 2017
We introduce the Graph Search Neural Network as a way of efficiently incorporating large knowledge graphs into an end-to-end vision classification pipeline.
[Paper][Code]
|
|