COOPERA: Continual Open-Ended Human-Robot Assistance

Preprint


1 University of Oxford
2 FAIR, Meta

arXiv Code
COOPERA

TL;DR: We introduce COOPERA, a novel framework for continual, open-ended human-robot assistance. COOPERA includes (a) simulated humans driven by psychological traits and long-term intentions, (b) continuous human feedback, and (c) a benchmark and approach to personalize the robot's collaborative actions.

Abstract

To understand and collaborate with humans, robots must account for individual human traits, habits, and activities over time. However, most robotic assistants lack these abilities, as they primarily focus on predefined tasks in structured environments and lack a human model to learn from. This work introduces COOPERA, a novel framework for COntinual, OPen-Ended human-Robot Assistance, where simulated humans, driven by psychological traits and long-term intentions, interact with robots in complex environments. By integrating continuous human feedback, our framework, for the first time, enables the study of long-term, open-ended human-robot collaboration (HRC) in different collaborative tasks across various time-scales. Within COOPERA, we introduce a benchmark and an approach to personalize the robot's collaborative actions by learning human traits and context-dependent intents. Experiments validate the realism of our simulated humans and demonstrate the value of inferring and personalizing to human intents for open-ended and long-term HRC.

Framework Overview

Framework Overview

Within COOPERA, the LLM-powered human proposes whole-day intentions and tasks, which are executed in the environment. As the robot observes the human actions, it predicts a set of tasks to assist them. After each day, the human provides feedback to the robot, enabling the robot to improve for subsequent days.

Human Simulation

Human Simulation

The human-LLM is seeded with an extended profile. At each time of day, the human proposes an intention and decomposes it into tasks, aligning with profile traits and temporal dependence on intention/task history. LLM inputs are optimized with Memory Retrieval and Search, and robustness is enhanced via two rounds of Reflexion. This pipeline generates continuous, whole-day intentions and tasks executed in the environment with expressive whole-body motion.

Building an Assistive Agent

Building an Assistive Agent

Our approach decouple robot task inference into two stages: first inferring intentions, then identifying specific tasks. By chaining VLM and classifier, the robot gradually filters and selects tasks that best correlate with the human's traits and temporal context. The robot keeps track of a human profile, inferred from collaboration history. This profile, combined with human feedback, optimizes the robot-VLM through prompting and the classifiers through supervised learning.

Qualitative Results

Simulated Humans

Simulated Humans

Human-Robot Collaboration

Human-Robot Collaboration

BibTeX

@article{ma2025coopera,
  title={COOPERA: Continual Open-Ended Human-Robot Assistance},
  author={Ma, Chenyang and Lu, Kai and Desai, Ruta and Puig, Xavier and Markham, Andrew and Trigoni, Niki},
  journal = {arXiv preprint arXiv:xxxx.xxxx},
  year={2025}
}