Overcooked¶
Originally proposed by Carroll et al. (2019), the Overcooked-AI environments have become a standard in cooperative multi-agent reinforcement learning and human-AI interaction.
Variants¶
| Environment ID | Agents | Layout | Orders |
|---|---|---|---|
Overcooked-CrampedRoom-V0 |
2 | Cramped Room | No |
Overcooked-AsymmetricAdvantages-V0 |
2 | Asymmetric Advantages | No |
Overcooked-CoordinationRing-V0 |
2 | Coordination Ring | No |
Overcooked-ForcedCoordination-V0 |
2 | Forced Coordination | No |
Overcooked-CounterCircuit-V0 |
2 | Counter Circuit | No |
Overcooked-CrampedRoom-SingleAgent-V0 |
1 | Cramped Room | No |
Overcooked-CrampedMixedKitchen-V0 |
2 | Cramped Mixed Kitchen | Yes |
Objects¶
| Char | Name | Description |
|---|---|---|
O |
OnionStack | Infinite onion dispenser |
T |
TomatoStack | Infinite tomato dispenser |
o |
Onion | Individual onion (pickupable) |
t |
Tomato | Individual tomato (pickupable) |
U |
Pot | Cooking container (capacity 3, pickup requires plate) |
= |
PlateStack | Infinite plate dispenser |
P |
Plate | Individual plate (pickupable) |
S |
OnionSoup | Cooked onion soup (pickupable, deliverable) |
! |
TomatoSoup | Cooked tomato soup (pickupable, deliverable) |
@ |
DeliveryZone | Delivery target |
C |
Counter | Surface that holds one object |
Recipes¶
Recipes are defined in cogrid.envs.overcooked.recipes and declared on the Pot class:
from cogrid.core.objects.containers import Container
from cogrid.envs.overcooked.recipes import Recipe
@register_object_type("pot", scope="overcooked")
class Pot(GridObj):
container = Container(capacity=3, pickup_requires="plate")
recipes = [
Recipe(["onion", "onion", "onion"], result="onion_soup", cook_time=30, reward=1.0),
Recipe(["tomato", "tomato", "tomato"], result="tomato_soup", cook_time=30, reward=1.0),
]
Container is a core primitive (any object that holds items). Recipe is Overcooked-specific — it declares how ingredients combine into a result after cooking.
| Parameter | Type | Default | Description |
|---|---|---|---|
ingredients |
list[str] |
required | Object IDs of required ingredients (order does not matter). |
result |
str |
required | Object ID produced when cooking finishes. |
cook_time |
int |
0 |
Steps to cook once the container is full. |
reward |
float |
0.0 |
Reward value on delivery of the result. |
Pipeline: Pick up ingredient from stack -> place in pot (x3) -> pot cooks for 30 steps -> pick up plate from plate stack -> pick up soup from pot (requires plate) -> deliver at delivery zone.
The autowire system reads container and recipes and generates all interaction branches, tick handlers, extra state, and static tables.
Actions¶
All Overcooked variants use cardinal_actions:
| Index | Action | Overcooked Context |
|---|---|---|
| 0 | MoveUp | Move one cell up |
| 1 | MoveDown | Move one cell down |
| 2 | MoveLeft | Move one cell left |
| 3 | MoveRight | Move one cell right |
| 4 | PickupDrop | Pick up ingredient/plate/soup, place in pot, place on counter |
| 5 | Toggle | Activates button indicator (V2 Grounded Coordination only) |
| 6 | Noop | Do nothing |
Observations¶
Default feature set (Cramped Room config):
| Name | Per-Agent | Dim | Description |
|---|---|---|---|
agent_dir |
Yes | 4 | One-hot facing direction |
overcooked_inventory |
Yes | 5 | One-hot over 5 pickupable types |
next_to_counter |
Yes | 4 | Cardinal adjacency to counters |
next_to_pot |
Yes | 16 | Pot adjacency with contents/timer encoding |
object_type_masks |
No | 770 | Binary spatial masks for 10 object types |
ordered_pot_features |
Yes | 24 | Per-pot features in grid-scan order |
dist_to_other_players |
Yes | 2 | Delta vector to partner |
agent_position |
Yes | 2 | Grid coordinates |
can_move_direction |
Yes | 4 | Passable cardinal neighbors |
Rewards¶
| Class | Coefficient | Common | Trigger |
|---|---|---|---|
DeliveryReward |
20.0 | Yes | Deliver any recipe output to delivery zone |
OnionInPotReward |
3.0 | Yes | Place an onion into a pot with remaining capacity |
SoupInDishReward |
5.0 | Yes | Pick up finished soup from pot while holding a plate |
Code Example¶
import jax
import cogrid
env = cogrid.make("Overcooked-CrampedRoom-V0", backend="jax")
env.reset(seed=0)
n_agents = len(env.possible_agents)
n_actions = len(env.action_set)
def step_fn(carry, _):
state, key = carry
key, step_key, action_key = jax.random.split(key, 3)
actions = {i: jax.random.randint(jax.random.fold_in(action_key, i), (), 0, n_actions)
for i in range(n_agents)}
obs, state, rewards, terminated, truncated, info = env.jax_step(step_key, state, actions)
return (state, key), rewards
@jax.jit
def rollout(key):
key, reset_key = jax.random.split(key)
obs, state, info = env.jax_reset(reset_key)
(final_state, _), all_rewards = jax.lax.scan(
step_fn, (state, key), None, length=env.max_steps,
)
return all_rewards # {agent_id: (max_steps,)}
rewards = rollout(jax.random.key(0))
Stochastic Orders (Cramped Mixed Kitchen)¶
The Overcooked-CrampedMixedKitchen-V0 layout adds a stochastic order queue to the base Overcooked mechanics. Orders spawn randomly, count down, and expire with a penalty — similar to the Overcooked video game. This introduces a planning and coordination challenge beyond the standard fixed-recipe layouts: agents must identify which recipe is needed, coordinate to fulfill it before the timer runs out, and avoid delivering the wrong soup.
Order Queue¶
Orders are controlled by an order configuration:
order_config = {
"spawn_probs": {"onion_soup": 0.005, "tomato_soup": 0.005},
"max_active": 2,
"time_limit": 300,
}
| Parameter | Type | Description |
|---|---|---|
max_active |
int |
Maximum simultaneous orders. |
spawn_probs |
dict[str, float] |
Per-recipe spawn probability per step into each empty slot. Must sum to ≤ 1.0. |
time_limit |
int |
Steps before an order expires. |
At each step, each empty order slot is independently sampled — if the roll lands on a recipe, that order becomes active with a countdown timer set to time_limit. Active orders tick down by 1 each step and expire (clearing the slot) when the timer reaches zero.
Order Observation¶
The order_observation feature encodes active orders into the observation vector:
- Dimension:
max_active * (n_recipes + 1). - Per order slot: recipe one-hot (
n_recipes) + normalized time remaining (0.0–1.0). - Global: both agents see the same order state.
Inactive slots are all zeros. Add "order_observation" to the features list to enable.
Rewards¶
| Class | Coefficient | Common | Trigger |
|---|---|---|---|
OrderDeliveryReward |
20.0 | Yes | Deliver a soup matching an active order (+20 correct, -20 incorrect) |
OrderGatedIngredientInPotReward |
3.0 | No | Place ingredient in pot, gated on matching active order |
ExpiredOrderPenalty |
-20.0 | Yes | An active order expires |
Custom Order Configuration¶
Enable orders on any layout by adding the order config, tick function, and extra state initializer:
import functools
from cogrid.envs.overcooked.config import build_order_extra_state, build_order_tick
config["orders"] = order_config
config["tick_fn"] = build_order_tick(order_config, recipe_results=["onion_soup", "tomato_soup"])
config["extra_state_init_fn"] = functools.partial(build_order_extra_state, order_config)
Custom Ingredients¶
Register new ingredient + stack pairs at runtime with make_ingredient_and_stack(). Call before environment creation so interaction tables include the new types.
from cogrid.envs.overcooked.overcooked_grid_objects import make_ingredient_and_stack
from cogrid.core import constants
Mushroom, MushroomStack = make_ingredient_and_stack(
ingredient_name="mushroom",
ingredient_char="m",
ingredient_color=constants.Colors.Brown,
stack_name="mushroom_stack",
stack_char="M",
scope="overcooked",
)
The OvercookedInventory feature auto-adjusts -- it discovers all pickupable types from the registry at compose time.
Config Reference¶
Full Cramped Room config:
cramped_room_config = {
"name": "overcooked",
"num_agents": 2,
"action_set": "cardinal_actions",
"features": [
"agent_dir",
"overcooked_inventory",
"next_to_counter",
"next_to_pot",
"object_type_masks",
"ordered_pot_features",
"dist_to_other_players",
"agent_position",
"can_move_direction",
],
"rewards": [
DeliveryReward(coefficient=20.0, common_reward=True),
OnionInPotReward(coefficient=3.0, common_reward=True),
SoupInDishReward(coefficient=5.0, common_reward=True),
],
"grid": {"layout": "overcooked_cramped_room_v0"},
"max_steps": 1000,
"scope": "overcooked",
"pickupable_types": ["onion", "onion_soup", "plate", "tomato", "tomato_soup"],
}
Different layouts swap grid.layout. Different gameplay swaps rewards and features. The single-agent variant sets num_agents: 1.