Skip to content

Rewards

Rewards are modular components that compute per-agent reward values each step. Multiple reward instances are listed in the config and summed by the step pipeline.

Reward Base Class

from cogrid.core.pipeline.rewards import Reward

class MyReward(Reward):
    def compute(self, prev_state, state, actions, reward_config):
        coefficient = self.config.get("coefficient", 1.0)
        # prev_state, state: StateView objects (pre- and post-step)
        # actions: (n_agents,) int32 array
        # reward_config: dict with type_ids, action indices, static_tables, etc.
        rewards = ...
        return rewards  # (n_agents,) float32
Reward

Reward

Reward(coefficient: float = 1.0, **kwargs: Any)

Base class for reward functions.

Subclasses define compute() which receives StateView objects and returns (n_agents,) float32 reward arrays. The returned values are the final rewards -- apply any scaling or broadcasting inside compute().

Every reward has a coefficient that controls its magnitude. At runtime, coefficients are stored as a dynamic array in EnvState.extra_state["reward_coefficients"] (accessible as state.reward_coefficients in compute). This allows coefficient updates without re-JIT on the JAX backend.

_reward_index is assigned by build_reward_config() and maps this instance to its position in the coefficients array.

Usage::

class DeliveryReward(Reward):
    def compute(self, prev_state, state, actions, reward_config):
        coefficient = state.reward_coefficients[self._reward_index]
        ...
        return rewards  # (n_agents,) float32


config = {
    "rewards": [DeliveryReward(coefficient=1.0, common_reward=True)],
}

Store config kwargs for use in compute().

get_coefficient
get_coefficient(state: Any) -> float

Read the dynamic coefficient from state, falling back to self.coefficient.

The coefficients array lives in state.reward_coefficients (populated from EnvState.extra_state). If absent (e.g. in unit tests that build states manually), falls back to the value set at init time.

compute
compute(prev_state: Any, state: Any, actions: ArrayLike, reward_config: dict[str, Any]) -> ArrayLike

Compute and return (n_agents,) float32 reward array.

Subclasses must override.

InteractionReward

For rewards triggered by agent-object interactions, use the declarative InteractionReward base:

from cogrid.core.pipeline.rewards import InteractionReward

class OnionInPotReward(InteractionReward):
    action = "pickup_drop"   # "pickup_drop", "toggle", or None
    holds = "onion"          # agent must hold this type
    faces = "pot"            # forward cell must contain this type

Class attributes (declare trigger conditions):

Attribute Type Description
action str | None Required. "pickup_drop", "toggle", or None (no action filter).
holds str | None Object type the agent must hold.
faces str | None Object type in the agent's forward cell.
overlaps str | None Object type at the agent's current position.
direction int | None Direction the agent must face (0=Right, 1=Down, 2=Left, 3=Up).

Instance config (passed via __init__ kwargs):

Parameter Default Description
coefficient 1.0 Scalar multiplier on the reward.
common_reward False If True, all agents receive the reward when any agent triggers it.

The engine checks each condition in sequence, building a boolean mask over agents. Agents matching all conditions receive coefficient. When common_reward=True, the reward is broadcast to all agents.

extra_condition()

Override extra_condition() for domain-specific checks beyond the declarative attributes:

class OnionInPotReward(InteractionReward):
    action = "pickup_drop"
    holds = "onion"
    faces = "pot"

    def extra_condition(self, mask, prev_state, fwd_r, fwd_c, reward_config):
        # Only reward if the pot has remaining capacity
        pot_contents = prev_state.pot_contents
        ...
        return mask & has_capacity
InteractionReward

InteractionReward

InteractionReward(coefficient: float = 1.0, **kwargs: Any)

Declarative base for condition-triggered rewards.

Class attributes (declare what triggers the reward): action: "pickup_drop", "toggle", or None (any/no action check). Subclasses MUST set this explicitly -- there is no default. holds: str type name agent must hold, or None faces: str type name agent must face (forward cell), or None overlaps: str type name agent must stand on, or None direction: int direction agent must face (0=R,1=D,2=L,3=U), or None

Instance config (runtime tuning via init kwargs): coefficient: float scaling factor (default 1.0) common_reward: bool broadcast to all agents (default False)

Override extra_condition() for domain-specific checks beyond the standard conditions (pot capacity, timers, etc.).

Examples::

class OnionInPotReward(InteractionReward):
    action = "pickup_drop"
    holds = "onion"
    faces = "pot"


class GoalReward(InteractionReward):
    action = None
    overlaps = "goal"
compute
compute(prev_state: Any, state: Any, actions: ArrayLike, reward_config: dict[str, Any]) -> ArrayLike

Compute (n_agents,) float32 rewards from declarative conditions.

extra_condition
extra_condition(mask: ArrayLike, prev_state: Any, fwd_r: ArrayLike | None, fwd_c: ArrayLike | None, reward_config: dict[str, Any]) -> ArrayLike

Override to add conditions beyond the declarative attributes.

Return narrowed boolean mask.

Composition

Rewards are listed in the config and computed each step:

from cogrid.envs.overcooked.rewards import DeliveryReward, OnionInPotReward, SoupInDishReward

config = {
    "rewards": [
        DeliveryReward(coefficient=1.0, common_reward=True),
        OnionInPotReward(coefficient=0.1, common_reward=False),
        SoupInDishReward(coefficient=0.3, common_reward=False),
    ],
    ...
}

Each step, the engine calls compute() on every reward instance and sums the results into a single (n_agents,) float32 array.

Writing a Custom Reward

from cogrid.core.pipeline.rewards import InteractionReward

class GoalReward(InteractionReward):
    action = None          # no specific action required
    overlaps = "goal"      # agent stands on a goal cell

# Usage in config:
config = {
    "rewards": [GoalReward(coefficient=10.0, common_reward=False)],
    ...
}

For rewards that do not fit the interaction pattern, subclass Reward directly and implement compute().