Rewards¶
Rewards are modular components that compute per-agent reward values each step. Multiple reward instances are listed in the config and summed by the step pipeline.
Reward Base Class¶
from cogrid.core.pipeline.rewards import Reward
class MyReward(Reward):
def compute(self, prev_state, state, actions, reward_config):
coefficient = self.config.get("coefficient", 1.0)
# prev_state, state: StateView objects (pre- and post-step)
# actions: (n_agents,) int32 array
# reward_config: dict with type_ids, action indices, static_tables, etc.
rewards = ...
return rewards # (n_agents,) float32
Reward
Reward
¶
Base class for reward functions.
Subclasses define compute() which receives StateView objects and returns (n_agents,) float32 reward arrays. The returned values are the final rewards -- apply any scaling or broadcasting inside compute().
Every reward has a coefficient that controls its magnitude.
At runtime, coefficients are stored as a dynamic array in
EnvState.extra_state["reward_coefficients"] (accessible as
state.reward_coefficients in compute). This allows coefficient
updates without re-JIT on the JAX backend.
_reward_index is assigned by build_reward_config() and maps
this instance to its position in the coefficients array.
Usage::
class DeliveryReward(Reward):
def compute(self, prev_state, state, actions, reward_config):
coefficient = state.reward_coefficients[self._reward_index]
...
return rewards # (n_agents,) float32
config = {
"rewards": [DeliveryReward(coefficient=1.0, common_reward=True)],
}
Store config kwargs for use in compute().
get_coefficient
¶
Read the dynamic coefficient from state, falling back to self.coefficient.
The coefficients array lives in state.reward_coefficients
(populated from EnvState.extra_state). If absent (e.g. in
unit tests that build states manually), falls back to the value
set at init time.
compute
¶
compute(prev_state: Any, state: Any, actions: ArrayLike, reward_config: dict[str, Any]) -> ArrayLike
Compute and return (n_agents,) float32 reward array.
Subclasses must override.
InteractionReward¶
For rewards triggered by agent-object interactions, use the declarative InteractionReward base:
from cogrid.core.pipeline.rewards import InteractionReward
class OnionInPotReward(InteractionReward):
action = "pickup_drop" # "pickup_drop", "toggle", or None
holds = "onion" # agent must hold this type
faces = "pot" # forward cell must contain this type
Class attributes (declare trigger conditions):
| Attribute | Type | Description |
|---|---|---|
action |
str | None |
Required. "pickup_drop", "toggle", or None (no action filter). |
holds |
str | None |
Object type the agent must hold. |
faces |
str | None |
Object type in the agent's forward cell. |
overlaps |
str | None |
Object type at the agent's current position. |
direction |
int | None |
Direction the agent must face (0=Right, 1=Down, 2=Left, 3=Up). |
Instance config (passed via __init__ kwargs):
| Parameter | Default | Description |
|---|---|---|
coefficient |
1.0 |
Scalar multiplier on the reward. |
common_reward |
False |
If True, all agents receive the reward when any agent triggers it. |
The engine checks each condition in sequence, building a boolean mask over agents. Agents matching all conditions receive coefficient. When common_reward=True, the reward is broadcast to all agents.
extra_condition()¶
Override extra_condition() for domain-specific checks beyond the declarative attributes:
class OnionInPotReward(InteractionReward):
action = "pickup_drop"
holds = "onion"
faces = "pot"
def extra_condition(self, mask, prev_state, fwd_r, fwd_c, reward_config):
# Only reward if the pot has remaining capacity
pot_contents = prev_state.pot_contents
...
return mask & has_capacity
InteractionReward
InteractionReward
¶
Declarative base for condition-triggered rewards.
Class attributes (declare what triggers the reward): action: "pickup_drop", "toggle", or None (any/no action check). Subclasses MUST set this explicitly -- there is no default. holds: str type name agent must hold, or None faces: str type name agent must face (forward cell), or None overlaps: str type name agent must stand on, or None direction: int direction agent must face (0=R,1=D,2=L,3=U), or None
Instance config (runtime tuning via init kwargs): coefficient: float scaling factor (default 1.0) common_reward: bool broadcast to all agents (default False)
Override extra_condition() for domain-specific checks beyond the standard conditions (pot capacity, timers, etc.).
Examples::
class OnionInPotReward(InteractionReward):
action = "pickup_drop"
holds = "onion"
faces = "pot"
class GoalReward(InteractionReward):
action = None
overlaps = "goal"
compute
¶
compute(prev_state: Any, state: Any, actions: ArrayLike, reward_config: dict[str, Any]) -> ArrayLike
Compute (n_agents,) float32 rewards from declarative conditions.
extra_condition
¶
extra_condition(mask: ArrayLike, prev_state: Any, fwd_r: ArrayLike | None, fwd_c: ArrayLike | None, reward_config: dict[str, Any]) -> ArrayLike
Override to add conditions beyond the declarative attributes.
Return narrowed boolean mask.
Composition¶
Rewards are listed in the config and computed each step:
from cogrid.envs.overcooked.rewards import DeliveryReward, OnionInPotReward, SoupInDishReward
config = {
"rewards": [
DeliveryReward(coefficient=1.0, common_reward=True),
OnionInPotReward(coefficient=0.1, common_reward=False),
SoupInDishReward(coefficient=0.3, common_reward=False),
],
...
}
Each step, the engine calls compute() on every reward instance and sums the results into a single (n_agents,) float32 array.
Writing a Custom Reward¶
from cogrid.core.pipeline.rewards import InteractionReward
class GoalReward(InteractionReward):
action = None # no specific action required
overlaps = "goal" # agent stands on a goal cell
# Usage in config:
config = {
"rewards": [GoalReward(coefficient=10.0, common_reward=False)],
...
}
For rewards that do not fit the interaction pattern, subclass Reward directly and implement compute().