Reinforcement learning objectives constrain the cognitive map

In this work, we detail a model of the cognitive map predicated on the assumption that spatial representations are optimized for maximizing reward in spatial tasks. We describe how this model gives rise to a number of experimentally observed behavioral and neural phenomena, including neuronal populations known as place and grid cells. Place and grid cells are spatially receptive cells found in the hippocampus and entorhinal cortex, respectively. Classic place cells have a single firing field tied to a specific location in space. The firing properties of these cells are sensitive to behaviorally relevant conditions in the environment; for instance, they tend to be skewed along commonly traveled directions, clustered around rewarded locations, and influenced by the geometric structure of the environment. Grid cells exhibit multiple firing fields arranged periodically over space. These cells reside in the entorhinal cortex, and vary systematically in their scale, phase, and orientation. We hypothesize that place fields encode not just information about the current location, but also predictions about future locations under the current policy. Under this model, a variety of place field phenomena arise naturally from the disposition of rewards and barriers and from directional biases as reflected in the transition policy. Furthermore, we demonstrate that this representation of space can support efficient reinforcement learning (RL). We also propose that grid cells compute the eigendecomposition of place fields, one result of which is the segmentation of an enclosure along natural boundaries. When applied recursively, this segmentation can be used to discover a hierarchical decomposition of space, allowing grid cells to support the identification of subgoals for hierarchical RL. This suggests a substrate for the long-standing finding that humans tend to divide space hierarchically, resulting in systematic biases about relations between locations in different regions.