Online learning to rank for information retrieval (IR) aims to enable search systems to learn directly from interactions with their users. In our recent work, we explore formulations based on reinforc