In recent years, machine learning has seen the development of a series of algorithms for parameter learning that avoid estimating the partition function and instead, rely on accurate approximate MAP i