Obtaining judgments from human raters is a vital part in the design of search engines’ evaluation. Today, there exists a discrepancy between judgment acquisition from raters (training phase) and use