Data Intelligence
Fourth of a set of industry talks at ISWC. “Data Intelligence”. Today, lots of research is inhibited because of data that cannot be made available to the research community (this has been one of my gripes for a while, so great to be hearing from MS). Big issue of course is data privacy. Goldcorp challenge. Goldmining company. Provided all their survey data to the public online, let challenge participants register and seek gold, offer prizes. Very successful in identifying good mining targets. In our field, innovation inhibited by inability to diseminate info due to privacy concerns. Many newsworthy privacy violations (cracking of anonymized search logs, anonymized health records, anonymized video ratings) discourage data release. She proposes a framework for specifying how data can be used, so that scientists can sign licenses on the data they are getting. I’m still in favor of instead focusing on ways to let users release some subset of information about themselves unconditionally—I think that for most users, deciding what subset is unconditionally save is a much easier job than deciding the restricted conditions (under arbitrary unimaginable circumstances) under which all their data is safe.