Menu

Predictive Modelling in the Wild: Success Factors in Data Mining Competitions and Real-World Applications

calendar icon Sep 14, 2009 9248 views
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

In this tutorial, we give our perspective on the keys to success in application of predictive modeling to competitions like KDD Cup and real-life business intelligence projects. We argue that these two modes of applying predictive modeling share many similarities, but have also some important differences. We discuss the main success factors in predictive modeling: domain understanding, statistical acumen, and appropriate algorithmic approaches. We describe our relevant experiences in the context of three recent predictive modeling competitions where our team has had success (KDD Cup 2007 and 2008 and INFORMS DM challenge 2008) and two case studies of projects we have led at IBM Research. We also survey some of the recurring challenges and complexities in practical predictive modeling applications. One key issue is information leakage, and we discuss its definition, influence, detection and avoidance. We consider leakage to be the silent killer of many predictive modeling projects, and we demonstrate its impact on the competitions, and discuss the challenges in addressing it in the real-life projects. Other challenges include framing real-life modeling objectives into predictive modeling, and usefully applying relational learning concepts when modeling "real-life" complex, relational datasets.

RELATED CATEGORIES

MORE VIDEOS FROM THE EVENT

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.