Identifying Good Patterns for Relation Extraction

In pattern based relation extraction, patterns that with high precision and recall produce semantically useful relations are preferred. We present a technique similar to n-gram extraction that extracts patterns from large text corpora and calculates statistics, like frequency, minimal token frequency and normalized expectation, which guide to preferred patterns. Patterns have named-instances and/or one variable length gap as arguments. We extracted patterns from a large news corpus and translated them to Cyc relations. We focused on four patterns, which we evaluate by asserting their translated relations to Cyc knowledge base.