Menu

Language identification of documents and queries

calendar icon Oct 8, 2013 2378 views
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Language identification is a relatively simple and well-solved task. In the talk, I will give an overview of existing standard techniques, and discuss their application to two text types: crawled Web documents and user search queries. Both present specific challenges: - for Web documents - multilinguality, genre variability; - for queries - they are just too short for reliable attribution: hence the need for extra data (user context) to resolve potential ambiguity. I will talk about Yandex endeavours to cope with all that.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.