Internals of an Aggregated Web News Feed
Internals of an Aggregated Web News Feed
en
0.25
0.5
0.75
1.25
1.5
1.75
2
We present a pipeline for acquiring a clean, continuous, real-time aggregated stream of publically available news articles from web sites across the world. The articles are stripped of the web page chrome and semantically enriched to include e.g. a list of entities appearing in each article. The results are cached and distributed in an efficient manner.