Ghost
Adaptive web page content identification
Extracting article text from the web with maximum subsequence segmentation
Progress Report 20091002
Content extraction via tag ratios