Any suggestions? Active schema support for structuring web information

More Info
expand_more

Abstract

Backed up by major Web players schema.org is the latest broad initiative for structuring Web information. Unfortunately, a representative analysis on a corpus of 733 million Web documents shows that, a year after its introduction, only 1.56% of documents featured any schema.org annotations. A probable reason is that providing annotations is quite tiresome, hindering wide-spread adoption. Here even state-of-the-art tools like Google’s Structured Data Markup Helper offer only limited support. In this paper we propose SASS, a system for automatically finding high quality schema suggestions for page content, to ease the annotation process. SASS intelligently blends supervised machine learning techniques with simple user feedback. Moreover, additional support features for binding attributes to values even further reduces the necessary effort. We show that SASS is superior to current tools for schema.org annotations.