Poster – Paper 506

WInte.r - A Web Data Integration Framework

Oliver Lehmberg, Christian Bizer and Alexander Brinkmann

Poster

clock_event October 23, 2017, Poster and Demo Reception, 18:30-21:20
house Festsaal 1
download Download paper (preprint)

Abstract

The Web provides a plethora of structured data, such as semantic annotations in web pages, data from HTML tables, datasets from open data portals, or linked data from the Linked Open Data Cloud. For many use cases, it is necessary to integrate such web data with existing local datasets. This integration entails schema matching, identity resolution, as well as data fusion. As an alternative to using a combination of partial or ad hoc solutions, this poster presents the Web Data Integration Framework (WInte.r), which supports end-to-end data integration by providing algorithms and building blocks for data pre-processing, schema matching, and identity resolution, as well as data fusion. While being fully usable out-of-the box, the framework is highly customisable and allows for the composition of sophisticated integration architectures such as T2K Match, which is used to match millions of web tables against DBpedia. A second use case for which WInte.r was employed is the task of stitching (combining) web tables from the same web site into larger tables as a preprocessing step before matching. The WInte.r framework is written in Java and is available as open source under the Apache 2.0 license.