Demo – Paper 605
Data scientists, journalists, analysts and academics struggle to find data sources, prepare analyses, collaborate with peers, present their results in context, and find an audience for their conclusions. The data.world project aims to create a semantic-based publication platform for datasets, scalable to hundreds of thousands of heterogeneous users and millions of distinct datasets. With such broad horizontal scale comes a highly varied population of datasets, ranging from multi-kilobyte Excel spreadsheets to multi-gigabyte RDF graphs, and of users, ranging in technological sophistication from journalists and students to highly skilled data scientists. The data.world project leverages semantic web technologies including CSVW, HDT, and dataset metadata ontologies to automate the ingestion, discovery, presentation and linkage of both tabular and native graph datasets. Cloud-based technologies are used to elastically scale the back-end graph storage and query fabric to meet availability and reliability targets with acceptable response latency. This abstract provides an overview of a technical demonstration of data.world, through which data.world’s functionality and architecture shall be explained, in addition to the challenges of creating a system of such broad horizontal scale and the role that semantic web technologies have played in powering its development and adoption.