Research – Paper 237

Entity Comparison in RDF Graphs

Alina Petrova, Evgeny Sherkhonov, Bernardo Cuenca Grau and Ian Horrocks

Research

clock_eventOctober 23, 2017, 11:00.
house Lehár 4
download Download paper (preprint)

Abstract

In many applications, there is an increasing need for the new types of RDF data analysis that are not covered by standard reasoning tasks such as SPARQL query answering. One such important analysis task is entity comparison, i.e., determining what are similarities and differences between two given entities in an RDF graph. For instance, in an RDF graph about drugs, we may want to compare Metamizole and Ibuprofen and automatically find out that they are similar in that they are both analgesics but, in contrast to Metamizole, Ibuprofen also has a considerable anti-inflammatory effect. Entity comparison is a widely used functionality available in many information systems, such as universities or product comparison websites. However, comparison is typically domain-specific and depends on a fixed set of aspects to compare. In this paper, we propose a formal framework for domain-independent entity comparison over RDF graphs. We model similarities and differences between entities as SPARQL queries satisfying certain additional properties, and propose algorithms for computing them.

6
Leave a Reply (Click here to read the code of conduct)

avatar
3 Comment threads
3 Thread replies
1 Followers
 
Most reacted comment
Hottest comment thread
4 Comment authors
AlinaHéctorArtem RevenkoErnesto Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Héctor
Guest
Héctor

Though I find the approach quite well defined and potentially useful, I worry about its scalability. How well would it work to find interesting commonalities/differences in a pool of millions of entities described using a model containing tens of thousands of properties?

Alina
Guest
Alina

Hi Héctor, thanks much for the comment! Indeed, we are currently working on scalable algorithms for both (most specific) similarities and (most general) differences. 1) Despite the complexity of finding a difference query being quite high, it stems from the presence of blank nodes. In real-world scenario we would never hit the worst case. 2) In addition, in a reasonable scenario the size/depth of the query is bounded by some small value (due to readability), in which case similarity and difference computation becomes scalable.

Artem Revenko
Guest
Artem Revenko

Very interesting to compare to this approach: https://link.springer.com/chapter/10.1007/978-3-319-60438-1_61

Alina
Guest
Alina

Thanks much for the reference, Artem!

Ernesto
Guest
Ernesto

I see a potential application in (traditional) instance matching where one of the task is to find equivalent entities.

Alina
Guest
Alina

Hi Ernesto, thank you for the suggestion! Indeed, the framework could be used for equivalent and near-equivalent instance matching and discovery.