Connecting the best of both worlds: ontologies and vocabularies in metaphactory

Semantic Knowledge Modeling

Johannes Trame

·

·

Reading time: 6 - 12 minutes

Connecting the best of both worlds: ontologies and vocabularies in metaphactory

The terms "ontology" and "vocabulary" are often used interchangeably. However, more often than not, this leads to confusion among customers who want to semantically model their domain and results in questions about whether there is in fact a distinction between the two and whether both are needed to implement a knowledge graph.

The meta-layers that these terms describe have been captured by different standards (OWL and SKOS respectively) and we at metaphacts believe that there is value in treating both as individual but complementary assets in their own right.

Let's first look at how we define these two concepts:

Ontologies are semantic data models that define the types of entities that exist in a domain and the properties that can be used to describe them. An ontology combines a representation, formal naming, and definition of the elements (such as classes, attributes, and relations) that define the domain of discourse. You may think of it as the logical graph model that defines what types (sets) of entities exist, their shared attributes, and logical relations. For ontologies, metaphactory builds upon OWL and SHACL as open standards. You can learn more about ontology modeling in metaphactory in this blog post »

Vocabularies are controlled term collections organized in concept schemes that support knowledge graph experts, domain experts, and business users in capturing business-relevant terminology, i.e., synonyms, hyponyms, spelling, and language variations. A term could include preferred and alternative labels (synonyms) in multiple languages and carries natural language definitions. Terms can be related to each other, i.e., a term is equal, broader/narrower (hyponyms), or defined as loosely related. The most common examples of different types of vocabularies are thesauri, taxonomies, terminologies, glossaries, classification schemes, and subject headings, which can be managed within metaphactory using SKOS as an open standard. You can learn more about vocabulary management in metaphactory in this blog post »

For example, an ontology may define that agents have roles in an organization. The particular agents (i.e., a specific person) or the kind of roles are considered instance data of the graph, whereas part of this instance data like role kinds (software engineer, software architect, etc.) might be managed as controlled vocabularies. As recently described by Katariina Kari from IKEA, you may look at this as the three layers of the knowledge graph, where the vocabularies are the layer between the ontology and instance data (c.f. also graphic below).

From these definitions it becomes clear that ontologies and vocabularies serve different purposes: Ontologies help to build, maintain, and query the knowledge graph on a logical level. Vocabularies support organizing, annotating, and querying the knowledge graph on the terminological level without requiring changes to the logical model.

In addition, these artifacts also play an important role in terms of governance and expertise required. metaphactory's visual ontology interface allows domain experts and business users to contribute to building the logical model, although ontologies may still be owned and maintained by ontology and knowledge graph engineers. On the other hand, vocabularies capture business-relevant terms and require the knowledge of users who are experts in their domain and can help create a shared terminology. Once the vocabulary is created and its basic structure and design rationale is defined, it can be owned and maintained by subject matter experts, e.g., in the role of data stewards.

metaphactory knowledge graph approach

metaphactory knowledge graph approach - Layering of open W3C semantic knowledge graph standards as utilized and applied by metaphactory (based on ideas from the semantic layer cake)

RDF: Encode everything in the RDF 1.1 graph model stored in any RDF 1.1 & SPARQL 1.1 compliant database for querying. Built up on central Web and XML standards like URI & namespaces, it supports several open & standardized serialization formats for exchange without vendor lock-in.

OWL: Use core elements of OWL to define which classes (in a class hierarchy), attributes & relations exist. Additional layering through external tooling is possible (e.g., Protégé): Open world, Description Logic for consistency checking of complex models or DL inference (e.g., automated classification).

SHACL: Constrain how instances of classes relate to each other, which kind of attributes are allowed incl. xsd datatypes & cardinality. metaphactory integrates with databases / SHACL engines to assert data quality & completeness (closed world).
Additionally, metaphactory uses the SHACL interpretation semantics to drive application logic (e.g., generate authoring forms with built-in auto-suggestions, offer UI elements & runtime validation)

SKOS: Define concepts as controlled terms (categorical names, synonyms, identifiers & description in various languages), helping to annotate, structure & query data vertically on the instance level. Terms are instances themselves, i.e., they are instance of skos:Concept. They are not part of the logical data model and as such, for example, do not change the validation logic.

However, these assets do not need to live on their own or be managed in silos, but they complement each other and provide additional context for end users.

Already with version 4.4, metaphactory allowed customers to create truly interconnected knowledge graphs by supporting the management of ontologies, vocabularies, and dataset descriptions together with actual instance data. Released in April 2022, metaphactory 4.5 delivered an even tighter integration between ontologies and vocabularies by allowing knowledge graph engineers to link classes in the ontology editor to controlled vocabularies.

In this blog post, we will:

  • Demonstrate how interlinking ontologies and vocabularies works in practice by looking at an example where we connect a class from the Nobel Prize ontology to a controlled vocabulary.
  • Build an end-user search interface using metaphactory’s interactive wizards and demonstrate how the vocabulary is reflected in a hierarchical structure in the search facets for more intuitive search and exploration. The hierarchical facets were released in July 2022 with metaphactory 4.6.

Interlinking ontologies and vocabularies

The Nobel Prize Dataset1 is a public dataset available as a Semantic Knowledge Graph, i.e., it is published in RDF through a public SPARQL endpoint and described by an OWL ontology (see RDF/XML version). The ontology and dataset include information about all Laureates (Persons, Organizations) who have received a Nobel Prize Laureate Award in a certain Nobel Prize category, or a share thereof, ever since the inception of the Nobel Prize. We augmented the Nobel Prize ontology with SHACL shapes to also model relevant constraints such as cardinalities and range constraints including class and data type assertions, as depicted in the ontology diagram below:

Nobel Prize ontology in metaphactory

When it comes to vocabularies, the situation is a bit more complex as there is no single, standard, or best practice on how ontologies can be connected to vocabularies. Most of the time, we see the so-called semi-formal hybrid modeling pattern.

For example, the human readable documentation of the Nobel Prize ontology has an explicit section called "Vocabularies". However, in the original OWL version of the Nobel Prize ontology, categories are declared and enumerated within the ontology itself as OWL individuals and they are only implicility classified as skos:Concept. As a consequence, there is no split between ontology and vocabulary maintenance.

This introduces several limitations:

  • Making changes to the categories (e.g., introducing a new category), requires changes to the OWL ontology as every category is declared and enumerated individually within the ontology rather than decentralized in a dedicated, linked vocabulary.
  • The OWL restriction cannot be easily used for data validation through standard data quality engines.

To overcome these limitations, we have introduced in metaphactory 4.5 the ability to link the ontology to a vocabulary through a SHACL "restriction on a class" (technically, we apply a restriction to all relations leading to that class) - in this case the Laureate Award class. Users (ontology engineers or domain experts) can apply this restriction using metaphactory's visual ontology editor as demonstrated in the screenshot below:

Applying a vocabulary restriction on a class in metaphactory

A class can be either restricted to an entire vocabulary (SKOS concept scheme), a collection (SKOS collection), or a particular sub-tree within a vocabulary.

This delivers several benefits:

  • Stakeholder communication and documentation: Vocabulary restricted classes are visually depicted and easy to identify. As such, it becomes clear from the ontology diagram which classes of the ontology will be populated by controlled terminology and thus maintained by domain experts/data stewards through the vocabulary management functionality.
  • Separation of concerns: The linkage allows to formally specify that certain vocabularies (or top-level concepts or collections thereof) are expected, while still enabling to separate vocabulary management from the management of the ontology since individual terms are no longer maintained or enumerated as part of the ontology itself.
  • Model-driven application building: The restrictions can be automatically exploited, for example, to instruct auto-suggestions in semantic instance authoring forms, runtime validation of user interactions, or to set up search configurations including hierarchical facets (see following section).
  • Data Quality Assurance: Since the restriction is directly encoded in a SHACL property shape, it can be executed by any SHACL standard compliant data quality engine/database. For example, a data quality engine would immediately detect if terms that are not part of a vocabulary and/or defined as skos:Concept are referenced in the dataset (for example, created by an ETL process).

Building model-driven end-user interfaces

Let's now look at two examples on how the knowledge captured in the model - namely the linkage from the ontology to the vocabulary - drives application logic in metaphactory. We can use metaphactory's intuitive wizard to build a search interface that picks up the vocabulary restriction we defined before. First, we'll select the type of search we want to have, in this case metaphactory's keyword-type search, and we enable faceted filtering:

metaphactory wizard for building a search interface - Search type selection

Then, we'll select the classes from the ontology that we want to include, in this case Nobel Prize and Laureate.

metaphactory wizard for building a search interface - Classes selection

Finally, we'll select the relevant attributes and relations from the ontology - in this case all:

metaphactory wizard for building a search interface - Attributes selection

The wizard will generate a configuration code as well as a live preview. The live preview - see screenshot below - allows us to test the resulting search and refine the configuration by going back in the wizard steps, before finally adding it to the application page.

metaphactory wizard for building a search interface - Result preview

As we can see in the screenshot above, the faceted filter for Category now has a hierarchical structure, as defined in the vocabulary. By exposing this hierarchical vocabulary structure to our search filter, we offer a new level of abstraction to our search and allow end users to filter by broader terms (hypernyms) which automatically groups the narrower terms together. This helps end users to formulate and iteratively refine their information needs. The beauty lies in the fact that recall and precision of searches through changes in the vocabularies, i.e., the labels, synonyms, and relations to other Nobel Prize category terms, can be used for indexing and query expansion, without touching the actual instance data (e.g., every individual Nobel Prize) and, as such, information retrieval tasks can greatly benefit from the decentralized management of terminology.

Similarly, this can benefit annotation use cases in the context of model-driven instance authoring. For example, a semantic-form with reference to the class Laureate Award is configured through this simple HTML5 markup and allows end users to populate the Laureate Award class with instances from the vocabulary only, by presenting the user with a hierarchical tree selector driven by and limited to the structure of the vocabulary:

<semantic-form for-class="http://data.nobelprize.org/terms/LaureateAward" new-subject-template="http://example.com/records/{{UUID}}">
</semantic-form>

Alternatively, we can also use the form wizard to visually select the class from the ontology (i.e., without having to copy or remember the class identifier) and bootstrap the initial template code for incremental configuration and styling:

metaphactory wizard for building a form - Result preview

To learn more about model-driven semantic forms in metaphactory, have a look at this blog post »

Side Note: While the interlinking of ontologies with vocabularies focuses particularly on leveraging hierarchical structures in terminology, it is of course also possible to encode, expose, and exploit other hierarchical relationships other than broader/narrower relations. Common use cases are place hierarchies or part of relations, which are not necessarily well captured through broader/narrower relations in SKOS vocabularies and are not maintained as terminology. This can be done by adjusting the configuration, both in search and in forms:

Exploiting other hierarchical relationships in the user interface

This sounds cool! How can I try it myself?

To test metaphactory's semantic modeling capabilities, you can get started with metaphactory today using our 14-day free trial. If you'd like to reproduce the examples in this blog post, you can load the Nobel Prize dataset into your metaphactory instance by pasting the code snippet below into the SPARQL tab.

LOAD<https://metaphacts-datasets.s3.amazonaws.com/nobel-prize-bundle.trig.gz>;
LOAD <http://archivo.dbpedia.org/download?o=http%3A//dbpedia.org/ontology/&f=owl> INTO GRAPH <http://archivo.dbpedia.org/download?o=http%3A//dbpedia.org/ontology/&f=owl>;
LOAD<https://xmlns.com/foaf/spec/index.rdf> INTO GRAPH <https://xmlns.com/foaf/spec/index.rdf>;

Footnotes
1This public dataset is licensed under Creative Commons Zero (CC0) and free to use as stated by the Terms & Conditions of the Nobel Prize website.

Irina Schmidt

Irina is an international marketing and communications expert with over 10 years of experience in the areas of product marketing, online and digital marketing, public relations and customer success. She loves working at the crossroads where technology and business meet and is passionate about targeted marketing solutions that resonate with customers and solve real-world problems.

Johannes Trame

As Product Manager at metaphacts, Johannes collaborates with customers and the professional services team to identify customer needs, he articulates product features that successfully address these needs, and he works with the software engineering team to turn the product vision into reality.