Managing Multiple Schemes and Multiple Hierarchies

This page describe how to manage multiple schemes and, possibly, mulltiple, different, hierarchies, when different groups/organizations are working on the same SKOS dataset,

A good reading before going ahead is provided by this page on SKOS development.

In large efforts for developing reference dataset for particular domains, it might happen that different organizations want to collaborate on a common ground. The reason is clear: instead of multiplying efforts, different actors put their strenghts together for maximizing the result and for providing a single reference resource.

There are quite a few cases. Just to mention a few in the domain of agriculture:

The Global Agricultural Concept Space (GACS) was a hub for concepts related to agriculture, in multiple languages, for use in Linked Data, putting together the three most relevant thesauri in the world about this domain: Agrovoc , NALT and CABI. The project is no longer active, but its outcome is still providing an alignment backbone for the three datasets
In Agrovoc, the landvoc scheme is related to land information, being maintained by staff from Land Portal. More recently, other subschemes managed by other groups have been added (ASFA, faolex, IndigenousPeople and ONE CGIAR)

While the first case is a kind of its own, started as a coordinated aligned backbone and meant to progress as a single unified resource, in the two other cases the approach is identical: delegate to other departments or other organizations the management of parts of a thesaurus.

In this case, a few features from group management come handy.

Group Management and Multiple Schemes

In order to have "happy flatmates" coliving within the same SKOS dataset, it is useful to adopt good, agreed, policies on the one side, and to enforce restrictions on the other through the sytem. Luckily, VocBench comes of help for such case.

Simply, it is possible to define "groups" in VocBench and to associate specific behaviors in each project. So, to clarify, groups are persistent in VocBench at the system level (once defined, they do not need to be redefined for each project) but must be set for specific behaviors project by project. The main objectives of this feature are:

define, at the mere level of identity, the membership of users with respect to certain groups of editing
restrict user authorizations on concepts belonging to certain groups (more specifically, belonging to schemes owned by certain groups)

The group management page (and other related pages) provides all the information fron the technical perspective.

Multiple, Scheme-specific, Hierarchies

One known limitation of SKOS (mainly due to limitations of RDF, before the advent of RDF* at least) is that while certain information can be scoped to schemes (e.g. the membership itself of a concept to a scheme) other information cannot. For instance, relationships such as those based on skos:broader/narrower cannot be scoped to any scheme and only asserted in general. The reason is that, as the relation involves two concepts (thus filling a triple with the concepts as subject and object and the skos:broader/narrower as predicate), there is no way (unless the triple is refied or by using other tricks, such as micrographs) to tell something about it, such as its scoping to a scheme.

How to solve this situation? well, in standard SKOS (and RDF) this is not possible. However, one solution, supported by VocBench with dedicated features, is to define scheme-specific properties representing the broader/narrower relations.

So, suppose there are two schemes: :schemeA and :schemeB, this solution foresees the creation of two properties, :broaderA and :broaderB. Each of them represents the broader relation in its associated scheme and can be used to bind two concepts into a broader relation only in the context of a particular scheme. Without lack of generality, we consider the case of two schemes (and will refer to two schemes in the following text) even though identical considerations apply for multiple schemes.

Please notice that this escamotage is necessary only when both the two concepts involved in the broader relation belong to both the schemes A and B. This is because if one of the two concepts does not belong to one scheme, it will be in any case filtered out from the hierarchy of the missing scheme.

So, what is the support provided by VocBench?

configurable visualization of the concept tree. It is possible to choose that only certain subproperties of skos:broader will drive the painting of the tree. So if, say, :broaderA is chosen for representing the hierarchy, then the user will only see the tree built upon :broaderA and not the hierarchical relations bound through :broaderB.
configurable assertion of the broader relation. It is possible to choose the property that will be prompted by default for asserting broader relations. Notice that this is a default, which can be changed by the user case by case, without changing the default

A further setting (Projects-->Project-Groups management in the UI of the administrstor) consents administrators and project managers to set default values for the above choices for each user and for groups (and thus users belonging to these groups).

Is this really necessary?

As we all know from spidey's uncle (and recently even an auntie..), "with great power comes great responsibility". This feature is a sort of hack with respect to the common way of handling SKOS, introduces some complexity (see the next paragraph on how to implement the procedure) and..yes, it can be used, but users should be really aware of how and when it is used.

There is indeed a common misconception for which it is quite a common case to desire multiple, different, hierarchical relations. Indeed, cases for which different hierarchies are desired in different schemes should really amount to a minor number of cases, specifically these three cases:

simple disagreement on whether c2 is broader than c1, with broader intended in a common sense in both schemes
case of two concepts belonging to two different schemes and the two schemes have very specific and different semantics of broader/narrower.
merely structural organization of the tree

Now, let's examine them case by case:

Case 1: disagreement.

We really encourage to resolve disagreements. "agreeing to disagree" on a same relation between two concepts is - very possibly - like suggesting that the semantics of the concepts themselves are not clear, which is like telling that a semantic resource is failing in being semantic..not really a good message! This case is usually not a sign of an healthy management and giving much freedom is only resulting in making more mess in your.

Case 2: different semantics of broader/narrower.

well, SKOS is a very shallow model, and the broader/narrower have been explicitly said to allow for different interpretations in different schemes.
We make an example here that can show how different semantics can be imbued in the same properties: if you sell cameras, you may decide that a lens is something you want to show under a camera and use a scheme to represent this recommendation tree. Lens is not narrower than camera (it is part-of it). So it is narrower intended as "part-of" ?. Not even that, narrower here means "if you bought a camera, you might be interested in buying a lens". Now, if the camera and lens concepts were to be shown on another scheme with broader intended in the more common case of "is-a", then "lens" is not narrower than "camera" because lenses are not cameras!
However, in two very general schemes, is there really a different intension of the broader/narrower relation? Usually not.

Case 3: Structural Organization.

As of what has been already described in case 2, we remark that the hierarchy in SKOS (differently from, say, OWL, where the objective is not to represent a hierarchy; on the contrary, the hierarchy is only a convenient view to observe the classifications)

So, these cases are merely emerging due to the "artificial nature" of the hierarchy in SKOS and the fact that skos:broader/narrower are not transitive. For instance, if you have:

c1 skos:broader c2 .
c2 skos:broader c3 .

in schemeA, with c1, c2, c3 members of A

then you want to have, in schemeB, only:

c1 skos:broader c3 .

because c2 doesn't belong to schemeB (and thus you need to "wire up" c1 and c3)

and here is the limitation: since the skos:broader relation is not scheme-local, you would end up with c1 being both directly under c2 and under c3 in schemeA. That's where a specific broader property, local to a scheme, would be desired if modeling the two schemes together.

So, the structural issue when some concept in the middle is missing from one of the schemes is a real case that can always happen, even though it might be uncommon as well.

Pls notice that if you attempt at skos:broader in the first, things such as the unwanted triple "c1 skos:broader c3" in schemeA in the previous example would be detected by the ICV for "redundancy in the hierarchy". So a maintainer of scheme A could easily detect these cases and say: "well, thanks my scheme-B friends, but we decide to split for that relationship as C2 is missing from your tree but not from ours and so we do not need the relationship wiring up c1 with c3", then switching: c1 skos:broader c3 to c1 :broaderB c3

Choosing and Implementing the Policies

So, given what VB3 provides as a support, and these modeling tricks, how should user implement the policies for managing multiple thesauri within a same dataset?

There are multiple possibilities, and all of these need to be supported by policies, while not being bound to further characteristics of VB3.

We can see, basically, two main scenarios:

use an "on our way" independent approach: whenever a concept is added to one or more schemes, it needs to be attached somewhere so, for each scheme, the owner should say if it is ok to attach it to the chosen concept or to choose another concept. The scheme-specific property will be used in any case for each of them
- in this approach, skos:broader is not used and the scheme-specific properties are usually subproperties of skos:broader.
the "happy flatmates" approach: by default, when a concept is attached under another concept, a skos:broader is adopted (being thus valid for all schemes). As we know, the skos:broader is not important until both concepts are in the same scheme(s) so, for each of the schemes where both concepts are involved, the relation is vetted by the scheme owner and approved. If approved, no commuication is necessary. Otherwise, the skos:related needs to be splitted in its multiple scheme-specific properties
- in this approach, the scheme-specific properties are usually super properties of skos:broader. This way, a user can set their view on their scheme-specific property and then see the tree built on both skos:broader and their scheme-specific property (this is because VocBench always looks on subProperties of the one built for representing the tree
- please notice that the operation of splitting could be conveniently done with a stored SPARQL update, which would first look at all the schemes a certain concept is in, and then for these schemes use the associated scheme-specific property in place of skos:broader

Publication

Known Limitations Upon Publication

This solution is long researched by institutions that, while relying on linked data technologies and models for representing their data, have no intention to publish their resources as Linked Open Datasets. The reason is that if the same IRI scheme is kept, there will be a single IRI for reprersenting the same concept and if this concept is connected to others through different, contrasting, relationships, then there will be only one loci for representing the hierarchical information and a single representation must be chosen. So, approaches in this case include:

a predominant vision: the concept represented at its IRI represents the broader/narrower relationships related to the main scheme. Other schemes (with their own different hierarchy) are published in other ways (e.g. through dataset browsers such as ShowVoc and Skosmos) but their specific hierarchical information cannot be represented on the main scheme
IRI differentiation upon publication: the schemes are actually different datasets being published separatedly, with different URI spaces for their same concepts. Concepts with the same meaning are actually maintained together - by using a same IRI - in order to join efforts towards a common goal but, upon publication, these are multiplied on each specific dataset URI space. E.g. http://this.is.an/example/c_123 is exported as http://the.first.dataset/c_123 and as: http://the.second.dataset/c_123. To the purpose of publication as linked open data, since these become different concepts each with their own IRI, they also have separated LOD descriptors.
keeping scheme-local broader/narrower: this solution loses on compliancy with the standard...or, better, it depends on the position of the specifci broader properties, i.e. if these are rdfs:subClassOf skos:broader/narrower, technically each broader/narrower relation will be a child of skos:related, thus implying it as well; unfortunately many SKOS consumer applications won't be smart enough to digest this trivial inference. Nonetheless, it has the advantage of allowing for different hierarchies in the same description.

How to Publish

Here VocBench comes again of help for supporting the work of the users.

In Global Data Management-->Export Data, there is a dedicated transformer for normalizing some properties into others. This can be simpy invoked to transform scheme-specific properties into the standard skos:broader for producing the output datasets.