VocBench Tips & Tricks
This page reports on solutions for resolving specific needs, going further than the user manual. While the user manual plainly explains the various features of VocBench, here we go bottom-up: starting from a need, we describe how this need can be covered in VocBench through its features (possibly, through a combination of them).
Dataset Maintenance with Externally Provided Information
We are regularly adding language labels and translated the descriptors of the source vocabulary (in English only) into other languages. These updates come with new descriptors, and a few redirections, term rejections.
What strategy should we follow with VocBench?. Do we manually enter the term network for the newly added terms? or do we have any other alternate mechanical process to merge newly added terms into the VocBench?
I guess the most important questions are:
- Which sort of policy you adopt for maintaining your vocabulary, some questions:
- do you have *also* people maintaining on it natively in VB or is it only gathering new addenda from outside?
- In the former case, are you using validation?
- In the latter case, you always get a completely new version of the vocabulary or only the addenda? ‘cause from your sentence it seems you are getting a new version of the vocabulary entirely
- do you have *also* people maintaining on it natively in VB or is it only gathering new addenda from outside?
- How big are the smaller parts of that new data you ingest? E.g. the rejections, redirections…
- How are you getting the new data? is that “source” format you described in point 1? Can you easily make a spreadsheet out of it?
Based on the above (and possibly other) questions it is possible to lay down a strategy. However, to give some possible answers in advance:
In case of 1.a.i:
- VB allows for the proposal of terms and for the validation (accept, reject) of those terms. If the rejections are few, you could process rejections manually. If they are applied to terms under validation, you could use the validation interface, if by rejection you instead mean deletion of older terms, then you can apply the deletion. In both cases, if the rejection/deletion actions are many, you can use the Web API of VocBench: http://semanticturkey.uniroma2.it/doc/user/first_access.jsf and http://semanticturkey.uniroma2.it/doc/user/web_api.jsf
In case of 1.a.ii
- If you are not using validation (and not even history), and if you get the entire vocabulary each time, I have a question: is there any specific role of VocBench into the matter since its first application for translating your languages?
- If the role has only been of adding the translation at the start but there’s no maintenance, why not adding the translations you produced (those in Hindi and Bengali) to your source format and then reconvert everything to RDF instead of working as a delta addendum?
- If VB is only used for browsing/publication, why not considering ShowVoc (http://showvoc.uniroma2.it/ ), which is a companion to VB3 focused on managing metadata (thus acting as a metadata registry/dataset portal) and browsing?
- About the loading of the data, while I suppose you already developed your conversion system, there are also solutions within VB3 (see below)
Concerning how to ingest the data, and besides the considerations above (which are, in short, if there’s no maintenance being done in VB3 and it comes from external sources only and if there’s no history being kept in VB, then no need to work on the delta and instead reconvert everything from scratch each time), there are several solutions in VB3:
- If the data can be represented as a spreadsheet (or, from the forthcoming version 11.1 of VB3, if it is on a DB), you can use Sheet2RDF: It has a UI within VocBench (http://art.uniroma2.it/sheet2rdf/documentation/vb_tool/) and allows for the conversion of any tabular data to RDF. Pls notice that you can use it also for deletions as sheet2rdf allows also for data deletion (obviously, this if you don’t opt neither for massive reload from scratch of the vocabulary nor for manual rejections/deletions).
- VocBench has several possibilities for data import: http://vocbench.uniroma2.it/doc/user/global_data_management.jsf which include transformations of RDF data and lifting from other formats (http://semanticturkey.uniroma2.it/doc/sys/rdf_lifter.jsf). You can even develop your own lifter as a plugin for VocBench.
Story: Multiple scheme and hierarchy management
I would like to manage multiple schemes within a same SKOS dataset. The schemes are managed by different groups/organizations and I would even like to have different hierarchies per each scheme.
This is indeed a complex story, for which we have created a dedicated page.