VocBench Logo

Home Documentation Downloads Support About us

Global Data Management

The Global Data Management menu provides functionalities for overall management of the data in the project.

export_repository

Load Data

Please notice that this functionality is meant for loading the data that has to be maintained within the project (e.g. load the latest distribution of the Eurovoc dataset in order to edit Eurovoc within VocBench). If the intent is to owl:import an ontology, in order to create a knowledge base based on it, or to create another ontology extending its model, then the Import functionality in the "Metadata Management" section should be used.

Load data

VocBench can load data from a variety of sources such as files, triple stores and custom sources. The desired type of source can be chosen using the combobox labeled "Load from". Depending on this choice, the user shall then configure:

The baseuri field is usually not mandatory as the baseuri is generally provided by the data being loaded. The value for the baseuri is used only when the loaded content includes local references (e.g. #Person) and no baseuri has been specified. Formats such as NTRIPLES, which always contain fully specified URIs, never need this optional parameter, and in cases local references are possible (such as in RDFXML), usually the baseuri is provided inside the file.

The data that has been loaded and (if necessary) lifted to RDF can be further processed before it is written in the working graph of the current project. The transformation of the data can be controlled by specifying a sequence of RDF Transformers (more details provided later in the Export Data section). Transformers can operate destructively on a copy of the data being loaded, which are first loaded in a temporary, in-memory repository. Otherwise, this temporary copy is avoided, and the data is directly fed to the project repository.

The "Resolve transitive imports from :" combo box, shows the following possible values, instructing VocBench on how to import vocabularies specified on transitive dependencies, that is, vocabularies that are owl:imported by the loaded data, or by other vocabularies in turn imported by it.

Note that the content will be loaded inside the project's working graph, so it is possible to modify it. This is the main difference with respect to the import options on the Import Panel that allow users to import existing ontologies as read-only data. Conversely, the Load RDF option is typically used to reimport data which has been previously backed up from an old repository, or for data exchange among different users.

The flexibility of the "load data" mechanism clearly allows for complex procedures to be defined. To ease the (consistent) reuse of these procedures, VocBench supports saving (and then loading them) on different scopes (system, project, user, project-user), depending on the desired level of sharing. To that end, it is sufficient to use the "flooppy disc" buttons inside the header of the window.

Save overall load configuration

It is worth to notice that the configuration of individual components (i.e. loaders, lifters, RDF transformers) are not included inside the configuration of the overall process, but they are only included by reference. In case one these sub-configurations was not saved independently, a dialog like the one below is shown:

Reference to an unsaved configuration

To save and load the configuration of individual components, it is possible to use analogous save/load buttons associated with each configurable component.

Loading Large Amounts of Data on Projects Requiring Validation

When dealing with big datasets in projects with validation enabled, loading the initial data might result in a very slow process, this is because each single triple present in the repository is copied first in the support repository (in its refied form) and in the validation graph of the core repository, and then it needs to be validated, causing another heavy operation for being finalized.

As a solution, authorized users can tick the "Implicitly validate loaded data" option (which appears only in projects requiring validation, and only for authorized users) which, as the options says, skips the validation process and copies the data directly in the repository (if history is enabled though, a copy in the history will be made in any case, still requiring less time than a copy in the validation, and not requesting validators to perform the heavy validation operation later).

loading_under_validation

Export Data

After selecting the Export Data option, a window like the one in figure below will be shown. The window is organized into three sections, which correspond with the selection of the graphs to export, an optional pipeline of export transformations and a deployment specification. Additionally, a checkbox near the button "Export" can be used to include the inferred statements in the export.

Export data

The section, "Graphs to export" lists all the named graphs available in the dataset of the project, so that the user can decide whether to export the sole working data or other information. Note that the provenance information of the graph is maintained depending on the configured deployment specification: serializing the data in a file conforming to quad-oriented formats (e.g. N-Quads, TriX, etc..) allows to keep track of the context each statement belong to, while triple-formats (N-Triple, TriG..) will have all the data merged in the same triple set.

The middle part of the export page concerns the possibility to use RDF transformers for altering the content to be exported according to user preferences. Export transformers range from very specific transformations (e.g. transform all SKOS-XL reified labels into plain SKOS core ones) through user-customizable ones (e.g. DeletePropertyValue, allowing the user to specify a property and a value that should be removed from all resources in the dataset, adopted usually in order to remove some editorial data that is not desired to appear on the published dataset, but repurposable for any need) to completely specifiable transformers, such as the SPARQL Update Export Transformer, that allows the user to completely change the content according to user-defined SPARQL updates. The warning sign near the SPARQL RDF Transformer in the figure above indicates that the user has not already provided the required configuration.

Note that when transformers are adopted, all the content to be exported is copied to a temporary in-memory repository, which can thus be altered destructively by the transformers without corrupting the original data. The export process is optimized in case no transformer has been selected: in this case, no temporary repository is generated and the data is directly dumped from the original dataset.

We have provided a dedicated page for describing useful (and thus reusable) configurations of the RDF Transformers.

The last section, "Deployment", allows to configure the destination of the exported data. Available options are:

The first option, "save to file", serializes the exported data as a sequence of bytes that are returned to the browser so that they can be downloaded in a file within the filesystem of the user. The serialization is controlled by the selection of a reformatter, which may optionally require a configuration. Moreover, a combobox labeled "Export Format" allows to select the specific serialization format among the ones supported by the chosen reformatter. In the figure above, it has been chosen to format the data according to the RDF/XML syntax.

The second option, "Deploy to a triple store", uses a deployer supporting destinations that (broadly speaking) are a triple store. The figure below shows the use of the deployer implementing the SPARQL 1.1 Graph Store HTTP Protocol to save the data to a graph of a complaint (remote) triple store.

Deploy data to a triple store

The third option, "Use custom deployer", uses deployers for byte oriented destinations: consequently, the user shall also select a reformatter, which translates the RDF data to the actual sequence of bytes that will be deployed. The figure below shows the possibility to deploy the data to an SFTP server after that they have been converted to the Zthes format.

Deploy data to a triple store

The mechanisms that have been described so far allows for the definition of very complex export procedures. The "floppy disc" buttons inside the header of the window allow to save and load the configuration of a complete export procedures. Configurations are identified by a name, and they can be saved to different scopes (system wide, project, user, project-user), to support different sharing options (e.g. a configuration may be made available to every user of a project, or saved privately by a user).

Save the overall export configuration

The saved export configuration does not include the configurations of the components it uses (i.e. RDF transformers, reformatters and deployers), rather it includes references to these configurations, which shall be saved independently. The figure below shows an error occurred while saving an export configuration that depends on an unsaved configuration of a deployer.

Reference to an unsaved configuration

To save and load the configuration of individual components, it is possible to use analogous save/load buttons associated with each configurable component.

Clear Data

Through this action, the project repository will be completely cleared.

Needless to say, pay attention to this action because it erases all information in the project (we recommend to save the existing data before clearing it).

Versioning

VocBench allows authorized users to create time-stamped data dumps of the dataset, that can later be inspected through the same project. Each versioned dump is stored to a separate repository, which is prevented from being written by the application.

version dump menu

The Dump menu allows the user to create a new version of the dataset, either by following conventioned coordinates for the creation of the repository for the new version, or by manually specifying them. The following figure shows the simple dialog for dumping a new version of the dataset, requesting only an ID (a tag) for the version.

load_repository

The following figure shows instead the repository configuration in case of a dump to a custom location established by the user. The configuration panel is very similar to the one for creating the main dataset repository when initializing the project. Note that all the information related to the custom dump will be retained by the project, so it will be always possible to access this custom location without having to note down its coordinates/configuration.

load_repository

After a new version has been dumpted, it will be listed on the available versions. The "Switch to" button located on the topright corner allows the user to temporarily switch to this version and inspect its content. Everything in VB will now be localized to this dumped version, except that it will not be possible to write on it. The Delete button allows the authorized user to delete the selected version.

load_repository

Data Refactor

The Data Refactor page allows the administrator or project manager (or equivalently authrozied user) to perform massive refactoring of the loaded data. Note that this refactoring is usually performed at the beginning of the life of a project, usually after some data has been loaded from an external file. This is because the data is non-conformant to the specifications of the project (e.g. the dataset contains SKOS core labels while the project is thought for managing SKOS-XL lexicalizations) and might need to be refactored in order to be properly managed with the intended project settings and configuration.

load_repository

Current refactoring options include going back and forth from SKOS to SKOS-XL and migrating data from the default graph (i.e. the single unnamed graph in the repository) to the working graph (i.e. the graph named after the baseuri of the dataset, which is supposed to hold the working data).

Metadata Management

The Menatadata Management view can be accessed by the top-rightmost menu

metadata management

The Metadata Management View (see figure below) is divided into two main sections, that can be selected through the buttons at the top of the page:

  1. The namespaces and import section, allowing users to set prefix-namespace mappings, to owl:import ontology vocabularies and to edit the ontology mirror, a local mirror of ontologies stored within VB
  2. The metadata vocabularies section allows for the specification of metadata according to different existing metadata vocabularies, such as VoID, LIME, DCAT, ADMS, etc...
metadata management