Dataset Catalogs

Dataset catalogs are remotely accessible directories of ontologies, thesauri, ..., datasets. Actually, catalogs greatly differ in the type of the managed datasets, the range of metadata made accessible and the offered functionalities.

VocBench does not commit to any specific catalog, and relies on the extension point DatasetCatalogConnector to support diverse catalogs. Predefined extensions support the following catalogs:

The figure below depicts the main dialog for the interaction with a dataset catalog. Its main function is to support the user to search a dataset matching some criteria. Currently, this functionality is exploited when "adding an import" or when "preloading a newly created project with some data".

Dataset catalogs: dataset search

The topmost drop-down list, labeled "Catalog", supports the selection of the catalog to use. The text field just below, labeled "Search", can be used to enter a search string. To start a search, it is sufficient to hit ENTER (when the cursor is in the search text field) or to press the nearby button with the magnifier icon.

The search results are listed below the input widgets, sorted by relevance (if returned by the catalog), with the most relevant result at the top. Results are paginated to prevent problems with very large result sets. However, it is indicated the total number of results, the total number of pages and which page is currently shown. The user can move to the previous or to the subsequent page (if they exist) by clicking on the two triangles near the page indicator, respectively, the leftward point triangle and the rightward point one.

If returned by the chosen catalog, search facets are shown on the right of the results list as a sequence of boxes. The heading of the box contains the name of the facet (e.g. language), while the contained items indicate different values for the specific facet (e.g. English, French, etc.). The number associated with each item indicates how many search results have that value for the facet. Users can refine their search by selecting one or more facets. To that end, it is sufficient to click on the item of interest. Active facets are rendered with a darker color, and they can be deactivated by clicking on them again.

For each search result, the following information is shown:

If a search result is associated with titles and descriptions in different natural languages, the display language is determined as follows in decreasing order of preference:

When the user clicks on a search result, its detailed description is shown on the right side of the dialog. Currently, the description includes a few additional metadata, most importantly the data dump, the SPARQL endpoint and the URI prefix. On the right of the dataset description, there are some boxes, corresponding to catalog-specific facets (e.g. language and tag) used to classify the dataset.

Catalogs may support different access methods (e.g. SPARQL endpoint or data dump), and the datasets in the same catalogs may support different methods.

In general, the presence of specific metadata determines what can be done with a given dataset: