Add a new federator
This walkthrough illustrates the steps required from the implementor of a federation engine in order to create a FederatorTemplate specification.
In KOBE, a federator template is defined using a set of Docker images. Additional parameters include the port and the path that the container will listen for queries. Federator templates are used in the Experiment specifications in order to define the federation engine to be benchmarked.
Prerequisites
In this walkthrough we assume that you already have already prepared a Docker image that provides the SPARQL endpoint of the federation engine (e.g., https://hub.docker.com/r/semagrow/semagrow/). Moreover, you should have a piece of software that automatically constructs the configuration required for your federator to operate (e.g., https://github.com/semagrow/sevod-scraper).
Step 1. Prepare your Docker images
Usually, a federation engine requires some configuration files that depend on the federated endpoints (e.g., the URLs of the federated SPARQL endpoints). Thus, apart from the Docker image with the SPARQL endpoint of the federation engine, you should provide a docker image that constructs any desired configuration for each of the source endpoints, and a docker image that initializes the federator that possibly takes into account the configuration files of the source endpoints. More specifically, prepare the following images:
- A docker image that constructs a configuration file for a source endpoint and
places it in an output directory of your choice. Assume that the source
endpoint and the dataset name are available in the environment variables
$DATASET_NAME
and$DATASET_URL
respectively, and that the dump file of the dataset is present in an input directory of your choice. - A docker image that constructs a configuration file for the federation engine and places it in an output directory of your choice. Assume that all the configuration files produced in the previous step are present in an input directory of your choice.
- A docker image that starts the federation engine and exposes its SPARQL endpoint.
The environment variables are initialized by the KOBE operator according to the specification of the benchmark to be executed.
As an example, we present the images for two federation engines (namely Fedx and Semagrow).
-
Regarding the Semagrow federation engine, we use the images
semagrow/semagrow-init
(source code here),semagrow/semagrow-init-all
, (source code here), andsemagrow/semagrow
(see here). The first image uses the sevod-scraper tool to create a ttl metadata file from the dump file, and the second image concatenates all metadata files of each of the source endpoints into a single metadata file. -
Regarding the fedx federation engine, we use the images
semagrow/fedx-init
(source code here),semagrow/fedx-init-all
, (source code here), andsemagrow/fedx-server
(source code here). Fedx is known for not using any dataset statistics, but it uses only a ttl file that contains only the SPARQL endpoints of the federation. The first image creates a ttl file that defines the SPARQL endpoint of each dataset and the second image concatenates all ttl files of each source endpoints into a single configuration file.
Step 2. Prepare your YAML file
Once you have prepared the docker images, creating the federator template specification for your dataset server is a straightforward task. It should look like this (we use as an example the template for Semagrow):
apiVersion: kobe.semagrow.org/v1alpha1
kind: FederatorTemplate
metadata:
# Each federator template can be uniquely identified by its name.
name: semagrowtemplate
spec:
containers:
# here you put the last image (that is the image for the
# SPARQL endpoint of the federation engine)
- name: maincontainer
image: semagrow/semagrow
ports:
- containerPort: 8080 # port to listen for queries
port: 8080 # port to listen for queries
path: /SemaGrow/sparql # path to listen for queries
fedConfDir: /etc/default/semagrow # where the federator expects to find its configuration
# federator configuration step 1 (for each dataset):
confFromFileImage: semagrow/semagrow-init # first docker image
inputDumpDir: /sevod-scraper/input # where to find the dump file for the dataset
outputDumpDir: /sevod-scraper/output # where to place the configuration for the dataset
# federator configuration step 2 (combination step):
confImage: semagrow/semagrow-init-all # second docker image
inputDir: /kobe/input # where to find all dataset configurations
outputDir: /kobe/output # where to place the final (combined) configuration
The default URL for the SPARQL endpoint for Virtuoso is
http://localhost:8080/SemaGrow/sparql
, hence the port and the path to listen
for queries are 8080
and /SemaGrow/sparql
respectively. The input and output
directories of the images mentioned previously are configured using the
parameters inputDumpDir
,outputDumpDir
,inputDir
,outputDir
.
Examples
We have already prepared several federator template specifications to experiment with:
Note
We plan to define more federator template specifications in the future. We
place all federator template specifications in the
examples/ directory
under a subdirectory with the prefix federator-*
.