Create a new benchmark
This walkthrough illustrates the steps required from the benchmark designer in order to create a Benchmark specification.
In KOBE, a benchmark comprises a collection of data sources, the latency of these data sources, and a list of query strings. Benchmarks are defined independently of the federator that is being benchmarked.
In this walkthrough we assume that you already have already prepared the following:
- The dump of each RDF dataset of the benchmark.
- A list of query strings of the benchmark.
- A DatasetTemplate for each dataset server you want to use in your benchmark.
Regarding the third prerequisite, we have already prepared several dataset templates to use. If you want to create your own dataset server template, check out this guide.
Step 1. Prepare your dataset dumps
Create a .tar.gz file for each dataset, and upload it on a known location.
Place all files of the dataset into a directory, put this directory into a tar file and compress it with gzip. Even though most dataset engines support the import of several RDF formats (such as RDF/XML, turtle, etc), the most simple format is N-TRIPLES. Therefore, we suggest to store your dataset in a single .nt file. If you choose to to prepare a dump.nt file, just do the following:
mkdir dataset/ mv dump.nt dataset/ tar czvf dataset.tar.gz dataset/
Finally, upload the .tar.gz file on a known location. As an example, we have uploaded the datasets for the FedBench experiment in the following location.
Step 2. Prepare your YAML file
A benchmark is characterized by its name and is parameterized using a list of datasets and a set of queries. A typical benchmark specification should look like this:
apiVersion: kobe.semagrow.org/v1alpha1 kind: Benchmark metadata: # Each benchmark can be uniquely identified by its name. name: mybench spec: # Each benchmark consists of a set of dataset specifications. datasets: # Each dataset can be uniquely identified by its name, # and is defined with # A list of URLs that contain the dump of the dataset to download. # A specification of the dataset server to use (dataset template). - name: dataset1 files: - url: https://path/to/download/the/dataset1.tar.gz templateRef: datasettemplate # ... add more datasets ... # Each benchmark consists of a set of queries. queries: # Each query can be uniquely identified by its name, # and is defined with # The language in which the query is written (e.g., SPARQL). # The actual query string to be posed to the federator. - name: query1 language: sparql queryString: "SELECT * WHERE ... " # ... add more queries ...
Check the following link in which we illustrate a simple example of the above specification:
This benchmark contains three SPARQL queries (namely
and two datasets (namely
toy2), both of them served by Virtuoso.
We have already prepared several benchmark specifications to experiment with:
We plan to define more benchmark specifications in the future. We place all
benchmark specifications in the
under a subdirectory with the prefix