A Step-by-Step Guide

Deployment Instructions

This step-by-step guide helps you start using SemaGrow

Table of Contents

Installation
Configuration
Usage

Installation

Debian distribution

Semagrow is available as a .deb distribution. To install on Debian-based systems execute the following in a terminal:

echo ‘deb http://semagrow.semantic-web.at/deb/ lucid free’ > /etc/apt/sources.list.d/semagrow.list

wget -q http://semagrow.semantic-web.at/deb/packages.semagrow.key -O- | apt-key add -

Semagrow depends on Java 8 or later. To start the SemaGrow endpoint, issue:

service semagrow start

At this point, the Semagrow endpoint is up and running:

service semagrow status

[ ok ] SemaGrow Stack is running with pid **.

Semagrow is a SPARQL endpoint meant to be used by a client application, but a human-usable Web app is also provided for testing and monitoring. The Semagrow Web app can be accessed at http://localhost:8080/SemaGrow

Up to table of contents

Docker image

Semagrow is available as a Docker image. To install the image execute the following in a terminal:

docker run semagrow/semagrow:latest

Build from sources

Up to table of contents

Semagrow is an open source project developed on Github. The source repository can be cloned from https://github.com/semagrow/semagrow and the most recent stable version can be downloaded from the master branch. This is always the version from which the Debian and Docker distributions are produced. The current stable version is version 1.4.0.

Maven is required in order to build the sources.

Up to table of contents

Configuration

As a bare minimum, one must declare the remote endpoints that Semagrow federates. These are specified in RDF using the Turtle format. By default, Semagrow looks at /etc/default/semagrow/metadata.ttl to find information about its federation. A metadata.ttl can be as minimal as:

@prefix void: <http://rdfs.org/ns/void#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

_:DatasetRoot rdf:type void:Dataset .

_:Dataset1 rdf:type void:Dataset ;
     void:subset _:DatasetRoot ;
     void:sparqlEndpoint <http://dbpedia.org/sparql> .

_:Dataset2 rdf:type void:Dataset ;
     void:subset _:DatasetRoot ;
     void:sparqlEndpoint <http://data.nobelprize.org/sparql> .

More complex configuration files provide Semagrow with important metadata and statistics about the contents of the federated endpoints. Please consult the configuration page about generating such files.

Each time the metadata.ttl file is modified, the Semagrow service must be restarted in order to read in the modified configuration:

service semagrow restart

Up to table of contents

Usage

For our simple example, we will use the metadata.ttl provided at the Semagrow repository. This metadata.ttl describes the AGRIS endpoint that serves agricultural science bibliography.

You can see a simple usage of the SemaGrow stack by submitting a SPARQL query that retrieves the number of images published between the years 2006 and 2008:

prefix dct: <http://purl.org/dc/terms/>.
prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
select (count(?s) as ?num) {
   ?s dct:issued ?a.
   ?s dct:type "Image".
   filter(xsd:integer(?a) > 2005).
   filter(xsd:integer(?a) < 2009).
}

Press the Execute button to execute the query. The results are presented in JSON format.

To demonstrate federated querying, we will now add a second dataset to the federation by replacing metadata.ttl with a new configuration file, also available at the Semagrow repository. This new configuration federates AGRIS with a dataset of that annotates AGRIS resources with a “clean publication year” property that disambiguates and normalizes publication years. The new dataset does not add publications, but the publication year is guaranteed to be an integer so that the same query yields more results.

Up to table of contents