What is Stanbol?
Stanbol is an application that in simple words allow to obtain aditional information from a document.
For example, if I have a text who talk about the Romans emperors and I pass it through Stanbol, it will infer the language of the text (e.g. English) and the entities present in the text (Rome, Ceasar, Italy, Neron, etc), . With entities we refer to places and persons.
Setting up Stanbol
There is many possible ways to start using Stanbol, we will mention some of them:
Github
The most traditional way is to download the source code with git, and then compile it with maven
git clone https://github.com/apache/stanbol.git
mvn install -Dmaven.test.skip=true
You can read the README to understand special the different options for compiling.
In the folder ./launchers/ you can find many package.
There is a standalone version of Stanbol where you have to just run :
./launchers/full/target/org.apache.stanbol.launchers.full-1.0.0-SNAPSHOT.jar
That will create Stanbol server in the port 8080.
Also, in the same folder you can find a war version of Stanbol that you can put in a JavaEE application container like Tomcat.
Compiled File
You can retrieve already compiled packages from: iks-project.
The jar version:
wget http://dev.iks-project.eu/downloads/stanbol-launchers/1.0.0/org.apache.stanbol.launchers.stable-1.0.0-SNAPSHOT.jar
The war version:
wget http://dev.iks-project.eu/downloads/stanbol-launchers/0.12.1/stanbol-0.12.1-SNAPSHOT.war
Docker
The third option is to use docker to have a virtual machine with Stanbol:
sudo docker run -i --rm -p 8080:8080 --name stanbol -t mxr576/stanbol
With this line you will create a stanbol server instance in the port 8080.
This is arguably the easier and most quick way to start using stanbol
Using the web interface
To start understanding enhancement of content from Stanbol and to start extracting semantic data of your document, if you have a local instance of Stanbol running in your machine. Browse this page:
There you can write any text and extract the entities related with it.
For example, for the text:
Argentina is a big country in South America
Stanbol detect the follow entities:
It found two entities of the “place” type “argentina” and “south America” and it found the language of the text “english”
You can receive the same information through command line with curl:
curl -X POST -H "Content-type: text/plain" --data "Argentina is a big country in South America." \
http://localhost:8080/enhancer
Instead of sending plain text, you can upload whole file using curl to Stanbol.
If you have a file “text.txt” in your current folder, with this command line you can upload the file to get the enhanced information:
curl -X POST -H "Content-type: text/plain" -T text.txt \
http://localhost:8080/enhancer