Integrating existing linguistic tools into WebLicht

This section is addressed to tool providers who want to integrate their tools into WebLicht. CLARIN-D can provide advice and assistance, time and resources permitting, through the CLARIN-D technical helpdesk and individual centres. More detailed information about the technical specifications for WebLicht services and tutorials for creating and WebLicht tools are available through the WebLicht website.

[Important]RESTstyle architecture

Before integration into WebLicht, linguistic tools and resources must have a standard web service interface. Web service standards are defined by the W3C, which considers a web service to be a software system designed to support interoperable machine-to-machine interaction over a network.

Web services can be implemented in several ways, and there are several established standards and best practice systems. WebLicht uses the RESTstyle architecture [Fielding 2000], which is well suited to highly scalable systems of independent software components, like large collections of independently authored linguistic tools. Every WebLicht-integrated tool must be implemented as RESTstyle web service.

Rewriting linguistic tools as web services may be time-consuming, complicated, and, in the case of tools with intellectual property restrictions, impossible. Therefore, it is common practice to construct a wrapper around an existing tool. A web service wrapper is a program that is implemented as a web service and invokes the existing tool in response to users' input. The wrapper often must convert user input from the formats provided by the web service system to the formats expected by the tool, and then converts the output into the format expected by the web service or the user (see Figure 8.16, “Web service wrapper”).

Figure 8.16. Web service wrapper
Web service wrapper

There are no fixed limitations on the programming languages used for WebLicht tools. The only strict technical requirements for WebLicht integration are

  1. interfaces that follow the requirements for a RESTful architecture, and

  2. metadata descriptions in the CMDI format (see the section called “The Component Metadata Initiative (CMDI)”).

Although not as strictly required, it is very strongly preferred that all web services available through WebLicht employ the TCF format described in section the section called “Interoperability and the Text Corpus Format”. Devising a wrapper for a new tool should generally mean devising a robust data converter between TCF and that tool's usual processing format.

Generally, tool integration into WebLicht can be accomplished using any programming language and software development framework. The Java EE programming environment (particularly version 6 and above) and the Apache Tomcat web application server are good best practices for building web services for WebLicht.

[Important]Checklist for WebLicht integration

The following steps have to be performed in order to successfully integrate a tool or resource into WebLicht:

  • Implement the tool as RESTstyle web service or embed it in a wrapper which acts as RESTstyle web service and make it accessible via standard internet protocols.

  • Tools that process textual information should use the TCF format for input and output whenever possible.

  • Describe the web service in the CMDI metadata format and assign a PID to it. Examples of CMDI files can be found at any CLARIN-D repository which hosts web services.

  • The CMDI file has to be stored at one of the CLARIN-D centre repositories.

  • Decide whether you want to host your tool or web service at your site or whether you want to draw on the hosting facilities of CLARIN-D. Creator-hosted resources can still be part of WebLicht, as long as server availability is high and the hosting system is able to handle a sufficiently large number of simultaneous calls.