WebLicht usage scenarios

The WebLicht infrastructure can be used in many different contexts and the use cases described here by no means exhaust the possibilities for use. The purpose of WebLicht is to provide easy access to linguistic tools and a realtively simple means for developers to distribute their linguistic tools. In any circumstance where a ready-made linguistic tool could be deployed, WebLicht is at least in principle available.

WebLicht provides a platform for quickly and reliably annotating arbitrary texts using standard tools. This is a boon for classroom use and for any kind of linguistic research, since it makes full annotation available quickly with little or no preprocessing and no tool preparation.

The example below describes the quick construction of a lemmatized and PoS tagged resource from an arbitrary short text. This task requires no software installation beyond a standard Internet browser, and takes minutes to demonstrate and perform. Tool processing time varies, depending on the tool itself.

  1. Find a short text on the Internet.

  2. Open WebLicht in a browser, and select from the menu: FileNewtext/plain.

  3. Copy the text from the article and paste it into the window, select the language of the text and other options as appropriate, then click the Save button.

  4. The next step is to construct a tool chain. Available tools are always in the upper window, labeled Next Choices. Tools can be added to a chain by double-clicking them, or by dragging them from the upper window to the lower (labeled Current Tool Chain).

  5. Most text processing tools require conversion into TCF format, so first select the text to TCF converter.

  6. The Next Choices window now contains tools that can be chained after the TCF converter. Select the tools that will need to run to produce the desired annotated resource. For example, select in sequence the SfS Tokenizer/Sentence Splitter, then the IMS TreeTagger. This produces a valid, complete annotation chain.

  7. Click Run Tools. Processing time depends on the underlying tools and the facilities available to run them. WebLicht displays the current state of the text in processing chain and signals when it has finished running.

  8. WebLicht has facilities for visualizing the output of the processing chain directly. Click the visualization icon on the last element of the chain to inspect the results.

  9. Click the download icon on the last member of the chain after processing is completed to download the TCF file produced at the end of the processing chain. This file is in XML format with distinct and well-documented tags encompassing all the information produced by the tools in the chain. Further processing and analysis can be performed from this file.

  10. (optional) WebLicht also includes parsers and other common linguistic tools, some of which take a much longer time to run. To fully parse the text, all that is necessary is to add a parser tool to the end of the tool chain and wait for it to finish running.

WebLicht is also able to incorporate facilities for doing statistical analysis as part of the tool chain. All that is required is to add an analysis tool to the chain. For example, the tool chain from the previous section can include a lemma count tool.

WebLicht also has tools for extracting PoS distributions, displaying histograms and other statisticals visualizations, and a variety of statistical analysis tools in progress.

Another application for WebLicht is data transformation for more complex visualization. Geovisualization extracts placenames from annotated texts and displays them on a map. Figure 8.15, “Viewing placenames from a text on a map” is an example of a geovisualization from a German newspaper article, processed in a few seconds using WebLicht.