WebLicht usage scenarios

The WebLicht infrastructure can be used in many different contexts and the use cases described here by no means exhaust the possibilities for use. The purpose of WebLicht is to provide easy access to linguistic tools and a realtively simple means for developers to distribute their linguistic tools. In any circumstance where a ready-made linguistic tool could be deployed, WebLicht is at least in principle available.

Quick annotation

WebLicht provides a platform for quickly and reliably annotating arbitrary texts using standard tools. This is a boon for classroom use and for any kind of linguistic research, since it makes full annotation available quickly with little or no preprocessing and no tool preparation.

The example below describes the quick construction of a lemmatized and PoS tagged resource from an arbitrary short text. This task requires no software installation beyond a standard Internet browser, and takes minutes to demonstrate and perform. Tool processing time varies, depending on the tool itself.

  1. Find a short text on the Internet.

    Figure 8.3. A short text from a news website
    A short text from a news website


  2. Open WebLicht in a browser, and select from the menu: FileNewtext/plain.

    Figure 8.4. WebLicht start-up
    WebLicht start-up


  3. Copy the text from the article and paste it into the window, select the language of the text and other options as appropriate, then click the Save button.

    Figure 8.5. Inputing a short text
    Inputing a short text


  4. The next step is to construct a tool chain. Available tools are always in the upper window, labeled Next Choices. Tools can be added to a chain by double-clicking them, or by dragging them from the upper window to the lower (labeled Current Tool Chain).

    Figure 8.6. Constructing a tool chain
    Constructing a tool chain


  5. Most text processing tools require conversion into TCF format, so first select the text to TCF converter.

    Figure 8.7. Adding a TCF converter to a tool chain
    Adding a TCF converter to a tool chain


  6. The Next Choices window now contains tools that can be chained after the TCF converter. Select the tools that will need to run to produce the desired annotated resource. For example, select in sequence the SfS Tokenizer/Sentence Splitter, then the IMS TreeTagger. This produces a valid, complete annotation chain.

    Figure 8.8. Adding a tokenizer, sentence splitter, lemmatizer and tagger to a tool chain
    Adding a tokenizer, sentence splitter, lemmatizer and tagger to a tool chain


  7. Click Run Tools. Processing time depends on the underlying tools and the facilities available to run them. WebLicht displays the current state of the text in processing chain and signals when it has finished running.

    Figure 8.9. Running the tool chain on the text
    Running the tool chain on the text


  8. WebLicht has facilities for visualizing the output of the processing chain directly. Click the visualization icon on the last element of the chain to inspect the results.

    Figure 8.10. Visualization

    Visualization


  9. Click the download icon on the last member of the chain after processing is completed to download the TCF file produced at the end of the processing chain. This file is in XML format with distinct and well-documented tags encompassing all the information produced by the tools in the chain. Further processing and analysis can be performed from this file.

    Figure 8.11. Downloading the results

    Downloading the results


  10. (optional) WebLicht also includes parsers and other common linguistic tools, some of which take a much longer time to run. To fully parse the text, all that is necessary is to add a parser tool to the end of the tool chain and wait for it to finish running.

    Figure 8.12. Adding a parser to the tool chain
    Adding a parser to the tool chain


Statistics

WebLicht is also able to incorporate facilities for doing statistical analysis as part of the tool chain. All that is required is to add an analysis tool to the chain. For example, the tool chain from the previous section can include a lemma count tool.

  1. Add the Lemma Frequency Tool to a processing chain that includes a lemmatizer.

    Figure 8.13. Adding a lemma frequency tool to the chain
    Adding a lemma frequency tool to the chain

  2. Once processing is completed, the ordered list of lemmas and frequencies is available for viewing and downloading.

    Figure 8.14. Viewing word frequency data in a table
    Viewing word frequency data in a table

WebLicht also has tools for extracting PoS distributions, displaying histograms and other statisticals visualizations, and a variety of statistical analysis tools in progress.

Geovisualization

Another application for WebLicht is data transformation for more complex visualization. Geovisualization extracts placenames from annotated texts and displays them on a map. Figure 8.15, “Viewing placenames from a text on a map” is an example of a geovisualization from a German newspaper article, processed in a few seconds using WebLicht.

Figure 8.15. Viewing placenames from a text on a map
Viewing placenames from a text on a map