Multimedia tools

Multimedia tools cover a wide variety of programs – mostly not specialized for linguistics – used to store, annotate, search and edit video and audio data. We will discuss only a few tools designed for use in linguistics:


ELAN is a tool for the creation of complex annotations in video and audio resources. With ELAN, a user can add an unlimited number of free-form annotations to audio and/or video data. An annotation can be, for instance, a sentence, word or morpheme, or a gloss, a comment, a translation, or a tag or description of any feature observed in the media. Annotations can be created on multiple layers, called tiers, that can be hierarchically interconnected and can correspond to different levels of linguistic analysis. Users can also mark and annotate gestures.

The annotation is always created and stored in files which are separate from the multi-media files. The textual content of annotations is always represented in Unicode and the annotation files are in an Elan-specific XML format. Annotation can be imported from and exported to a variety of other formats, including Shoebox/Toolbox, CHAT, Transcriber (import only), Praat and comma or tab-delimited text files. Export is also possible to interlinear text, HTML, SMIL and subtitle text.


Lexus is a web-based tool for creating and editing multimedia lexical databases. A lexical entry in Lexus can describe different linguistic aspects of a word, like its part-of-speech, along with dictionary-style information like examples, as well as encyclopedic and ethnographic information. In Lexus, lexical entries can also contain images, sounds, and video files, to illustrate meanings or as an example of the usage of the word.

Lexus-built lexica make it possible to link linguistic and cultural concepts together in a way which conventional electronic language resources cannot easily manage. In addition, Lexus supports structural linguistic dependencies between words at all levels of analysis.

Lexus is of primary interest for language documentation projects since it offers the possibility to not just create a digital dictionary or thesaurus, but an entire multimedia encyclopedic lexicon. It also supports working collaboratively from different locations through a web-based interface.


WaveSurfer is a tool for manually annotating sound files. It provides different visualizations of audio data – waveform or spectrogram display – and enables pitch contour and formant calculation and visualization. It supports different file formats for import and export, including WAVES or TIMIT. It is also possible to specify a tagset for annotation labels. Tagsets are saved in human readable form, and can be manually modified with a simple text editor. Annotation labels can be queried and replayed to the user.

WaveSurfer does not support hierarchical annotations. Also, no automatic transcriptions are provided, but results of automatic transcription processes in one of the supported formats can be imported and manipulated by the tool.


EXMARaLDA (Extensible Markup Language for Discourse Annotation) is a system of data formats and tools for the computer assisted transcription and annotation of spoken language, and for the construction and analysis of spoken language corpora.

The EXMARaLDA Partitur Editor is a tool for inputting, editing and outputting transcriptions in partitur (musical score) notation. The EXMARaLDA Corpus-Manager is designed to assemble transcripts created with the EXMARaLDA Partiture-Editor with their corresponding recordings into corpora and enrich them with metadata. Metadata can be about speakers, communications (settings), recordings and the actual transcripts. The EXMARaLDA query tool EXAKT (EXMARaLDA Analysis and Concordancing Tool) is a tool for searching transcribed and annotated phenomena in an EXMARaLDA corpus.

All EXMARaLDA data is stored in Unicode-compliant XML files. EXMARaLDA data can be transformed into a number of widely used presentation formats and supports several important transcription systems (HIAT, GAT, CHAT, DIDA).


The CLARIN-D center at the Bavarian Archive for speech Signals (BAS) has made available several new web services that incorporate the functionality of the MAUS (Munich AUtomatic Segmentation System) segmentation tool. MAUS allows the fully automatic segmentation of speech recordings, given some form of written transcript. In a nutshell MAUS transforms the written text into a sequence of canonical phonemes, then produces a hypothesis model of possible pronunciation variants based on this canonical form, and finally decodes the speech signal into the most likely variant together with the optimal segmentation into phonemic and word units.

MAUS was developed in the late 1990s and has been maintained by BAS since then. To make it easier to use, several web services are now available that allow scientists to use MAUS over the Internet without the hassle of installation.