Types of Resources and Metadata Components

Metadata frameworks need to allow researchers to describe different resource types that occur in the area of linguistics ranging from raw material in form of texts, audio and video recordings, brain imaging, eye tracking etc. to derived data which can include completely different types such as lexica, results of statistical analysis in table form, etc. All these resource types can be created by researchers from different sub-disciplines in linguistics. Thus the heterogeneity of the resource types and the intentions of the researchers needs to be covered by a metadata framework that allows to use metadata for research questions and not just to discover useful resources by approximate semantics.

In 2000 two groups discovered that the suggestions coming from the library world were not sufficient to meet the researchers’ needs:

  1. In early 2000 the IMDI group (widely European experts) decided to develop the IMDI metadata set that is structured, extendible and includes domain semantics to express the linguistic wishes.

  2. In late 2000 the OLAC group, with its origins mainly in the US, decided to extend the Dublin Core set by a few linguistically relevant categories to meet the most urgent needs, but also to remain simple.

Other suggestions were made in the linguistic domain such as ENABLER, but since no tools supported them these suggestions were not used. Both IMDI and OLAC were used by various linguistic resource centers with different purposes in mind. However, both approaches suffered from some major deficits:

  1. Despite possibilities to add extensions they both offered a limited set of categories – OLAC severely more restricted than IMDI.

  2. Despite its greater expressiveness due to structure options IMDI as well as OLAC had a fixed schema, i.e. even if a creator only knew about four values for example he had to cope with all requested input fields or when a new sub-discipline (e.g. sign language experts) wanted to use IMDI a new special profile had to be created and integrated.

It was obvious that only a flexible component model could overcome the limitations and give all researchers from the various sub-disciplines the possibility to create the profiles they would like to use and that are tailored for their intentions. It is obvious that syntax does not hamper interpretation if the meaning of the categories being used is widely independent of their structural embedding, i.e. these definitions have to be semantically narrow. We need to define categories such as date of birth, date of creation, date of annotation, etc. instead of semantically broad categories such as date where the interpretation is defined by its structural embedding. These considerations were the motivation to build CMDI:


Only flexible component models have the expressive power to cover the heterogeneity of a broad filed in terms of resource types, variety of sub-disciplines and research intentions.