Chapter 2. Metadata

The research domain is characterized by an enormous increase of the amount of data and the complexity it covers, i.e. the type of implicit and explicit relations included, the heterogeneity of formats and semantic domains, etc. This is also true in the domain of linguistics where it is not the sheer volume only that is creating new challenges for management and access, but it is the extremely growing number of files researchers are creating and using. A field researcher documenting an endangered language for example easily has about 10.000 files on his notebook, which need to be managed. Of course these files cover in particular raw data (AV recordings, texts, etc.), but also many types of annotations, lexica, sketch grammars, notes about various aspects of the language and the field trip, etc. On top of this there are several versions of each work, often several presentation forms (for photos for example JPEG and PNG, versions), extractions of fragments into new files and many other related forms.

It is a common experience that it is almost impossible to manage such a heap of data without having a proper organization and naming scheme. Directory systems were used for many years, but it turns out that these are not appropriate anymore, since they are not meant for sharing and aggregation, do not include the many relations, do not express contextual knowledge, do not support searches, etc.

Throughout this user guide we follow the definition of a digital object (DO) as introduced by [Kahn/Wilensky 2006] when refering to different types of stored data as an abstract notion of a digital work that is instantiated in some representational form and is associated with a persistent identifier and a metadata description. We cannot claim thus that DOs are necessarily files, since they could also be constructs in databases for example.

An abstract definition of metadata says MD is data about data. In this document we use the term metadata as a keyword type of description of data objects. This concept of metadata is not at all new, it was introduced as cards when big libraries were being built. These cards typically combined creation with location information.

It is widely agreed – also across disciplines – that associating metadata with every DO is the only alternative to be able to support management, sharing and access of data in the Internet domain. Currently we see that this wide agreement is turned into strong requirements from funding agencies: projects that include the creation of data need to come up with a data management plan where it is described how the data created will be described, preserved and curated. Metadata in this restricted sense can be defined as structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage and information resources [NISO:2004].