Metadata serves a number of important functions such as
management of large data sets,
associating access permissions with such data sets,
discovery of digital objects and data sets,
information about how to access data objects and sets,
assisting in re-using data objects by covering context and provenance information, and
finding appropriate tools for a given data object
The discussion about metadata initiated by librarians mainly focused on the discovery function. However, modern e-Research will require considering the other functions as equally important. New functions such as profile matching will certainly be required.
Due to the range of different functions some experts speak about different types of metadata descriptions such as structural MD, administrative MD, guide MD, preservation MD, technical MD, process MD, descriptive MD, etc. These categories are not standardized, very much dependent of the community using them and are subject to changes. In this document we will not use these terms since we did not find them helpful.
We then need to address the question how we can describe the characteristics of data so that the above-mentioned functions can be realized. The typical way is to define a number of meaningful keywords that can describe the properties of a digital object, its context and provenance. A few examples are typical keywords such as:
creator: the name of the person(s) who created the object
country: the country where an object was created
content_language: the language an object is in
date: the data when an object was created, modified, released, etc. – consider how this example demonstrates that a keyword date is obviously not sufficient to allow correct interpretations.
actor: the person(s) actively involved in the data object
genre: the genre the object can be categorized in
source: an indicator of the lifecycle steps that led to this object – this is for example very important when processing video streams. One needs to know which kinds of codecs have been used, what kind of transformations have been applied etc. to do correct interpretations.
With each of these keywords some form of vocabulary or constraint can be associated. With respect to the category “country” for example one may want to associate the official list of nations as accepted by the UN. Such a list of possible values is called a controlled vocabulary. With “date” one may want to associate a certain form to be entered such as the US way of writing dates. Such syntactical limitations on the values are called constraints. However, for many fields such as “genre” there are no widely agreed vocabularies, i.e. one can only indicate a few typical options, but the list of values basically needs to be open.