Objects, Collections, Granularity

Digital Objects as introduced by [Kahn/Wilensky 2006] have a number of internal and external properties. These properties are stored separate from the object and describe the DO as a whole, which can excellently be done by metadata keywords. External properties can be described with typical keywords (categories) as mentioned above. The internal properties typically indicate the technical encoding scheme used, the structure of the object, the semantic used resp. covered, etc. Keyword type of metadata is not meant to contain this information itself, but it should point to an object that contains this information. It should be mentioned here that metadata descriptions are restricted DOs itself, i.e. they need to have an identity that can be used to refer to them, but they are not described by metadata. Otherwise this would result in an unlimited recursive system.

Some format suggestions for files such as CHAT [MacWhinney 2000] and TEI [TEI P5] suggest including so-called header information in the file. The widely agreed convention is that metadata needs to be separate, since then it can be used free of licenses, free of the large amounts of bytes the object itself may cover, to be combined to form all sorts of virtual collections, to merge metadata from various sources, or to update the metadata without changing the object. This is very important since updating the header information in a file means creating a new object requiring an own identity and thus version. Converters will allow extracting the header information from such files to generate the metadata.

DOs are related in many different ways as a whole or amongst its fragments. Here we only discuss examples where DOs are related as a whole. Typical examples are: the DO is a new version of an older one, the DO is a different presentation version, the DO is part of a series of DOs that was created at the same time and location, the DOs are about the same content_language, the DOs include the same actors, and there are many more possibilities of relations users want to express. Metadata systems should allow the creator, manager and end-user to form so-called virtual collections, i.e. they should support the user in aggregating the metadata descriptions of DOs even from various repositories to build collections fit for any kind of purpose such as writing a thesis on a collection of objects. Figure 2.1, “Building virtual collections by aggregating metadata descriptions” indicates this process: a user can aggregate metadata descriptions from one or more existing collections into a basket to create a collection. To build virtual collections the actual DOs are not moved, only the metadata descriptions that have pointers to the objects are being collected. Such a virtual collection can also be described by a metadata description, which will contain next to the typical metadata keywords describing its properties a long list of references pointing to the metadata descriptions of the objects included.

Figure 2.1. Building virtual collections by aggregating metadata descriptions
Building virtual collections by aggregating metadata descriptions

Conversely, there is a large debate what kind of granularity should be chosen to assign metadata that can be used in the above-mentioned ways. Many repositories still offer only metadata descriptions for whole collections without pointing to the descriptions of individual objects. Imagine a corpus of lower-saxonian German including variants spoken at the coastal areas from the North Sea and Baltic Sea and recorded over some time. We could give the whole corpus one metadata description to publish and register it. This would allow users to find this corpus and to work with it as a whole. But let’s assume that an analysis work is directed to the question whether there are differences between male and female speakers in losing their capabilities of speaking the variants. There would be no chance based on metadata descriptions to make a simple query and group the whole collection into two or more sub-collections. Having a high granularity of metadata descriptions simplifies re-using a collection in particular in ways as they were not foreseen by their creators and thus supporting new research questions.


Thus we can conclude that it makes sense to associate each meaningful digital object with a metadata description to support identifying and re-combining them to address new research questions.