For a successful technical integration of linguistic resources and tools into the CLARIN-D infrastructure the following requirements must be met:
All data should be stored in one of a limited set of data formats. Throughout this user guide we discuss all data formats available within the CLARIN-D infrastructure. A list of data formats and their status within the European CLARIN project can be found on the CLARIN EU website.
All tools and resources have to be associated with persistent identifiers.
Tools and resources have to be associated with comprehensive metadata. All data categories used in the resource itself or in its metadata description should map to a data category in ISOcat, or other CLARIN supported data category registries, like the ISO-3166 registry for country codes [ISO 3166-1:2006], [ISO 3166-2:2007]. In certain cases, entries in RELcat (which is still under development) may also be used. See Chapter 2, Metadata for details.
A formal description of the resource's underlying data model must be provided. It serves as a means of formal documentation of the resource. It will also be used for formal validation if the resource is processed by a CLARIN-D member.
See the section called “Well-formedness and schema compliance” for a detailed account on formal descriptions for validation. In the case of XML based resources XSD, Relax NG schemata and Schematron rules are preferred over document type definitions (DTD). Formal descriptions should be documented and should contain links to a data category registry for all datatypes and their values. Closed sets of value data categories must be explicitly enumerated.
An informal documentation of the resource targeted for the CLARIN-D user community must be provided. In the simplest case this might be an already existing freely available electronic article or whitepaper. The documentation should be provided in an English version and preferably also in the subject language(s) of the resource. Versions in additional languages are welcome, too.
In the case of resources in formats that are not supported by CLARIN-D, resource providers are encouraged to contact the CLARIN-D technical help desk to find support in transforming their resource into a CLARIN-D compatible format.
Integration of some resources and tools may lead to additional requirements. These are discussed in the specific section on the resource or tool type.