Metadata Models – International Developments and Implementation

Definitions of metadata often describe them as “data about other data”, sometimes refined through the expression “structured data about data”. This definition over-simplifies the facts, that metadata on one hand have been in use long before the digital age, be it in library catalogues or on inventory cards of museums, and that on the other hand the entity they represent does not necessarily need to be in form of bits and bytes. A highly generic definition for metadata therefore could be “(structured) information about (digital) objects”. Transferred to the digital world “structured information” stands for “structured data”.


Introduction into Metadata
Definitions of metadata often describe them as "data about other data", sometimes refined through the expression "structured data about data". This definition over-simplifies the facts, that metadata on one hand have been in use long before the digital age, be it in library catalogues or on inventory cards of museums, and that on the other hand the entity they represent does not necessarily need to be in form of bits and bytes. A highly generic definition for metadata therefore could be "(structured) information about (digital) objects" 1 . Transferred to the digital world "structured information" stands for "structured data".
The term metadata does not solely refer to the representation of existing parts of reality such as data streams or objects, but is applicable as well to describe descriptions -metadata collections like subject gateways or digital libraries can be described in a metadata set as well. The unambiguous use of the term metadata therefore requires to regard the context as well as the levels of complexity to be described and the receivers of the information supplied through metadata. Metadata usually carry structured information like "Author", "Title", "Subject" etc, bits of information, that are semantically interconnected.
Instead of a fixed term or a standardised format metadata therefore should be understood as a form of language to exchange information on objects like books or digital resources as well as for purposes like archiving. Tom Baker describes Dublin Core 2 , one of the most common metadata formats, as "pidgin for Digital Tourists" 3 . Baker stresses the point that the structure of metadata forms a grammar, whose "basic pattern is easily grasped" and which "is wellsuited to serve as an auxiliary language for digital libraries" (ibid.). Metadata thus offer an elementary level of understanding, either between machines, human beings or in between the two of them. 1 A more abstract definition could be "structured information on specific and describable sectors of reality". "Specific and describable sectors of reality" could mean "a certain stream of data" as well as "an object", "a document" or "a human being". It should be emphasised, that metadata usually draw their meaning from the unambiguous relation to the reality they stand for. If there is no proof for the existence of the described, the description might be meaningless in the field of digital data. 2 http://www.dublincore.org/ 3 http://dlib.org/dlib/october00/baker/10baker.html

Types and Function of Metadata
Who benefits from metadata, assumed they have been stringently applied? For the creators of digital objects (resources, documents, collections), be it a producer or a scientist, metadata offer the possibility to prepare a formalised description of their work and thus exercise control over it. For cataloguers metadata form an established tool for the description of particular types of resources. They enable implementors to set up reliable schema and information providers to share such reliable schema through cross-walks and harmonising processes in order to establish core sets, which are mentioned below in further detail. A major benefit of metadata is given to end users performing information retrieval, who can identify, locate and compare relevant resources and recognise the respective functional requirements.
According to their different statements on the nature of the described digital objects metadata can be divided into the following types: • Content metadata/descriptive metadata refer to description of content or further bibliographic information, can be specified according to document type and/or subject. Technical details of real-life objects such as museum artefacts or books belong to this metadata type as well. • Administrative metadata carry information for the distributed administration and the maintenance of archiving systems such as versioning of the metadata set or date stamps and signatures regarding metadata modifications. • Structural Metadata allow the navigation in archiving systems by offering information on hierarchical levels (journal → article, monograph → chapter, artefact → detail). • Technical Metadata carry information on the digital nature of the described resource/document such as size, format, resolution or colour. • Preservation metadata inform about the durable preservation of digital objects and the storage/presentation format regarding migration and emulation. • Terms and condition metadata allow the disclosure of copyrights, intellectual property and retrieval conditions such as payments or registration. • Metadata about metadata collections allow users to cross-search distributed archiving systems (e.g. RSLP collection description 4 ). • Metadata about the use of metadata for example project/domain specific application profiles such as Renardus 5 , DC Libraries 6 , etc. Metadata of this type are usually stored in registries carrying information on application profiles and element sets of the respective co-operation partners in distributed archiving systems.
Metadata feature certain aspects of functionality. Their service function allows structured access to different resources by offering possibilities like crosssearching, cross-browsing, result display, result ranking or result sorting. Their technical and administration functions allow long-term maintenance, metadata exchange and sharing as well as reliable archiving.

Application Profiles
To fully meet the potential of metadata as an elementary language for interoperability it is necessary to establish application profiles. Application profiles usually consist of several element sets 7 , but at least one of them. Application profiles can be used "as a way of making sense of differing relationship that implementors and namespace managers have towards metadata schema, and the different ways the use and develop schema." (Heery/Patel 2000). Application profiles describe elements necessary for a certain implementation by specification of obligations such as "mandatory", "strongly recommended" or "optional" in order to ensure that certain information such as "Title" and "Creator" will be supported by a specific project or domain specific implementation. Application profiles disclose which organising body or institution maintains which element set and provide guidelines and best practice for each element. Application profiles offer the possibility to shape domain-specific variations. Some examples: • DC-Education 8 is based on the Dublin Core element set and includes elements from the IEEE Learning Object Metadata (LOM) element set. Target groups of education material for example thus can be specified on the metadata level already. • DC-Government 9 includes all fifteen Dublin Core elements, supplemented with domain-specific elements refining information on rights (security classifications such as "top secret"), dates (reliable information on publishing dates), subjects, relation. • DC-Libraries is based on the Dublin Core elements and includes elements of MODS (Metadata Description Object Schema), that allow the expression of roles for Creator/Contributor or the differentiation according to genre. DC Libraries support "Library of Congress Subject Headings" as an encoding scheme for Temporal Coverage.
• EULER AP 10 as a mathematics-specific application profile extends the DC elements by differentiating the role of the creator according to scientific publishing conditions or extending DC source type lists with EULER specific types.

• Renardus AP 11
Renardus as an interdisciplinary broker service offering sophisticated crossbrowsing extends the DC elements by supporting several encoding schemas.

Element Sets (Formerly Called Namespaces)
Element sets describe a well defined set of metadata elements, according to domain or subject-specific requirements and to different implementations. Besides the pure graduation of information, element sets define semantics and syntax for each metadata element, serving as "a vocabulary that has been formally published, usually on the Web; it describes elements and qualifiers with natural language labels, definitions, and other relevant documentation." (Baker 2000) Metadata generation should follow clear cataloguing rules. With Baker's approach, understanding metadata element sets as a vocabulary, it is thus the underlying grammar to the vocabulary that conditions those cataloguing rules into a consistent semantics and syntax. This consistency is reached, when each element is treated as "a unique identifier formed by a name (e.g., Title)" (ibid.

Metadata Registry
Metadata element sets are mainly used for the mixing and matching of data. To ensure their reliability, interoperability and long-term maintenance, projects for setting up metadata registries have been started. They arose from the "recognition of the benefits of shared data dictionaries leading to the specification of a formal registration process in the standard ISO/IEC 11179. 20 " Metadata registries serve as reference tools for a wide range of complex data sets by promoting the re-use of already defined elements, disclosing data definitions on element sets used in local or subject-specific implementations, thus ensuring their authoritativeness. Other objectives are registries of controlled vocabularies within particular domains or developments of domain specific application profiles. SCHEMA and MetaForm are non-authoritative registries, which do not involve in the launching of definitions and standards: • SCHEMAS 21 project, funded by the European Commission under the Fifth Framework Programme, "has provided a forum for metadata schema designers involved in projects under the IST Programme and national initiatives in Europe" (SCHEMAS Homepage). The CORES registry 22 will be developed to register application profiles and metadata element sets. • MetaForm 23 (SUB), part of the German Meta-Lib 24 project, is a database with a special focus on the Dublin Core and its manifestations in various Summarising the preceding paragraphs, the concept of application profiles and element sets, put down in metadata registries, follows certain objectives. They prevent the implementation of metadata schema based on in-house solutions, who are time-consuming and double effort and might work only locally. In this sense they increase the interoperability among different implementors dramatically and encourage cooperation between several partners resp. projects. The workability of metadata schema is ensured through the definition of unique identifiers for single elements as well as particular element sets or controlled vocabularies systems. These concepts ensure that bodies like management authorities feel responsible for the maintenance of metadata elements. Only durable and maintained concepts ensure interoperability and can serve as reliable interchange formats for cross-searching distributed services and heterogeneous environments.

Core Set of Metadata
The concept of metadata core sets is a prerequisite for an advanced service such as cross-searching. Cross-searching implies on the one hand, that searching over distributed and sometimes heterogeneous metadata collections via one single user-interface is technically supported, on the other hand, that searching results bring up comparable resources meeting the user's needs. To meet this objective, besides the technical development such as software architecture thorough preparatory work on the participating partner's metadata element sets is needed. This includes the following: • analysis of semantics and syntax of each element • investigation of all qualifiers in use. This refers to refinements as well as encoding schemes such as country-lists, classification systems or keywords, thesauri or controlled vocabularies • analysis and harmonisation of cataloguing rules. This applies to core elements like "Title" or "Creator" as well as to "Description" and qualifiers such as keywords • equalisation of rules on repeatability of each element • analysis and harmonisation on obligation rules such as "mandatory", "strongly recommended" and "optional" • analysis of language qualifiers in use. It is highly recommended to investigate and harmonise language qualifiers for the fields "Title", "Description" and "Subject", especially in multilingual services or international co-operations that allow cross-searching over language boundaries This preparatory work is usually conducted by detailed questionnaires and following discussions between the partners involved. In the end it should lead to the determination of a core set of metadata through identifying the minimum set of metadata elements that are needed to reasonably run the service and the maximum set of elements that each partner is able to support sufficiently. The definition of each element entering the core set is based on a "Format of Entry".

Renardus Application Profile
Renardus aims to "provide a trusted source of selected, high quality Internet resources for those teaching, learning and researching in higher education in Europe. Renardus provides integrated search and browse access to records from individual participating subject gateway services across Europe." 27 It will serve as an example to illustrate the concept of application profiles already mentioned. The application profile of Renardus is based on five element sets, encoded in XML/RDF.

Preservation Metadata
Preservation Metadata inform about the durable preservation of digital objects and the storage/presentation format regarding migration and emulation. This includes the description of the complete "life cycle" of a digital object, especially when digital objects are not digital-born but results of transformation processes such as digitisation: • provenance information like original format of the book • date of digitisation • technical information of the digitisation process • presentation and storage format (supporting migration and emulation) • RMS (rights-management-system), supplying information on terms and conditions of access, copyright and intellectual property • etc.
There are a lot of projects and initiatives which are working on preservation metadata. The following list is not complete but provides an overview:

OAIS (Open Archival Information System Reference Model) 28
The OAIS is a conceptual framework for an archival system important for the long-term preservation of digital objects, stemming from the work of the space data community. Since it's beginnings in 1997 the framework has gained international recognition due to the common effort of RLG 29 , OCLC, and many members of their respective organisations in shaping the reference model and adapting it for the use in libraries, archives and research repositories. Especially archive designers and maintainers can benefit from a reliable framework such as OAIS, which provides common concepts and terminology. The OAIS has been implemented already into the NEDLIB project 30 , run by the Koninklijke Bibliotheek, Den Haag. The participation of international organisations ensures the maintenance and further development of the framework.

Open Issues
Although OAIS provides a sophisticated framework, several issues still need to be developed in the future. This especially refers to distributed archiving systems such as EMANI, the electronic mathematical archiving network initiative 31 . Projects like EMANI work on concepts, how to handle the granularity of digital objects (such as journals/articel or monograph/chapter) in archiving systems. Other questions to be solved refer to the development of a minimum set of preservation metadata in distributed services. Automatic generation of metadata and preservation metadata standards, best practices and clear guidelines display fields for further engagement as well.

Outlook
The questions thrown open by this article refer to certain related topics. Further developments from there thus promise to be fruitful for those questions. Schema issues for example will be touched by progresses in XML/RDF research, architecture issues will benefit from progresses on the field of de-central discovery systems such as Z39.50 32 , LDAP 33 or OAI 34 . Architectural issues will influence at least the administrative metadata already mentioned. Research on formats will refer to presentation and storage format, which is important for migration and emulation of data. It may be expected, that additional metadata are needed to cover the aspect of format sufficiently. Business models, which concern metadata for rights, permission of use, payments and registrations are important as well. Trends and developments in this field should be observed carefully, as they might call for modifications in the respective metadata elements. Metadata are indispensable for the efficient search across multiple collections by supporting interoperability and crosswalks. A prerequisite for such services are reliable registries. Thorough application and use of metadata supports the documentation and maintenance of interrelationships within repositories. With descriptive and technical metadata for example, digitisation processes can be traced. For the dissemination of digitised contents, metadata are a major tool for resource discovery. They allow the documentation of multiple versions of digital objects, be it updated versions, different formats or translations. Metadata not only describe these versions but allow connections and links between the respective objects. Rights and reproduction information are stored safely in metadata. Ambitious applications in the Web such as the large-scale preservation of the Cultural Heritage or developments of the Semantic Web would be unthinkable without metadata.