Metadata Schema

LaMBDa's metadata schema is an application profile based on DCAT (Data Catalog Vocabulary) and Disco (DDI-RDF Discovery Vocabulary).

DCAT enables the descriptions of datasets and the according data distributions in a data catalog (similar to a library catalog).

Disco provides properties for descriptive metadata relevant for information retrieval in discovery systems.

A selection of authority files, controlled vocabularies and knowledge organization systems are used in combination with the metadata schema. We provide a Skosmos browser as a tool to search and explore all the controlled vocabularies and knowledge organization systems integrated in LaMBDa.

Authority Files

The GND (Gemeinsame Normdatei) is used to identify persons and organizations as creators or contributors of research data.

Controlled Vocabularies

DDI Controlled Vocabularies

The DDI Controlled Vocabularies are used for methodological information. Note that the DDI vocabularies are specific to the social sciences.

DDI CV Analysis Unit
Describes the entity being analyzed in the study or in the variable.
DDI CV Data Source Type
Describes the type of the data source.
DDI CV Mode of Collection
Describes the method used to collect the data.
DDI CV Sampling Procedure
A typology of sampling methods.
DDI CV Time Method
Describes the time dimension of the data collection.

Contributor Roles

The following roles were selected from the da|ra Contributor Type vocabulary in order to describe the specific role of contributors (descriptions taken from the da|ra Metadata Schema documentation).

Data Collector
Person/institution responsible for finding, gathering/collecting data under the guidelines of the author(s) or Principal Investigator (PI).
Data Curator
Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data center or repository.
Data Manager
Person (or organization with a staff of data managers, such as a data center) responsible for maintaining the finished resource.
Distributor
Institution tasked with responsibility to generate/disseminate copies of the resource in either electronic or print form.
Editor
A person who oversees the details related to the publication format of the resource.
Researcher
A person involved in analysing data or the results of an experiment or formal study. May indicate an intern or assistant to one of the authors who helped with research but who was not so “key” as to be listed as an author.
Rights Holder
Person or institution owning or managing property rights, including intellectual property rights over the resource.
Supervisor
Designated administrator over one or more groups/teams working to produce a resource or over one or more steps of a development process.

Knowledge Organization Systems

BSB-DDC is used for faceted classification of subject area, place, and time. The main purpose of BSB-DDC is the integration in the osmikon discovery system, which also uses the the facets.

The JEL (Journal of Economic Literature) Classification System is used for subject classification in the domain of economics.

The European Thesaurus International Relations and Area Studies is mainly used for subject classification in the domain of political science and history.

Organizing Research Data

LaMBDa is based on the DKAN data platform. DKAN provides the following approaches to organize information enabled by the LaMBDa metadata schema: Collections, Datasets and Data Distributions can be used to organize the research data in the LaMBDa repository.

Collections

Collections can be used as hierarchical organization systems in order to collect datasets according to their common provenance (e.g. output of the same project) or common source (e.g. transcribed from the same source). Such organizing principles may be combined in nested collections.

Note that collections are handled by the LaMBDa repository as DKAN Datasets of the type Collection. Thus collections can also be published via DOI registration, but can also be used for just organizing content without being published (i.e. collections as containers to organize datasets in the institutional research data repository).

Datasets

DKAN Datasets are the main content type in LaMBDa. Note, that a collection is represented as a DKAN Dataset (content type) of the type Collection in LaMBDa. Typical types of Datasets are Dataset, Text, Image, Geospatial, or Interactive Resource.

A dataset is classified with a concept from the research data type vocabulary.

The DCAT standard defines dataset as follows:

A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

Data Distributions

In the strict sense of distribution, a distribution should represent the data described in the metadata of a dataset, so we could explicitly name it a data distribution. Distribution is defined as follows in the DCAT standard.

A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).

Therefore, distributions should not provide data documentation or even related publications. Related documents should be described as . This includes the classification of the relationship between the dataset and the document according to the da|ra Relation Type vocabulary. The vocabulary contains concepts like Is Supplement To, Is Documented By, Is Referenced By, etc.