First log in to LaMBDa with your user account or register if you do not have an account. You will find the Log in and Register buttons on the right side of the main menu.
Creating a new Dataset
After you registered an user account and logged into the system, select Dataset from the Add content submenu in the Content menu from the administration menu bar in the upper border in order to create a new dataset.
Describing Research Data
The metadata entry form for a dataset is quite extensive. It is separated in several groups of metadata fields. Furthermore metadata fields are organized in tabs to make metadata management easier. The metadata blocks are explained step by step.
Note that there is only a limited number of mandatory fields. The fields are required to register a DOI for the dataset in the publishing workflow (see section Publishing a Dataset).
Authority files (GND and ORCID), controlled vocabularies, and knowledge organization systems are used to identify creators, contributors, or publishers and to provide subject classification and methodological information. See the documentation page about LaMBDa's Metadata Schema for more details.
The Create Dataset form begins with a block of basic bibliographic metadata:
- Original title of the dataset (in whatever language it is given).
- Translated Title
- English translation of the original title (if original title not given in English).
- Original subtitle of the dataset (in whatever language it is given).
- Translated Subtitle
- English translation of the original subtitle (if original title not given in English).
- Short Title
- Short title of the dataset (should be language independent).
- Person or institution who or which created the data.
- Person or institution who or which edited the data.
- Publication Date
- The date of publication.
- Publication Year
- The year of publication.
- The publisher of the dataset.
Please note, that there is a distinction between authorship and contributorship: A dataset has one or more creators and it can have one or more contributors. The specific role of a contributor can be selected from the da|ra Contributor Type Role vocabulary. Typical roles of contribution are Data Collector, Data Curator, or Data Manager.
Finally, the block has three metadata fields for information organization related to the LaMBDa research data repository:
- The division or research group where the research data was produced.
- Topics (main research areas) used for the thematic organization of research data.
- Keywords to describe the research data.
- The general type of research data in the dataset.
- The subtype of the dataset (considered as subtype of the general resource type).
- The DOI (Digital Object Identifier) of the dataset (available after publication and DOI registration).
- The collection of which the dataset is part of (if applicable).
- Data Standard
- Used to identify a standardized specification the dataset conforms to.
- The language of the textual values of a dataset distribution.
Different classification systems and thesauri can be used to describe the topical coverage of the dataset.
The JEL (Journal of Economic Literature) Classification System is applied for subject classification in the domain of economics.
The European Thesaurus International Relations and Area Studies is can be applied for subject classification in the domain of political science, area studies, and history.
GND Subject Headings can be used for general subject classification.
This block contains information about the spatial / geographical coverage of the data:
- Spatial / Geographical Coverage Area
- Spatial coverage of the dataset described by bounding box (or polygon) coordinates.
- Spatial / Geographical Coverage Location
- Spatial converage of the dataset described by free keywords.
- Spatial / Geographical Coverage Controlled (ISO 3166)
- Spatial coverage of the dataset described by ISO 3166 Country Codes.
This block contains information about the temporal coverage of the data.
BSB-DDC Faceted Classification
BSB-DDC is used for faceted classification of subject area, geographical and temporal coverage of the dataset:
- Subject Area
- Scientific disciplines as subject areas.
- Taxonomy of regions and countries.
- Time intervals for temporal coverage description.
- The population of a study.
- Analysis Unit
- The analysis unit of a study.
- Sampling Procedure
- The sampling method used to select the for example the survey respondents to represent the population (sampled universe).
- Textual description of the sampling.
- Data Collection Method
- The method used to collect the data.
- Time Dimension
- Describes the time dimension of the data collection.
- Theoretical Background
- Theoretical background, paradigm, or research program which is the conceptual framework of the data collection.
- Disciplinary Context
- Disciplinary context of the data collection.
Data License Information and Access Rights
- Describes the license data.
- Access Rights (Availability)
- Describes the availability of the data with concepts from the COAR Access Rights Vocabulary.
To make the research data reusable, provenance information should be provided. Following the best practice of DCAT application profiles, textual information (possibly in narrative form) about the provenance of the data and information about the source(s) used to create the data (bibliographic/archival description and/or a link to the resource) is used to supply the required information:
- Documentation of data lineage.
- Source(s) used to create the data.
- Data Source Type
- Type of the data source.
Information about the project context of a dataset (as output of the project) can be provided the following fields:
- The name of the project.
- Link to a project website or record in a research information system.
Name and email address of the contact person for the dataset.
- DOI Proposal
- Proposal of DOI (Digital Object Identifier) for registration via the da|ra registration agency
- URL for the landing page of the dataset.
Related resources like publications or data documentation are added as qualified relations:
- Relation Type
- The type of the relationship between the dataset and the resource (using a type from the da|ra Relation Type controlled vocabulary).
- Resource Title
- The title of the resource (e.g. the document title).
- Resource Link
- The URL or DOI of the resource.
The three most important relation types to support data citation are Is Cited By, Is Referenced By, and Is Supplement To. See RelationType for Citations and References on the DataCite website for more information.
Note: Versioning is supported by relating two datasets: A dataset is linked to its previous version via the Is Previous Version Of relation.
Already exisiting data distributions (DKAN resources) can be added to the dataset.
Adding a Resource to a Dataset
After entering the metadata for the dataset, the next step is to add a data distribution to the dataset. In order to add a data resource, click the button Next: Add data below the dataset form.
On creating a resource, an URL alias is automatically generated by default. The URL alias pattern tries to include the file type from the uploaded file as the last URL segment. So, if the resource has a Excel file with the file extension .xls the string xls will be added to the URL alias. This results for example in the URL alias dataset/ces-2012/resource/xls.
Publishing a Dataset
There is a research data management workflow for publication of a dataset. The workflow includes a metadata review process in order to control metadata quality.
Metadata should be provided by the creators and contributors of a dataset. The provided metadata has to be reviewed by a responsible metadata librarian before the dataset can be published via the da|ra registration agency.
Research Data Publishing
The da|ra DOI registration service requires the following mandatory metadata properties:
- The type of the dataset has to be selected from the selection list Type (in the tab Dataset Information).
- The original title of the dataset in the field Title.
- At least a name entry in the Creator fiels is needed.
- Publication Date
- The date of publication of the dataset in the field Publication Date.
- The availability of the data has to be selected from the selection list Access Rights (Availability) (in the tab Data License Information and Access Right).
- URL (in the tab Administrative Metadata).
Note that it is recommended to provide a DOI in the field DOI Proposal (in the tab Administrative Metadata), otherwise the DOI suffix is automatically generated by the da|ra system.
The metadata of the dataset is transfered to the da|ra registration agency for DOI registration. The metadata will also be indexed in DataCite for information retrieval.