DATA LIFE CYCLE

Data typically have a longer lifespan than the project in which they were collected or created.

The data life cycle illustrates the various stages of research data development, starting from the initial conceptualization of the research problem, identification of existing data sources, collection of research data (both primary and secondary), data processing and analysis, publication of research results, and the deposit of research data in recommended repositories, where data professionals will ensure the long-term accessibility and usability of the data.

The goal is to ensure secure, documented, and transparent handling of data, as well as to enable their reuse. The ultimate outcome of careful management of research data is a data publication accessible in a data center, which can be cited by others as a scientific reference in their articles and other publications.

In the following, we provide a detailed overview of all the phases of the data life cycle and offer recommendations for handling research data with the aim of ensuring quality, long-term stewardship, and usability of the data after the project’s completion.

The CESSDA archives have prepared an online guide, the Data Management Expert Guide (DMEG), which we recommend to researchers and students for preparing to collect research data. The chapters in the guide are helpful for completing the Data Management Plan form, which is tailored for the social sciences and suitable when planning to publish data in one of the social science data archives within the consortium.

Chapters in DMEG follow the data lifecycle:

0. PLANNING

When designing a research project, it is essential to carefully plan it. The Data Management Plan (DMP) is a valuable tool in this process. The purpose of the DMP is to encourage thoughtful consideration of how to organize and publish the collected data before starting to work with the data and during data processing, ensuring that the data will be accessible and usable in the long term for various users, in line with the FAIR principles.

Data management planning involves anticipating the typical steps in the data lifecycle with the aim of:

  • Ensuring the security of sensitive or confidential data,
  • Maximizing the potential usability of the data, and
  • Supporting their long-term accessibility.

The ultimate goal of data management planning is the publication of data in a recommended repository.

Examples of DMP:

  • A DMP form may be required by a funder or a research organization. Therefore, before deciding which form to use, it is important to check the requirements specified in the research funding agreement and/or the rules of the research organization.
  • The DMP is a “living” document that evolves with the needs of the project and its participants. During the project, the DMP is updated to ensure that it tracks changes over time and reflects the current state of the research project, so multiple versions should be available by the end of the project.

The social science data archives from the CESSDA consortium have prepared the CESSDA Data Management Planning Form, which follows the chapters from the DMEG guidebook.

The English version of the DMP checklist can be found via the provided link in .pdf or .docx format below.

For more information, see the chapter  Plan.

I. DISCOVER

When designing a research project, we identify available data sources and related materials, such as questionnaires, syntax files, etc.

The main activities in this phase include:

  • Identifying appropriate data repositories,
  • Searching for relevant data,
  • Evaluating the data,
  • Gaining access to the data,
  • Reviewing the terms of data use.

For more information, see the chapter Discover.

II. ORGANIZE & DOCUMENT

During and after the project, it is essential to ensure the traceability of the creation and version changes of research data, so that research steps are understandable even to those who did not participate in the study. For this purpose, the researcher must appropriately organize and document the steps and processing of materials while working with new or secondary data.

This phase includes:

  • Selection of the appropriate methodology,
  • Sampling or selection of units or research participants,
  • Types and formats of data,
  • Identification of potential ethical and legal constraints,
  • Use of secondary data, if available,
  • Data collection: recording, observation, measurement,
  • Documentation of project and data procedures,
  • Preparation of final versions of materials and metadata.

For more information, see the chapter Organize & Document.

III. PROCESS

In this phase, the data must be prepared for analysis and publication. Since the data may be processed and reorganized multiple times, and multiple people may be involved in the processing, it is important to ensure clear instructions and traceability of the work to help maintain credibility.

Typical activities in the data processing and analysis phase include:

  • Data entry, digitization, transcription, translation,
  • Data verification, validation, cleaning, anonymization where necessary,
  • Data derivation,
  • Description and documentation of data,
  • Data analysis,
  • Data interpretation,
  • Preparation of research results.

For more information, see the chapter Process.

IV. STORE

In this phase, it is crucial to ensure the proper storage of data, protecting it from loss or unauthorized use. Planning for storage requires special attention when handling sensitive or confidential data.

Typical activities in this phase include:

  • Ensuring an appropriate location for data storage,
  • Maintaining backup copies,
  • Regulating access to the data.

For more information, see the chapter Store.

V. PROTECT

The type of data you handle will dictate the applicable legislation and ethical standards you must adhere to.

The main activities in this phase are:

  • Understanding and adhering to:
    • European data protection regulations (GDPR),
    • National legislation (e.g., ZVOP-1, ZASP, etc.),
    • Ethical committee requirements,
    • Other agreements, such as those made in the project or with research participants.

In this phase, activities are guided by previously obtained consents for participation in the research, such as anonymization, setting access regimes for different users, and other possible restrictions.

For more information, see chapter Protect.

VI. PUBLISH

The decision on where to publish newly generated data should ideally be made during the research planning phase. At the outset, it is important to review the conditions of the chosen repository and align with its acceptance requirements in advance.

To make the best decision, consider the following aspects:

  • Selection of data based on its potential for future use,
  • Advantages of different data service providers (general, institutional, disciplinary data repositories),
  • Additional opportunities for promoting the data publication (data article, blog, presentation).

For more information, see the chapter Archive & Publish.

Source of content and images: CESSDA Training Team (2017 – 2022). CESSDA Data Management Expert Guide.Bergen, Norway: CESSDA ERIC. Retrieved from https://dmeg.cessda.eu/