What is it? Why is it important?

Raw Data (RD) is considered the original data derived from some primary source (e.g. laboratory reports, participant questionnaires, medical examinations).

RD has not undergone any processing, either manually or through automated processing.

It is critical to clearly define any RD used in a study. Aspect to include are:

  • Description of the type of RD (e.g. weight = continuous data, gender = coded data, questionnaire = score data)
  • The source (Source Data) from where RD was retrieved (e.g. medical records, laboratory reports, paper CRF)


Sometimes RD must be processed in order to convert RD into a format that can be analysed and visualised. This usually entails some type of cleaning (harmonisation), or transformation.


  • The calculation of creatinine clearance requires the input of several RD into an applicable formula (e.g. age, weight, serum creatinine). Consequently, the creatinine clearance number is not considered RD but processed data
  • The calculation of BMI is based on participant height and weight. Consequently, height and weight are the required RD while the BMI is the derived data

Any such data processing procedures are described and documented in the metadata of the study.

By comparing entries in the electronic database (eCRF) with the original RD the monitor can confirm that the respective data entered in the database is correct.

What do I need to do?

Familiarise yourself with how to specify and document study data:

  • List variables needed for study evaluation
  • Define RD and its source (e.g. participant weight is located on paper-CRF from medical examinations at screening)
  • In the event of derived or processed data, indicate formula used (e.g. computation of pain score based on participant questionnaire)
  • Define data format and prioritise standardised or international formats
  • Define quality checks needed to locate computation errors


Examples of data formatting

  • Date formats:

31st January 1999, 31/01/1999 or 31.1.99 99.01.31, 31011999, or today

  • Computation:
    • metrics: 1,70 m or 170 cm
    • time: 1,15 min or 75 sec
  • Coding:

female=F male=M, or female=1 male=2

Errors to consider when processing RD include:

  • Formatting errors:

Different study sites or departments might report blood MCHC differently (e.g. 32 g/dl, 320 g/l, 4.81 mmol/l or 13%). Depending on how the MCHC is documented in the study database, some conversion might be required

  • Computation errors:

Occur based on human, machine, or instrument errors. Consequently, data should be checked and potentially identified as “suspect” if entries are unreasonable. Such entries may require reconfirmation

Any data processing procedures are described and documented in the metadata of the study.

Where can I get help?

Your local CTU can support you with experienced staff regarding this topic

  • CRF – Case Report Form
  • CTU – Clinical Trials Unit
  • RD – Raw Data
  • SD – Source Data
