What is it? Why is it important?

Raw Data (RD) is considered the original data derived from some primary source (e.g. laboratory reports, participant questionnaires, medical examinations).

RD has not undergone any processing, either manually or through automated processing.

It is critical to clearly define any RD used in a study. Aspect to include are:

  • Description of the type of RD (e.g. weight = continuous data, gender = coded data, questionnaire = score data)
  • The source (Source Data) from where RD was retrieved (e.g. medical records, laboratory reports, paper CRF)


Sometimes RD must be processed in order to convert RD into a format that can be analysed and visualised. This usually entails some type of cleaning (harmonisation) or transformation. Any such data processing procedures are described and documented in the metadata of the study.




  • The calculation of creatinine clearance requires the input of specific RD into a respective formula (e.g. age, weight, serum creatinine). Consequently, the creatinine clearance number is not considered RD but processed data
  • The calculation of BMI is based on participant height and weight. Consequently, height and weight is the RD needed to calcualte BMI, while BMI is the derived data


By comparing entries in the electronic database (eCRF) with the original RD the monitor can confirm that the respective data entered in the database is correct.

What do I need to do?

As a SP-INV, familiarise yourself with how to specify and document study data:

  • List variables needed for study evaluation
  • Define RD and its source (e.g. participant weight is located on paper-CRF from medical examinations at screening)
  • In the event of derived or processed data, indicate formula used (e.g. computation of pain score based on participant questionnaire)
  • Define data format and prioritise standardised or international formats
  • Define quality checks needed to locate computation errors
  • Document implemented processing procedures in the metadata of the study


Examples of data formatting


  • Date formats: 31st January 1999, 31/01/1999 or 31.1.99 99.01.31, 31011999, or today
  • Computation:
    • metrics: 1,70 m or 170 cm
    • time: 1,15 min or 75 sec
  • Coding: female=F male=M, or female=1 male=2



Errors to consider when processing RD include


Formatting errors:

Different study sites or departments might report blood MCHC differently (e.g. 32 g/dl, 320 g/l, 4.81 mmol/l or 13%). Depending on how the MCHC is documented in the study database, some conversion might be required.

Computation errors:

Occur based on human, machine, or instrument errors. Consequently, data should be checked and potentially identified as “suspect” if entries are unreasonable. Such entries may require reconfirmation.


Where can I get help?

Your local CTU can support you with experienced staff regarding this topic

  • CRF – Case Report Form
  • CTU – Clinical Trials Unit
  • RD – Raw Data
  • SD – Source Data
  • SP-INV – Sponsor Investigator
Basic ↦ Data Handling ↦ Study Database ↦ Raw Data

Provides some background knowledge and basic definitions

Basic Monitoring
Basic Drug or Device

Starts with a study idea

Ends after having assessed and evaluated study feasibility

Concept Drug or Device

Starts with confidence that the study is feasible

Ends after having received ethics and regulatory approval

Development Drug or Device

Starts with ethics and regulatory approval

Ends after successful study initiation

Set-Up Ethics and Laws
Set-Up Quality and Risk
Set-Up Drug or Device

Starts with participant recruitment

Ends after the last participant has completed the last study visit

Conduct Drug or Device

Starts with last study visit completed

Ends after study publication and archiving

Completion Statistics
Completion Drug or Device
Current Path (click to copy): Basic ↦ Data Handling ↦ Study Database ↦ Raw Data

Please note: the Easy-GCS tool is currently under construction.