What is Data?
Data can be defined as “a set of values of qualitative or quantitative variables” (Bernstein, 2009, p. 2). It is unprocessed, meaning it requires interpretation and structuring to become meaningful. Data can take numerous forms, including numbers, text, images, audio, and sensor outputs. As Rowley (2007) explains, the meaning of data is context-dependent; without context, data is inert. With appropriate structuring and analysis, data can yield valuable insights that support effective decision-making.
The DIKW Pyramid
The DIKW hierarchy describes the progression from raw data to actionable wisdom:
- Data: discrete facts or observations without inherent meaning.
- Information: data organised and contextualised to answer basic questions.
- Knowledge: interpreted information combined with experience or rules.
- Wisdom: the application of knowledge to make judgements and decisions.
Merkus, Helms and Kusters (2019) note that the DIKW model is widely used in knowledge management, although it has been subject to critique. Bratianu and Bejinaru (2023) argue that the model oversimplifies the iterative and dynamic nature of knowledge creation. Despite these criticisms, the pyramid remains a useful starting framework for understanding how analysts add value by transforming data into insight.
Types of Data
As a data analyst, distinguishing between different types of data is essential:
- Open Data: Freely available for anyone to use, often provided by governments (Kitchin, 2014). Example: datasets from the UK Office for National Statistics.
- Public Data: Accessible without restriction, although it may not be openly licensed for reuse.
- Administrative Data: Collected by organisations as part of routine operations, such as health records or tax filings (Schopfel et al., 2020).
- Research Data: Generated during academic or scientific research, often subject to ethical and funding body requirements (Koltay, 2020).
Understanding the data type influences not only how it is analysed but also how it can legally and ethically be used.
Principles of Data
Analytical work must adhere to core data principles to ensure quality, trustworthiness, and compliance:
- Ownership: Identifies who has the legal rights over the data, which affects permissions and intellectual property considerations.
- Accessibility: Data should be available to those who need it, while respecting restrictions (Hartman et al., 2020).
- Security: Protects data from unauthorised access, alteration, or loss. Includes encryption, access controls, and secure storage.
- Usability: Ensures that data is in a format and structure that supports analysis.
- Reusability: Data should be stored and documented to enable future use, aligning with the FAIR principles (Findable, Accessible, Interoperable, Reusable) (Koltay, 2020).
- Data Quality: Accuracy, completeness, consistency, and timeliness are critical dimensions (Jennex, 2017).
- Integrity: Safeguards the accuracy and consistency of data over its lifecycle.
Data Classification
Classification is the process of categorising data based on sensitivity, intended audience, and legal or policy requirements. Common classifications include:
- Public: Freely available with no confidentiality concerns.
- Internal: For internal organisational use only.
- Confidential: Limited to authorised individuals; disclosure could cause harm.
- Sensitive: Includes personal data or information that could cause significant harm if disclosed.
- Restricted: Highly sensitive data with strict access controls.
As Khan and Shaheen (2023) emphasise, classification enables appropriate security measures and handling procedures. It also aligns with legislative frameworks such as the UK General Data Protection Regulation (UK GDPR) and organisational policies.
Applying Principles in Analytical Work
Practical application involves:
- Identifying the data type.
- Confirming ownership and licensing.
- Assigning classification based on organisational policy.
- Implementing necessary controls such as encryption or anonymisation.
- Documenting processes and any exceptions.
For example, an analyst using NHS patient data must classify it as sensitive, secure it via encryption, limit access to authorised users, and meet legal and policy requirements.
Action Point
Identify a dataset you have worked with recently. Determine its type (open, public, administrative, research) and classify it according to your organisation’s policy (public, internal, confidential, sensitive, restricted). Document the ownership, access controls, and security measures required, noting any legal or ethical considerations that apply.