Qualitative and Quantitative Data
All data can be grouped as qualitative or quantitative.
- Qualitative data describes characteristics or qualities rather than numbers such as open text survey responses, photographs, or interview transcripts. It is analysed through interpretation, often using content or thematic analysis to reveal patterns or attitudes (Busetto et al., 2020).
- Quantitative data consists of measurable, numerical values such as counts, percentages, or averages (Allanson & Notar, 2020). It supports comparison, prediction and statistical analysis. For example, calculating average sales per week or the proportion of customers in each age group.
Both forms are essential: qualitative data explains why something happens; quantitative data shows how much or how often.
Levels of Measurement
Allanson and Notar (2020) identify four statistical measurement scales that determine how numerical data can be used:
- Nominal: Categories with no order (e.g. product type, region).
- Ordinal: Ordered categories showing rank or preference (e.g. satisfaction ratings).
- Interval: Evenly spaced values with no true zero, such as temperature (°C) or dates.
- Ratio: Evenly spaced values with an absolute zero, such as revenue, time, or distance.
Understanding these measurement levels helps you decide how best to describe and compare data. For example, you can calculate an average or total when the numbers have real values, like sales or distance. When the data is grouped into categories, such as job roles or satisfaction ratings, it is more useful to count how often each category appears.
Structured, Semi Structured and Unstructured Data
The way information is organised also defines its type.
- Structured data follows a fixed schema, stored in rows and columns, like database tables or spreadsheets (Codd, 1970). Examples include employee numbers, postcodes, or transaction amounts. These formats are easy to validate and analyse with standard tools like Excel.
- Unstructured data has no consistent format, including emails, documents, videos, or social media posts. Hopkins et al. (2022) explain that unstructured information lacks predefined fields, making it more complex to manage but often richer in insight. Analysing it requires specialist methods such as natural language processing or image recognition.
- Semi structured data combines both, containing identifiable tags or metadata but no rigid schema, as seen in JSON, XML, or IoT sensor feeds. Recognising the level of structure helps determine how data should be stored, accessed and visualised.
Compound Data Types
A compound (or composite) data type stores multiple related elements as a single structure. Databases often use compound records, such as an employee record containing name, age and address (Elmasri & Navathe, 2015). Modern tools and programming languages also support compound objects like lists or dictionaries. For example, in Python, a list such as employee = [“John Smith”, 29, “London”] holds different data types (text and numbers) together in one unit. These structures allow related information to be stored and analysed as a whole, while keeping the connection between each element.
Why Data Types Matter
Codd (1970) and Date (2004) stress that defining data precisely is vital for integrity and compatibility. When data is misclassified, for example, treating text as numeric, analyses can produce misleading results. Correct classification ensures the right methods, validation checks, and security settings are applied. For instance, analysts working with Office for National Statistics datasets must know whether employment information is categorical (occupation) or numerical (hours worked) to apply suitable calculations. Clear data definitions also support governance and legal compliance, particularly under GDPR, where personal identifiers must be handled differently from summary analysis.
Action Point
Review a dataset used in your role or organisation. Identify each field’s data type (qualitative, quantitative, structured, semi-structured, or unstructured) and note its measurement level. Consider how these differences affect the way you clean, store and interpret the information. Reflect on how misclassifying a field could impact the accuracy or security of your results.