Identifying and sourcing data begins with understanding the problem that needs to be solved. A clear definition of the analytical question determines what data are relevant, how much detail is required, and where that data might be found. In practice, this process starts with translating a business issue into measurable data needs and assessing which information is already available within the organisation (Jarvenpaa, 2024; Krasikov and Legner, 2023). By aligning sourcing decisions with purpose, analysts avoid collecting unnecessary data and focus on finding high-quality, reliable inputs that directly support specific analytical goals.
Once data needs are defined, analysts determine the type of data required. Data may describe characteristics, behaviours, or quantities, and can take qualitative, quantitative, or mixed forms (Midamba et al., 2025). Quantitative data are numeric and suitable for statistical analysis, while qualitative data provide descriptive insights that explain patterns and relationships. In many professional settings, combining both forms produces a balanced view that links evidence to context. Understanding these distinctions allows analysts to select the most suitable collection techniques and analytical tools.
It’s important to consider where data originates. Internal sources such as operational systems, customer relationship management tools, and financial databases are often the most relevant and readily available, but they may not provide a complete view. External sources, including open data portals, government publications, and commercial datasets, can fill information gaps or validate internal findings. A well-chosen combination of internal and external data strengthens analysis but requires careful attention to compatibility, structure, and quality. Research on data integration highlights that differences in completeness or sampling can distort analytical results if left unchecked (Baud et al., 2002). Structured frameworks for identifying and preparing open data therefore emphasise the importance of screening, assessing, and validating datasets at several levels, including metadata, schema, and content (Krasikov and Legner, 2023).
When sourcing data beyond internal systems, it’s key to understand the methods of collection and access. Data can be gathered directly through surveys, interviews, observations, or indirectly from repositories, APIs, and archived records. Direct collection gives control over accuracy and relevance but can be time consuming and resource intensive. Indirect sourcing, such as the reuse of existing data, is more efficient but requires critical evaluation of provenance, methodology, and timeliness. A balanced approach that supplements existing data with targeted new collection often provides the most robust evidence base. Midamba et al. (2025) note that valid and reliable data depend on selecting the right collection approach and applying consistent standards of quality and transparency.
Analysts must also evaluate the legitimacy and context of the data they use. Data always reflect the assumptions, priorities, and social context of its collection. Boyd (2020) explains that questioning who collected the data, why they did so, and what might be excluded from the dataset is central to responsible sourcing. This reflective approach encourages analysts to consider fairness and representation alongside technical validity.
Ethical and legal considerations underpin every decision in data sourcing. Jarvenpaa (2024) stresses that provenance, consent, and lawful access are fundamental to trustworthy and compliant practice. Analysts must ensure that sourced data meet organisational governance policies and adhere to data protection legislation. Documenting data origins, collection methods, and intended uses is essential for accountability and transparency.
Ultimately, identifying and sourcing data is not simply a technical task but a process of inquiry, evaluation, and judgement. Analysts who clearly define their objectives, apply sound collection methods, validate data quality, and respect ethical and legal standards create a reliable foundation for meaningful analysis and informed decision-making.
Action Point
Examine how your organisation sources and manages data. Identify where key datasets originate, how they are gathered or acquired, and where they are stored or maintained. Determine whether these sources are internal, external, or integrated from multiple systems. Assess the reliability, consistency, and ethical use of this data, and reflect on how sourcing practices influence the quality, governance, and outcomes of your analytical work.