Data Analysts vs Data Scientists - What's the difference?
Data Science and AI technologies are already having a major impact on business and society — from early and accurate diagnosis of medical problems, to building cost-effective operations, providing more accurate fraud detection and self-driving cars.These are just four examples, new AI-powered applications and Data Science use cases are emerging across industries. Big Data and Business Analytics revenues are forecast to reach $549.73 billion by 2028. The market size has already grown significantly from $206.95 billion in 2020 to $231.43 billion in 2021.
The solution to the growing demand for data science capabilities?
Companies can make use of internal Data Science training schemes and career development programmes (apprenticeships) to both upskill their workforces and develop new talent. The 2021 Global Talent Competitiveness Index notes that "growing talent has traditionally meant education, but its definition should be broadened to include apprenticeships, training,
and continuous education as well as experience and access to growth opportunities."
However, if you or your workforce are seeking to upskill in data science and analytics and are new to the field, it can be difficult to know the difference between both roles. The good news is that there are clear industry standards published by the Institute of Apprenticeships (IfA) outlining the required skills for a Data Analyst and Data Scientist.
In this article, we draw upon IFA standards to highlight the difference between both roles and explain how you can make use of training programmes to address critical skill gaps in your organisation.
What are Data Analysts and Data Scientists?
The Data Analyst
In essence, the primary role of a Data Analyst is to collect, organise and study data to provide business insights. As stated in the IfA Data Analyst Standards “Data Analysts are typically involved with managing, cleansing, abstracting and aggregating data, and conducting a range of analytical studies on that data.”
👉RELATED READING: Data Analysts: Who are they and What do they do?
Managing, Cleansing, Abstracting and Aggregating Data: A Definition
- Managing: involves planning, executing and maintaining data processes for the secure storage of data and information assets.
- Cleansing: the process of checking data quality and accuracy by recognising then removing incorrect or biased data from a database
- Abstracting: the process of removing characteristics from a dataset to reduce it to a set of essential characteristics for more efficient data processing.
- Aggregating: the process of compiling information from multiple data sources to prepare combined datasets for data processing.
The Data Scientist
Data Scientists build upon the core competencies of a Data Analyst with additional Machine Learning and Software Engineering skills. The IfA Data Scientist Standards states; “Data Scientists are dynamic and adaptable, addressing varied problems with varied techniques. They actively explore innovative ways to use existing and new statistical, algorithmic, predictive, machine learning and artificial intelligence tools and techniques, to find significant and valuable patterns in data and transform these into information for their organisation.”
What are the typical problems a Data Analyst and a Data Scientist might work on?
Different types of analytics can be categorised into 'The Four Analytic Capabilities' - a widely used framework put forward by Gartner Research. These approaches increase in complexity, from Description and Diagnostic (more traditional techniques), to Predictive and Prescriptive (more sophisticated techniques), providing a useful way to demonstrate the progression from Data Analytics to Data Science.
The Four Analytic Capabilities: A Definition
- Descriptive: What happened? Example: What is the turnover this month?
- Diagnostic: Why did it happen? Example: In your monthly report, you can see that last month’s sales performance declined. What caused this?
- Predictive: What will happen? Example: Imagine you are a retailer and you want to maximise product sales while minimising waste. How can you accurately forecast how much stock you need?
- Prescriptive: What should I do? Example: Based on the traffic predictions, what are the best marketing initiatives you can put in place to maximise the prospects-to-lead ratio?
Our industry is moving forward fast, so we need to stay ahead. I recommend practising with real data, experimenting with different methods and evaluating them as you go. Collaboration and teamwork are key too… just as important as understanding complex problems quickly and working on code."
- Sebastian Kaltwang , Research Scientist, FiveAI
Core Skills - Domain Expertise, Mathematics, and Programming
At the core, Data Analysts and Data Scientists have skills in three broad areas: Domain Expertise, Mathematics and Statistics, and Programming. In this section, we break down these three areas to identify the foundational skills expected of a Data Analyst, then describe the additional expectations for a Data Scientist.
👉RELATED READING: How data analysts are adding value to businesses in every sector
Domain expertise is a fundamental part of Data Analytics and Data Science that can be taught on the job, enforcing the importance of graduate training schemes that quickly get individuals up to speed. Individuals draw upon domain expertise to:
- Understand and identify business problems that can benefit from Data Science.
- Apply relevant tools and techniques to solve the problem.
- Convert that solution into actionable insights to help the business.
- Communicate the findings in way that wider business units can understand and act on the insights.
Additional skills for Data Science:
Both roles require the ability to present results to a range of stakeholders and reason about their methods. However, Data Scientists need to have a strong understanding of industry best practices for interpreting complex machine learning models. For example, LIME and SHAP are recognised techniques for model explainability.
Mathematics and Statistics
Statistical foundations are important to ensure individuals grasp machine learning’s underlying mathematics, and have an understanding of how the model works and when it works well. For example, to train a basic prediction model, an Analyst will need an intuition of linear regression and gradient descent - these methods draw upon an understanding of linear algebra, optimisation and probability.
- Linear Regression is a statistical learning method used to visualise a linear relationship between dependent output variables (y) and independent input variables (x), and use that line to predict future values of the output variables.
- Gradient Descent is a basic optimisation algorithm for finding the minimum of a function; this value is called a parameter. Parameters are properties used to train the model to fit the data as accurately as possible, in order to minimise error in the model’s predictions.
A great reference for further information on mathematical foundations is: Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares by Stephen Boyd and Lieven Vandenberghe, 2018.
Additional skills for Data Science:
A Data Scientist’s mathematics skills will be more advanced than those of a Data Analyst. Machine Learning modules become less intuitive as they get more complex in design. In turn, their implementation requires more rigorous mathematical knowledge beyond basic concepts, not only to train the model, but also to decompose what the algorithm has done to explain your decisions to stakeholders.
Programming and Database knowledge is the third core component of Data Analyst and Data Scientist roles. To gather, process and analyse large amounts of data, individuals must have the skills to:
- Bring data together from multiple sources.
- Clean, transform and explore data to deliver practical insights quickly.
- Work in accordance with software development standards, including security, code quality and version control.
As a Data Analyst, this means going beyond Excel to make use of Python (See Data Analysis: Python vs Excel), a widespread programming language with an abundance of ready-made libraries for data analysis, data visualisation and modelling. Data Analysts use their intermediate Python programming skills to apply techniques such as Exploratory Data Analysis, Supervised Learning and associated methodologies to maintain and tune models. In addition, Data Analysts must be comfortable collecting and storing various forms of data (relational, document-oriented and graph) utilising Big Data Systems and technologies such as Spark and Parquet.
Additional skills for Data Science:
A Data Scientist’s programming skills are well beyond those of a Data Analyst. Combining their advanced mathematics and programming skills, Data Scientists create more complex Machine Learning solutions using techniques such as Ensembles Models, Time Series Forecasting, Natural Language Processing, Deep Learning, and Recommender Systems. Data Scientists must bring an engineering mindset to Data projects.
I expect a Data Scientist to have an engineering mindset about functionality that is destined to production - everything needs to be measured and tested. Additionally, Data Scientists need to be able to write their own feature engineering code (either in Python / Scala / Java) with light touch guidance if needed."
David Illes, Vice President, Morgan Stanley
The need for Data Analysts and Data Scientists
To improve the supply of data talent, leading business are investing in continuous learning opportunities; such as offering graduate training schemes and apprenticeship programmes that attract high-calibre students and skill them up quickly. This is a promising movement to address the UK's long-standing data skills shortage. Numerous government publications such as Nesta’s 2014 report titled 'Mind the data skills gap' stated, “urgent action is needed to deal with this data skills crunch, and ensure that ‘data talent’ coming out of UK universities is able to transform data into insights in the industry.”
Training schemes can be used to support your talent management and recruiting strategies. There are some real opportunities for companies to leverage their training programmes to provide competitive offers that attract and hire the best candidates.
Cambridge Spark offers an efficient way to find, hire and equip individuals with the right skills for your business. We provide adaptive Data Analysis and Data Science training programmes using blended learning. This approach includes live-coded online lectures and learning activities, in-person diagnosis and group presentation sessions, and practical projects with instant feedback using K.A.T.E.® - our Data Science training and assessment platform.
The K.A.T.E.® platform assists your talent management and learning and development initiatives by developing personalised training programmes based on each individual’s strengths/weaknesses, and learning objectives to get graduates up to speed and ready to deliver value to your organisation.
👉RECOMMENDED READING: 5 Business Benefits of Data Analyst Apprenticeships
What problems can be solved with Data Science?
Forecasting demand and consumer buying patterns to increase revenue
There are a variety of Data Science techniques available to help analysts predict customer acquisition, retention and profitability. For example, Cambridge Spark client, EDF Energy, leveraged these techniques within their Sales & Marketing Unit to optimise marketing initiatives and improve business outcomes.
“We have been able to market to our customers and prospects more effectively, using fewer resources to acquire and retain customers,” says Matt Wilson, Senior Manager, EDF Energy. “Cambridge Spark’s Core Data Science training was very useful for us to build capability fast. We used it to quickly deploy a series of classification models for marketing optimisation (predicting campaign outcome).”
Lower costs with increased operational efficiency
“What aspects of a store mean we will achieve the best sales? What is the best store to implement a given project? What variables are important to successfully grow sales? These are some of the problems that are becoming more relevant,” says Richard Pegler, Operational Strategy Manager at Sainsbury’s Argos Ltd.
To address these issues, companies can make use of Data Science techniques, such as demand forecasting and segmentation, to identify the variables that drive the success of their stores. For example, these techniques are used by Cambridge Spark client, Sainsbury’s Argos Ltd, to improve decision making, decrease operational costs and mitigate risks.
Increased revenue by developing new product offerings
New research and technologies are continuously emerging in the field of Data Science, and as a result, there are new opportunities for businesses to seize. With the growing interest in this field, businesses that adapt and integrate cutting edge techniques into their portfolio will be positioned more competitively. For example, Cambridge Spark client, Deloitte, makes use of training programmes to build internal data science expertise they can apply to new projects, to increase their offerings and bring more value to clients.
“I focus on delivering data-driven solutions to clients, most recently for the cognitive and robotics automation projects. Deep Learning and neural work are such hot topics in this field, so myself and my colleagues need to keep up with these new advanced technologies,” says Fei Liao, Technology Consulting Analyst at Deloitte.
Interested in training for your teams?
Whether you're looking to train 5 people or 100 people, we have a variety of scalable training solutions to help you address a wide spectrum of training needs within the fields of Data Science, Artificial Intelligence, or Software Engineering.
Please complete the form with your details and any known requirements. We'll then get in touch and guide you through every step of the way.
Alternatively, give us a call at + 44 (0)7816 419378
Get in touch now
Please complete all of the required fields to get in touch with us: