Data Analyst, Data Scientist, and Data Engineer: What’s the Difference?
Many CPO’s and CSCO’s today complain about being overwhelmed with the amount of internal and external data they are being asked to analyze to create improved supply chain outcomes. The volume and velocity of data streaming at us is increasing exponentially, with an often quoted statistic that 90% of the data in the world has been generated in the last five years. Data is arriving in many different formats, including structured data (conventional letters and numbers in cells and rows) as well as unstructured data (media, text, photos, videos, voice mails, etc.). Executives are trying to make sense of this complex mass of digital output from their businesses, customers, and suppliers, and struggling to understand how to convert this information into driven decision making (DDDM) outcomes within their supply chains.
A select number of executives we’ve engaged with are starting to take steps to do so, and are benefiting from the lessons learned that comes with starting up a digital analytics team. These executives recognize that they will need to staff their teams with data professionals to assist them in achieving not only their digital strategies but also to benefit from the data their digital strategies will provide.
The titles of data professionals often heard are data analyst, data scientist, data engineers, as well as subject matter experts. What do these titles mean and what are the skill sets and technical capabilities associated with each of these titles?
Data Analyst
A data analyst is the most tactical of the four titles. Data Analysts usually are part of an (embedded) functional team who are responsible for receiving and collecting data from multiple internal and external sources. A Data Analyst usually uses Excel, Access, SQL, or other tools to clean and organize the data. Some may be more proficient in the use of algorithms embedded in specialized packages (such as SCWorx, which is used to standardize healthcare data). They may use some statistical packages to perform fundamental statistical analysis and may use some type of data visualization such as Power BI to present their findings. They usually have a basic knowledge of the business function they support and may make tactical recommendations typically based on historical data.
A good Data Analyst is critical to creating a foundation for Data Driven Decision Making within an organization. Most supply chain and procurement functions are at the level of early digital integration where they need a good data analyst.
Basic Data Analyst tasks might include:
- Cleaning and organizing raw data
- Verifying the source and relevance of the raw data used
- Utilizing descriptive statistics to provide an interpretation of the data
- Identifing and analyzing any trends
- Creating visual representations of the data to help convey the distribution of the data, as well as preliminary findings
Data Scientist
A Data Scientist typically has a strong analytical and quantitative background. A Data Scientist will use the output produced by a Data Analyst or their own data manipulation, and leverage their advanced statistical expertise to gain further insights into the data through the use of advanced predictive modeling and machine learning. Another way of thinking about this difference is to note that a Data Scientist focuses more on predicting the future using the data, whereas a Data Analyst uses the data to interpret and describe what has happened in the past.
Using data to predict the future is tricky, so a good Data Scientist must have a deep understand of the overall business processes from which their data is extracted. This may involve having them discuss the data with Subject Matter Experts (described below), and may require interviews with managers to fully understand business unit objectives, as well as the interdependencies between business functions and units. Data Scientists are not usually embedded in a procurement or supply chain function, but work in organizations that support the enterprise as a whole, and work cross functionally across different business lines.
Procurement and supply chain functions usually require the support of a Data Scientist while they are designing their digital environment, during the implementation and testing of how people will perform business processes in this environment, and are involved to help maximize the benefits of supply chain digitization.
Basic Data Scientist tasks could include:
- Creating statistical models and validate their accuracy for use in predicting future events, including advanced tools such as Python, R, Natural Language Processing, SQL, and others.
- Employ data visualization tools such as Microsoft BI, Tableau, Klik, or others to illustrate relationships among different types of data and the implications for decision-makers
- Utilize these models and verified data to train neural networks in various machine learning applications
- Monitoring machine learning models for accuracy and bias and adjust accordingly
Data Engineer
A Data Engineer provides enterprise level support for analytics support. A Data Engineer is responsible for optimizing the availability and accuracy of internal and external data sources so that Data Analysts and Data Scientist can perform their duties in a timely and efficient manner. Data Engineers are responsible for the transfer of data between sources, the integrity of the data during such transfers, storage, security, access control ease of access, etc. Data Engineers require knowledge of data architectures, data warehouses, data ponds, data networks, etc.
Data Engineers are usually members of the Information Technology (IT) function, and often are involved only as supporting team members for a specific supply chain digitization effort.
Basic Data Engineer tasks could include:
- Creating Application Program Interfaces (APIs) for the retrieval of data from both internal and external data bases
- Integrating various internal and external data bases
- Monitoring and testing all critical networks and their interfaces
- Monitor network security and overall performance
Subject Matter Expert – A subject matter expert is someone who may not have strong data analytics or statistics skills, but has deep, deep experience in working within a particular business function. They are the individuals who are best positioned to make sense of analytics outcomes, and to interpret the meaning of what the results are stating. They are also the ones who will know where to look for certain types of data that will answer a particular research question, or if it isn’t available, what other proxy data or latent indicators can be used to determine the presence of the event or issue in question. SME’s are typically people who have worked in an area like procurement, logistics, material handling, quality assurance, or finance for a long time, and know when a particular result is a “false positive” or just seems too funky to be real. Remember, data analysis is often a process of going down one blind alley after another, and these folks will be key to help guide your team through the maze you will encounter during an analytics project.
Basic Subject Matter Expert tasks could include:
- Reviewing the analytical output of the Data Engineer and Data Scientist, interpreting the business meaning, and identifying where to look next in the data for further clarity or validation of the results.
- Identifying potential problems from the business that can be scoped into an analytics project for the team, including the boundary conditions, types of data required, and the key research hypothesis/question that the project should address.
- Disseminate the results of an analytics project to the business function and senior executives, helping them to understand the implications for the business in language that is not technical but in the vernacular the executives will understand.
These categorizations and descriptions for various data professionals can be helpful for readers seeking to define and recruit the right people with the right expertise needed not only for current supply chain digitization efforts, but also down the road as your vision starts to become a reality.
This post was written by my colleague Joseph Yacura, who I have been working with over the last three years to better understand the digital procurement space. Thanks for the great article Joe!