Data science, machine learning and artificial intelligence are terms that can be used in imprecise ways and have overlapping meanings. Many will be familiar with the phenomenon of a job advertisement billed as a data scientist role. But, in reality, more of a data analyst or an IT specialist. It seems appropriate to distinguish between the concepts, at least to some extent. This can help clarify what kind of career you might want to pursue or what job you might want to look for.
- Data science produces insights
- Machine learning produces predictions
- Artificial intelligence produces actions
- How do we use the three together?
- Key takeaways
AI is perhaps the most liberally used of the terms. Sure to increase one’s chances of writing a successful grant application or tender for a piece of work. It is safe to say that all of these terms can be susceptible to overhype. The choice of which to use often appears to be a matter of marketing.
The term AI is often a matter of marketing
On the other hand, there are distinctions between the three, and it is essential to have a good understanding of those distinctions. One of our favourite discussions about how these terms relate to one another comes from David Robinson:
- Data science produces insights.
- Machine learning produces predictions.
- Artificial intelligence produces actions.
Data science produces insights
To elaborate, the objective of data science work is to discover patterns and understand data. Even though the revealed numerical patterns may be obvious and objective, a human must be involved in the interpretation process. Data science differs from machine learning in that modelling isn’t always a part of it. We would also contend that it need not necessarily involve coding or programming. Many data scientists use spreadsheets to combine domain knowledge with statistical inference to gain insightful information.
Common data science terminology
Not all insights generated from data fall under the category of data science. Data science is traditionally defined as combining statistics, software engineering, and domain knowledge. However, by using this definition, we can set it apart from ML and AI. The main difference is that a human is always involved in data science. Someone is understanding the insight, looking at figures, or taking advantage of the conclusion. It would be absurd to claim that Google Maps uses data science to suggest driving directions or that a chess-playing algorithm uses data science to determine its next move. So, the emphasis in this definition of data science is on the following:
- Domain knowledge
- Data visualization
- Statistical analysis
- Experiment design
Data scientists often use simple tools to create line graphs and report percentages based on SQL queries. They could also employ highly sophisticated techniques, such as working with distributed data stores to examine trillions of records. Creating interactive visualizations and developing cutting-edge statistical methods.
The objective is to understand their data better using whatever method they choose.
Machine learning produces predictions
The scope of data analytics can now extend into the realm of predictive modelling thanks to machine learning. This can be carried out in a highly automated manner. However, no machine learning system should be permitted to have an impact on decision-making unless a human is included in the loop. There’s a wide variety of complexity and interpretability levels for machine learning models.
For instance, linear regression is at the simpler end of the spectrum; in fact, it is so straightforward that many people do not even consider it to be a form of machine learning. Deep learning models, on the other hand, are at the opposite end of the spectrum. These models have inner workings that are so difficult to understand that it is practically impossible to comprehend how the model generates its predictions. There is no clear demarcation between data science and machine learning; rather, there is a spectrum of differences between these two fields, which often overlap.
xkcd.com Machine Learning
The difference between machine learning and data science
Machine learning and data science have a lot in common. Using logistic regression, for instance, one can infer relationships; the further down a document’s information is found, the less likely it is to be important; and make predictions; this information is 72% relevant to the requested topic, so we should extract it.
Deep learning techniques are notoriously difficult to explain, while models like random forests are slightly less interpretable and more likely to fall under the category of “machine learning.” If your objective is to gather insights rather than make predictions, this might get in the way. So, we could think of data science and machine learning as two ends of a “spectrum,” with data science having more interpretable models and machine learning having more “black box” models.
Machine learning has more black-box models.
The majority of practitioners can switch between the two tasks with ease. We often combine machine learning and data science. For example, we might create a model using machine learning to match job specs and CV data to identify candidates who are most likely to be a good fit for a given role. Analysing the results is a crucial step in finding model flaws and overcoming algorithmic bias.
These crucial skills are one of the reasons data scientists are frequently in charge of creating a product’s machine learning components.
Artificial intelligence produces actions
The term “artificial intelligence” (AI) may be the one that is used the most frequently, but paradoxically, it may also be the one that is the least understood. Many professionals have staunch opinions regarding its significance and the factors that set it apart from the other two terms. When we think of artificial intelligence, we think of any application that produces a result that is an action. This can include anything from robotics that are utilized in industries to chatbots that provide customer service to game-playing algorithms that make use of reinforcement learning.
Due to researchers, journalists, and startups seeking funding or attention, there is a lot of hype surrounding the term. This has also caused a backlash, which is unfortunate because it prevents some work from being referred to as AI, even though it probably should be. Researchers have even expressed dissatisfaction with the AI effect, saying that “AI is whatever we can’t do yet.”
It doesn’t help that AI is frequently confused with superintelligent AI, a form of AI which outperforms human intelligence, or even general AI, which is capable of performing tasks across many different domains. This creates unjustified expectations for any system dubbed “AI.”
Super intelligent AI is a form of AI which outperforms human intelligence.
What is Artificial Intelligence
So what activities fall under the umbrella of AI? The fact that an autonomous agent performs or suggests actions is a recurring theme in definitions of “artificial intelligence”. Among the systems that therefore qualify as AI are:
- Automation (choosing an optimised driving route)
- Natural language processing (processing documents or chatbots)
- Game-playing algorithms (chess computer)
- Robotics and control theory (self-driving cars)
- Reinforcement learning (learning from trial and error)
Once more, there is considerable overlap with the other fields. Deep learning is particularly intriguing because it bridges the ML and AI fields. Although it has demonstrated outstanding success in game-playing algorithms like AlphaGo, the typical use case is training on data and producing predictions. This largely differs from earlier gaming systems, such as Deep Blue. These algorithms emphasise exploring and optimizing the potential solution space using game theory.
For further reading, see our article on self-learning AI.
The difference between machine learning and data science
There are distinctions, though. If we examine some sales data and find that customers from specific industries renew their contracts more frequently than others, the result is some numbers and graphs, not a specific action. Executives may then decide to alter their sales strategy based on those findings, but that decision is not autonomous. It would therefore be embarrassing to say that we are “using AI to improve our sales,” so the work is described as data science instead.
The distinction between machine learning and artificial intelligence is somewhat more subtle, and historically, ML has frequently been viewed as a subfield of AI. Computer vision, in particular, was a classic AI problem. However, more and more people believe that the ML field has largely “broken off” from AI. This is in part due to the backlash mentioned above. Most people who work on prediction problems dislike identifying themselves as AI researchers. It also helped that many significant ML advances originated in statistics, which is less prevalent in the rest of the AI field. This implies that we would advise against using the term “AI” if you can define the issue as “predicting Y from X.”
How do we use the three together?
At Spot Intelligence, we build natural language processing systems. We ingest documents and different forms of text and extract useful information. What is useful is very dependent on the client’s needs and the data provided. However, to complete these tasks, we would need skills drawn from all three of these fields.
We ingest data from a large variety of unstructured data and construct a dataset by running it through our NLP pipeline. We extract the text, images, locations of the text, headings, font sizes, and font styles to create a rich dataset ready to train an algorithm to predict what information is relevant for the task at hand.
Once our algorithm has extracted useful information from a new document, it needs to decide how to take action. Is the confidence interval of the data high enough for the use case? For example, for financial documents, there needs to be more certainty than when dealing with social data like Facebook or Twitter messages. A misplaced comma can be detrimental in financial statement analysis, while this is negligible in a tweet. Taking the appropriate action with the given information within the system is a use case of AI.
We often find that when new types of documents appear, the system’s performance goes down, with some false negatives where data that should be picked up isn’t being found. After analyzing these new documents, we gained the insight that the rate of false negatives depends on the quality of the scanned document.
We realize that most of our training data were limited to documents that were computer generated and so much easier to parse. We would then go back to the preprocessing steps in our pipeline and add more noise to our training dataset that then feeds into the machine learning algorithm to improve the overall accuracy of the model.
- The differences between artificial intelligence, machine learning, and data science lead to an interesting discussion. The opinions in this article are our own, and we understand others might use other definitions and have other opinions. Let us know in the comments what your view on the matter is and whether you view yourself as an AI, ML, or data science practitioner.
- Having common definitions in the industry is useful as it allows us to communicate more effectively and write job ads that better represent the work candidates will be doing when applying for roles.
- When seeking companies to collaborate with, it becomes essential to understand how they work in these three distinct areas. Always ask these questions. What insight will you get? What will be predicted? And what actions will or can be taken?