The Importance of Data Processing in Machine Learning

Machine Learning and Artificial Intelligence has often been used interchangeably but they are not the same. So, let me begin with what is machine learning and settle that misconception.

Machine Learning vs Artificial Intelligence

Machine Learning takes vast amounts of data (hence Big Data) to learn from the patterns. It creates self-learning algorithms so that machines can learn from themselves.

If you are looking for Machine learning examples, we need go only as far as Amazon. They got to be the world leaders in inventory management and customer satisfaction index because they brought in Machine Learning in all their systems. The company ships about 1.6 million packages a day all because they have optimized inventory by using trained algorithms.

Artificial Intelligence is a step over ML, in that it uses Big Data but it mimics human intelligence to apply knowledge it has gained to solve complex problems. Machine learning on the other hand will work on a specific task for accuracy and does not go beyond the problem it is set to work on.

To put the difference into perspective, let’s look at it vis a vis the application of ML and AI on a video game. Let’s say your avatar in the video game has to find a way out of a maze, littered with traps that pull you in and that’s it, the game is over. Machine learning would collect data of where the traps are located and then allow you to reach the end safely. But what happens if the traps position changes or new elements are introduced? It has to relearn what it must do. That is all well and good but then new patterns require new algorithms to be set for Machine Learning.

AI on the other hand would look at the new problem differently, mimicking human intelligence it would look at signals that indicate a problem and codify its own new rules and find new paths that will deliver success.

Data preparation for Machine Learning

Whether it is AI or ML, the foundation is data and lots of it. Every business has this data in one form or the other but it has to be cleaned and prepared into a state that can be used by Machine Learning to learn from.

These are the various stages in data preparation for Machine Learning

Data Collection: The amount of data that is needed is directly dependent on the complexity of the problem that needs to be solved. Data sources for a business can be both proprietary as well as from open sources such as weather data, traffic data etc. Business owned data can be numeric(loan amount, customer retention rate), categorical (gender, color, property type), time stamped (how many products were purchased in a time range) or even free text (think emails, doctor’s notes).
Data Transformation: Data needs to be cleaned and missing values has to be dropped. When data is collected from different sources the formats differ and have to be standardized to be understood by ML. Features engineering is a big part of this stage. And requires to create relationships between different sets of data. For example, sales performance can be split into day. month and year category values for more relevant patterns.
Data Training: Now the analytics can start and choosing the right data model is important – different algorithms for different tasks. It is important to split the data into training and evaluation sets in the 20:80 percent rule.
Parameter Tuning: The model is tested against the evaluation set and parameters will be fine tuned and will include number of training steps, learning rate etc.

Model evaluation and parameter tuning is of paramount importance and it is these iterative steps that take the machine learning model from “good enough” to “effective”.

Importance of preparing good datasets

There are businesses that have so much of data that sifting through it and deciding what is relevant to the problem to be solved is a humungous task.

Among the many known Machine Learning examples that failed due to bad data is that of a healthcare project that used Machine Learning to predict which patients suffering from pneumonia were at high risk and needed hospital care and those who could continue antibiotic treatment at home.

The project failed because while it used patient records going back years, the data missed records where patients had died because their case was complicated by asthma. Hence when the machine learning algorithm, was set in place it ignored asthma as a high risk factor.

While all ML applications are not a matter of life and death, it highlights the importance of good data sets and experienced data scientists.

What big tech companies like Amazon and Facebook are doing with Machine Learning applications is also within reach of all companies.

Would you like to get started? Contact us or chat with us here.

The Importance of Data Processing in Machine Learning

Machine Learning vs Artificial Intelligence

Data preparation for Machine Learning

Importance of preparing good datasets

share:

Recent Posts

Categories

INDIA OFFICE

US OFFICE

EMAIL