Machine Learning 101 in Predictive Maintenance

ML lets the computer deal with the healthy machines, so the analyst can focus on the anomalies.

June 17, 2020

7 min read

Machine learning is everywhere. In most cases, we don’t even think about how we are interacting with it. When you talk to Siri or browse recommended items on Amazon, you are using a machine-learning-driven product. We rely on it for everyday tasks.

ML is different from traditional programming because it doesn’t have a person writing the rules for it, but it still needs people to determine what questions to ask. The main benefit is scalability. Imagine if every time you tried to use Siri, you needed to wait for a person to respond to you and to the other 41.4 million active Siri users! ML can respond to millions of users and interactions immediately by making useful insights from data that is collected.

Just as the consumer world needs ML, the industrial world does too, and perhaps even more so. The cost of downtime in an oil refinery is millions of dollars per day and the ability to predict and prevent events has tremendous value. Many companies are talking about the Industrial Internet of Things (IIoT) and the potential of connecting physical assets to the internet. However, more sensors means more data, and data will be useless unless we have someone or something to interpret it. With real-time or near real-time data streaming in constantly, it is not practical or even possible for a person to make the necessary connections, especially the relationship between multiple sensors. ML helps to pinpoint areas for someone to focus their time on. It lets the computer deal with the “healthy machines,” so the analyst can focus on the anomalies.

Benefits of ML

ML can immediately show benefits, whether with existing sensorized assets or new wireless sensors without any historical data. The system can trigger insights based on anomaly detection and it can classify different types of faults. Failures exhibit different patterns based on what the underlying cause is, and ML can connect the fault with the cause in order to eliminate the root cause. It can also integrate with CMMS systems to automatically schedule work orders and check for necessary spares.

In the long term, the ideal system will tell you remaining useful life (RUL), which can be used for optimizing planning and scheduling, but there needs to be enough underlying data in order to build models that can be relied upon. Another key to a successful ML program is to put in place a feedback system to let the algorithm know if the recommended action was taken. This allows the system to learn and constantly improve.

While most companies see immediate value in applying RUL, this application of ML will take some time for the systems to be accurate. Most assets have mean time between failure (MTBF) spanning multiple years, and the algorithms need a lot of data in order to make accurate predictions in these areas. By contrast, in predictive maintenance, the application of ML is already proven and used by many companies across industries.

Where to Start

There are four fundamentals to consider when implementing ML within industrial applications:

Good, clean raw data
Feature engineering (choose most useful parts of data)
Supervised ML (eg. classification)
Unsupervised (eg. anomaly detection)

Good Data

The first step is to decide what kind of data to collect. Think about what an analyst would collect for the asset. For example, with a motor-pump drive train, you may want to look at current (electrical faults, speed), vibration, differential pressure (evaluate operations). Even more important than the number of inputs is the quality of sensors and cables. Noise and bad data will give bad output. On a vibration sensor, for example, all the decisions about how it’s mounted (type of adhesive, magnetic mount) will impact the quality of the readings, and ultimately the effectiveness of your recommendations. All data is not created equally.

Feature Engineering

Once you have good, clean data coming in, you need to determine which features to include in the ML algorithms. Feature engineering is the process of selecting the features or predictor variables, out of a dataset. This is the most important step because a machine0learning model can only learn from the features it is given. Going back to the pump drive train example, you may want to include overall vibration (RMS), Crest factor (how peaky the vibration levels are), or frequency data that shows vibration at each speed (helps to determine cause of vibration). ML identifies the relationship between these features in a way that a human couldn’t compute—we could see the relationship between one or two pairs of features, but once you start adding multiple features simultaneously, it becomes unmanageable.

When companies are just getting started with data science they often take the approach of throwing all available data into the ML algorithm and see what comes out, but this is not the best approach. Instead, it is important to use domain expertise to pick the right features that will generate the best model. This is why a general purpose ML algorithm will have a longer payback period than one that is built to solve a specific problem.

ML is different than basic alarming, which is common in traditional vibration analysis. Alarms typically only look at a single parameter in isolation. If the application has variable speeds, or taking into consideration multiple parameters, ML can draw a more complex boundary around “normal” behavior.

Within predictive maintenance, there are two basic applications of ML—anomaly detection and classification. Anomaly detection is based on unsupervised machine learning (doesn’t rely on humans to interpret the data), while classification uses supervised models, which require some form of human feedback to train the algorithm, for example, connecting the failure pattern with the underlying failure mode. In other words, the algorithm can tell there is a problem, but it doesn’t know that it is caused by misalignment unless it is trained by someone who can recognize the pattern.

Anomaly Detection (unsupervised)

Anomaly detection compares healthy points with new data. It draws a boundary around “good” data in its multidimensional plane. The algorithm evaluates the distance from normal operation, and bases the machine health score on distance. It takes into account multiple parameters and pinpoints the features that contribute the most to the change in asset health.

Classification (supervised)

Classification is useful in identifying and comparing asset health under similar operating conditions, especially when you are looking at VFDs and equipment with varying speeds and loads. As mentioned above, classification needs domain expertise to connect the alert with the underlying cause.

ML has tremendous potential in industrial applications, especially in asset reliability and optimization. It makes reliability scalable and brings a richer perspective than a human alone can do, especially when considering multiple parameters. The best systems will include advanced ML algorithms combined with asset knowledge. If you are setting up a project that includes ML, think about the application and what data sets would be useful to include; use the knowledge within your organization to make the algorithms the best possible.

Ramakrishna Reddy is a software architect at Petasense.