When manufacturers strategically deploy artificial intelligence (AI) into their operating environments, the results can be game changing. Setting the stage for constant innovation. Avoiding down production lines. Anticipating and taking action on dramatic market changes before the competition. The reality for most manufacturers? AI deployments can be challenging. Getting it right takes determining and dedication to optimizing the process.
I recently connected with Landing AI's Quinn Killough to discuss what happens, and how manufacturers can move past the obstacles.
IW: When it comes to adopting AI, where do most manufacturers run into problems?
Killough: One of the major points of failure in AI projects for manufacturers is the jump from a working model in the lab to a working model in production. Many manufacturers have high hopes for AI improving automation on their plant floors, and once they’ve honed-in on a valuable problem/project, they go to work in the lab and start training AI models. After just a few iterations (depending on the complexity of the problem) they may find their model is performing at the accuracy level desired.
They think, “great! AI is magic and it’s going to solve all of our problems,” and decide to take the solution to production so they can start reaping the benefits. The issue that we have seen with countless manufacturers across a number of industries is that the model’s performance in production is worse than it was in the lab and is not yet suitable for true deployment.
They end up needing to go back to the lab and the drawing board to iterate on the model further and in the meantime revert back to their old way of doing things on the floor. As we see it, this jump from a PoC in the lab to a fully functional, value adding, deployed model is much harder than expected. Not only is this an issue for the project, i.e. longer development times, more money spent, opportunity cost of the old system running longer, but it is also an issue of perception for AI - people can lose faith after seeing this process unfold.IW: Why do manufacturers struggle to get past that point?
Killough: Frequently, the number one contributor to this struggle is data quality. A machine learning model is only as good as the data you are putting into it. Machine learning engineers spend up to 80% of their time prepping their data, and there is a lot that can go wrong in this phase. Issues with data quality can come in many flavors, but the two we see the most often are poorly labeled data, and bad dataset distribution.
A model will struggle in production if the manufacturer fails to accurately label the data. When you don’t have millions of datasets to wash out the impact of labeling mistakes, just one or two improperly labeled items can lead to a model that does not perform well. In the case of visual defect detection in manufacturing, a common contributor to bad labels is inadequate defect definitions. It is not uncommon for two subject matter experts on a plant floor to disagree on whether a defect is a scratch or a crack, which in turn makes it difficult for the people doing labeling to label images correctly.
When machine learning engineers are collecting data to train their model, the data collected often is not diverse enough to represent the variety of edge cases that would actually be seen in the production environment. Because of this, edge cases that are not seen frequently may be underrepresented when training and testing so the model ends up not being great at finding these. Let’s consider a visual inspection project where a manufacturer is looking for several categories of defects on a phone screen - dust, dead pixels and cracks. Dust and dead pixels are super common, so they have a ton of data on these categories. Cracks, on the other hand, are a very infrequent defect and therefore there is less data and the model is not great at recognizing them. A big issue here is that oftentimes the infrequent defects are some of the most critical ones to catch. If this manufacturer were to test on the same distribution as they trained they would get good results. But when going to production they would quickly find out they can’t perform on their most critical defects and be forced to pull the system.
IW: What are the keys to getting past the barriers?
Killough: There are a couple of things that can be done to help get past these barriers and achieve a successful deployed model.
First, create an airtight labeling process, and second, move the project out of the lab sooner. This starts with data collection and making sure your dataset is as representative of the production process as possible. After a well-balanced dataset is established, a systematic method for driving agreement on what the definitions of the label categories are needs to be incorporated. Consensus tasks have proven to be a great way to achieve this - i.e. create a system where multiple experts review that same data and compare labels, and after this, discrepancies can be resolved in order to make label definitions less ambiguous for labelers. Now that labelers have the clearest instructions possible, they have a good chance at labeling the data accurately, but they are still human and will likely make mistakes - this is why we would recommend having a tight review process. This means that every single piece of labeled data is reviewed. It may seem like overkill, but a single mistake can be costly on model performance. Establishing this airtight data preparation process will inherently improve model performance and help project leaders avoid costly pitfalls and mistakes.
Outside of having an airtight data preparation process, we also recommend utilizing approaches that allow you to get from the lab bench to the production floor faster than most would think is intuitive. Often teams want to achieve their ultimate goal for model performance - let’s say 95% accuracy is the goal - in the lab before moving out to production, but this can be flawed. There will inevitably be unforeseen changes as well as the edge cases mentioned earlier that pop up when deployed to production that hurts performance. Instead, if we spend less time in the lab to get to a lower bar of 80%, we can then deploy that model in production and start uncovering those production edge cases and issues much sooner. To do this, AI teams can use two approaches - shadow mode or putting a human in the loop. Shadow mode means running a model on actual production data without the predictions of the models impacting production - meaning your existing process is not impacted while at the same time you can iterate and improve your model.
Putting a human in the loop is another option if you’d still like to benefit from an underperforming model while it is being improved. In this situation the model would run as though it was actually deployed, and a human would review each low confidence prediction. Now a low performing model can be deployed earlier, 100% of parts can still be inspected, and the model can be iterated upon to continuously do more and more of the work until fully autonomous.
IW: Once manufacturers are able to adopt AI, what steps can they take to optimize its use?
Killough: Continuous learning is the most important aspect to optimizing a deployed solution. This entails a few different aspects including data collection, inference monitoring, and a method for retraining.
Model deployment is just the beginning. It takes a solid system just to maintain current performance of a model, let alone further performance optimization. The first step here is having a method for continuous data collection. As the model is making predictions it’s important to be able to collect data where it may have struggled. With an inference monitoring system where operators can view real time and past predictions this process can be fairly simple.
While collecting and labeling data, a periodic retraining of the model is also very important. This periodic retraining on new data helps to optimize performance by catching and training on those infrequent edge cases. Also, by continuously retraining on the latest state of the production system you can account for naturally occurring changes in production. For example, in optical inspections, if a material changes in your production it could impact model performance due to a new appearance of the product, but not enough to set off any alarms or flag anyone’s attention. Having a quick and easy way to retrain and roll out a new model is important when trying to optimize or maintain an AI system - without the help of automation this task can be rather time consuming.