Skip navigation
Is Messy Data An Inevitability?

Is Messy Data An Inevitability?

Let’s push back a little.

It’s no secret manufacturing data is “messy”.  It’s almost never clean enough analyze immediately. Jim Miller of Rockwell Automation, a leader in manufacturing analytics, estimates that data scientists spend 60% of their time cleaning data. And I bet you’d agree. You probably see something similar.

What a shame. We’ve got smart people, ready to code up sophisticated analyses, ready to pull insights out of data, and they spend more than half of their time editing, reformatting, fixing typos. Bummer.

Is this a fact of life? Do we just accept it? Maybe. But before conceding, let’s push back a little.

First, let’s define the problem. We can call data messy any time it’s not immediately usable for analysis, any time editing is required. For starters, one big culprit is the same information entered different ways; one technician enters “maintenance”, a second “maintainance”, a third “maintnance". Another culprit is when machines and sensors generate junk data, meaningless numbers. At other times units are inconsistent; one system records milliamps, another microamps. Then there’s the overly complicated world of time recordings. They accompany nearly every piece of data and they can be terribly frustrating. With the myriad of different formats, the handling of time zones, and unsynchronized clocks, time data is particularly messy.

To put a finer point on it, let’s take a page out of Jeff Foxworthy’s book.

… if your top 3 failures are “insufficient epoxy”, “insufficient epoxie”, and “not enough epoxy” then you might have messy data

… if you have a temperature reading of 3.2x1018C then you might have messy data

… if your machine stopped running at 3:00PM and but your results for 7:00PM still look fine then you might have messy data

You get the idea.

So what can we do about it? Here’s a thought: start with your manufacturing software. Why not push your MES and analytics providers to help with this? Here are three things they can do.

  1. Create systems that disallow bad input

Whenever possible give users pulldowns rather than asking them to type. Your manufacturing software should be able to set this up. It’s worth a half day of your process engineer’s time to configure this. Your data analysists will thank you. So will your operators.

  1. Put a little thought into data feeds

Manufacturing software providers shouldn’t view their job as complete once the data feed is set up. For each process step, each machine, each sensor, they should take a few minutes to understand the data. They should look at the data together with the process owner to make sure it’s being written wisely. If there is more than one machine or station, make sure they’re entering data the same way. If there are conditions or annotations to be written with the data, think through the use case to write those things smartly.

Similarly, someone should be assigned to review all the data feeds for the entire factory. Are parameters named consistently across processes? Is time written the same way across the factory? Is any of the data obvious nonsense? Is the data set up in a way that’s truly useful?

  1. Automate the cleaning

Of course, even after implementing the above, some messiness will still exist. But rather than passing it on, why not do some of the cleaning right on the systems where it’s input? Simple things, such as filtering out impossible values, should be easily done by software. When it comes to textual data, that’s trickier, but still not unreasonable. If my email automation site can figure out “that’s not my job” is the same as “please stop sending me emails,” then certainly our manufacturing software solutions can apply similar intelligence to inputs. If you ask your data scientists to summarize their top five cleaning techniques and give those to your manufacturing software provider they should be able to automate the bulk of the cleaning.

Will we ever get to the point where data scientists spend 0% of their time cleaning data? Probably not. But 60%? That’s a waste. We can do better than that. We’re not that messy.

Carl Ogden is VP of Operations at Intraratio Corporation, heading up business development for that manufacturing software company. He worked in product and test engineering management in the semiconductor industry for three decades with five companies that transitioned from startup to acquisition. He specialized in leading the development of test solutions for both characterization and production.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish