By equipping a smart, connected product with more sensors, a company can get a broader view of how a product functions (including real-time insights), open new ways to improve its performance (in real time as well) and more. However, it’s not enough just to add new sensors to the existing ones to improve visibility into the state and performance of these products. As the number of sensors increases, the volume of the data coming from these sensors will grow as well—and such a situation needs a specific approach to establishing all the big data-related processes: data ingesting, storing, processing and analyzing.
Let’s look into the challenges that such high-volume data brings through a product that needs thousands of sensors and frequent readings: a smart gas turbine producing electric power. The sensors collect data that includes pressure, temperature and vibrations inside a turbine and the speed of blades’ rotation.
Ingesting high-volume sensor data
The sensor data journey begins with ingesting—gathering data from sensors and moving it into the cloud. To make this beginning smooth and effective, it’s important to address possible challenges at the early stages of IoT development.
Detecting connectivity problems. Although connectivity and data transmission technologies are gaining momentum, there is always some (although small) probability that connectivity can break for some time. Be it seconds, minutes or even hours—an IoT system gathering huge data volumes from numerous sensors located in the turbine can lose important data or/and stop functioning.
It’s important for an IoT system to “understand” when that connectivity is no longer working and send corresponding alerts. For example, when no data comes from sensors for a certain period of time (for example, 10 seconds), an IoT system should generate an alert.
Plan B if connectivity breaks. If such a problem is detected, it’s vital for the system to take measures as soon as possible to minimize the damage—like, for example, sending big data to buffer storage where it will be collected and transmitted when the connectivity is restored.
Network bandwidth and battling its limitations. Network bandwith should support versatile data at the speed the data is coming. Experts set expectations on the relatively new 5G technology (providing 1 millisecond latency) of transmitting data about 10 times faster than the current standard 4G LTE.
Choosing a suitable data ingesting method. In IoT development, it’s also important to choose the right method of ingesting data—either in real time (immediately when the data comes) or in batches (periodically), to or apply the combination of these approaches. This choice depends on your particular case and business needs. Getting real-time sensor data streams might be useful, for example, in those parts of a turbine that are likely to overheat, to enable control apps to send the commands to actuators and cool the overheated parts. Gathering data in batches (when it’s first collected at the edge and then sent in packs to the cloud) can be applied when immediate reaction from control apps is not needed.
Setting the frequency of data readings. Data should be taken from sensors often enough to detect dangerous situations as soon as possible. With higher frequency, the volume of data coming to the cloud will also increase, and an IoT system should be able to quickly and efficiently process all this data.
Storing high-volume sensor data
Big data storage is now significantly cheaper than years ago. But it’s still challenging to find a storage system that prudently uses its space to categorize and store large volumes of data, so that the data is quickly accessible when it’s needed.
Redundant data should be filtered out in the gateways. A large amount of data may be kept in a “data lake” before it’s sent to a big-data warehouse. The data doesn’t need refining to go in a data lake, and the company may throw out the data when it is no longer needed.
Choosing an optimal data model. A suitable data model is important not only for storing data—it helps an IoT system and data analysts navigate in constantly expanding data volumes and contributes to quick and effective data-driven insights. Databases suited for capturing and storing high-volume sensor data include Cassandra, HBase and similar solutions. Cassandra ensures quick writes (adding new records into the storage), but there might be some issues with strong data consistency. HBase might be slower with data writes, but it does better with consistent reads.
Processing high-volume data with control apps
Ensuring sufficient speed of data processing. Control apps immediately react to the data coming from sensors. However, with the growing number of sensors and the data they generate, computational load on control apps is also increasing. This may slow down the performance of an IoT system, if it’s not updated to cope with the new load. Delays in data processing may seriously hinder effective gas turbine performance and even lead to damages (like the turbine not cooling in time).
To address the issue, it’s possible to resort to high-performance stream processing solutions, like Kafka, and reactive architecture. Reactivity opens the opportunity to distribute data loads among several servers and provides easy scaling (involving additional servers when needed).
Analyzing high-volume sensor data
The more data you gather from various endpoints, the more complete your picture is of what’s going on. However, complex sensor data loads can also hinder quick and effective data analytics.
Synchronizing and integrating readings from numerous sensors. In the gas turbine with densely located sensors, it’s possible to spot the areas of low/high pressure and temperature, and the areas where vibrations are too high. In a user app, these parameters can be shown on special “maps” showing in color how a turbine “behaves.” However, for these maps to be precise and up-to-date, effective image analysis is needed—for example, to recognizing the patterns of temperature, pressure and other parameters.
Distinguishing between random and meaningful correlations. Also, with immense big data loads, the task is not only to analyze this data, but do it right and find meaningful correlations. In this case, data analytics tools may be not enough, and data scientists are needed to verify the correlations identified automatically.
Alex Grizhnevich is a process automation and IoT consultant at ScienceSoft, an IT consulting and software development company headquartered in McKinney, Texas. His 17+ years’ experience in IT and OT includes programming industrial microcontrollers, developing web and desktop applications, databases and document management solutions for oil & gas and logistics. With degrees in automation and management of industrial processes, Alex is now focusing on IoT and machine learning on sensor data.