Industrial big data


Industrial big data refers to a large amount of diversified time series generated at a high speed by industrial equipment, known as the Internet of thingsThe term emerged in 2012 along with the concept of "Industry 4.0”, and refers to big data”, popular in information technology marketing, in that data created by industrial equipment might hold more potential business value. Industrial big data takes advantage of industrial Internet technology. It uses raw data to support management decision making, so to reduce costs in maintenance and improve customer service. Please see intelligent maintenance system for more reference.

Definition

refers to data generated in high volume, high variety, and high velocity that require new technologies of processing to enable better decision making, knowledge discovery and process optimization. Sometimes, the feature of veracity is also added to emphasize the quality and integrity of the data. However, for industrial big data, there should be two more "V’s". One is visibility, which refers to the discovery of unexpected insights of the existing assets and/or processes and in this way transferring invisible knowledge to visible value. The other "V" is value.
;Background: General "Big Data" analytics often focuses on the mining of relationships and capturing the phenomena. Yet "Industrial Big Data" analytics is more interested in finding the physical root cause behind features extracted from the phenomena. This means effective "Industrial Big Data" analytics will require more domain know-how than general "Big Data" analytics.
;Broken: Compared to "Big Data" analytics, "Industrial Big Data" analytics favors the "completeness" of data over the "volume" of the data, which means that in order to construct an accurate data-driven analytical system, it is necessary to prepare data from different working conditions. Due to communication issues and multiple sources, data from the system might be discrete and un-synchronized. That is why pre-processing is an important procedure before actually analyzing the data to make sure that the data are complete, continuous and synchronized.
;Bad-Quality: The focus of "Big Data" analytics is mining and discovering, which means that the volume of the data might compensate the low-quality of the data. However, for "Industrial Big Data", since variables usually possess clear physical meanings, data integrity is of vital importance to the development of the analytical system. Low-quality data or incorrect recordings will alter the relationship between different variables and will have a catastrophic impact on the estimation accuracy.

Technologies

Data acquisition, storage and management

As data from automated industrial equipment are being generated at an extraordinary speed and volume, the infrastructure of storing and managing these data becomes the first challenge any industry will face. Different from the tradition business intelligence which mostly focuses on internal structured data and processes that information in regularly occurring cycles, "Industrial Big Data” analytical system requires near real-time analytics and visualization of the results.
The first step is to collect the right data. Since the automation level of modern equipment is getting higher, data are being generated from an increasing number of sensors. Recognizing the parameters are related to equipment status is important to reducing the amount of data necessary to be collected and increase the efficiency and effectiveness of data analytics.
The next step is to build a data management system that will be able to handle large amounts of data and perform analytics in near real-time. In order to enable rapid decision making, data storage, management and processing need to be more integrated. General Electric has built a prototype data storage infrastructure for fleet of gas turbines. The developed in-memory data grids -based system was proved to be able to handle challenging high velocity and high volume data flow while performing near real-time analytics on the data. They believe that the developed technology has demonstrated a viable path to realize batch "Industrial Big Data” management infrastructure. As prices of memory becomes cheaper, such systems will become central and fundamental to future industry.

Cyber-physical systems

s is the core technology of industrial big data. Cyber-physical systems are systems that require seamless integration between computational models and physical components. Differing from the traditional operation technology, "Industrial Big Data” requires that the decision to be informed from a way wider scope, a central part of which is equipment status. T Improved processes will further increase productivity and reduce costs. This aligns with the mission of "Industrial Big Data”, which is to reveal insights from the large amount of raw data and turn that information into value. This combines the power of information technology and operation technology to create an information-transparent environment to support decisions for users of different levels.

Sample repositories

Every unit in an industrial system generates vast amount of data every moment. Billions of data samples are being generated by every single machine per day in a manufacturing line. As an example, a Boeing 787 generates over half a terabyte of data per flight. Clearly the volume of data generated by group of units in an industrial system is far beyond the capability of traditional methods therefore handling, managing and processing it would be a challenge.
In the course of last several years, researchers and companies have actively participated in collecting, organizing and analyzing huge industrial data sets. Some of these data sets are currently available for public usage for research purposes.
NASA data repository is one of the most famous data repositories for Industrial Big Data. Various data sets provided by this repository may be used for predictive analysis, fault detection, prognostics and etc.
IDRepository NameDescription of the Data
1Algae Raceway Data Set3 small raceways experiment for algae biomass
2CFRP Composites Data SetRun-to-failure experiment on CFRP panels
3Milling Data SetExperiments on a milling machine for different speeds, feeds, and depth of cut. Records the wear of the milling insert, VB. The data set was provided by the BEST lab at UC Berkeley.
4Bearing Data SetExperiments on bearings. The data set was provided by the Center for Intelligent Maintenance Systems, University of Cincinnati.
5Battery Data SetExperiments on Li-Ion batteries. Charging and discharging at different temperatures. Records the impedance as the damage criterion. The data set was provided by the Prognostics CoE at NASA Ames.
6Turbofan Engine Degradation Simulation Data SetEngine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.
7IGBT Accelerated Aging Sata SetPreliminary data from thermal overstress accelerated aging using the aging and characterization system. The data set contains aging data from 6 devices, one device aged with DC gate bias and the rest aged with a squared signal gate bias. Several variables are recorded and in some cases, high-speed measurements of gate voltage, collector-emitter voltage and collector current are available. The data set is provided by the Prognostics CoE at NASA Ames.
8Trebuchet Data SetTrajectories of different types of balls launched from a trebuchet with varying counter weights. Flights were filmed and extraction routines calculated position of data. Both raw video data and extracted trajectories are provided. Geometry and physical properties of the trebuchet are available.
9FEMTO Bearing Data SetExperiments on bearings' accelerated life tests provided by FEMTO-ST Institute, Besançon, France.
10Randomized Battery Usage Data SetBatteries are continuously cycled with randomly generated current profiles. Reference charging and discharging cycles are also performed after a fixed interval of randomized usage in order to provide reference benchmarks for battery state of health.
11Capacitor Electrical Stress Data SetCapacitors were subjected to electrical stress under three voltage levels i.e. 10V, 12V and 14V. Data Set contains EIS data as well as Charge/Discharge Signal data.

Sample industrial big data analytics use cases

Leveraging machine learning and predictive analytics algorithms, industrial big data can help to create value in various use case scenarios like predictive maintenance, product quality prediction in early stages of the production process and product quality optimization, prediction and prevention of criticial situation in continuous production processes, prediction of product lifetime, assembly plan prediction for new 3D product designs, energy demand prediction, demand forecasting, price forecasting, and many other use cases.