Why do I need a PI System if I have a data lake?

Imagine for a moment that you are going to the hospital for surgery. You check in and as they prepare to administer general anesthesia, they tell you that they no longer monitor vital signs during surgery.

The surgeon calmly explains that they received your most recent medical data from your general practitioner, and they have that on file and can pull it and look at it if they need to. No need to worry - they have all your data on file and will pull it if anything goes wrong. 

“How long will that take?” you ask. 

“Oh, just a few seconds, a couple minutes tops,” the surgeon says. 

Now, what does this scenario have to do with the PI System and data lakes? A lot. Both the PI System and data lakes are built for data storage, but they are optimized for different tasks. The PI System structures data for rapid insight and use. It's the EKG machine for operations, while a data lake is more like a set of patient charts, or a medical library. 

Both the PI System and data lakes have their place within the modern industrial enterprise. However, to optimize your operations and improve business performance requires understanding how these different systems evolved and the way they each store data. 

data lake

Built for Operations v. Built for Business  

OT and IT may be converging, but they still have different needs. IT serves the business, which often undertakes analysis to make strategic decisions. Operations requires fast and actionable information for real-time decisions. The difference of a few seconds when a multi-million dollar asset is about to act up is the difference between a preventative fix and a catastrophic failure. 

The PI System is a platform built and developed for operations (OT) that has since added integrations to enterprise systems. It can handle high and varying speeds and formats of operations (time series) data. Under the hood, the PI System can visualize up to 1 million events per second with installations handling up to 25 million data streams. As a result, the PI System can deliver real-time critical operations data to minimize downtime, optimize asset performance, and track batch quality in real-time.

Data lakes, on the other hand, were developed as enterprise (IT) platforms and have since added capabilities to handle operational data. Data lakes are excellent for analysis across aggregate data sets, but ask them to deliver real-time information about a boiler's performance and their limitations quickly become apparent. They are simply not optimized to give the real-time insights necessary for critical operations.

The Speed of Decisions - Reading v. Writing

The cost of cleaning and preparing operational data can be critical to optimizing investments. While data lakes structure data when it is read, or in other words, when it is pulled out of the data lake to be used. The PI System structures data as it is written to the system, so that data is clean and contextualized from the start.

For sensor data, structure and context are particularly important. As Stewart Bond, Director of Data Integration and Integrity Software Research at IDC says, "IoT sensor data on its own doesn't have a lot of value; it is dirty, unstructured, and disorganized, and unless the context in which it was captured is known, deriving insights from it is difficult without significant downstream data preparation efforts."

The PI System also adds context to structured data as it is stored through its Asset Framework (AF). With AF, companies can create digital models of asset, plant, and enterprise operations that update in real time as well as highlight events in the data streams and receive automated notifications of issues. The result is better support for faster operational decisions that leverage data, what a recent IDC profile on OSIsoft terms, “decision-centric computing… an event-driven style of building solutions that puts decision automation at the heart of the solution.”

So, what is the cost of taking IIoT data directly to a data lake in terms of cleaning, structuring, and preparing that data for analytics? Can a data lake supply data at the right time and with the right context to effectively answer the most mission critical questions? Chances are the PI System, not a data lake, is still the optimal solution for critical operations.