Distributed Systems Covering Edge-to-Cloud (Part 2)

Note: This post is part 2 of 2 on edge-to-cloud. You can read part 1 here

Inside the Gateway: Edge-to-Cloud Stage 1

The edge-to-cloud pipeline begins inside the gateway, where two key tasks have to be performed:

  • Data from sensors and devices must be captured in its raw form
  • Sensor and device data must be aggregated and/or canonicalized to reduce its payload size and to make it conform to standardized, semantically meaningful formats

Inside the Gateway: Data Capture

During raw data capture, the first component of the first phase of an edge-to-cloud architecture, the Hazelcast In-Memory Computing Platform already begins to show its usefulness. First, Hazelcast is small—the entire platform is contained in a single JAR file of about 15MB—making it ideal for deployment on edge gateway devices.

Second, Hazelcast IMDG, the in-memory data grid component of the Hazelcast platform, is ideal for capturing the kinds of heterogeneous data generated by edge computing devices. Unlike a traditional database, Hazelcast IMDG is schema-free and operates as a key-value store with advanced capabilities for indexing and querying. Since Hazelcast is flexible and imposes no schema on its data, device data captured from devices made by different manufacturers over several decades can be mixed into a unified store. Data can be keyed on any unique value. These keys can be intrinsic to the data, such as a sample UUID grep’d through text processing, or keys can be extrinsic to the data, such as a timestamp recording when the data was captured.

What’s more, IMDGs, being pure in-memory solutions, don’t require flash or SSD access as part of their data write operations. This means that even relatively constrained devices like IoT gateways can cope with high data ingest rates with very low latencies. For initial data capture from edge devices, it’s hard to beat an IMDG.

Inside the Gateway: Aggregation and Canonicalization

But data captured in its raw form, as output from sensors and devices, isn’t terribly useful. First, the data is in whatever format the device produced, which is probably not semantically meaningful to you from a business perspective. Second, there tends to be a lot of it—if a sensor produces a sample at 30Hz, you get 30 data samples every second, even if almost all of these samples show very little change over small time windows.

What’s needed are two further processes. First, we need to aggregate the data, such as by transforming a series of fine-grained samples into coarser-grained (and more manageable) averages. Second, we need to convert the data from the format of the raw sensor or device into a format that is meaningful and useful. This second step goes by various names, such as standardization, normalization, or canonicalization, but it’s essentially an ETL process on a continuous stream of data.

When we think of ETL tools, we tend to think of big, sophisticated, expensive systems like Informatica PowerCenter that are designed to run in corporate data centers. But at the edge, you need something API-driven and code-oriented rather than the GUI-focused ETL tools that you’re accustomed to. The Hazelcast Platform offers a lightweight alternative: packaged inside the same 15MB platform JAR file as Hazelcast IMDG is Hazelcast Jet, a third-generation stream processing tool that runs blazingly fast even in a resource-constrained environment like an edge gateway.

By harnessing Hazelcast Jet, data stored in the underlying Hazelcast IMDG storage layer can be continuously transformed from key-value maps of raw device data into key-value maps of meaningful, canonicalized data.

From Edge-to-Cloud: Data Transport

Once device data has been canonicalized into a format that will make sense to business decision makers and data scientists, it’s time to move it from the edge into the cloud or data center. Thankfully, with the Hazelcast Platform, this usually means a simple flick of a switch. Every Hazelcast Platform instance comes with the ability to replicate data in an eventually consistent, asynchronous manner to other, geographically distinct Hazelcast Platform instances, even over slow or unreliable WAN links. This feature, called Hazelcast WAN Replication, comes with myriad configuration options, but in many cases, it’s sufficient simply to turn it on and let the default settings do the rest.

Hazelcast WAN Replication seamlessly moves data updates from the edge back to the cloud. But you might be asking yourself: can I really use the same data storage technology on beefy servers in the cloud that I use out on resource-constrained edge devices? The answer is an emphatic yes.

Just ask large Hazelcast customers like JPMorgan Chase and UBS. These institutions run Hazelcast clusters that house terabytes of data. What’s more, the Hazelcast platform is built for easy deployment on cloud-native PaaS technologies like OpenShift and Kubernetes.

In the Cloud (or Data Center): Making Sense of Information

Once aggregated and canonicalized device data has made its way from the edge back to the cloud, the Hazelcast Platform becomes an incredible tool for unlocking the value of that data. Since the returned edge data will arrive in a Hazelcast IMDG instance, the Hazelcast IMDG Distributed Query API can be used immediately to begin to make the data available in paged result sets to other enterprise systems. It’s easy to create graphically rich dashboards to visualize and display edge data, as seen in the Hazelcast Edge-to-Cloud Connected Vehicles demo.

If you’ve developed ML models to analyze device performance or to predict device failures, manufacturing yield percentages, etc., from your edge data, you can use the Hazelcast Jet Inference Runner to execute those models as part of a Hazelcast Jet pipeline—all within the Hazelcast Platform.

And you needn’t worry about having to do something poorly understood that no else has done before. Using the Hazelcast Platform to unlock device and sensor data that has been transmitted back to the cloud or data center is a well-tread path. Airbus, for example, uses Hazelcast to enable access to space science research data collected from sensors on sounding rockets.

Better Together: Hazelcast Industry Partnerships

Hazelcast’s evolving partnerships with industry leaders like Intel and IBM provide further peace-of-mind that edge-to-cloud transformation projects built atop the Hazelcast Platform will succeed. Hazelcast was a featured partner during the launch of the IBM Edge Application Manager and forms a key part of the IBM Edge Partner Ecosystem. For the cloud and data center components of edge-to-cloud solutions, Hazelcast’s Project Veyron partnership with Intel aims to enable higher in-memory storage densities and greater compute performance—including the execution of AI/ML workloads—on the latest generation of Intel hardware.