Distributed Systems Covering Edge-to-Cloud (Part 1)

Introduction: A Personal Anecdote

When I was in college, I took an upper-level course in analytical chemistry. Several of the lab modules required my lab partner and me to make a pilgrimage deep into the bowels of the building to use the department’s gas chromatograph. Now, gas chromatography (GC) machines are complex and expensive pieces of equipment that university departments, always short on funds, are loath to replace. This particular example had been acquired in the early 1990s and extensively upgraded about a decade later. Like most sophisticated scientific and industrial equipment, our GC machine was attached to a computer for process control and the display of analyses. What shocked my lab partner and me—we were both computer science majors who assumed that everyone always wanted the latest-and-greatest of everything—was the kind of computer to which the GC machine was connected: an Apple IIe. This was 2006. The last time I’d seen an Apple II in the wild was fifteen years before when I was in primary school.

As strange as the Apple II might have seemed to my lab partner and me, it worked. And that’s the thing about the scientific and industrial equipment in the internet-of-things (IoT): you can’t always count on the latest-and-greatest. Gas chromatographs, aircraft engines, fleet vehicles, and line assembly robots are purchased when plants are built and budgets are available. They’re used until there is some compelling reason—and an adequate budget—to replace them. They are acquired at different times and made by different manufacturers.

Unlike the IT world, where servers in data centers are refreshed every three years and hence you can be pretty sure that whatever you’re building, it will run on a recent version of Linux installed on an x86 server, there is no such comfort in the world of operational technology (OT), the computing endeavor concerned with monitoring and controlling physical processes like cars and robots and gas chromatographs. In the OT world, you can’t count on standard processors or operating systems or instruction sets or data formats.

The reality is, when you begin to implement edge-to-cloud use cases, you will run into legacy technologies from decades past. As a result, the data canonicalization will present challenges when you try to translate the heterogeneous data into useful information. And this is where in-memory streaming solutions such as Hazelcast Jet can dramatically simplify the data transformation process.  

The Business Value of IoT Data

In spite of its heterogeneity, fine granularity, and relative lack of standard formats and schemas, the business value of data produced by physical devices and sensors is immense. Especially when coupled with the widespread application of machine learning (ML) and artificial intelligence (AI), the information produced by IoT devices can become a data feedstock for predicting failures, optimizing business processes, sensing and remediating product defects on assembly lines, and detecting violations of business rules by people or machines.

In order to unlock the value of data produced by IoT sensors and devices, three key objectives that must be achieved:

  1. Data must be aggregated and canonicalized as soon as possible
  2. Data must be moved from the edge of the business back to a central data center, either a physical, on-premises data center or a virtual data center in a public cloud
  3. Once in the data center, it must be possible to query, enrich, visualize, transform, and otherwise make the data useful to business decision makers and data science and data engineering teams

The Edge-to-Cloud Pipeline

The three steps above correspond roughly to three stages of an architecture called the edge-to-cloud pipeline, where data is acquired from IoT sensors and devices, aggregated and transformed, then moved into the cloud (or to an on-prem data center) to unlock its business value.

We’ll discuss each of these three stages in turn, but first let’s talk about the where of the edge-to-cloud architecture: the physical locations and devices in which each of the stages run. Stages 1 and 2 generally run on an IoT gateway device (or, better yet, for redundancy, a cluster of multiple IoT gateway devices). These IoT gateway devices are miniaturized industry-standard x86- or ARM-based computers mounted in a ruggedized enclosure with a wide variety of connectors. Most major enterprise hardware vendors sell such devices. Two well-known examples are the GL20 IoT Gateway from Hewlett Packard Enterprise and the Edge Gateway 3001 for IoT Applications from Dell, Inc.

True to their product names, these gateway devices are deployed at the outer “edge” of a business, far away from any central data center or cloud. They are designed to be clamped onto the rafters over factory floors or bolted onto the chassis of delivery trucks or the fuselages of aircraft. Gateway devices come with a variety of connectors—past, present, and future—even legacy RS232 serial ports that would allow them to connect to an Apple II controlling a gas chromatograph! The idea is to connect such devices directly to sensors and devices.

Next Week

Part 2 of this blog will be posted next week. Check back for more information on the different scenarios for edge-to-cl0ud.