What is a data grid?

A data grid is a set of computers that directly interact with each other to coordinate the processing of large jobs. The participating computers are typically spread across multiple geographically remote sites. Each site may have one or more of the computers in the grid, and each site shares data and resources with other sites. The main goal of a data grid is to leverage the collective power of all computers to accomplish a given task, in a practice known as grid computing. Software running on all the computers in a grid handles the coordination of tasks, and user accesses to data, across the grid.

How Does a Data Grid work?

A data grid works by running specialized software on every computer that participates in the grid. The software acts as the manager of the entire system and coordinates various tasks across the grid. Specifically, the software assigns subtasks to each computer so they can work simultaneously on their respective subtasks. After the completion of subtasks, the outputs are gathered and aggregated to complete a larger-scale task. The software lets each computer communicate over the network with the other computers so they can share information on what portion of the subtasks each computer is running, and how to consolidate and deliver outputs.

Data Grid Example Diagram
A data grid works by running specialized software on every computer that participates in the grid.

Example Use Cases

One example use case for is running a huge number-crunching task that would not efficiently run on an individual computer. Imagine running a set of interdependent calculations in a spreadsheet but at a much larger scale. Data grids are especially useful for tasks that can be broken into smaller subtasks such as microservices, that can then be run in parallel to enable an efficient divide-and-conquer approach.

Another example is the use of the data grid as a large data store. Each site in a data grid might store data that is owned by that site, but the coordinated sharing of all data across all grid users promotes collaboration and increased knowledge transfer.

Data grids can also be used as the basis for private clouds, in which computers are pooled together and then a subset of the resources of that pool is dedicated to various users via virtual machines. Each virtual machine looks like a real computer but is typically only using a portion of the resources of a physical computer. This is similar to a configuration from a public cloud vendor, except that the computers are owned by the participating organizations of the data grid. This configuration is particularly efficient when private cloud users have short-term computing needs that are soon freed up so that another task can leverage those freed resources.

A specific type of data grid is an in-memory data grid (IMDG) which, as the name implies, runs processing in the computers’ main memory, e.g., random-access memory (RAM). The advantage is that the data is stored in memory across all the computers in the grid, so data access is extremely fast. Thus, IMDGs are especially useful when the grid computing tasks require extremely high throughput and low latency. Hazelcast IMDG is an example of data grid technology that stores and processes data in memory to provide the performance that many business-critical environments require.

Related Topics

Grid Computing

Key-Value Store

In-Memory Database

In-Memory Computation

In-Memory Processing

Further Reading

In-Memory Data Grids Popularity Trend Continues Upward

Transformational Drivers and In-Memory Data Grids

Relevant Resources

| Video
| 60 minutes

Introduction to In-Memory Data Grids

The speed at which today’s evolving data is exploding (90% of all data was collected in the last 2 years) imposes complex business problems that prevailing technology platforms can not address. This is preventing Enterprises from quickly extracting business value from this data. This poses further challenges as the value of data and the insights we can get from them decrease if it takes too long to take action. In this talk, we will learn how Hazelcast® addresses these problems and helps Enterprises overcome the challenges of extracting business value from massive scale data. You will be introduced to Distributed Systems and In-Memory Computing with Hazelcast. This talk will cover some familiar distributed data structures like Maps, Lists, Queues, etc., along with running complex business algorithms in parallel over a Hazelcast cluster by using Distributed Executor Service, EntryProcessors and In-Memory MapReduce.
White Paper

In-Memory Database vs In-Memory Data Grid

In-Memory Databases (IMDB) and In-Memory Data Grids (IMDG) are two technologies that address real-time computing and big data needs without having to start over with an entirely new set of IT systems. While they may sound similar, the differences are significant and understanding your options will help you make better decisions. Your business operates in an environment with constantly evolving technology. For business and IT leaders, it can be overwhelming to keep up with new developments and determine what is critical to implement immediately, and what technologies can or should wait. In-memory processing technologies are becoming pervasive and have a direct and indirect effect on how a business operates. Hazelcast has created this whitepaper to discusses these technologies and the respective implications.
| Video
| 60 minutes

Time to Make the Move to In-Memory Data Grids

DRAM is dirt cheap. That’s why in-memory databases, analytics, and data grids are surging in popularity among firms that have an insatiable need for performance and scalability. But, databases, analytics platforms, and data grids target very different use cases. In-memory data grids, in particular, are often misunderstood because they support an extensive set of use cases that often overlap other technologies. Join guest speaker Mike Gualtieri, Principal Analyst at Forrester Research, Greg Luck, CEO of Hazelcast®, and Ken Kolda, Software Architect of Ellie Mae on this radio-show style webinar to boost your in-memory IQ.

Easy Scaling with Hazelcast In-Memory Data Grid

In-Memory data grids have historically been the exclusive domain of large investment banks and proprietary solutions such as Oracle Coherence, Pivotal Gemfire and Software AG Terracotta. Hazelcast provides an opensource solution that is easy to develop, elastic in scaling and fault tolerant. First part of presentation will cover simple use case, fictional stock brokerage system, that shows basic distributed structures and their behavior. Second part will show some advanced features of Hazelcast like event listeners and data affinity. At the end comparison between Hazelcast, on one side, and redis and memcached, on the other is going to be presented.
View All Resources