What Is an In-Memory Data Grid?

An in-memory data grid (IMDG) is a set of networked/clustered computers that pool together their random access memory (RAM) to let applications share data with other applications running in the cluster. Though IMDGs are sometimes generically described as a distributed in-memory data store, IMDGs offer more than just storage. IMDGs are built for data processing at extremely high speeds. They are designed for building and running large-scale applications that need more RAM than is typically available in a single computer server. This enables the highest application performance by using RAM along with the processing power of multiple computers that run tasks in parallel. IMDGs are especially valuable for applications that do extensive parallel processing on large data sets.

 

In-Memory Data Grid diagram

 

An IMDG is an in-memory version of a data grid, except that all nodes of the cluster are typically run in the same data center. This local configuration is done to maintain the expected high performance of in-memory technologies, as coordination of data structures over geographically remote computers can be a bottleneck.

How Does an In-Memory Data Grid Work?

An IMDG works by running specialized software on every computer in a cluster to coordinate access to data for applications. Each computer in the cluster has its own view of data and data structures in memory, but the view is shared across all other computers. The software keeps track of all data on each individual node, so that the data can be shared with any other node or any application. This orchestration hides the complexity of retrieving and updating data across the network, thus simplifying application development.

Data in an IMDG is often stored in the form of objects, such as maps, lists, and queues. Basic data types (“primitives”) like integers and floating-point numbers are included as well. Each of these objects and data types are represented as variables in an application, and the application logic references these variables as if they resided in the same computer that is running the application. This makes the programming paradigm much simpler than other in-memory technologies, as the developer does not need to include code to physically retrieve data.

Alternatives to In-Memory Data Grids

Similar technologies like in-memory databases (IMDB) also enable high-speed applications. The key difference is that IMDGs allow you to build applications that process large amounts of data all at once. IMDGs are ideal when data is continually accessed and modified as part of complex computations (like financial market simulations or risk analysis). IMDBs, on the other hand, are designed for applications that are more storage-focused, where only small subsets of data are retrieved at any given time, or when simple calculations are run across a large subset of data. In general, data points are retrieved when needed, processed, then written back to the IMDB.

There are two general uses for IMDBs, and thus there are distinct offerings in the market that are designed to address the separate use cases. There is some overlap between these use cases, but there is enough difference to warrant separate technologies.

The first use case entails SQL-based analytics, where data is stored in the table/row/column format of the relational model. Some IMDBs in the market are optimized for this use case. One can think of these technologies as in-memory versions of relational databases, but they exhibit greater speeds due to the in-memory advantage. These IMDBs typically emphasize analytics, but some can handle operational workloads as well.

The other use case simply entails data lookup. IMDBs designed for this use case typically do not have SQL as an interface, as the data is typically stored in a non-relational format. The most basic form of this use case is caching, in which the IMDB provides faster reads versus disk-based databases. IMDGs are also good for caching, and are used in such environments when caching is only one part of the overall in-memory requirement, such as when fast processing is a critical capability.

Example Use Cases

One example use case for IMDGs is payment processing. During a payment transaction, several important calculations need to be made in a minimal time window. Payment authentication and payment network selection are two of the key actions, but fraud detection is arguably the most computationally heavy. Fraud detection requires a complex comparison of the transaction in question against a long history of transactions to measure the probability of fraud. The reduction of fraud is clearly a benefit to the payment processor, but reducing the number of false positives is also an important objective. If the fraud rules are too strict, then otherwise valid transactions might be flagged as fraudulent, causing consumers to use another method of payment. This makes the payment processor lose the transaction fee, which is why better fraud detection is a key factor in gaining a competitive advantage. By running multiple fraud algorithms simultaneously, payment processors can determine a more accurate scoring mechanism to reduce both fraud and false positives. The performance advantage of IMDGs lets processors run these multiple algorithms while maintaining the millisecond-level responsiveness that they need to maintain a positive customer experience.

Another use case for IMDGs is running large-scale simulations such as “Monte Carlo simulations” that help create a clearer picture of what might happen in the future by considering many different factors. These types of simulations are commonly run in the financial services industry to better understand the risks that the firms face. For example, stress testing is about understanding the resilience of the firm and its investments if certain future events were to occur. This not only helps with planning and asset allocation, but it is also required for regulatory compliance to show auditors that banks have appropriate levels of capital and other assets. Financial services firms might also use IMDGs to calculate counterparty risk, so they can better understand what risk they are taking on as they work with another financial institution. Since simulations often entail many calculations with many variables, IMDGs help run these simulations quickly to get the full picture of potential risk.