IMDG vs. IMDB – A Business-Level Perspective
The technologies that are most present in our lives are often ones we don’t realize exist, (even within the IT domain), or are poorly understood. In this post, I’d like to provide a business-level explanation of the difference between an in-memory data grid and an in-memory database in order to drive a better understanding of why this is important to a non-technical reader.
The context for this is transactional use. If you use a credit card, if you’ve ever purchased from a major e-commerce site, if you play online games, etc., you’re already a user of in-memory technology, and the choice of in-memory technology affects the quality of your experience.
One of the ongoing broadly-based, but not well recognized, technology initiatives underway is to move data completely into memory in order to avoid the latency associated with traversing a network to access information stored in a database, especially a disk-based one. A good example of this is the complexity associated with placing an online order: the e-commerce site has to sort through configuration options, inventory availability, shipping, taxes, and updating your customer profile, not to mention validating your credit card, etc. In this example, multiple complex transactions execute so quickly you don’t even notice. This is the sort of thing where in-memory technology affects billions of people continuously. The two popular approaches to putting information where it can be accessed instantly are in-memory databases and in-memory data grids. There are lots of low-level technical differences which would be interesting to a developer, but surfacing the differences to an executive is more critical, since they are more closely aligned with how their business makes money, and understanding this technology can have a huge impact on your bottom line.
An In-Memory Data Grid (IMDG) is a technology that is designed to handle intensive data processing applications. Think of an IMDG as a mechanism that seamlessly gives your applications access to the random-access memory (RAM) across multiple computers. It is as if you simply added much more RAM modules to your application’s computer, while also gaining the ability to run multiple applications in parallel on the same data to handle large processing tasks. This makes application development much easier since you do not have to constantly read and save small chunks of data as you would with a database.
An IMDG can be used in a variety of configurations. As one example, it can create a large pool of data that a complex application, like a banking stress test, can access all at once. As another example, an IMDG can sit between an existing database and the application that needs access to it, such as a customer-facing product catalog on an e-commerce site. The IMDG (like the back-end data it supports) is distributed across multiple nodes and essentially serves as a cache of data (a copy of data that is accessed repeatedly and therefore needs to be reached quickly). Having this type of data cached in memory makes a huge difference in access time (think micro-seconds vs. seconds or minutes), especially at scale. Data in the IMDG can be updated as well, and the changes will be automatically propagated to the back-end database to ensure data updates are preserved. IMDGs can also be run in embedded mode in which shared memory resides on the same computer as the application; this provides a huge performance advantage since less data needs to be transmitted over the network. An IMDG also scales well, and most IMDGs (particularly Hazelcast, if your transactions are complex) do not compromise on speed at all.
An In-Memory Database (IMDB), by comparison, is a system that lets you store and read data, which is all done in memory. When compared to traditional disk-based databases, IMDBs have the obvious advantage of reading and writing data much faster, since both activities are done via RAM. Compared to IMDGs, IMDB applications generally process smaller blocks of data at a time, since the applications have to read data from the IMDB and then write it back once the processing is done. And IMDBs are necessarily run separately from the applications, so network communication between the application and the IMDB is always required. If you choose to implement an IMDB, you often are looking to replace your existing databases, and since they are normally a core part of a legacy system, replacing them is non-trivial. On the upside, there will be limited changes required to the application layer. IMDBs can also be used as caches like IMDGs, but with the disadvantage of requiring the transfer of data over the network to the application.
As mentioned earlier, there are lots of technical nuances between the two options. Bottom line? If your applications only need to access small amounts of data at a time, then the programming model of IMDBs might make sense. If you’re driven by complex transactions that require speed and immediate access to large blocks of data, if you have to scale to internet levels, and can’t or don’t want to swap out your database (keeping in mind all legacy systems include a database), then IMDG is probably the best choice. If you’d like to find out more about IMDGs, start here, or feel free to contact us here.