Apache Cassandra Enhancement
See Hazelcast in Action
Master intelligent applications with Hazelcast unified real-time stream processing platform.
Introduction
Hazelcast makes Cassandra much faster
Apache Cassandra is one of the most popular NoSQL databases today. It is known especially for high-speed read and write performance. Often used to store huge amounts of tabular, non-relational data, Cassandra is deployed at many organizations for their data storage needs. It features continuous availability, linear scale performance, operational simplicity, and easy data distribution across multiple data centers and cloud availability zones.
Hazelcast Platform includes the fastest in-memory data grid, providing fast reads and writes, as well as high-performance data recovery from single node or cluster failures, and parallel processing. Commonly used for large-volume, distributed, read-heavy in-memory caches with low, consistent latency requirements, Hazelcast Platform provides data processing and querying at in-memory speeds. With its built-in high availability, disaster recovery, and security capabilities, it offers reliability and data safety in addition to its performance advantage to power systems in the most demanding environments at leading companies around the world.
Despite its reputation for speed, Cassandra still faces bottlenecks at high loads due to its disk-based architecture. This means read performance is capped by I/O specifications, which restrict application performance. By integrating Hazelcast Platform with Cassandra, you can boost application performance by reducing disk accesses and retrieving data directly from the RAM managed by Hazelcast. This dramatically reduces the latency of read operations, and also allows greater throughput to accommodate your most intensive workloads. And unlike traditional caching technologies, Hazelcast is architected to run as a production, 24/7 system to tolerate node failures and thus avoid significant performance hits.
Where does Hazelcast fit in?
Without Hazelcast Platform, Cassandra users that need higher throughput and lower latency have a few suboptimal options:
- Significantly boost your hardware investment. You can upgrade your servers to get faster CPUs, you can upgrade to faster disk drives or even to SSDs/NVMes, you can add more servers so that each server in the cluster has less work to do, or do a combination of the above. This ultimately means you will invest a lot more money in hardware than you originally expected, while still not getting optimal performance.
- Redesign your data access patterns. You can spend time on analyzing your end user behavior to see if there are ways to reduce the amount of data stored in Cassandra, or limit the ways users can query the data. This approach can certainly be much more complicated than the effort warrants.
- Reset SLAs. You can tell your stakeholders that higher performance simply is not practical, but this ultimately impacts your ability to gain competitive advantage, so your business partners will often not see this as an acceptable option.
With Hazelcast Platform, you set up an intelligent and scalable caching system that can speed up your Cassandra reads and writes. You can achieve new levels of performance without getting burdened with complexity and cost.
Solution
Hazelcast Platform is popular because of its ease of use and low operational overhead. These make it relatively easy to run a Hazelcast cluster alongside a Cassandra cluster to gain a significant performance boost.
Integration is done by leveraging the MapStore and MapLoader interfaces, both parts of the Hazelcast Platform API. Application developers write the interface code to map the data in Cassandra to Hazelcast and vice versa. The platform then owns the task of automatically synchronizing data between the two systems.
While in operation, Hazelcast Platform acts like a seamless cache in front of Cassandra, where any application-requested data that resides in Hazelcast/RAM is quickly returned. If the data is not found in RAM, then Hazelcast retrieves the data from Cassandra, stores it in RAM, and also returns it to the application. If the application makes an update to data, Hazelcast Platform automatically performs a write-through operation to update the data in Cassandra. No application code is necessary to synchronize the data between Hazelcast Platform and Cassandra.
Use cases for a joint Hazelcast-Cassandra deployment range from caching of product catalogs for online stores, to low-latency write-through caches for user account information, to fast storage of transactional data records. In general, everything that needs to be persistent but quickly accessible is a great fit for a Hazelcast-Cassandra configuration.