This short video explains why companies use Hazelcast for business-critical applications based on ultra-fast in-memory and/or stream processing technologies.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing?
In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Now, deploying Hazelcast-powered applications in a cloud-native way becomes even easier with the introduction of Hazelcast Cloud Enterprise, a fully-managed service built on the Enterprise edition of Hazelcast IMDG. Can't attend the live times? You should still register! We'll be sending out the recording after the webinar to all registrants.
Here are a few of the reasons.
Hazelcast IMDG APIs are familiar to developers, so you can be productive right out of the gate. If you’re a Java developer, you’ll find our APIs can be thought of as distributed implementations of the Java Collections API. For other languages, you’ll find the data structures you’re used to using (Map or Dictionary, List, Set, Queue, etc.), distributed transparently, with APIs that work the way you expect.
When you store data in a Hazelcast Object Store, you aren’t just storing string data; you have all the rich complexity of your objects, including references to the other objects they relate to. You can use IMDG’s query API to perform complex queries and aren’t limited to a simple retrieval-by-key. The “distributed” part of IMDG means that you can have in-memory collections larger than the memory of a single JVM or server, partitioned across multiple server nodes that can all be accessed in parallel for high throughput even when many clients are accessing the data concurrently.
Any caching solution can let you store and retrieve data. But in most cases, to do anything with that data you’ll have to fetch it across the network and process it locally. When caching gigabytes or even terabytes of data, any operation that needs to iterate through any non-trivial fraction of the data can become bottlenecked by the need to move data between the cache and the application using the data. With Hazelcast, you can take the methods that operate on the data and submit them to the cluster by passing a Runnable or Callable to the distributed ExecutorService. (A ScheduledExecutorService is also available for periodically recurring tasks). Once you submit the code (which can run synchronously or asynchronously), IMDG takes care of routing it to the appropriate member nodes and collecting the results, which can then be returned to the caller or stored in the data grid.
Hazelcast Entry Processors are another mechanism to execute code in the grid; Entry Processors specifically target individual data entries (either single entries identified by a key, or a set of entries qualified by a Predicate expression). Because the Entry Processor executes directly on the member owning the primary copy of the data, no data transfer is needed.
Hazelcast Aggregators provide the ability to perform common aggregation operations across a set of entries identified by a predicate; if these entries are spread across multiple cluster nodes (as is commonly the case), each cluster member will perform the aggregation of the entries it owns, and then a single cluster member will aggregate those aggregations to return a final value to the requesting client.
The computation capabilities of IMDG can be extended even further when IMDG is paired with Hazelcast’s in-memory stream and batch processing engine, Hazelcast Jet.
For the majority of developers who pick up Hazelcast, the power of the platform is that it allows you to build a distributed application that’s as simple to code as a monolithic application – you worry about the business logic, and we’ll worry about keeping everything in sync across the cluster (or even several distributed clusters). But sometimes you need to build something distributed that doesn’t have a simple out-of-the-box solution – and for those cases, we’ve got a set of building blocks that can take a very complex problem and break it down into a more manageable level of complexity.
Hazelcast is focused on high availability use cases; many of our customers operate in zero-downtime environments where Hazelcast is a key component of how they meet their strict SLAs. So most Hazelcast features, in terms of CAP Theorem, operate in an AP manner. But when building distributed systems, sometimes there are critical code areas where consistency must be guaranteed, even in the event of multiple node failures. For these critical functions – like locks and semaphores – Hazelcast provides a CP subsystem, based on our implementation of the RAFT consensus algorithm. Here are some of the APIs included in the CP subsystem that you can use in building distributed applications, with confidence that the hard part has been taken care of and you can focus on implementing your business logic.
Hazelcast’s partitioned data structures remove the need for most locking. Because only one thread is allowed to read or update data from a particular partition, all access to the data will queue on to be serviced by that partition thread. Since multiple threads cannot access the partition concurrently, no locking is required; but the relatively large number of partitions ensures sufficient concurrency that this queuing should not become a bottleneck. (For larger data stores, increasing the partition count can alleviate the issue should it arise).
But you may have additional resources to which access needs to be coordinated or an operation that needs to access data in multiple partitions. In these cases, the FencedLock API provides a mechanism for locking in a distributed environment.
A lock allows you to ensure that only a single thread is accessing a resource. Sometimes, there may be a resource where you want to provide limited but not exclusive access, for example allowing a certain number of threads to be running. Semaphores provide this capability. A similar need is that in a multi-threaded application, you might need for multiple threads to complete their work before some operation proceeds; a Countdown Latch can be used for this. Locks, Semaphores, and Countdown Latches are all provided by Java in the java.util.concurrent package, but these implementations only operate within a single JVM and cannot help you coordinate distributed applications. The Hazelcast Distributed Concurrency APIs provide the semantics you need in a familiar form, and the CP subsystem ensures that they will function reliably in the most challenging circumstances.
Hazelcast provides the IAtomicInteger, IAtomicLong, and IAtomicReference APIs to provide distributed implementations of atomic integer, long, and reference types.
When an application is spread across multiple nodes, generating unique identifiers can be an issue. An atomic value can be used if the values need to be contiguous, but if many values are being generated in a short period of time, the atomic value can become a bottleneck. If the values don’t need to be contiguous, a Flake ID generator can do the job -- each node will generate values within its own range so that values from different nodes are guaranteed not to collide.
Just as unique ids can be an issue, so can counters. Here again, atomic values can provide a way to implement a distributed counter across a cluster; but the counter can become a point of contention if each cluster member updates the counter every time the value is changed. A less intrusive mechanism is a PN counter - each node member updates its own local counter, and then the counter values are periodically synced by sharing the net change in the counter to other cluster members. This keeps the value seen across the cluster close, but not identical -- so it’s useful for values like a hit counter where an approximate value is sufficiently accurate.
Pilots of small planes will tell you that moving from a single engine to a dual engine plane doubles the chance that you’ll have an engine failure. Similarly, as you move from simple applications that run on a single system to complex distributed systems, you introduce more points of failure. But a well architected distributed system takes advantage of the multiple nodes in the system to build in redundancy and become resilient to failure.
With Hazelcast, redundancy and fault tolerance is built-in; we take a zero-code approach to high availability where by default, your data is backed up, and you can provide a further degree of safety by configuring additional backups or replicating the entire cluster, through changes to the configuration files. (These changes can also be made programmatically if you prefer).
In an IMDG cluster your data is partitioned (some would say sharded), with partitions distributed among the cluster nodes. Each partition has a primary copy (and the node where that copy is stored is considered the owner of that partition). Each partition also has one or more backup copies, which will not be stored on the same system as the primary copy. (There are configuration parameters you can use to minimize the chance of a failure taking out a primary and backup – saying, for example, that backups should not be stored to a node in the same server rack or availability zone as the primary copy). Hazelcast automatically creates these backups and keeps them in sync. In case of a node failure, the backup copies of partitions that were housed on the node are promoted to primaries, and new backups are created. Partitions will migrate as needed to rebalance the partition load between the nodes that are still operational. When the downed node or its replacement rejoins the cluster, Hazelcast will again rebalance the partitions to efficiently use all of the capacity of the cluster.
For even greater reliability, Hazelcast also supports WAN replication, where a cluster is replicated to a remote site. (Member nodes within a cluster should always be located in the same data center).
Hazelcast IMDG is an open core software product licensed under the Apache 2 license. The home of the open source core is hazelcast.org, and the project sources can be viewed or forked from GitHub.
Hazelcast Enterprise Edition adds features important to Enterprise deployments including improved security and additional fault tolerance, and technologies to enable zero downtime deployments (such as Rolling Upgrades and Blue-Green deployments). For more information about Hazelcast Enterprise, see the product page.
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.