Blog ›External Hazelcast Queries

By Neil Stevenson

Principal Architect

With over 30 years of industry experience, Neil has designed, developed, debugged, and supported software systems for numerous customers large and small. Initially, a C and assembler programmer, most of the last 20 years have been Java-based, with a focus on distributed systems, data grids, and stream processing. Neil is an occasional committer to the Hazelcast code base, with special interest on GoLang.

View all blogs by the author

Jul 26, 2021

Back to Blog

External Hazelcast Queries

We can send an SQL query to Hazelcast and get results, a great feature.

But did you know the data doesn’t have to be stored in Hazelcast to be queryable via Hazelcast?

It could be somewhere like a Kafka topic or a CSV file, and you can still query it using Hazelcast’s SQL.

This makes Hazelcast a “one-stop-shop” for all your querying needs.

Example

Imagine this query:

SELECT * FROM xyz

Where is the data stored that it is querying? An IMap? Somewhere else?

The answer, if you are a user, is that you don’t care!

All you are interested in really is the results, not where the data is stored.

Hazelcast knows where the data is. The data may be in Hazelcast or elsewhere such as a Kafka topic. What matters is if Hazelcast can locate the data to run the query against it.

Background

While it’s easy for the user to query external things via Hazelcast, there are two caveats to be aware of.

Mappings

Firstly, there needs to be a definition of the external source so that Hazelcast knows how to find it and work with it.

For example:

CREATE EXTERNAL MAPPING IF NOT EXISTS kf_trades EXTERNAL NAME kf_trades
(  id             VARCHAR,
   price          BIGINT,
   quantity       BIGINT,
   symbol         VARCHAR,
   "timestamp"    BIGINT
)
  TYPE Kafka  
OPTIONS
(  'keyFormat' = 'java', 'keyJavaClass' = 'java.lang.String',
   'valueFormat' = 'json',
   'auto.offset.reset' = 'earliest',
   'bootstrap.servers' = 'kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094'
)

This defines a Kafka topic called “kf_trades“. It has field elements in a JSON value – id, price, quantity, symbol and timestamp. We also need to confirm the location of the Kafka brokers, kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094.

So not exactly difficult, and a one-off task to submit the “CREATE MAPPING” Sql command to the Sql engine when the cluster. starts.

Performance

Secondly, the external source won’t be as fast as Hazelcast. So the end-user may have to wait a bit for their results. Just don’t blame Hazelcast here.

Now This Is An Example

Now let’s stretch the imagination with this query:

SELECT k.id, k.symbol, k."timestamp", s.* FROM kf_trades AS k LEFT JOIN symbols AS s ON k.symbol = s.__key

Now what we’re doing is a join on the Kafka topic defined above and “symbols” which happens to be an IMap.

So we can take a sequence of Stock Market trades on a Kafka topic and enrich them with information about the stock symbol to form our result.

ID                                    ,  SYMBOL  ,  TIMESTAMP      ,  __KEY  ,  FINANCIALSTATUS  ,  MARKETCATEGORY        ,  SECURITYNAME                                                  
4a696e0c-45e3-4c88-8d6f-a5bc2c966966  ,  LULU    ,  1624528869786  ,  LULU   ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  lululemon athletica inc. - Common Stock                       
9b25cb15-9b94-4578-815b-a9db035f4da9  ,  BSRR    ,  1624528873366  ,  BSRR   ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  Sierra Bancorp - Common Stock                                 
b0c3b163-4122-4e49-81a3-25f32c60eab1  ,  VDSI    ,  1624528883836  ,  VDSI   ,  NORMAL           ,  CAPITAL_MARKET        ,  VASCO Data Security International, Inc. - Common Stock        
0f5426ab-86cf-4e5a-b60b-6369c2470432  ,  AMDA    ,  1624528886519  ,  AMDA   ,  DEFICIENT        ,  CAPITAL_MARKET        ,  Amedica Corporation - Common Stock                            
6f6b2f74-7f48-414c-bc82-5aed441c0c35  ,  ASBB    ,  1624528887603  ,  ASBB   ,  NORMAL           ,  GLOBAL_MARKET         ,  ASB Bancorp, Inc. - Common Stock                              
c6b4d97d-18ec-480b-b1dd-20491713dc82  ,  TANNI   ,  1624528879335  ,  TANNI  ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  TravelCenters of America LLC - 8.25% Senior Notes due 2028

So our trades have a stock symbol, and by joining on the map with symbol information, we can find out the status and name of the stock.

Some input is external here (Kafka) and some internal (IMap), but all the end-user need to concern themselves with is the output.

Or CSV

Or how about a CSV file?

It’s common to pass data from system to system and not normally something you can query easily. Now you can. All you do is tell Hazelcast where to find it, somewhere all members of the cluster can reach.

For this CSV file:

English,French,Spanish,Italian
one,un,uno,uno
two,deux,dos,due
three,trois,tres,tre
four,quatre,cuatro,quattro

With this definition:

CREATE MAPPING csv (  English VARCHAR, French VARCHAR, Spanish VARCHAR, Italian VARCHAR )  TYPE File  OPTIONS (  'format' = 'csv', 'path' = '/tmp/blog' )

We can run this query:

SELECT English, French FROM csv

And get the results we expect:

one un
four quatre
two deux
three trois

Of course, we may want to join our CSV file with a Hazelcast map.

Our map has two pairs of Strings, for French and German.

hazelcastInstance.getMap("map").set("deux", "zwei");
hazelcastInstance.getMap("map").set("trois", "drei");

So now our query becomes:

SELECT c.*, m.* FROM csv AS c LEFT JOIN map AS m ON c.French = m.__key

And our result:

one un uno uno null null
four quatre cuatro quattro null null
two deux dos due deux zwei
three trois tres tre trois drei

Two rows have nulls because it’s a LEFT JOIN.

Ordering

Did you spot the order of the query results wasn’t the same as the order of lines in the file?

That’s because Hazelcast is distributed.

It’s easily fixed if you need it, with the ORDER BY clause.

SELECT English, French FROM csv ORDER BY 1

Remember also this is the order of the output. We don’t care about the order of the input.

Summary

Hazelcast allows you to query data in its stores and data elsewhere.

This is a great convenience for the end-user.

We can query different things such as Kafka and Hazelcast from one place.

The end-user doesn’t need to know (and may not care) the source of the data.

After all, it’s the results of the query we want, not the input.

Keep Reading

Blog

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Network hops cost milliseconds; milliseconds cost money. Put data, compute, and AI on one platform, and both bills shrink, whether…

Blog

Understanding the Value of Distributed Compute

Introduction Hazelcast is a powerful platform. It delivers the power of a highly reliable, distributed cache. Equally important is the…

Blog

Resilience That Holds Under Load: Hazelcast Platform 5.7

A major release for institutions where the operational state must remain correct during degradation, not just be restored afterward. The…

Blog

Testing distributed resilient applications powered by Hazelcast

Applications powered by Hazelcast and that use it to drive business logic need tests that go beyond happy-path validation. Serialization,…

Datasheet

/ PDF

/ 2 pages

Resilient, Continuous, Active Data – without Compromise Datasheet

The unified in-memory and stream processing platform for resilient, continuous active data at sub-millisecond speed.

Webinar

/ Video

/ 45 min

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Zero-downtime upgrades aren’t the hard part—schema evolution is. Learn how mixed service versions interact with shared cached data, why subtle inconsistencies cause failures, and how to design forward-compatible changes using Hazelcast and real Java examples.

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Neil Stevenson

External Hazelcast Queries

Example

Background

Mappings

Performance

Now This Is An Example

Or CSV

Ordering

Summary

Keep Reading

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Understanding the Value of Distributed Compute

Resilience That Holds Under Load: Hazelcast Platform 5.7

Testing distributed resilient applications powered by Hazelcast

Resilient, Continuous, Active Data – without Compromise Datasheet

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Neil Stevenson

Spread the Word

External Hazelcast Queries

Example

Background

Mappings

Performance

Now This Is An Example

Or CSV

Ordering

Summary

Keep Reading

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Understanding the Value of Distributed Compute

Resilience That Holds Under Load: Hazelcast Platform 5.7

Testing distributed resilient applications powered by Hazelcast

Resilient, Continuous, Active Data – without Compromise Datasheet

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect