Blog ›External Hazelcast Queries

By Neil Stevenson

Principal Architect

With over 30 years of industry experience, Neil has designed, developed, debugged, and supported software systems for numerous customers large and small. Initially, a C and assembler programmer, most of the last 20 years have been Java-based, with a focus on distributed systems, data grids, and stream processing. Neil is an occasional committer to the Hazelcast code base, with special interest on GoLang.

View all blogs by the author

Jul 26, 2021

Back to Blog

External Hazelcast Queries

We can send an SQL query to Hazelcast and get results, a great feature.

But did you know the data doesn’t have to be stored in Hazelcast to be queryable via Hazelcast?

It could be somewhere like a Kafka topic or a CSV file, and you can still query it using Hazelcast’s SQL.

This makes Hazelcast a “one-stop-shop” for all your querying needs.

Example

Imagine this query:

SELECT * FROM xyz

Where is the data stored that it is querying? An IMap? Somewhere else?

The answer, if you are a user, is that you don’t care!

All you are interested in really is the results, not where the data is stored.

Hazelcast knows where the data is. The data may be in Hazelcast or elsewhere such as a Kafka topic. What matters is if Hazelcast can locate the data to run the query against it.

Background

While it’s easy for the user to query external things via Hazelcast, there are two caveats to be aware of.

Mappings

Firstly, there needs to be a definition of the external source so that Hazelcast knows how to find it and work with it.

For example:

CREATE EXTERNAL MAPPING IF NOT EXISTS kf_trades EXTERNAL NAME kf_trades
(  id             VARCHAR,
   price          BIGINT,
   quantity       BIGINT,
   symbol         VARCHAR,
   "timestamp"    BIGINT
)
  TYPE Kafka  
OPTIONS
(  'keyFormat' = 'java', 'keyJavaClass' = 'java.lang.String',
   'valueFormat' = 'json',
   'auto.offset.reset' = 'earliest',
   'bootstrap.servers' = 'kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094'
)

This defines a Kafka topic called “kf_trades“. It has field elements in a JSON value – id, price, quantity, symbol and timestamp. We also need to confirm the location of the Kafka brokers, kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094.

So not exactly difficult, and a one-off task to submit the “CREATE MAPPING” Sql command to the Sql engine when the cluster. starts.

Performance

Secondly, the external source won’t be as fast as Hazelcast. So the end-user may have to wait a bit for their results. Just don’t blame Hazelcast here.

Now This Is An Example

Now let’s stretch the imagination with this query:

SELECT k.id, k.symbol, k."timestamp", s.* FROM kf_trades AS k LEFT JOIN symbols AS s ON k.symbol = s.__key

Now what we’re doing is a join on the Kafka topic defined above and “symbols” which happens to be an IMap.

So we can take a sequence of Stock Market trades on a Kafka topic and enrich them with information about the stock symbol to form our result.

ID                                    ,  SYMBOL  ,  TIMESTAMP      ,  __KEY  ,  FINANCIALSTATUS  ,  MARKETCATEGORY        ,  SECURITYNAME                                                  
4a696e0c-45e3-4c88-8d6f-a5bc2c966966  ,  LULU    ,  1624528869786  ,  LULU   ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  lululemon athletica inc. - Common Stock                       
9b25cb15-9b94-4578-815b-a9db035f4da9  ,  BSRR    ,  1624528873366  ,  BSRR   ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  Sierra Bancorp - Common Stock                                 
b0c3b163-4122-4e49-81a3-25f32c60eab1  ,  VDSI    ,  1624528883836  ,  VDSI   ,  NORMAL           ,  CAPITAL_MARKET        ,  VASCO Data Security International, Inc. - Common Stock        
0f5426ab-86cf-4e5a-b60b-6369c2470432  ,  AMDA    ,  1624528886519  ,  AMDA   ,  DEFICIENT        ,  CAPITAL_MARKET        ,  Amedica Corporation - Common Stock                            
6f6b2f74-7f48-414c-bc82-5aed441c0c35  ,  ASBB    ,  1624528887603  ,  ASBB   ,  NORMAL           ,  GLOBAL_MARKET         ,  ASB Bancorp, Inc. - Common Stock                              
c6b4d97d-18ec-480b-b1dd-20491713dc82  ,  TANNI   ,  1624528879335  ,  TANNI  ,  NORMAL           ,  GLOBAL_SELECT_MARKET  ,  TravelCenters of America LLC - 8.25% Senior Notes due 2028

So our trades have a stock symbol, and by joining on the map with symbol information, we can find out the status and name of the stock.

Some input is external here (Kafka) and some internal (IMap), but all the end-user need to concern themselves with is the output.

Or CSV

Or how about a CSV file?

It’s common to pass data from system to system and not normally something you can query easily. Now you can. All you do is tell Hazelcast where to find it, somewhere all members of the cluster can reach.

For this CSV file:

English,French,Spanish,Italian
one,un,uno,uno
two,deux,dos,due
three,trois,tres,tre
four,quatre,cuatro,quattro

With this definition:

CREATE MAPPING csv (  English VARCHAR, French VARCHAR, Spanish VARCHAR, Italian VARCHAR )  TYPE File  OPTIONS (  'format' = 'csv', 'path' = '/tmp/blog' )

We can run this query:

SELECT English, French FROM csv

And get the results we expect:

one un
four quatre
two deux
three trois

Of course, we may want to join our CSV file with a Hazelcast map.

Our map has two pairs of Strings, for French and German.

hazelcastInstance.getMap("map").set("deux", "zwei");
hazelcastInstance.getMap("map").set("trois", "drei");

So now our query becomes:

SELECT c.*, m.* FROM csv AS c LEFT JOIN map AS m ON c.French = m.__key

And our result:

one un uno uno null null
four quatre cuatro quattro null null
two deux dos due deux zwei
three trois tres tre trois drei

Two rows have nulls because it’s a LEFT JOIN.

Ordering

Did you spot the order of the query results wasn’t the same as the order of lines in the file?

That’s because Hazelcast is distributed.

It’s easily fixed if you need it, with the ORDER BY clause.

SELECT English, French FROM csv ORDER BY 1

Remember also this is the order of the output. We don’t care about the order of the input.

Summary

Hazelcast allows you to query data in its stores and data elsewhere.

This is a great convenience for the end-user.

We can query different things such as Kafka and Hazelcast from one place.

The end-user doesn’t need to know (and may not care) the source of the data.

After all, it’s the results of the query we want, not the input.

Keep Reading

Blog

Recap: Establishing a Digital Core for Real-Time Applications

Discover how real-time data and AI drive business agility in our webinar recap. Experts from Discover, JPMorgan Chase, Baxter, and Capri Holdings shared insights on customer-centric ROI, AI strategies, and shedding legacy systems. Learn how businesses unlock efficiencies, scale seamlessly, and stay competitive with real-time data.

Webinar

/ Video

/ 60 min

Establishing a Digital Core for the Real-Time Economy

Blog

Recap: Hazelcast, Santander, and London Java Community Holiday Meetup

Close out the year in style with the London Java Community! Highlights included Santander’s Humphrey Fu sharing how Hazelcast evolved into a core part of their FX platform, and Hazelcast’s Ed Thurman exploring fault-tolerant Java applications. With insights, festive fun, and community connections, this meetup was one to remember.

Case Study

IoT Analytics and Telematics at the Largest Logistics Operation in the World

Video

Beyond Caching with Hazelcast, Santander, and London Java Community

Discover how Santander scaled Hazelcast to power messaging, compute grids, and more, with insights from Engineering Lead Humphrey Fu. Plus, Hazelcast’s Ed Thurman shares practical strategies for resilient distributed systems, covering replication, backups, and failure-resistant Java design. Perfect for architects, engineers, and developers!

Blog

Get Instant Answers with Ask AI on Hazelcast Documentation

Introducing Ask AI on our documentation site! This chatbot uses a retrieval-augmented generation system trained on 5,000+ pages to deliver accurate answers to technical questions, supporting multiple languages and improving the developer experience. User feedback enhances response quality to keep Hazelcast’s documentation reliable.

Why Hazelcast?

Ready to see how Hazelcast can transform your data architecture?

Key Capabilities

Products

Tool Kit

Quick Links

A cloud-agnostic architecture for your applications

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Neil Stevenson

Spread the Word

External Hazelcast Queries

Example

Background

Mappings

Performance

Now This Is An Example

Or CSV

Ordering

Summary

Keep Reading

Recap: Establishing a Digital Core for Real-Time Applications

Establishing a Digital Core for the Real-Time Economy

Recap: Hazelcast, Santander, and London Java Community Holiday Meetup

IoT Analytics and Telematics at the Largest Logistics Operation in the World

Beyond Caching with Hazelcast, Santander, and London Java Community

Get Instant Answers with Ask AI on Hazelcast Documentation

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect