External Hazelcast Queries
We can send an SQL query to Hazelcast and get results, a great feature.
But did you know the data doesn’t have to be stored in Hazelcast to be queryable via Hazelcast?
It could be somewhere like a Kafka topic or a CSV file, and you can still query it using Hazelcast’s SQL.
This makes Hazelcast a “one-stop-shop” for all your querying needs.
Example
Imagine this query:
SELECT * FROM xyz
Where is the data stored that it is querying? An IMap? Somewhere else?
The answer, if you are a user, is that you don’t care!
All you are interested in really is the results, not where the data is stored.
Hazelcast knows where the data is. The data may be in Hazelcast or elsewhere such as a Kafka topic. What matters is if Hazelcast can locate the data to run the query against it.
Background
While it’s easy for the user to query external things via Hazelcast, there are two caveats to be aware of.
Mappings
Firstly, there needs to be a definition of the external source so that Hazelcast knows how to find it and work with it.
For example:
CREATE EXTERNAL MAPPING IF NOT EXISTS kf_trades EXTERNAL NAME kf_trades ( id VARCHAR, price BIGINT, quantity BIGINT, symbol VARCHAR, "timestamp" BIGINT ) TYPE Kafka OPTIONS ( 'keyFormat' = 'java', 'keyJavaClass' = 'java.lang.String', 'valueFormat' = 'json', 'auto.offset.reset' = 'earliest', 'bootstrap.servers' = 'kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094' )
This defines a Kafka topic called “kf_trades
“. It has field elements in a JSON value – id
, price
, quantity
, symbol
and timestamp
. We also need to confirm the location of the Kafka brokers, kafka-broker0:9092,kafka-broker1:9093,kafka-broker2:9094
.
So not exactly difficult, and a one-off task to submit the “CREATE MAPPING
” Sql command to the Sql engine when the cluster. starts.
Performance
Secondly, the external source won’t be as fast as Hazelcast. So the end-user may have to wait a bit for their results. Just don’t blame Hazelcast here.
Now This Is An Example
Now let’s stretch the imagination with this query:
SELECT k.id, k.symbol, k."timestamp", s.* FROM kf_trades AS k LEFT JOIN symbols AS s ON k.symbol = s.__key
Now what we’re doing is a join on the Kafka topic defined above and “symbols
” which happens to be an IMap.
So we can take a sequence of Stock Market trades on a Kafka topic and enrich them with information about the stock symbol to form our result.
ID , SYMBOL , TIMESTAMP , __KEY , FINANCIALSTATUS , MARKETCATEGORY , SECURITYNAME 4a696e0c-45e3-4c88-8d6f-a5bc2c966966 , LULU , 1624528869786 , LULU , NORMAL , GLOBAL_SELECT_MARKET , lululemon athletica inc. - Common Stock 9b25cb15-9b94-4578-815b-a9db035f4da9 , BSRR , 1624528873366 , BSRR , NORMAL , GLOBAL_SELECT_MARKET , Sierra Bancorp - Common Stock b0c3b163-4122-4e49-81a3-25f32c60eab1 , VDSI , 1624528883836 , VDSI , NORMAL , CAPITAL_MARKET , VASCO Data Security International, Inc. - Common Stock 0f5426ab-86cf-4e5a-b60b-6369c2470432 , AMDA , 1624528886519 , AMDA , DEFICIENT , CAPITAL_MARKET , Amedica Corporation - Common Stock 6f6b2f74-7f48-414c-bc82-5aed441c0c35 , ASBB , 1624528887603 , ASBB , NORMAL , GLOBAL_MARKET , ASB Bancorp, Inc. - Common Stock c6b4d97d-18ec-480b-b1dd-20491713dc82 , TANNI , 1624528879335 , TANNI , NORMAL , GLOBAL_SELECT_MARKET , TravelCenters of America LLC - 8.25% Senior Notes due 2028
So our trades have a stock symbol, and by joining on the map with symbol information, we can find out the status and name of the stock.
Some input is external here (Kafka) and some internal (IMap), but all the end-user need to concern themselves with is the output.
Or CSV
Or how about a CSV file?
It’s common to pass data from system to system and not normally something you can query easily. Now you can. All you do is tell Hazelcast where to find it, somewhere all members of the cluster can reach.
For this CSV file:
English,French,Spanish,Italian one,un,uno,uno two,deux,dos,due three,trois,tres,tre four,quatre,cuatro,quattro
With this definition:
CREATE MAPPING csv ( English VARCHAR, French VARCHAR, Spanish VARCHAR, Italian VARCHAR ) TYPE File OPTIONS ( 'format' = 'csv', 'path' = '/tmp/blog' )
We can run this query:
SELECT English, French FROM csv
And get the results we expect:
one un four quatre two deux three trois
Of course, we may want to join our CSV file with a Hazelcast map.
Our map has two pairs of Strings, for French and German.
hazelcastInstance.getMap("map").set("deux", "zwei"); hazelcastInstance.getMap("map").set("trois", "drei");
So now our query becomes:
SELECT c.*, m.* FROM csv AS c LEFT JOIN map AS m ON c.French = m.__key
And our result:
one un uno uno null null four quatre cuatro quattro null null two deux dos due deux zwei three trois tres tre trois drei
Two rows have nulls because it’s a LEFT JOIN
.
Ordering
Did you spot the order of the query results wasn’t the same as the order of lines in the file?
That’s because Hazelcast is distributed.
It’s easily fixed if you need it, with the ORDER BY
clause.
SELECT English, French FROM csv ORDER BY 1
Remember also this is the order of the output. We don’t care about the order of the input.
Summary
Hazelcast allows you to query data in its stores and data elsewhere.
This is a great convenience for the end-user.
We can query different things such as Kafka and Hazelcast from one place.
The end-user doesn’t need to know (and may not care) the source of the data.
After all, it’s the results of the query we want, not the input.