Hazelcast Python Client 4.2 is Released!
We have released the Python client 4.2, featuring the brand new SQL support! You can start using it to query your data efficiently with the widely-known standard SQL syntax.
You can install the latest version through PyPI by issuing the following command.
pip install -U hazelcast-python-client
SQL Support
For a long time, Hazelcast IMDG and the Hazelcast Python client have supported querying the data you store in distributed maps using the Predicates with SQL-like syntax. Yet, the existing query engine had its limitations and there were strong voices from the community for additional features and support for standard SQL syntax.
Listening to these, we introduced a preview version in IMDG 4.1 of the brand-new, state-of-the-art SQL engine. Although there is a long way to go, we have improved it a lot in IMDG 4.2 and started working on the client languages to bring that feature to widespread use.
The new SQL engine:
- supports many new expressions.
- supports sorting and limiting.
- is able to query large amounts of data without putting too much strain on the client and server by fetching the results page by page.
- has an option to make use of the new, high-performance, concurrent off-heap B+ tree+
- has more advanced query optimizations.
The 4.2 version of the Python client brings these new functionalities to the Python world!
Note: The SQL feature for the Python client and IMDG is in beta state and we only guarantee compatibility with the same minor versions of the client and server.
Usage
For the rest of the post, we will assume a map called employees is filled with values of the following Portable class.
class Employee(Portable): FACTORY_ID = 1 CLASS_ID = 1 def __init__(self, employee_id=None, first_name=None, last_name=None, age=None, salary=None): self.employee_id = employee_id self.first_name = first_name self.last_name = last_name self.age = age self.salary = salary def write_portable(self, writer): writer.write_int("employee_id", self.employee_id) writer.write_string("first_name", self.first_name) writer.write_string("last_name", self.last_name) writer.write_int("age", self.age) writer.write_int("salary", self.salary) def read_portable(self, reader): self.employee_id = reader.read_int("employee_id") self.first_name = reader.read_string("first_name") self.last_name = reader.read_string("last_name") self.age = reader.read_int("age") self.salary = reader.read_int("salary") def get_factory_id(self): return Employee.FACTORY_ID def get_class_id(self): return Employee.CLASS_ID client = hazelcast.HazelcastClient(portable_factories={ Employee.FACTORY_ID: { Employee.CLASS_ID: Employee, }, }) employees = client.get_map("employees").blocking() # Fill the employees map employees.set(1, Employee(1, "John", "Doe", 42, 60000)) ...
In Python client 4.2, there is a new service called SQL, that is capable of executing queries on distributed maps.
You can simply use the execute() method to run queries, by just passing the query string and parameters, if any. Note that, the parameters are specified with the question marks in query strings.
result = client.sql.execute("SELECT employee_id, first_name FROM employees WHERE age < ?", 50)
Or, you can construct a SqlStatement to have a greater amount of control over the query execution by configuring cursor buffer size (page size), timeout, expected result type, and parameters. A SqlStatement can be passed into the execute_statement() method.
statement = SqlStatement("SELECT * FROM employees") statement.cursor_buffer_size = 200 result = client.sql.execute_statement(statement)
Both of these methods return an SqlResult, which implements the iterator interface for ease of use. You can iterate over the rows of the result and get the column values of rows either by the column name or index.
result = client.sql.execute("SELECT employee_id, first_name FROM employees WHERE age < ?", 50) for row in result: # Get the column value with the column name employee_id = row.get_object("employee_id") # Find the column index and get the column value with that index. # You can also know the column indexes beforehand, as they have the # same order as the order given in the query string. row_metadata = row.metadata first_name_index = row_metadata.find_column("first_name") first_name = row.get_object_with_index(first_name_index) print(employee_id, first_name)
SqlResult also has the following methods:
- .is_row_set() that returns whether or not the result has rows to iterate.
- .update_count() that returns the number of rows affected by the query or -1, if the result is a row-set.
- .get_row_metadata() that returns the metadata about the rows where you can get the column metadata, count, and indexes.
- .close() that closes the query execution and makes sure that the resources associated with the query on the server-side are released.
- iterator() to iterate over the result in a non-blocking manner. Most of the time, the iteration method shown above should be used, instead of this. See this code sample to understand how to use it.
All of these methods return Future objects, and you must call .result() on them to wait for their results or add callbacks that will run when their result is set.
Sorting
You can use the standard SQL clauses ORDER BY, LIMIT, and OFFSET to sort and limit the result set.
Note that, you must have SORTED indexes on the fields that you ORDER BY. You can add indexes dynamically with the client as below, ideally before putting data into the map.
employees.add_index(["salary"], "SORTED")
The following code snippet gets the top five employees with the highest salaries. The code snippet also shows the usage of the with statement (context manager) to automatically call .close() on the result to make sure that the server-side resources are cleaned properly.
with client.sql.execute(""" SELECT employee_id, first_name, last_name, salary FROM employees ORDER BY salary DESC LIMIT 5 """) as result: for row in result: print(row)
Unsupported Features
As of IMDG 4.2, we only support SELECT clauses but plan to add support for the following features in upcoming releases:
- INSERT / UPDATE / DELETE
- JOIN
- GROUP BY / HAVING
- Set operators (UNION, INTERSECT, MINUS)
- Subqueries (SELECT … FROM table WHERE x = (SELECT …))
See the following links for more information about the SQL feature.
- Python client documentation
- Python client SQL API documentation
- Python client code samples
- IMDG Reference Manual SQL section
Other Enhancements and Bug Fixes
As usual, the 4.2 Python client release includes some enhancements and bug fixes, apart from the SQL implementation. Refer to the release notes for more information.
Closing Words
We are very excited about the SQL feature and think that it is a great addition to the capabilities of the IMDG and the Python client. We will continue working hard to improve it. In the meantime, don’t hesitate to try it and share your feedback with us.
We will be waiting for your feedback on the Slack channel, Google Group, or Github repository. If you want to contribute to the Python client, please take a look at our issue list or talk with us through the communication channels listed above.
Stay safe, be well, and see you soon!