Hazelcast Python Client 4.2 is Released!

We have released the Python client 4.2, featuring the brand new SQL support! You can start using it to query your data efficiently with the widely-known standard SQL syntax.

You can install the latest version through PyPI by issuing the following command.

pip install -U hazelcast-python-client

SQL Support

For a long time, Hazelcast IMDG and the Hazelcast Python client have supported querying the data you store in distributed maps using the Predicates with SQL-like syntax. Yet, the existing query engine had its limitations and there were strong voices from the community for additional features and support for standard SQL syntax.

Listening to these, we introduced a preview version in IMDG 4.1 of the brand-new, state-of-the-art SQL engine. Although there is a long way to go, we have improved it a lot in IMDG 4.2 and started working on the client languages to bring that feature to widespread use.

The new SQL engine:

  • supports many new expressions.
  • supports sorting and limiting.
  • is able to query large amounts of data without putting too much strain on the client and server by fetching the results page by page.
  • has an option to make use of the new, high-performance, concurrent off-heap B+ tree+
  • has more advanced query optimizations.

The 4.2 version of the Python client brings these new functionalities to the Python world!

Note: The SQL feature for the Python client and IMDG is in beta state and we only guarantee compatibility with the same minor versions of the client and server.

Usage

For the rest of the post, we will assume a map called employees is filled with values of the following Portable class.

class Employee(Portable):
    FACTORY_ID = 1
    CLASS_ID = 1

    def __init__(self, employee_id=None, first_name=None, last_name=None, age=None, salary=None):
        self.employee_id = employee_id
        self.first_name = first_name
        self.last_name = last_name
        self.age = age
        self.salary = salary

    def write_portable(self, writer):
        writer.write_int("employee_id", self.employee_id)
        writer.write_string("first_name", self.first_name)
        writer.write_string("last_name", self.last_name)
        writer.write_int("age", self.age)
        writer.write_int("salary", self.salary)

    def read_portable(self, reader):
        self.employee_id = reader.read_int("employee_id")
        self.first_name = reader.read_string("first_name")
        self.last_name = reader.read_string("last_name")
        self.age = reader.read_int("age")
        self.salary = reader.read_int("salary")

    def get_factory_id(self):
        return Employee.FACTORY_ID

    def get_class_id(self):
        return Employee.CLASS_ID


client = hazelcast.HazelcastClient(portable_factories={
    Employee.FACTORY_ID: {
        Employee.CLASS_ID: Employee,
    },
})

employees = client.get_map("employees").blocking()

# Fill the employees map

employees.set(1, Employee(1, "John", "Doe", 42, 60000))
...

In Python client 4.2, there is a new service called SQL, that is capable of executing queries on distributed maps.

You can simply use the execute() method to run queries, by just passing the query string and parameters, if any. Note that, the parameters are specified with the question marks in query strings.

result = client.sql.execute("SELECT employee_id, first_name FROM employees WHERE age < ?", 50)

Or, you can construct a SqlStatement to have a greater amount of control over the query execution by configuring cursor buffer size (page size), timeout, expected result type, and parameters. A SqlStatement can be passed into the execute_statement() method.

statement = SqlStatement("SELECT * FROM employees")
statement.cursor_buffer_size = 200

result = client.sql.execute_statement(statement)

Both of these methods return an SqlResult, which implements the iterator interface for ease of use. You can iterate over the rows of the result and get the column values of rows either by the column name or index.

result = client.sql.execute("SELECT employee_id, first_name FROM employees WHERE age < ?", 50)

for row in result:
    # Get the column value with the column name
    employee_id = row.get_object("employee_id")

    # Find the column index and get the column value with that index.
    # You can also know the column indexes beforehand, as they have the
    # same order as the order given in the query string.
    row_metadata = row.metadata
    first_name_index = row_metadata.find_column("first_name")
    first_name = row.get_object_with_index(first_name_index)

    print(employee_id, first_name)

SqlResult also has the following methods:

  • .is_row_set() that returns whether or not the result has rows to iterate.
  • .update_count() that returns the number of rows affected by the query or -1, if the result is a row-set.
  • .get_row_metadata() that returns the metadata about the rows where you can get the column metadata, count, and indexes.
  • .close() that closes the query execution and makes sure that the resources associated with the query on the server-side are released.
  • iterator() to iterate over the result in a non-blocking manner. Most of the time, the iteration method shown above should be used, instead of this. See this code sample to understand how to use it.

All of these methods return Future objects, and you must call .result() on them to wait for their results or add callbacks that will run when their result is set.

Sorting

You can use the standard SQL clauses ORDER BY, LIMIT, and OFFSET to sort and limit the result set.

Note that, you must have SORTED indexes on the fields that you ORDER BY. You can add indexes dynamically with the client as below, ideally before putting data into the map.

employees.add_index(["salary"], "SORTED")

The following code snippet gets the top five employees with the highest salaries. The code snippet also shows the usage of the with statement (context manager) to automatically call .close() on the result to make sure that the server-side resources are cleaned properly.

with client.sql.execute("""
SELECT
    employee_id, first_name, last_name, salary
FROM
    employees
ORDER BY salary DESC
LIMIT 5
""") as result:
    for row in result:
        print(row)

Unsupported Features

As of IMDG 4.2, we only support SELECT clauses but plan to add support for the following features in upcoming releases:

  • INSERT / UPDATE / DELETE
  • JOIN
  • GROUP BY / HAVING
  • Set operators (UNION, INTERSECT, MINUS)
  • Subqueries (SELECT …​ FROM table WHERE x = (SELECT …))

See the following links for more information about the SQL feature.

Other Enhancements and Bug Fixes

As usual, the 4.2 Python client release includes some enhancements and bug fixes, apart from the SQL implementation. Refer to the release notes for more information.

Closing Words

We are very excited about the SQL feature and think that it is a great addition to the capabilities of the IMDG and the Python client. We will continue working hard to improve it. In the meantime, don’t hesitate to try it and share your feedback with us.

We will be waiting for your feedback on the Slack channel, Google Group, or Github repository. If you want to contribute to the Python client, please take a look at our issue list or talk with us through the communication channels listed above.

Stay safe, be well, and see you soon!

Keep Reading