A Look Back: Hazelcast’s Summer of 2022 Intern Program

Sophia Chang was a Hazelcast intern in the summer of 2022 and is a Data Science major at the University of California, Berkeley.

I was in my junior year of college and sending applications out to companies for internships for work experience and seeing what I could do outside of the classroom. I chose Hazelcast because they worked in a field I found unique while also having room to work on ideas I found interesting prior to my internship.

The project I decided to work on was a movie recommendation system. The user could input a movie title, run a pipeline runner that funnels the title through a Python script and then write out the results. Having a pipeline runner allows processing large amounts of data across all the cores of a CPU, meaning lower latency and time complexity. For the recommender, the imdb dataset I trained on had a decently large amount of films and outputs several movies in response to one single liked film. This program has a request-response access pattern with a one-to-many relationship. In response to a request containing the title of one example movie, the recommendation system will generate a response of several movies similar to the example.

The rationale behind such a project was that I wanted to implement a demonstration that was not only related to what I studied in class, but also was approachable to many people. Because, let’s face it, people love movies: film suggestions are easy to grasp. This project also lent itself to visually appealing, interactive experiences where clicking on a single movie poster in a UI could return a webpage full of related movies, each displayed as its original cinematic release poster. I knew this was possible because the IMDB database includes cinemartic poster artwork. To attract more people, interactivity and visually interesting interfaces would need to be present. Furthermore, movie recommendation algorithms are of interest to entertainment enterprises, so this system can be used to advertise to them.

There was a lot to learn about the Hazelcast Platform in a short amount of time. Luckily, there were video tutorials and “lab exercises” I made use of that gave me a grasp of the technology. Most of my work was done with stream processing pipelines rather than with in-memory storage. Hence, Hazelcast’s low-latency, in-memory data store makes an ideal feature store for things like the IMDB data used by my ML model, which is also a thing I want to work with (and integrate into the project) in the future.

I had a little experience working in a company before during high school, but what I did there had more of an IT bent than a computer or data science bent, so I was still a bit unprepared about how working in the industry differed from working on projects in academia. In general, working on projects in a company (rather than in college) exposed me to how different things can work out (or, in my case, not work at all) in Unix systems vs. Windows. A VirtualBbox running Ubuntu was a godsend for running and testing my project during the later stages when working on the pipeline app. This was necessary since running Python apps through the Hazelcast pipeline is, as of this writing, impossible on Windows. Other than that, it was the first time I worked on a single project that utilized multiple programming languages (Java and Python) and where I had an immense amount of freedom to define its scope and direction.

Tuning the model took a large amount of time, including the time I spent trying to find one pesky error that turned out to be much simpler to solve than expected, so sadly there was much I could not get to with regards to my initial plan, like creating a visual interface. In its current state the program is mainly a command line interface that spits out a file, though with further work a website would be great. Initially I thought it was a good idea to use IMDB tags, not realizing that since anyone could make and add tags, they would actually be almost useless for recommendations. I realized this when in the initial stages having Toy Story as a liked film recommended The Shining. Even reducing the weight that IMDB tags held in the overall algorithm did nothing, so eventually I decided to take that out completely and now inputting Toy Story returns a completely age-appropriate list (mostly other Disney/Pixar animated films like Frozen, The Lion King, Monster Inc, etc.) with no unexpected picks.

My mentor/manager, Lucas Beeler, was extraordinarily supportive in helping me plan the project and getting used to the Hazelcast Platform. Daily standups about my progress were common, soon leading to Zoom collaborations and in-person meetings for further work on the project. In addition, I also asked about in the company Slack server about how to integrate the Python scripts and troubleshooting a problem (that turned out to be a simple issue of case sensitivity in the string input), receiving many helpful responses from Frantisek, who built the Python runner and was based in the Czech Republic. Coordinating times was a hassle due to the time difference but it went well and was much appreciated.

My time working at Hazelcast exposed me to a great degree of new technical skills and helpful people to work with and ask questions of. As an aspiring data scientist, it also gave me new ideas of workplace problems beyond the ‘things that happen in real life but we also have to structure it for a classroom format’ projects on campus. In the classroom, we try to model problems that can happen in real life, but many of the exercises are contrived. Working in real-life is something decidedly different. For example, most of my technical classes don’t talk about proper logging and debugging practices. The logging system that you select can be the difference between knowing why your project is failing and being left in the dark, which is part of why the issue with case sensitivity persisted for so long. The movie recommender application was a great way to synthesize my personal interest in data-related topics, what I have learned, and new knowledge from running data through pipelines. I am definitely glad about working here and would do so again; I would wholeheartedly recommend interning at Hazelcast to anyone.