Blog ›How Real-Time Stream Processing Makes Machine Learning More Meaningful

By Ed Sandoval

AI/ML Solution Lead

Ed Sandoval is an AI/ML Solution Lead at Hazelcast. He is based out of the UK and has 8 years experience in Product Management positions building AI/ML powered products. At Salesforce, he worked on a number AI/ML products in the Salesforce Einstein suite including: Einstein Prediction Builder, Einstein Case Classification, Einstein Recommendation Builder, Einstein Vision & Language amongst others. At Redis, he brought to market Vector Similarity search technology. He also led strategic collaborations with Tecton/FEAST. Prior to that, he was an Enterprise Architect at HP Enterprise Services. He holds a Masters in Software Engineering from University of Oxford.

View all blogs by the author

Jul 10, 2023

Back to Blog

How Real-Time Stream Processing Makes Machine Learning More Meaningful

What do you get when you combine machine learning and real-time stream processing? The secret sauce of real-time machine learning … A machine learning powerhouse … The Holy Grail of machine learning … You get the idea.

To illustrate the potential of machine learning combined with real-time stream processing, let’s look at fraud detection. This application hits close to home because most companies accept some form of digital payment and because of the far-reaching impact of fraud. Indeed, whether digital payments are made online or mobile, or in a physical retail space, the price of slow or reduced fraud detection accuracy is extremely high. For example, financial services companies absorb the penalty of false negative and false positive results through loss of fee revenue, compromised customer relationships, and losses due to fraud. Merchants, meanwhile, may lose sales and incur higher processing fees and fraud losses.

Just how long do these institutions have to detect fraud? Milliseconds. It is a small window of time to gather relevant historical and recent patterns of behavior from the limited amount of data in a transaction — a credit card number, an amount, a date — and to map that data to an existing customer and a merchant. With that context, a fraud detection algorithm calculates the probability of fraud, and the card processing provider decides whether that probability is high enough to grant a stop to the transaction.

But what if the process could be optimized? If the amount of time it takes to gather relevant information and run the fraud detection model could be reduced, additional checks could be applied in the same small window of time available between swiping a card and determining whether a transaction should be approved.

And this is where Hazelcast enters the equation. With your real-time machine learning models running on the Hazelcast Stream Processing Platform, you can deliver fraud detection at scale thanks to unified data store and stream processing engine capabilities. Compared to building a pipeline piece-by-piece, the unified real-time data platform enables a sophisticated real-time fraud detection data pipeline that meets the most demanding throughput, latency, and performance requirements.

For example, you can store relevant cardholder and merchant data in a highly distributed, low-latency data store, reducing time wasted fetching data from siloed sources, and utilizing the stream processing capability to establish the latest card usage patterns. And because this is in a single platform versus multiple systems from multiple vendors, your ML models can run next to the customer and merchant data necessary to render a decision for the transaction. Thus, you’ve built a sophisticated, scalable, real-time fraud detection pipeline with all of these capabilities (and more) in a single runtime.

With the additional time afforded by optimizing fraud detection execution, the net result is more checks to catch fraud, resulting in better peace of mind for you, the merchant, and the customer — both of whom are still in the middle of a transaction.

Traditionally, companies architecting state-of-the-art fraud detection pipelines must piece together several technologies. You need a prediction service that accepts fraud prediction requests (compute intensive), retrieves customer and merchant features (low-latency data store – memory intensive), and calculates recent card use patterns (stream processing – compute intensive). All of these are usually running in separate software and infrastructure stacks. You also need the fraud detection ML model (compute), built on Python (compute intensive), running on yet another software and infrastructure stack.

Inevitably, this architecture leads to a complex implementation of technologies and teams from multiple vendors. This fraud detection solution’s maintenance and ongoing operation must factor into long-term maintenance plans and costs. Moreover, piecing these technologies together is not trivial, regardless of whether you plan to run the application on-premise or via a cloud provider. There will invariably be gaps and integration issues among different products. Not to mention, the more hops necessary to communicate with different products and platforms will increase latency, the opposite of what you are going for with real-time fraud detection.

Hazelcast integrates stream processing and a fast data store into a single runtime; not only does this simplify the overall architecture, it lowers TCO by taking advantage of all underlying infrastructure (cores and memory) available to the Hazelcast cluster.

With Hazelcast, a real-time fraud detection pipeline leveraging ML typically consists of 5 stages:

Ingest – pull in transactions from source
Enrich – contextualize incoming transaction (for example, customer, merchant, and recent behavior patterns)
Transform – features into model inputs
Predict – run machine learning model on Hazelcast (Python, ONNX, Pytorch, Tensorflow)
Act – apply additional emerging fraud checks, run model explainability, store predictions (and explanations) for future analysis, check model performance, and alert on model performance degradation to signal model re-training needs to data scientists

The applications for the kind of real-time machine learning enabled by Hazelcast are almost endless. Think of all the ways in which healthcare, telecommunications, e-commerce, manufacturing, transportation … you name it … could benefit from the seamless integration of your machine learning models with historical and real-time data that Hazelcast powers.

For a detailed demonstration of Hazelcast, check out our on-demand talk about fraud detection.

You can also attend and watch our upcoming webinar on July 27 with Finextra on how management of real-time data can be used to combat fraudsters’ sophisticated attack vectors in a time-effective manner. Register today!

Keep Reading

Ref Card

/ PDF

/ 6 pages

Java Caching Essentials

Webinar

/ Video

/ 45 min

Building Resilient Distributed Java Systems: From Failure to Recovery

This webinar dives into practical observability and resiliency techniques for distributed Java environments. We’ll highlight key patterns, failure signals, and metrics that matter, backed by a live demo using Hazelcast, Chaos-mesh, Prometheus, and Grafana.

Webinar

/ Video

/ 60 min

Beyond the Database: Evolving Data Architectures for Real-Time Systems

Blog

Hazelcast collaborates with IBM to boost Real-Time Data Solutions on Mainframe

70% of all global financial transactions go through the IBM mainframe that has industry leading availability of 8x9s — equivalent…

Blog

Turbocharging Java Reflection Performance with MethodHandle

Reflection in Java is a powerful feature that allows introspection and manipulation of the application at runtime. However, it comes…

Webinar

/ Video

/ 60 min

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Ed Sandoval

How Real-Time Stream Processing Makes Machine Learning More Meaningful

Keep Reading

Java Caching Essentials

Building Resilient Distributed Java Systems: From Failure to Recovery

Beyond the Database: Evolving Data Architectures for Real-Time Systems

Hazelcast collaborates with IBM to boost Real-Time Data Solutions on Mainframe

Turbocharging Java Reflection Performance with MethodHandle

Modernizing Payment Processing Architectures

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Ed Sandoval

Spread the Word

How Real-Time Stream Processing Makes Machine Learning More Meaningful

Keep Reading

Java Caching Essentials

Building Resilient Distributed Java Systems: From Failure to Recovery

Beyond the Database: Evolving Data Architectures for Real-Time Systems

Hazelcast collaborates with IBM to boost Real-Time Data Solutions on Mainframe

Turbocharging Java Reflection Performance with MethodHandle

Modernizing Payment Processing Architectures

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect