What is Serialization?

Serialization is the process of converting a data object—a combination of code and data represented within a region of data storage—into a series of bytes that saves the state of the object in an easily transmittable form. In this serialized form, the data can be delivered to another data store (such as an in-memory computing platform), application, or some other destination.

Serialization Diagram
Data serialization is the process of converting an object into a stream of bytes to more easily save or transmit it.

The reverse process—constructing a data structure or object from a series of bytes—is deserialization. The deserialization process recreates the object, thus making the data easier to read and modify as a native structure in a programming language.

Serialization-Deserialization Diagram
Serialization and deserialization work together to transform/recreate data objects to/from a portable format.

Serialization enables us to save the state of an object and recreate the object in a new location. Serialization encompasses both the storage of the object and exchange of data. Since objects are composed of several components, saving or delivering all the parts typically requires significant coding effort, so serialization is a standard way to capture the object into a sharable format.

With serialization, we can transfer objects:

  • Over the wire for messaging use cases
  • From application to application via web services such as REST APIs
  • Through firewalls (as JSON or XML strings)
  • Across domains
  • To other data stores
  • To identify changes in data over time
  • While honoring security and user-specific details across applications

Why Is Data Serialization Important for Distributed Systems?

In some distributed systems, data and its replicas are stored in different partitions on multiple cluster members. If data is not present on the local member, the system will retrieve that data from another member. This requires serialization for use cases such as:

  • Adding key/value objects to a map
  • Putting items into a queue, set, or list
  • Sending a lambda functions to another server
  • Processing an entry within a map
  • Locking an object
  • Sending a message to a topic

What Are Common Languages for Data Serialization?

A number of popular object-oriented programming languages provide either native support for serialization or have libraries that add non-native capabilities for serialization to their feature set. Java, .NET, C++, Node.js, Python, and Go, for example, all either have native serialization support or integrate with serializer libraries.

Data formats such as JSON and XML are often used as the format for storing serialized data. Customer binary formats are also used, which tend to be more space-efficient due to less markup/tagging in the serialization.

What Is Data Serialization in Big Data?

Big data systems often include technologies/data that are described as “schemaless.” This means that the managed data in these systems are not structured in a strict format, as defined by a schema. Serialization provides several benefits in this type of environment:

  • Structure. By inserting some schema or criteria for a data structure through serialization on read, we can avoid reading data that misses mandatory fields, is incorrectly classified, or lacks some other quality control requirement.
  • Portability. Big data comes from a variety of systems and may be written in a variety of languages. Serialization can provide the necessary uniformity to transfer such data to other enterprise systems or applications.
  • Versioning. Big data is constantly changing. Serialization allows us to apply version numbers to objects for lifecycle management.

Related Topics