Enum serialization in Scala

Julien Truffaut

14th March 2023

Enum serialization in Scala

Enumerations are one of the most convenient features of the Scala programming language as they allow precise modeling of business domains. However, we eventually need to serialize our Scala objects, for example, if we want to save them in a database or to send them over the wire. This is when things get complicated, because most standard serialization formats don’t support enumerations very well.

In this article, I will go through various techniques to implement JSON serializers for Scala enumerations. I will use Scala 3 and the circe library (version 0.14.4).

You can find all the code samples in this github repository.

Case 1: Enumeration without data

The simplest scenario is when when no branch of the enumeration contains data. Let's take the example of an online subscription with two possible values: Free and Premium.

enum Subscription(val id: String) { case Free extends Subscription("FREE") case Premium extends Subscription("PREMIUM") }

To serialize a Subscription to JSON, we need to define an Encoder and a Decoder from the circe library. The Encoder defines how to transform a Subscription into JSON, while the Decoder performs the reverse transformation: JSON to Subscription.

object Subscription { given Encoder[Subscription] = ??? given Decoder[Subscription] = ??? }

We will encode a Subscription with a JSON String so that the case Free maps to the String "FREE" and the case Premium maps to the String "PREMIUM". To do that, we can use the method Encoder.instance, which creates an Encoder from a function Subscription => Json.

import io.circe.syntax.* given Encoder[Subscription] = Encoder.instance(subscription => subscription.id.asJson)

Note that I used the method asJson from the io.circe.syntax package to transform a Scala String into a JSON String.

Here is a more idiomatic implementation using the method contramap:

given Encoder[Subscription] = Encoder[String].contramap(_.id)

This implementation asks circe to convert the default Encoder[String] into an Encoder[Subscription] using the id field.

Next, let's implement the Decoder. It is a bit more complicated because we need to handle two failure scenarios:

  1. when the JSON element is not a JSON String.
  2. when the JSON String is not one of the two correct values.
given Decoder[Subscription] = Decoder.instance(cursor => for { str <- cursor.as[String] subscription <- str match case "FREE" => Right(Free) case "PREMIUM" => Right(Premium) case other => Left(DecodingFailure(CustomReason(s"$other is not a valid Subscription"), cursor)) } yield subscription )

Let's break it down:

  1. cursor is a circe object which permits moving inside a JSON object.
  2. cursor.as[String] parses the current JSON element into a String. If the JSON element is not a String, it returns a DecodingFailure (with an Either).
  3. Finally, a pattern match checks if the String is either "FREE" or "PREMIUM".

It is a lot of code for a simple Decoder; as you might have guessed, there is a much simpler implementation. First, let’s implement a function to parse a String into a Subscription.

def parseId(id: String): Either[String, Subscription] = Subscription .values .find(_.id == id) .toRight(s"$id is not a valid Subscription")

Note the use of Subscription.values, a new feature from Scala 3 that lists all possible branches of the enumeration. If you are using Scala 2, you either need to enumerate all branches manually or use a library such as enumeratum.

Then, the Decoder implementation is rather straightforward using the method emap, which stands for "error map" or "map with error":

given Decoder[Subscription] = Decoder[String].emap(parseId)

Let's put everything together:

import io.circe.{Encoder, Decoder} enum Subscription(val id: String) { case Free extends Subscription("FREE") case Premium extends Subscription("PREMIUM") } object Subscription { def parseId(id: String): Either[String, Subscription] = values .find(_.id == id) .toRight(s"$id is not a valid Subscription") given Encoder[Subscription] = Encoder[String].contramap(_.id) given Decoder[Subscription] = Decoder[String].emap(parseId) }

Case 2: Enumeration with data

This is the most complex scenario where each branch of the enumeration contains different data. For example, let’s imagine we work for an online newspaper with three kinds of users: The readers – our clients who read the newspaper. Each reader has a subscription which is either free or premium (see case 1). The editors – our employees who write articles for the newspaper. Each editor has a profile bio and a favorite font. The administrators – these are power users who can access and modify everything.

We can encode these three different kinds of user using an enumeration called Role:

enum Role { case Reader(subscription: Subscription) case Editor(profileBio: String, favoriteFont: String) case Admin }

The question is: how can we serialize a Role to JSON? We have a few different options.

Solution 1: Individual codecs

Reader, Editor and Admin are all case classes/objects. So, we could define a codec (encoder + decoder) for each class and then combine them together into a codec for Role.

import io.circe.{Encoder, Json} import io.circe.syntax.* given readerEncoder: Encoder[Reader] = Encoder.instance { reader => Json.obj("subscription" -> reader.subscription.asJson) } given editorEncoder: Encoder[Editor] = Encoder.instance { editor => Json.obj( "profileBio" -> editor.profileBio.asJson, "favoriteFont" -> editor.favoriteFont.asJson, ) } given adminEncoder: Encoder[Admin.type] = Encoder.instance { admin => Json.obj() } given Encoder[Role] = Encoder.instance { case x: Reader => readerEncoder(x) case x: Editor => editorEncoder(x) case Admin => adminEncoder(Admin) }

Let’s have a look at some serialization examples:

Reader(Premium).asJson.spaces2 // res: String = { "subscription" : "PREMIUM" } Editor("John is the winner of ...", "Comic Sans").asJson.spaces2 // res: String = { "profileBio" : "John is the winner of ...", "favoriteFont" : "Comic Sans" } Admin.asJson.spaces2 // res: String = { }

Note that spaces2 is a JSON formatter from the circe library and that Admin is serialized into an empty object since it doesn’t contain any data.

Let’s repeat the same process for the Decoder:

given readerDecoder: Decoder[Reader] = Decoder.instance(cursor => for { subscription <- cursor.downField("subscription").as[Subscription] } yield Reader(subscription) ) given editorDecoder: Decoder[Editor] = Decoder.instance(cursor => for { profileBio <- cursor.downField("profileBio").as[String] favoriteFont <- cursor.downField("favoriteFont").as[String] } yield Editor(profileBio, favoriteFont) ) given adminDecoder: Decoder[Admin.type] = Decoder.instance(cursor => for { obj <- cursor.as[JsonObject] } yield Admin )

Note that I used the method downfield to move the cursor inside the JSON object at a particular key.

Now that we have a Decoder for each branch of Role, we can combine them using the method or. This method describes a fallback logic: try to decode the user into a Reader, and if it doesn’t work, try to decode it into an Editor. If it still doesn’t work, try to decode it into an Admin.

import cats.implicits.* given Decoder[Role] = readerDecoder.widen[Role] .or(editorDecoder.widen[Role]) .or(adminDecoder.widen[Role])

Unfortunately, I had to use the method widen from cats to transform the Decoder[Reader], Decoder[Editor] and Decoder[Admin] into a Decoder[Role]. This wouldn’t be necessary if the trait Decoder was defined using variance (I am not sure why circe developers used invariant traits).

Let’s test the Decoder:

import io.circe.parser.decode decode[Role]("""{"subscription":"FREE"}""") // res: Either[Error, Role] = Right(Reader(Free)) decode[Role]("""{"profileBio":"foo","favoriteFont":"Comic Sans"}""") // res: Either[Error, Role] = Right(Editor("foo","Comic Sans")) decode[Role]("""{}""") // res: Either[Error, Role] = Right(Admin)

It works fine for the happy paths, when the JSON is well formed but it produces surprising results when the JSON is invalid:

decode[Role]("""{"subscription":"GENESIS”}""") // res: Either[Error, Role] = Right(Admin)

The issue is that the admin Decoder is too permissive. It considers all JSON objects as valid instead of accepting only empty JSON objects. Let’s fix that and retry:

given adminDecoder: Decoder[Admin.type] = Decoder.instance(cursor => for { obj <- cursor.as[JsonObject] _ <- if(obj.isEmpty) Right(Admin) else Left(DecodingFailure(CustomReason(s"JSON is not a valid Admin"), cursor)) } yield Admin ) decode[Role]("""{"subscription":"GENESIS”}""") // res: Either[Error, Role] = Left("DecodingFailure at : JSON is not a valid Admin")

The Decoder correctly identifies that the JSON is invalid, but the error message is confusing. It says that the JSON is not a valid Admin, whereas we expected the error to mention that “GENESIS” is not a valid Subscription. The problem comes from the fallback logic of the Decoder[Role]. When the JSON is invalid, we only get the error message from the last Decoder.

We can see the error message produced by each branch of the enumeration:

decode[Reader]("""{"subscription":"GENESIS”}""") // res: Either[Error, Reader] = Left("DecodingFailure at .subscription: GENESIS is not a valid Subscription") decode[Editor]("""{"subscription":"GENESIS”}""") // res: Either[Error, Editor] = Left("DecodingFailure at .profileBio: Missing required field") decode[Admin.type]("""{"subscription":"GENESIS”}""") // res: Either[Error, Admin.type] = Left("DecodingFailure at : JSON is not a valid Admin")

So if we could identify that the JSON is a Reader, we would be able to produce a useful error message. This leads to the second solution.

Solution 2: Individual codecs with discriminator

I will repeat the implementation of Solution 1, but I will add a field to describe the type of Role. This field is called a discriminator.

given readerEncoder: Encoder[Reader] = Encoder.instance { reader => Json.obj( "type" -> "READER".asJson, "subscription" -> reader.subscription.asJson ) } given editorEncoder: Encoder[Editor] = Encoder.instance { editor => Json.obj( "type" -> "EDITOR".asJson, "profileBio" -> editor.profileBio.asJson, "favoriteFont" -> editor.favoriteFont.asJson, ) } given adminEncoder: Encoder[Admin.type] = Encoder.instance { admin => Json.obj("type" -> "ADMIN".asJson) } given Encoder[Role] = Encoder.instance { case x: Reader => readerEncoder(x) case x: Editor => editorEncoder(x) case Admin => adminEncoder(Admin) }

Let’s have a look at an example:

Reader(Premium).asJson.spaces2 // res: String = { "type" : "READER", "subscription" : "PREMIUM" }

Then, we can use the discriminator to improve the Decoder:

given Decoder[Role] = Decoder.instance(cursor => for { discriminator <- cursor.downField("type").as[String] role <- discriminator match case "READER" => readerDecoder(cursor) case "EDITOR" => editorDecoder(cursor) case "ADMIN" => adminDecoder(cursor) case other => Left(DecodingFailure(CustomReason(s"invalid role $other"), cursor.downField("type"))) } yield role )

Let’s check the error message when the JSON is invalid:

decode[Role]("""{"type":"READER","subscription":"GENESIS"}""") // res: Either[Error, Role] = Left("DecodingFailure at .subscription: GENESIS is not a valid Subscription") decode[Role]("""{"type":"TESTER","subscription":"GENESIS"}""") // res: Either[Error, Role] = Left("DecodingFailure at .type: invalid role TESTER")

Great, this is exactly what we wanted! Additionally, the JSON parsing is probably faster with a discriminator, because we don’t need to try all branches of the enumeration when the JSON is invalid (though I haven’t benchmarked the two solutions). The only issue remaining is that it requires lots of code to define a codec for a simple enumeration. Let’s see how we can improve that.

Solution 3: Codec derivation with discriminator

Most serialization libraries support the automatic implementation of codecs for case classes. What’s particularly nice with circe is that it also supports the derivation of codecs for enumeration, including the discriminator pattern! This is what it looks like:

import io.circe.Codec import io.circe.derivation.Configuration enum Role { case Reader(subscription: Subscription) case Editor(profileBio: String, favoriteFont: String) case Admin } object Role { given Configuration = Configuration.default .withDiscriminator("type") .withTransformConstructorNames(_.toUpperCase) given Codec[Role] = Codec.AsObject.derivedConfigured }

Note that Codec is a convenient trait that extends Encoder and Decoder. So we can either define one Codec or one Encoder and one Decoder.

The Configuration object is used to tweak the Codec derivation. Here, we specify that we want to use a discriminator field called “type”, and the value of the discriminator should be the name of the constructor (Reader, Editor, Admin) uppercased.

Conclusion

In summary, we saw that we can serialize a Scala enumeration by defining a codec for each branch of the enumeration and then combining these codec together with:

  1. a fallback logic, or
  2. a discriminator field.

The discriminator is the best approach, as it leads to better error messages and deserialization performance.

Additionally, this serialization strategy is not specific to JSON. We can use the same pattern when encoding an enumeration to CSV or even when saving it to a database. For example, we could use the following schema to save users to postgres:

CREATE TABLE users ( id TEXT NOT NULL, type TEXT NOT NULL, subscription TEXT, profile_bio TEXT, favorite_font TEXT ); INSERT INTO users (id, type, subscription, profile_bio, favorite_font) VALUES ('0001', 'READER', 'FREE', null, null), ('0002', 'EDITOR', null, 'John is the winner of ...', 'Comic Sans'), ('0003', 'ADMIN' , null, null, null);

What do you think of this approach? Do you have a better one to share with us? Please comment on reddit.

Subscribe to receive the latest Scala jobs in your inbox

Receive a weekly overview of Scala jobs by subscribing to our mailing list

© 2024 ScalaJobs.com, All rights reserved.