Julien Truffaut
14th March 2023
Enumerations are one of the most convenient features of the Scala programming language as they allow precise modeling of business domains. However, we eventually need to serialize our Scala objects, for example, if we want to save them in a database or to send them over the wire. This is when things get complicated, because most standard serialization formats don’t support enumerations very well.
In this article, I will go through various techniques to implement JSON serializers for Scala enumerations. I will use Scala 3 and the circe library (version 0.14.4).
You can find all the code samples in this github repository.
The simplest scenario is when when no branch of the enumeration contains data. Let's take the example of an online subscription with two possible values: Free and Premium.
enum Subscription(val id: String) {
case Free extends Subscription("FREE")
case Premium extends Subscription("PREMIUM")
}
To serialize a Subscription
to JSON, we need to define an Encoder
and a Decoder
from the circe library. The Encoder
defines how to transform a Subscription
into JSON, while the Decoder
performs the reverse transformation: JSON to Subscription
.
object Subscription {
given Encoder[Subscription] = ???
given Decoder[Subscription] = ???
}
We will encode a Subscription
with a JSON String so that the case Free
maps to the String "FREE" and the case Premium
maps to the String "PREMIUM". To do that, we can use the method Encoder.instance
, which creates an Encoder
from a function Subscription => Json
.
import io.circe.syntax.*
given Encoder[Subscription] =
Encoder.instance(subscription => subscription.id.asJson)
Note that I used the method asJson
from the io.circe.syntax
package to transform a Scala String
into a JSON String.
Here is a more idiomatic implementation using the method contramap
:
given Encoder[Subscription] =
Encoder[String].contramap(_.id)
This implementation asks circe to convert the default Encoder[String]
into an Encoder[Subscription]
using the id
field.
Next, let's implement the Decoder
. It is a bit more complicated because we need to handle two failure scenarios:
given Decoder[Subscription] =
Decoder.instance(cursor =>
for {
str <- cursor.as[String]
subscription <- str match
case "FREE" => Right(Free)
case "PREMIUM" => Right(Premium)
case other => Left(DecodingFailure(CustomReason(s"$other is not a valid Subscription"), cursor))
} yield subscription
)
Let's break it down:
cursor
is a circe object which permits moving inside a JSON object.cursor.as[String]
parses the current JSON element into a String. If the JSON element is not a String, it returns a DecodingFailure
(with an Either
).It is a lot of code for a simple Decoder
; as you might have guessed, there is a much simpler implementation. First, let’s implement a function to parse a String
into a Subscription
.
def parseId(id: String): Either[String, Subscription] =
Subscription
.values
.find(_.id == id)
.toRight(s"$id is not a valid Subscription")
Note the use of Subscription.values
, a new feature from Scala 3 that lists all possible branches of the enumeration. If you are using Scala 2, you either need to enumerate all branches manually or use a library such as enumeratum.
Then, the Decoder
implementation is rather straightforward using the method emap
, which stands for "error map" or "map with error":
given Decoder[Subscription] =
Decoder[String].emap(parseId)
Let's put everything together:
import io.circe.{Encoder, Decoder}
enum Subscription(val id: String) {
case Free extends Subscription("FREE")
case Premium extends Subscription("PREMIUM")
}
object Subscription {
def parseId(id: String): Either[String, Subscription] =
values
.find(_.id == id)
.toRight(s"$id is not a valid Subscription")
given Encoder[Subscription] =
Encoder[String].contramap(_.id)
given Decoder[Subscription] =
Decoder[String].emap(parseId)
}
This is the most complex scenario where each branch of the enumeration contains different data. For example, let’s imagine we work for an online newspaper with three kinds of users: The readers – our clients who read the newspaper. Each reader has a subscription which is either free or premium (see case 1). The editors – our employees who write articles for the newspaper. Each editor has a profile bio and a favorite font. The administrators – these are power users who can access and modify everything.
We can encode these three different kinds of user using an enumeration called Role
:
enum Role {
case Reader(subscription: Subscription)
case Editor(profileBio: String, favoriteFont: String)
case Admin
}
The question is: how can we serialize a Role
to JSON? We have a few different options.
Reader
, Editor
and Admin
are all case classes/objects. So, we could define a codec (encoder + decoder) for each class and then combine them together into a codec for Role
.
import io.circe.{Encoder, Json}
import io.circe.syntax.*
given readerEncoder: Encoder[Reader] =
Encoder.instance { reader =>
Json.obj("subscription" -> reader.subscription.asJson)
}
given editorEncoder: Encoder[Editor] =
Encoder.instance { editor =>
Json.obj(
"profileBio" -> editor.profileBio.asJson,
"favoriteFont" -> editor.favoriteFont.asJson,
)
}
given adminEncoder: Encoder[Admin.type] =
Encoder.instance { admin => Json.obj() }
given Encoder[Role] =
Encoder.instance {
case x: Reader => readerEncoder(x)
case x: Editor => editorEncoder(x)
case Admin => adminEncoder(Admin)
}
Let’s have a look at some serialization examples:
Reader(Premium).asJson.spaces2
// res: String = {
"subscription" : "PREMIUM"
}
Editor("John is the winner of ...", "Comic Sans").asJson.spaces2
// res: String =
{
"profileBio" : "John is the winner of ...",
"favoriteFont" : "Comic Sans"
}
Admin.asJson.spaces2
// res: String = { }
Note that spaces2
is a JSON formatter from the circe library and that Admin
is serialized into an empty object since it doesn’t contain any data.
Let’s repeat the same process for the Decoder
:
given readerDecoder: Decoder[Reader] =
Decoder.instance(cursor =>
for {
subscription <- cursor.downField("subscription").as[Subscription]
} yield Reader(subscription)
)
given editorDecoder: Decoder[Editor] =
Decoder.instance(cursor =>
for {
profileBio <- cursor.downField("profileBio").as[String]
favoriteFont <- cursor.downField("favoriteFont").as[String]
} yield Editor(profileBio, favoriteFont)
)
given adminDecoder: Decoder[Admin.type] =
Decoder.instance(cursor =>
for {
obj <- cursor.as[JsonObject]
} yield Admin
)
Note that I used the method downfield
to move the cursor inside the JSON object at a particular key.
Now that we have a Decoder
for each branch of Role
, we can combine them using the method or
. This method describes a fallback logic: try to decode the user into a Reader
, and if it doesn’t work, try to decode it into an Editor
. If it still doesn’t work, try to decode it into an Admin
.
import cats.implicits.*
given Decoder[Role] =
readerDecoder.widen[Role]
.or(editorDecoder.widen[Role])
.or(adminDecoder.widen[Role])
Unfortunately, I had to use the method widen
from cats
to transform the Decoder[Reader]
, Decoder[Editor]
and Decoder[Admin]
into a Decoder[Role]
. This wouldn’t be necessary if the trait Decoder
was defined using variance (I am not sure why circe developers used invariant traits).
Let’s test the Decoder
:
import io.circe.parser.decode
decode[Role]("""{"subscription":"FREE"}""")
// res: Either[Error, Role] = Right(Reader(Free))
decode[Role]("""{"profileBio":"foo","favoriteFont":"Comic Sans"}""")
// res: Either[Error, Role] = Right(Editor("foo","Comic Sans"))
decode[Role]("""{}""")
// res: Either[Error, Role] = Right(Admin)
It works fine for the happy paths, when the JSON is well formed but it produces surprising results when the JSON is invalid:
decode[Role]("""{"subscription":"GENESIS”}""")
// res: Either[Error, Role] = Right(Admin)
The issue is that the admin Decoder
is too permissive. It considers all JSON objects as valid instead of accepting only empty JSON objects. Let’s fix that and retry:
given adminDecoder: Decoder[Admin.type] =
Decoder.instance(cursor =>
for {
obj <- cursor.as[JsonObject]
_ <- if(obj.isEmpty) Right(Admin)
else Left(DecodingFailure(CustomReason(s"JSON is not a valid Admin"), cursor))
} yield Admin
)
decode[Role]("""{"subscription":"GENESIS”}""")
// res: Either[Error, Role] = Left("DecodingFailure at : JSON is not a valid Admin")
The Decoder
correctly identifies that the JSON is invalid, but the error message is confusing. It says that the JSON is not a valid Admin
, whereas we expected the error to mention that “GENESIS” is not a valid Subscription
. The problem comes from the fallback logic of the Decoder[Role]
. When the JSON is invalid, we only get the error message from the last Decoder
.
We can see the error message produced by each branch of the enumeration:
decode[Reader]("""{"subscription":"GENESIS”}""")
// res: Either[Error, Reader] = Left("DecodingFailure at .subscription: GENESIS is not a valid Subscription")
decode[Editor]("""{"subscription":"GENESIS”}""")
// res: Either[Error, Editor] = Left("DecodingFailure at .profileBio: Missing required field")
decode[Admin.type]("""{"subscription":"GENESIS”}""")
// res: Either[Error, Admin.type] = Left("DecodingFailure at : JSON is not a valid Admin")
So if we could identify that the JSON is a Reader
, we would be able to produce a useful error message. This leads to the second solution.
I will repeat the implementation of Solution 1, but I will add a field to describe the type of Role
. This field is called a discriminator.
given readerEncoder: Encoder[Reader] =
Encoder.instance { reader =>
Json.obj(
"type" -> "READER".asJson,
"subscription" -> reader.subscription.asJson
)
}
given editorEncoder: Encoder[Editor] =
Encoder.instance { editor =>
Json.obj(
"type" -> "EDITOR".asJson,
"profileBio" -> editor.profileBio.asJson,
"favoriteFont" -> editor.favoriteFont.asJson,
)
}
given adminEncoder: Encoder[Admin.type] =
Encoder.instance { admin =>
Json.obj("type" -> "ADMIN".asJson)
}
given Encoder[Role] =
Encoder.instance {
case x: Reader => readerEncoder(x)
case x: Editor => editorEncoder(x)
case Admin => adminEncoder(Admin)
}
Let’s have a look at an example:
Reader(Premium).asJson.spaces2
// res: String = {
"type" : "READER",
"subscription" : "PREMIUM"
}
Then, we can use the discriminator to improve the Decoder
:
given Decoder[Role] =
Decoder.instance(cursor =>
for {
discriminator <- cursor.downField("type").as[String]
role <- discriminator match
case "READER" => readerDecoder(cursor)
case "EDITOR" => editorDecoder(cursor)
case "ADMIN" => adminDecoder(cursor)
case other => Left(DecodingFailure(CustomReason(s"invalid role $other"), cursor.downField("type")))
} yield role
)
Let’s check the error message when the JSON is invalid:
decode[Role]("""{"type":"READER","subscription":"GENESIS"}""")
// res: Either[Error, Role] = Left("DecodingFailure at .subscription: GENESIS is not a valid Subscription")
decode[Role]("""{"type":"TESTER","subscription":"GENESIS"}""")
// res: Either[Error, Role] = Left("DecodingFailure at .type: invalid role TESTER")
Great, this is exactly what we wanted! Additionally, the JSON parsing is probably faster with a discriminator, because we don’t need to try all branches of the enumeration when the JSON is invalid (though I haven’t benchmarked the two solutions). The only issue remaining is that it requires lots of code to define a codec for a simple enumeration. Let’s see how we can improve that.
Most serialization libraries support the automatic implementation of codecs for case classes. What’s particularly nice with circe is that it also supports the derivation of codecs for enumeration, including the discriminator pattern! This is what it looks like:
import io.circe.Codec
import io.circe.derivation.Configuration
enum Role {
case Reader(subscription: Subscription)
case Editor(profileBio: String, favoriteFont: String)
case Admin
}
object Role {
given Configuration = Configuration.default
.withDiscriminator("type")
.withTransformConstructorNames(_.toUpperCase)
given Codec[Role] = Codec.AsObject.derivedConfigured
}
Note that Codec
is a convenient trait that extends Encoder
and Decoder
. So we can either define one Codec
or one Encoder
and one Decoder
.
The Configuration
object is used to tweak the Codec
derivation. Here, we specify that we want to use a discriminator field called “type”, and the value of the discriminator should be the name of the constructor (Reader
, Editor
, Admin
) uppercased.
In summary, we saw that we can serialize a Scala enumeration by defining a codec for each branch of the enumeration and then combining these codec together with:
The discriminator is the best approach, as it leads to better error messages and deserialization performance.
Additionally, this serialization strategy is not specific to JSON. We can use the same pattern when encoding an enumeration to CSV or even when saving it to a database. For example, we could use the following schema to save users to postgres:
CREATE TABLE users (
id TEXT NOT NULL,
type TEXT NOT NULL,
subscription TEXT,
profile_bio TEXT,
favorite_font TEXT
);
INSERT INTO users (id, type, subscription, profile_bio, favorite_font)
VALUES
('0001', 'READER', 'FREE', null, null),
('0002', 'EDITOR', null, 'John is the winner of ...', 'Comic Sans'),
('0003', 'ADMIN' , null, null, null);
What do you think of this approach? Do you have a better one to share with us? Please comment on reddit.