Typeclasses in Translation

Second-language learners first speak by translating what they want to say from their native tongue to the target language. That process is slow and error-prone. It’s better, when you can, to “think in” the target language; the same is true of trying to understand Haskell via connections to other programming languages.

Programmers usually learn “Haskell as a second language” and try to connect its concepts to things they understand from past experience. Java programmers ask whether Haskell’s typeclasses are like Java’s interfaces. This is okay for initial intuition, but it obscures some important differences. Better, as you become able, to learn to “think in Haskell.”

Most answers to this question assume the reader knows Java but not Haskell. Haskell natives are scarce, but one of the authors here learned Haskell as a first programming language. So this post will assume the opposite, and in doing so we hope to demonstrate that Java is just as difficult to a Haskell native as Haskell is to a Java native.

Haskell

First let’s do a quick Haskell recap to be clear on terminology.

Haskell data structures are algebraic datatypes (specifically, sums and products) defined with the data keyword.1 A typeclass declaration gives type signatures for operations, which are implemented in different ways by typeclass instances that define the implementations for a specific type. We can write functions that operate over any type that has an instance of some typeclass X and use the operations that X defines; when such a function is used with some particular type, the type determines which implementation is used. This is our way to abstract over types.

Defines a data structure Defines an abstraction over types
Algebraic datatype
Typeclass

This is an example of a typeclass:

Class methods are the operations that a typeclass declares. The Eq typeclass has two methods, (==) and (/=).

For operations that are derivable from other methods in the class, you can write a minimal set of implementations and rely on the default methods to fill in the rest of the operations. The example above has one default method, (/=).

The typeclass instance is where the operations described in the typeclass declaration are implemented for a particular type. An instance is a unique relationship between a type and a typeclass.

This is an instance of the Eq typeclass instance for Bool (a type) that defines (==) and takes the default implementation of (/=):

A type consists of the combination of a data structure and its associated typeclass instances.

A subset of Java

We might start explaining interfaces to our Haskell native by highlighting the similarities with a simplified model of Java, omitting some constructs that don’t have close Haskell analogs.

In our simplified Java, a type is either an interface or a class (specifically, a final class, which we’ll discuss later. An interface gives type signatures for operations which are implemented in different ways by each class, and the class defines the implementations for a specific subtype. We can write code that interacts with a value belonging to interface X and uses the operations that X defines; the class to which the value belongs determines which implementation is used. This is our way to abstract over types — like a typeclass!

Defines a data structure Defines an abstraction over types
Class
Interface

But this obscures a number of differences between this and the Haskell system.

A method is sort of like a top-level function definition in Haskell, but it is defined as part of a type, and it has an implicit first argument of that type.

This is an example of an interface:

This Comparator interface has two methods, compare and lessThanOrEqual. The compare method is just a type signature with no implementation. lessThanOrEqual provides an implementation, so it is called a default method, similar to the terminology used for Haskell typeclasses.

This is an example of a class that implements that interface:

A class has fields that define a data structure (specifically, the structure is the product of its fields, and Java has no direct way to encode sum types). The IntComparator class above has one field, reverse. The class also implements the compare method from Comparator.

The relationship between classes and interfaces is subtyping. A class definition can specify that it implements any number of interfaces, and it inherits the interfaces’ methods. (More jargon: We say that the class is a subtype of each interface, and the interfaces are its supertypes.)

Notice that both Haskell and Java have a concept called method which consists of an overloaded name and a type signature. The differences, which we’ll expand on later lie in the discrepancies between the exact definitions of method in the two languages, and the specific ways in which types are related to the abstractions over them.

Java proper

We wouldn’t be giving full justice to the complexity of Java’s type system unless we also brought up the other kinds of classes. Subtyping gets weird when we introduce types that both have fields and permit subtypes (highlighted in orange below). These are both data structures and abstractions over types.

Complete data structure
(is instantiable)
Partial data structure
(can have fields)
Abstraction over types
(can have subtypes)
Final class
Class
Abstract class
Interface

The fields in an abstract class constitute “part of” a data structure, which can be augmented by additional fields in a subtype.

Classes that are final cannot have subtypes. A type that is non-final serves as an abstraction over its subtypes in the same way that an interface does.

Classes that are not abstract are instantiable, meaning that we can call their constructors to create instances of the class.

Types in Java serve as, among other roles, templates for defining other types. A type can be “incomplete” in the sense that it doesn’t define all of the things that you would need in order to actually use it. Such a type only exists in order to build subtypes from it. Its subtypes fill in these blanks, so to speak. If a type has no holes — no pieces of its definition left undefined — only then is the type instantiable.

We didn’t mention these things in our original description of interfaces to the native Haskeller because none of these things have good analogs in Haskell. But they’re important to understanding Java on its own terms.

Are typeclasses like interfaces?

From 30,000 feet up, these two language features have a resemblance: They define abstractions that codify similarities between types, and these abstractions permit generic code that works across multiple types.

Translation between the two languages can work, but not always.

Consider this typeclass Semigroup:

and a function triple that uses a Semigroup constraint:

We can consider two ways to write this code into a Java using interfaces.

The first is a direct translation:

Instances of this Semigroup interface are the semigroups themselves. Note that the relationship between a Semigroup instance and its type parameter A is not unique; there’s nothing stopping us from creating multiple semigroups for any type. When we use this interface, then, we have to explictly pass a Semigroup argument to specify which semigroup we’re using.

If we want to create a unique association between a type and a semigroup, and not have to manually pass the semigroup instance everywhere we use it, we can do a little refactor. Since the x parameter has type A, we can remove it and instead let types with semigroups implement the interface, turning the explicit x parameter into the implicit this parameter that refers to the instance of the class that A that implements Semigroupal<A>.

That is the more natural way a native Java user thinks about abstraction. But the set of Haskell typeclasses that can be converted to Java interfaces in this way is limited. The conversion from Semigroup to Semigroupal in this example relied on append having an argument of type A. What happens when that isn’t the case?

Consider another typeclass, Monoid:

We can write the direct translation of this into Java as before:

But the more idiomatic Java variation simply cannot be written.

This thing that Java cannot do is called return type polymorphism. The implementation of mempty is implied not by the type of an argument to the function, but by the type that we need to get from the function.

Conclusion

We’re not here to criticize Java but to challenge the popular wisdom that Haskell is more difficult to learn. If Haskell is your first language, then Java is hard, Java is alien. Java has a steep learning curve too; many Java-habituated Haskell learners forget that.

Some of the difficulty in crossing between these languages is just the bramble of shared terminology with different meanings: constructor, method, type, concrete type, class, instance, field, polymorphic. But some of it is a deep paradigm conflict: Haskell has full compile-time type resolution, Java has subtyping and dynamic dispatch.

It’s tempting to cling to your understanding of one while you learn the other. But the abstractions that typeclasses and interfaces provide over types are disparate, and it is difficult to reconcile them. You’re better off making a clean break from what you know and starting over with a full embrace of the language you’re learning.


Footnotes

1 Usually it’s data. There are other keywords that can introduce types but their effects on how types are defined and relate to typeclasses are minimal. Some Haskell features, such as existentially-quantified types, are more complex, but these are not core concepts.