bohm-berarducci-encoding
page-name: bohm-berarducci-encoding.md page-title: Böhm-Berarducci encoding
Böhm-Berarducci encoding
The visitor pattern is essentially the same thing as Church encoding (2021) https://www.haskellforall.com/2021/01/the-visitor-pattern-is-essentially-same.html
Alonzo Church discovered that you could model any data structure in the untyped lambda calculus using the technic that is now called Church encoding. Later, Corrado Böhm and Alessandro Berarducci had devised a version of Church encoding that worked to encode data structures in typed lambda calculi, specifically in System F, presented in the paper: Automatic synthesis of typed Λ-programs on term algebras
We can model the following data structures:
Products (records, structs). A product of two types is a type that stores both of them; e.g. a record with two fields, first of type A, second of type B
Coproducts (sum types, tagged unions). A sum of two types A and B is a type that stores either A or B; e.g. a tagged union where the first tag stores a value of type A and the second tag stores a value of type B)
Recursive data structures
The Böhm-Berarducci encoding (BBE) is known as the adjustment of the Church encoding for typed settings. Like other encoding schemes, BBE can encode any datatype using just anonymous function. Scott's encoding is also similar to these two, and it has a clear number of steps for translation of a data type from Haskell to a types λ-calculus.
Example in Haskell
We use Haskell to show how the Shape type is defined using GADTs, as ShapeG
, and how its translattion into Böhm-Berarducci encoding, as ShapeBB
, looks like. The Shape type evidently non-recursive, which simplifies matters.
We can start by defining the Shape type using the regular ADT declaration,
but in this particular example, nothing is gained from using GADT syntax: other than allowing accessor functions (field names) there's no functional differences between this declaration and the one below that uses the GADT syntax. However, the benefit of using the GADT syntax in this case is that a GADT declaration looks incredibly similar to the Böhm-Berarducci encoding (of the data type), as will soon become evident.
The exemplary Shape data type, named ShapeG
in its GADT form, consists of a nullary type ctor and two data ctors, each with a different number of fields. GADT syntax explicitly shows that all data ctors must return an instance of the declared type, but since the type ctor is nullary here, the return type of each data ctor must be exactly the same.
The strategy is to define a lambda function with as many params as there are data ctors in the type. Each param is
BB-encoding a non-recursive data type:
First declare the datatype as a GADT that will be used a "model"
GADT uses the 'data' keyword while BB can use anything:
'type' can be used to declare only a type alias but
'newtype' is better if we're to impl classes for it
'data' is probably the most comfortable since the type
the return type is a forall-type
each "ctor" param plays the role of data ctor
there is an "extra" overall return type that is the same as the return types of each "ctor" param (that expect a funarg)
Last updated
Was this helpful?