Haskell and C++ are very different programming languages. Haskell is purely functional and C++ is imperative. While ‘normal’ C++ is imperative, C++ metaprogramming is purely functional, just like Haskell. This blog shows examples of Haskell code that I’ve translated to C++.
Last week I blogged about Blasien. Blasien is a set of header files for writing literal validated XML in C++. XML can be validated against DTD, XML schema or Relax NG. Relax NG is of interest to me since that schema language is used in the Opendocument Format standard. Relax NG is an elegant schema language, but validating against it is not trivial. The goal with Blasien is to do this validation at compile time.
James Clark has written an elegant algorithm for Relax NG validation. His algorithm is written in a subset of Haskell. In the past, I’ve ported his algoritm to JavaScript for use in WebODF. For use in Blasien, I’ve now ported it to C++ metaprogramming. This took some puzzling, but the result feels very natural once you get used to the syntax.
Haskell is a very clean programming language. It is very different from C++. Haskell is a purely functional programming language and so is C++ metaprogramming. Purely functional means that all structures are immutable and functions have no side-effects. All a function does is to take immutable structures as input and create a new immutable structure as output. The same is true in C++ metaprogramming.
Here is an example of a function that adds two integers.
Haskell |
C++ |
---|---|
|
|
Both of these examples have a main
function which prints
out a value. That part of the examples is not pure: printing out a value
is a side-effect. But the calculation of the value is pure. In C++ the
value of r
will be calculated at compile time. (Probably in
Haskell too.)
Haskell has algebraic data
types. In the code fragment below, NameClass
is a data
type with four variants: AnyName
, QName
,
NsName
and NameClassChoice
. Instances of
AnyName
have no members. Instances of QName
have two members of type Uri
and LocalName
respectively. The members are not named but can be accessed via pattern
matching. This will be explained below.
In C++, the variants of NameClass
are not connected
directly. Each of them is defined as a separate struct. The members of
each data type are given via template parameters. We are using structs
everywhere now: both functions and data types are defined via template
structs.
Haskell |
C++ |
---|---|
|
|
The C++ definition of QName
differs in an important
aspect from the definition of NameClassChoice
.
QName
has static constexpr String
in front of
its members and NameClassChoice
has using
before its members. This is because C++ makes a distinction between type
parameters and value parameters. QName
has two
String
values as parameters and
NameClassChoice
has two NameClass
types as
parameters.
Note that AnyName
does not have a definition. It has no
members and therefor no need for a definition. All it needs is a
declaration. C++ metaprogramming is programming with types. None of
these types will be instantiated, so they do not require a complete
definition.
Haskell does not have function overloading. Instead it has pattern matching (and type classes). Only one function with a particlar name is possible. That function can still handle different types of input if the input parameters are algebraic data types. A different implementation is possible for each variant of an algebraic data type.
This is demonstrated in the function contains
. The first
parameter to this function should be of type NameClass
.
There are four variants of this data type and hence four implementations
are possible. The function contains
returns
true
if the NameClass
instance contains the
given QName
. The members of each variant of
NameClass
are exposed via pattern matching.
_
is a wildcard and is used in the first variant
AnyName
. Since AnyName
matches any
QName
instance, the value of QName
is not
bound to a variable. The implementations of the QName
and
NsName
variants are very straightforward. The
implementation of the NameClassChoice
variant is
recursive.
C++ metaprogramming also uses pattern matching. In C++ this is called template specialization. First a template struct is defined. This may or may not have a definition. Next, a specialization is written for each of the variants of the algebraic data type.
Haskell |
C++ |
---|---|
|
|
Some people prefer type checking. Others choose unit tests. Having
both is best. A neat feature of C++ metaprogramming is that it is
possible to write unit test that run at compile time. Here are some
tests for our new contains
function.
constexpr char xhtmlNS[] = "http://www.w3.org/1999/xhtml";
constexpr char divLocalName[] = "div";
constexpr char pLocalName[] = "p";
void testContainsAnyName() {
using PQName = QName<xhtmlNS, pLocalName>;
static_assert(contains<AnyName,PQName>::value, "AnyName should match PQName.");
}
void testContainsQName() {
using PQName = QName<xhtmlNS, pLocalName>;
using DivQName = QName<xhtmlNS, divLocalName>;
static_assert(!contains<DivQName,PQName>::value, "DivPQName should not match PQName.");
}
void testContainsNsName() {
using PQName = QName<xhtmlNS, pLocalName>;
using HtmlNsName = NsName<xhtmlNS>;
static_assert(contains<HtmlNsName,PQName>::value, "HtmlNsName should match PQName.");
}
void testContainsNameClassChoice() {
using PQName = QName<xhtmlNS, pLocalName>;
using DivQName = QName<xhtmlNS, divLocalName>;
using HtmlNsName = NsName<xhtmlNS>;
using NameChoice = NameClassChoice<DivQName,HtmlNsName>;
static_assert(contains<NameChoice,PQName>::value, "NameChoice should match PQName.");
}
These functions do not even need to be called from anywhere (at least not in GCC, maybe other compilers behave differently) to run these tests. They should, of course, be in a compilation unit.
I have translated (most of) this Relax NG validation algorithm to C++ metaprogramming. You can look at the progress so far.
The point of doing this is to further Blasien. Generating XML from C++ (and most programming languages for that matter) is a source of many invalid documents because doing validation at either runtime or compile time is currently incomplete and cumbersome. With a convenient way to write XML inside of C++ I hope this will change. Doing most of the validation at compile time will catch most of the common errors. Solving nice puzzles and learning more Haskell and C++ while developing Blasien is a nice bonus.
Comments
Post a comment