Creating and processing XML feels awkward in most programming languages. With Blasien, a tiny C++11 header library, XML in C++ feels easy and natural. As an extra the XML that is written is mostly validated at compile time.
Here is an example:
XHTML |
C++ with Blasien |
---|---|
|
|
The same syntax can be used to create a DOM.
Code to create XML is usually a matter of calling functions like
startElement
, setAttribute
,
endElement
etc. Such code looks nothing like the desired
XML. And there is no static type checking. Here is a typical
example:
XHTML |
C++ |
---|---|
|
|
This code looks unpleasant and it is easy to make errors. The tag names are written as string: a typo there can go undetected for a long time.
Elements are closed with writeEndElement()
. Matching up
the opening and closing of tags is hard to do visually and errors there
are not caught at compile time.
There are programming languages, like XSLT and XQuery, that work better with XML. Calling code in these languages from C++ is inconvenient and requires that the programmer learns an additional programming language.
A few years ago, I created a way to work with XML from C++. In that way, wrapper classes were created for each element type from a schema definition. This prevented many possible errors at compile time. But the code still did not look like XML. Blasien has all the same checks but with a nicer syntax:
C++ with writeodf |
C++ with Blasien |
---|---|
|
|
Blasien is built on a powerful C++ feature: operator overloading.
Nearly all operators can be overloaded in C++. For XML, two operators
are most distinctive: <
and >
. These
operators usually mean “smaller than” and “larger than” and are used in
expressions like if (x > 3) { ... }
.
There is no rule that limits the use of <
and
>
to mathematical expressions. As you will see, they can
be put to very different use.
The operators <
and >
are
left-associative. The left-most combination of expression, operator,
expression is replaced first. The left-most expression is a sink for XML
expressions. Each handled expression leads to a new sink with a
different state.
< html < body < "hello" > body > html; sink
can be written more explictly as:
const HtmlTag html;
const BodyTag body;
const HtmlDocSink sink;
const HtmlSink sink2 = sink < html;
const BodySink sink3 = sink2 < body;
const BodySink sink4 = sink3 < "hello";
const HtmlSink sink5 = sink4 > body;
const HtmlDocSink sink6 = sink5 > html;
which works because of these operator overloads:
operator<(const HtmlDocSink& sink, const HtmlTag& tag) {
HtmlSink .startElement(tag);
sinkreturn HtmlSink(sink);
}
operator<(const HtmlSink& sink, const BodyTag& tag) {
BodySink .startElement(tag);
sinkreturn BodySink(sink);
}
operator<(const BodySink& sink, const char* text) {
BodySink .writeCharacters(text);
sinkreturn sink;
}
operator>(const BodySink& sink, const BodyTag& tag) {
HtmlSink .endElement();
sinkreturn sink.base;
}
operator>(const HtmlSink& sink, const HtmlTag& tag) {
HtmlDocSink .endElement();
sinkreturn sink.base;
}
The operators <
and >
are only
implemented for valid combinations of parent and child nodes. The
compiler accepts sink <html <body
but refuses sink <body <html
because <
is overloaded for a left <html/>
and
a right <body/>
, but
not for a left <body/>
and
a right <html/>
. The
compiler also catches any text nodes that are put in places where they
are not allowed.
With Blasien, the compiler also checks the XML attributes. Missing required attributes, double attributes and forbidden elements are all caught at compile time.
Metaprogramming is a fancy word for using templates. Templates in C++ are very powerful and very complex.
The operator overloading from the previous section gets unwieldly quickly for common XML schemas. To avoid writing a lot of code, we translate the XML Schema or Relax NG schema to data structure in C++. The templated code uses this data to generate the required overloaded functions during compilation.
Here are some excerpts from a very simple XHTML schema for use with Qt code (Blasien is not restricted to Qt).
The tags and element types are defined in a dedicated namespaces, in
this case xhtml
:
namespace xhtml {
const QString htmlns = QStringLiteral("http://www.w3.org/1999/xhtml");
const QString htmlTag = QStringLiteral("html");
const QString headTag = QStringLiteral("head");
Tags are derived from a template called XmlTag
.
using HtmlTag = XmlTag<QString,&htmlns, &htmlTag>;
using HeadTag = XmlTag<QString,&htmlns, &headTag>;
using BodyTag = XmlTag<QString,&htmlns, &bodyTag>;
Each document type and element type defines what tag is used and which attributes are allowed and which are required:
struct XHtmlDocument {
};
struct HtmlType {
using Tag = HtmlTag;
using allowedAttributes = std::tuple<xhtml11::IdTag,xhtml11::ClassTag>;
};
struct ImgType {
using Tag = ImgTag;
using allowedAttributes = std::tuple<xhtml11::IdTag,xhtml11::ClassTag>;
using requiredAttributes = std::tuple<xhtml11::SrcTag,xhtml11::AltTag>;
};
}
Determining which nodes are allowed in which other nodes is done by
overloading the definition of a structure called
allowed_child_types
:
template <>
struct allowed_child_types<xhtml11::HtmlType> {
using types = std::tuple<xhtml11::HeadType, xhtml11::BodyType>;
};
template <>
struct allowed_child_types<xhtml11::ImgType> {
using types = std::tuple<>;
};
Blasien takes these struct
s to generate the right
overloaded operators.
To use Blasien in your project, you need to use a provided XmlSink or write one yourself. Two Sinks are provided: one for writing XML (30 lines of code) and one for createing a DOM tree (40 lines of code). Both use Qt5.
Here is an example that uses XmlBuilder
to create a
QDomDocument:
#include <XmlBuilder.h>
#include <XHtml11.h>
struct create_paragraphs {
const QList<QString> texts;
template <typename Sink>
operator()(const Sink& sink) {
Sink for (const QString& t: texts) {
<p<t>p;
sink }
return sink;
}
};
QDomDocument
(const QString& docTitle, const QList<QString>& paragraphs) {
createDocumentQDomDocument dom("test");
<XHtmlDocument>(dom)
XmlBuilder<html
<head
<title
<docTitle
>title
>head
<body
<create_paragraphs{{paragraphs}}
>body
>html;
return dom;
}
This code is very new and these instructions are likely to change. But Blasien is usable now and newer releases will simply bring more features.
C++ projects like Calligra, LibreOffice, Inkscape, MuseScore and many more that rely a lot on XML can already benefit from the current version.
Metaprogramming is so powerful that XML generating code can be written that checks against nearly all aspects of a Relax NG schema or XML schema. Future releases will do more and more compile time checking.
An exciting future feature is XPath-like selectors. Code for this is already present in Blasien. It gives a convenient syntax for collecting information from XML documents. Extending this part of Blasien could make it to a natively compiled alternative for XQuery and XSLT.
Blasien is just a few C++ headers with a reasonable amount of unit tests. I’ve put them on GitHub for now. Feel free to file issues or send pull requests.
The code is currently under LGPL3, but I’m open to additional licenses if a project requires it.
Comments
Post a comment