Strigi has reached the point that the configuration files for it should be more advanced than a text file with one directory per line. Because I have good experience with using XML Schema for mapping from XML to java and back using JAXB, I’d been looking for a good toolkit that does the same in C++. The requirements for such a tool are:
Simple
Small
Few dependencies
Use XML Schema
A tool I’ve found is called xsd
. Behind this trivially
simple name is a suite of free software tools that generate different
types of C++ code. The latest version of this tool is 2.2.0 and can be
gotten here.
There’s an extensive
manual starting at a simple Hello World application.
To test the feasibility of this code for Strigi I designed a small
XML Schema file and wrote a simple program to accompany it. For Strigi,
I’d like to have a configuration file that describes what directories
and other sources are indexed and how. Below I’ve written the different
file I used for testing xsd
. The workflow is pretty
easy:
write XML Schema
compile c++ from schema
use data classes in c++ code
If the requirements for the configuration file change, the XML Schema
is changed and the c++ code regenerated. Any changes required in the c++
code that uses the data classes will be caught by the compiler.
Nothing is for free: using xsd
introduces dependencies in
your code. When using xsd
, you will need to link your
executable with xerces-c
. Since this is a widespread
library, this is not a big problem.
An example configuration file might look like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
s:daemonConfiguration xmlns:s="http://www.vandenoever.info/strigi">
<
repository repositoryLocation="/home/tux/.strigi/mainindex" repositoryType="CLucene">
<fileSystemSource baseURI="file:/home/tux"/>
<httpSource baseURI="http://www.kde.org/"/>
<repository>
</
s:daemonConfiguration> </
For convenience, this file is kept simple. What you can see is that
Strigi could have configurations for multiple repositories (although
this file only shows one). Each repository has one index which contains
information extracted from various sources, such as files from the
filesystem or web pages. These sources are described by the elements
fileSystemSource
and httpSource
. These
elements are both instances of the more general fileSource
element.
The vocabulary can be described with an XML Schema:
<?xml version="1.0" encoding="UTF-8"?>
schema xmlns="http://www.w3.org/2001/XMLSchema"
< targetNamespace="http://www.vandenoever.info/strigi"
xmlns:tns="http://www.vandenoever.info/strigi">
element name="daemonConfiguration" type="tns:daemonConfigurationType"/>
<
complexType name="daemonConfigurationType">
<sequence>
<<!-- a repository contains one index -->
element name="repository" type="tns:repositoryType"
< minOccurs="0" maxOccurs="unbounded">
element>
</sequence>
</complexType>
</
complexType name="repositoryType">
<sequence>
<element name="fileSystemSource" type="tns:fileSystemSourceType"
< minOccurs="0" maxOccurs="unbounded"/>
element name="httpSource" type="tns:httpSourceType"
< minOccurs="0" maxOccurs="unbounded"/>
sequence>
</attribute name="repositoryLocation" type="anyURI" use='required'/>
<attribute name="repositoryType" type="tns:repositoryTypeType"
< use='required'/>
complexType>
</
simpleType name="repositoryTypeType">
<restriction base='string'>
<enumeration value='CLucene'/>
<enumeration value='HyperEstraier'/>
<enumeration value='Xapian'/>
<enumeration value='Sqlite'/>
<restriction>
</simpleType>
</
complexType name="fileSourceType">
<attribute name="baseURI" type="anyURI" use='required'/>
<<!-- time between updates for this directory, <= 0 means never -->
attribute name="autoUpdateFrequence" type="int"/>
<complexType>
</
complexType name="fileSystemSourceType">
<complexContent>
<extension base="tns:fileSourceType">
<sequence>
<element name="fileEventListener" minOccurs="0" maxOccurs="1">
<complexType>
<complexType>
</element>
</sequence>
</extension>
</complexContent>
</complexType>
</
complexType name="httpSourceType">
<complexContent>
<extension base="tns:fileSourceType">
<extension>
</complexContent>
</complexType>
</
schema> </
This XML Schema file can be compiled into source code with the
command
xsd cxx-tree --generate-serialization strigidaemon.xsd
.
Here is a simple program that uses this code to read and write this type
of configuration file:
#include "strigidaemon.hxx"
#include <iostream>
using namespace std;
int
(int argc, char* argv[]) {
main <strigi::daemonConfigurationType> config;
auto_ptr
if (argc > 1) {
// load an object
try {
= strigi::daemonConfiguration(argv[1],
config ::flags::dont_validate);
xml_schema} catch (struct xsd::cxx::tree::exception<char>& e) {
<< e << endl;
cerr }
}
if (!config.get()) {
// create an object
= auto_ptr<strigi::daemonConfigurationType>(
config new strigi::daemonConfigurationType());
}
// add elements to the object
= "/home/tux/.strigi/mainindex";
string repositoryLocation ::repositoryTypeType repositoryType("CLucene");
strigi::repositoryType repo(repositoryLocation, repositoryType);
strigi
::fileSystemSourceType fs("file:/home/tux");
strigi.fileSystemSource().push_back(fs);
repo
::httpSourceType hs("http://www.kde.org/");
strigi.httpSource().push_back(hs);
repo
->repository().push_back(repo);
config
// provide a mapping for the namespace we use
::namespace_infomap map;
xml_schema["s"].name = "http://www.vandenoever.info/strigi";
map
// output the object to the standard output iostream
(cout, *config, map);
daemonConfiguration
return 0;
}
To finish off, here’s the Makefile I used to compile this small test program:
LDFLAGS=-lxerces-c
CXXFLAGS=-Wall -O2
main: main.cpp strigidaemon.cxx
strigidaemon.cxx: strigidaemon.xsd
xsd cxx-tree --generate-serialization strigidaemon.xsd
clean:
rm strigidaemon.cxx strigidaemon.hxx main
Comments
Post a comment