Peter Stuifzand

I released a small C++ utility library

I just released a small C++ utility library. At the moment it contains three modules: an URL class, a config file reader, and a testing framework.

The code can be found at http://code.peterstuifzand.nl/cgi-bin/gitweb.cgi

or use git to clone the project:

git clone http://code.peterstuifzand.nl/git/cpputil.git

The URL class is used in a search engine that I am currently writing. The code doesn’t do much more than I need at the moment. I want to make this code better.

The config file reader reads a config file and returns a std::map. A config file could look like:

url.protocol=http

The key and value are seperated by an equals sign. You can also include comments by putting a # as the first character of a line.

The testing framework is based on the Test Anything Protocol. This is the text based protocol used by Perl. The nice thing is that you can use tools that are normally used to test Perl code, for example:

$ prove --exec '' ./test_url

This command will execute the prove command on a test written in C++. At the moment this has the following output:

./test_url....ok
All tests successful.
Files=1, Tests=37,  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
Result: PASS

As you can see I have written 37 tests for the URL class.

I will try to release a C++ HTML parser in the next few days. This parser is based on the specification of the HTML5 parser. It’s is not complete (not all states are implemented) and it only works with a string containing the whole HTML document.

The states that are not implemented are mainly related to the parsing of the doctype. Currently the Doctype is skipped when found. I already used the parser to parse 7007 unique HTML files without problems. It didn’t parse pages from the greater web and at the moment I’m not planning to.

I have to say that parsing HTML this way is really nice. The code for finding all URL’s is only four lines (not including class, method declartions and curly braces, and index generation):

if (tag.get_name() == "a") {
    for (std::vector<Attr>::const_iterator it = tag.get_attrs().begin(); it != tag.get_attrs().end(); ++it) {
        if (it->get_name() == "href") {
            std::string url = base.relative(URL(it->get_value())).to_string();
            // ...
        }
    }
}

This code uses the URL class that I mentioned before. This code could still be a lot smaller by improving the Tag class.

© 2023 Peter Stuifzand