People write blog posts. And when you have written a lot of blog posts, there comes a time when it becomes necessary to divide the posts into smaller collections of posts. One way to do this is pagination.

Pagination divides a list of items into a few pages. Each page has an URI, contains a few items and links to other pages in the list.

There are many places where this method is used. Two examples are search result pages and blogs. Search result pages can contain many, many results, sometimes as much as a few million. Showing all the result is a waste of space and bandwidth, as most people won't even look past the first page.

Google result page next links

For blogs this is a little bit different. The posts are in a reverse chronological order, thus starting with the latest post. Sometimes the last ten posts are shown on the same page, sometimes only one post.

A big difference between these two examples is that in search engines, the list is ephemeral. This list doesn't need to be the same every time you look at it. Some results move up and some results move down. Search engines shouldn't even index them, as there is no value in those pages for them.

Blogs on the other hand have a lot of value in the older posts. These posts are useful for search engines and all have permanent URIs. But that isn't always the way people find them. Sometimes a person finds an archive page with more items, that contains the post. The problem with this is that the posts move deeper and deeper into later pages, because the blog orders the posts from new to old.

For example, a blog with ten posts.

| Page 1                        |
| 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 |

When the author now writes a new post, all the posts move one position to the right.

| Page 1                         | Page 2  |
| 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 | 1       |

The first post was on page 1, but now moves to the second page. A search engine or user that thought the post was on page 1, now has to find it again, because the URI has changed. If the author writes even more posts, these posts move as well.

The historical solution

The solution that I describe above was used for a long time, and probably still is, because (1) it's easy to implement and fits the way the pages are generated and (2) because each page, except for the last contains the maximum number of items for a page.

A program that generates the pages for a blog, has a reversed list of all the posts that are on the blog. It loops through the list from the first to the last, starting a new page whenever it has shown some number of posts, for example, ten. The program writes the footer of the last page when there are no more posts left to show.

Other solutions

A better solution takes the moving post problem into account. To solve the problem, we should find another way to divide the list of posts into different pages.

A way to divide the pages is by grouping the post by a value that doesn't change, for example, the combination of the year and month of the creation date of the post. You could create a list of pages for each month of posts. This depends on the number of posts written, because you don't want more than about ten or twenty posts on one page.

A third solution would be to create the pages from the first to the last. This way a post always stays on the same page, because its index in the list doesn't change. The problem with this solution is that the homepage contains, nine out of ten times, less items than the other pages.

The fourth solution only works on a dynamically generated blog. The other three solutions all work for a statically generated blog. Twitter uses this solution, which I'll call the More solution.

Twitter more button

First we show a list of the first ten items of the blog. At the end of the list e show a link or button with the text More. Clicking this link loads the next ten items from the list of posts. This works because the More button has the timestamp, or id of the last item in the list and clicking loads the next ten posts that have an id or timestamp smaller than the current last post.

When a search engine finds these links, it creates more search results than in a model where each post can only be added to one page. In this model each post can be on as much as ten pages on any given time, depending on how often a search engine (or user) finds a link to such a page.

Conclusion

Each solution works best in a different situation. I prefer blogs that use the year-month approach for splitting up the pages, because the posts are split in a natural way.

In a searchengine however, or on other ephemeral pages, the More approach is better. Because most people don't want to go deeper into the results, but if they want, they can use the More button.

It becomes more and more apparent that we need to work on software and hardware that will allow us to our own version of services that we use online. We need a kind of home server computer that acts like a telephone. Every home needs one and (I'm predicting) will have one in ten years.

This presentation by Eben Moglen about Freedom in the Cloud talks about how we, as geeks and software developers, can accomplish this goal a bit faster. It really isn't that hard. We should start building these home server devices and iterate and search useful features, that will help people find what they need and how it can help them in their lives.

Danny O'Brien presented about a similar idea on OpenTech 2008, called Living on the Edge.

Inspired by the "Benificially Relating Elements" phrase of Kent Beck, I started out creating a builtin weblog for my webshop platform. The nice thing about the idea is that it helps you find relations between elements that you already have.

I started with the "What is a weblog?" A weblog is a chronological list of pages. By answering this way, I can reuse two elements that already are supported in the webshop: collections and pages.

Collections are lists of products and other collections. Nothing more nothing less. By increasing the scope of collections a little bit, I can also include pages.

A page is a piece of text that can be shown in the webshop. It has an url. By making a weblog post to be a page, I can reuse all the infrastructure of pages for weblog posts. This includes: creating, editing, saving and showing. The only thing missing from a page is the creation date, which is needed to sort the weblog posts chronologically.

The other two things that simplified the weblog feature are plug-ins and routes. Plug-ins are small packages of code that are loaded on start of the request and connect to rendering and loading and saving code.

Routes are ways to convert urls to controllers. The latest release made it possible to create routes based on regular expressions and all urls are parsed using this. This allowed me to use the names of the pages in the urls.

By relating the pieces, I created a new feature, that is useful in itself without having to write a lot of new code. Now when I add 'comments' as a feature to the weblog, I will also automatically add comments as a feature to pages, because they are the same thing.

Two features for the price of one. I like it.

The next feature I will talk about, is the given/when construct. This was added in perl 5.10. It works like switch/case in other programming languages, but is much more powerful. The matching is based on smart matching, which is another feature added in 5.010;

I will start with a simple example to give you an idea of the syntax that is used.

use 5.010;

my $x = <>;
chomp $x;

given ($x) {
    when ([0..99]) {
        say "Looking good";
    }
    when ([100..199]) {
        say "That's a bit much";
    }
    default {
        say "This could be a problem";
    }
}

This code compare the value of $x with the array's in the when statements. If $x is between 0 and 99 (inclusive) it will the text Looking good. If it's between 100 and 199 then it will say That's a bit much. The default block will be called when the value isn't matched by the when blocks.

Next I will give a more useful example, but not much more.

use 5.010;

my ($x, $y) = (0,0);

LINE: while (<>) {
    my @parts = split /\s+/;

    for (@parts) {
        when (/^x(\d+)/) {
            $x = $1;
        }
        when (/^y(\d+)/) {
            $y = $1;
        }
        when (/^p/) {
            say $x + $y;
        }
        when (/^q/) {
            last LINE;
        }
    }
}

This example reads lists of tokens from STDIN and matches them and executes code based on the input. In effect it's a small programming language. Notice that this code doesn't use the given statement. It's not needed here, because the for already assigns each element of @parts to $_.

It's also possible to use simple expressions like you would use in an if statement. For example:

use 5.010;

my $age = <>;
chomp $age;

given ($age) {
    when (!/^\d+$/) {
        say "Not a number";
    }
    when ($_ > 100) {
        say "That's quite old";
    }
    when (18) {
        say "Now your life begins...";
    }
    when (0) {
        say "Just born, and already using the computer.";
    }
    default {
        say "I have nothing useful to say about '$age'";
    }
}

As you can see when is quite smart about what to do with different expressions. The first when clause contains a negated regular expression. This will be matched using $age !~ m/REGEX/. The second one do what you expect. The 18 and 0 clauses will match using $age == 18 and $age == 0. You should watch out with comparing to 0 because this will also match empty strings or just strings. For example if $age = 'hello', when(0) will match.

Smartmatching is really powerful. With given and when it's easy to use this power for deciding what to do with the value that you've been given. You should take a look at the manual for more information about the possible smart matches and the things you can with given and when.

When I'm programming, I sometimes need to use a problem solving pattern, that Kent Beck called Parallel. The pattern tells us that when your making a change you should also leave the old way of doing things in the program, so you can gradually move to the new solution. This helps in situations, where you can't just flip a switch to migrate.

The problem with this however is, that you do have to migrate all the way to the new solution, because otherwise you will have twice the code and data in various states.

When I'm using Parallel, I like to keep an eye on the progress and I don't like to go back to the old way. To help me with this I created a unit test that monitors the progress that I make, while changing the program and the data.

It would be nice to have a test module that keeps track of this, but at the moment I only have a small test program. It does three things:

  1. Read the previous count (of old way occurences)
  2. Check if the current count is smaller or equal to the previous count.
  3. If true, the test succeeds and then writes the count to a file. If false, the test will fail.

This way the tests will only succceed if I make improvements or if the code stays the same. This way it can only improve the code.

View archived entries