The writings of Peter Stuifzand

Weblog: web

The use cases of Camlistore or all very interesting. I'm interested to see how this all pans out.

Camlistore seems like a really cool project when it gets farther along. Or actually it seems really cool already, but it's not useable at the moment. Something to take a look at sometime in the future.

This week I created a way for me to post small messages, as there is not yet a good decentralized way to make this happen. One part of this research is how can I create, host and share RSS feeds without using to much bandwidth and server time.

One way we can do this, is by using the features of the HTTP protocol. One of those features is the Last-Modified header. This header allows web servers and user agents to see when a resource was last modified. Together with a If-Modified-Since header we can let our software check if it needs to send the whole body or just the a simple 304 header.

However to make sure if this works as advertised, we need a way to simulate this situation. I use lwp-mirror to test this, which is included with the LWP package from Perl. It mirrors a remote resource to a local file.

Let's start the test. First download your resource to a file with the lwp-mirror command.

lwp-mirror <url> <local_file>

Now download the file again and check if it sends a 304 code now. Then do something that changes the resource on the server. In my case this is done by added a new post. Now download the resource again and see if it sends a new version of the file. Once downloaded, you check if you get another 304 code.

Until today I couldn't use variables in my template that are pieces of code. I added one piece of code that executes the piece of code in a the stash and returns its value. In the template it looks like this.

[% FOR p IN products %]
    <p>[% p.name %]
[% END %]

There are two places in this piece of code that could contain code references. The first is products. This could be implemented as follows.

my $stash = {
    products => sub { my $db=shift; return $db->ProductList(); },
};

Here I show the implementation of the template evaluation code.

sub find_value_in_stash {
    my ($db, $stash, $name) = @_;

    my $it = $stash;

    for my $p (split /\./, $name) {
        $it = $it->{$p};

        if (ref($it) eq 'CODE') {
            $it = $it->($db);
        }
    }
    return $it;
}

This code doesn't contain the error-checking code that's necessary for a production environment. This code allows us to add variables to the stash without knowing the value when we add it. The nice thing is, we don't need to execute the potentially expensive code, for retrieving all the products from the database.

By adding a simple two-line feature like this to a the templating system, we can write simpler controller code. The controllers don't need to retrieve all the information from the database if it isn't used. If the variables are used in the template, then the values will automatically be loaded by the templating engine.

The second place in the template where we can use code, is in the second line, where we get the name field. This field could be an value in a hash. On the other hand it could be a method in the object p. By added another line to the find_value method we can use objects, as well as, simple hash values in templates.

The line that move to the next value in the stash needs to be changed to the following. The line

        $it = $it->{$p};

becomes

        if (my $meth = $it->can($p)) {
            $it = $it->$meth();
        }
        else {
            $it = $it->{$p};
        }

This change allows us to use methods on objects. It enables us to write code in classes, that is executed when needed, instead of when the controller was written to build the parameter hash.

To be clear, Template::Toolkit provides both these features. I have written and seen a few web applications and most of them didn't use and create many objects, because there was a tendency to think of objects as being slow and using much memory. I do think we should watch out for creating many unused objects or loading many rows from a database, because it can slow down your web application a lot. I consider not using method calls and sub references here a form of premature optimisation.

In the last two essays we established that there is a dispatcher, multiple controllers and multiple actions. The dispatchers creates a controllers and calls the action. Why do we split the application into these parts?

First the Dispatcher. The Dispatcher applies rules to a URL and chooses the corresponding Controller and Action. The Controller is a container for action and applies some default values to the action. The Action contains the code that's necessary to perform some transformation on the application data.

By structuring the design like this, the only task for the Controller is as a container for actions. We group the Action together with semantically similar actions. Actions that apply to the same type of information, e.g. the Guestbook controller contains the two actions: list and add_entry. The first action shows a list of guest book entries, the second action adds a new entry to the list. On the surface it makes sense to group Action like this in the controller. The action in the controller perform operations on the same data type, e.g. the list of guest book entries.

If we begin with Actions instead, then we can structure the application in another way. Each type of Action gets its own class, e.g. the action show is performed by the ShowAction class. The ShowAction contains the code for showing one data item. It could be possible to generalize the code for every type in the system. And like the ShowAction we can also create an EditAction and an UpdateAction.

The controller based structure enables us to spell out every part of the computation, from beginning till end. We're free to do whatever we want. The action based structure forces us to become more structured programmers. The action can't contain any specific code. We'd have to write Actions for every action in the system.

I think it's better if we specify and generate code for the specific differences and let the general case be handled by the Action class. On the other hand couldn't we apply these lessons to a Controller based model?

Yesterday we looked at the structure inside of two simple controller actions. Now let us look at the outside. The structure of the two actions can be shown as a tree.

  • Guestbook
    • list
  • Orders
    • list

I hid the rest of methods that would normally be in these controllers. In front of these controllers there is another class, called the Dispatcher. This Dispatcher uses a request and calls the appropriate controller and action.

The Dispatcher translates a URL into two pieces of information: the controller and the action. The Dispatcher then translates the controller name into a class name like this.

my ($controller, $action, $id) = ($url =~ m{^/(\w+)(?:/(\w+)(?:/(\d+))?)?});
my $classname = 'AppName::Controller::' . ucfirst $controller;
eval "require $classname";
my $obj = $classname->BUILD();
$obj->$action();

This simplified version of the code calls the controller. This contains no error checking, which is really important in a secure web application.

While I was thinking about writing this article at one point I thought about the direction of the calls and which part controls the execution flow. In the example the URL is passed to the Dispatcher which finds the controller. It looks up the controller class, then looks up the method and calls it. This way the URL and the code is coupled.

The URL determines the class and method that gets called. In this design we can't split up classes, because all controller methods need to be contained in the same class. We can't split or join classes or create smarter software because of this design decision. Furthermore, because the controller is created at the start of a request, we can't use the same controller for different URLs. Each URL needs its own piece of code. I argue that because of these problems we can't even use the techniques we know to improve the design.

We could increase the flexibility of the design by using objects instead of classes. Objects are more flexible, we can replace parts of the system by setting an instance variable to a different object.

To find out how we can redesign the code with this new knowledge, we have to take a look at the structure of more controllers.

I have still a few questions about the structure of the system. Where is the boundary between controllers and actions. Is there a boundary? Do we need controllers at all, or is having actions enough? As always we have to consider more sides of this problem.

Alex Stepanov writes in Notes on Programming:

We often get the idea that a mathematical theory is built in a logical way starting from definitions and axioms. This is not the case. The definitions and axioms appear at the very end of the development of a good theory. It invariably starts with simple facts that later on are generalized into theorems, and only at the very end the formal definitions and axioms are developed.

In an effort to become a better programmer (while at the same time rewriting one of my software programs,) we'll look at a piece of web application code, that I wrote a few years ago. The language used in these examples is Perl.

The idea is to find a better abstraction, to make it easier to add and change code related to the guestbook (and other controllers), while at the same time finding the underlying abstraction that isn't yet obvious from the code.

The piece of code in question is the list function of the Guestbook controller. It lists a some entries limited by a constant. The function loads the entries from the database using the guestbook_load_entries function.

The webserver and a few other classes parse the url /guestbook/list and dispatch the request to this function. Inside the function we validate all parameters that are needed.

package WebWinkel::Controller::Guestbook;
use strict;
use base qw/WebWinkel::Controller/;
use WebWinkel::DB::Guestbook 'guestbook_load_entries';

sub list {
    my $self = shift;
    my $page = $self->validate(-as_integer => 'page');
    my $entries = guestbook_load_entries($page, 10);
    return $self->render_template('guestbook/list', {
        page => $page, entries => $entries 
    });
}

Each of the lines in the function performs a small part of the whole action. I don't think it's important to look too much at the syntax. We're interested in the structure of the action.

The first line in the function validates the page parameter. The guestbook_load_entries function retrieves a list of guestbook entries from the database in the second line. In the third line the controller renders a specific template with the parameters we retrieved from the database.

It seems the structure of the process is quite simple.

  1. Validate the query parameters,
  2. Load some database entities using the validated parameters,
  3. Render a template using the database entities.

There are no branches or loops in this piece of code on this level. The template hides the loops we need for rendering a list.

We can simplify the process to a simple graph.

(1) { Validate → Load → Render }

We omit the specifics in this graph, because we're looking for an abstraction. Each node in this graph depends on the values provided by the previous step. The Validate step depends on data outside the controller.

Now it's time to look if this model we just created, can be applied to other controller and their actions. Let's take a look and see if we can find at least one action that satisfies this model.

package WebWinkel::Controller::Orders;
use strict;
use WebWinkel::DB::Orders qw/orders_find_all_open/;
use Date::Simple qw/today date/;

use base 'WebWinkel::Controller';

sub list {
    my ($self) = @_;
    my ($orders, $payed) = orders_find_all_open();
    my $today = today();
    return $self->render_template('orders/list', {
        today        => $today->format("%d-%m-%Y"),
        orders       => $orders,
        payed_orders => $payed,
    });
}

I found this piece of code in the Orders controller. Let's see which steps this function contains.

It starts with the retrieval of the open orders from the database on the first line. The second line gets the current date. The third line renders the template with the database gathered in the previous two lines. The structure of the function look like this.

{ Load → GatherData → Render }

This shows us that our previous model doesn't completely describe all controllers and actions. In this instance the GatherData step doesn't depend on the Load step. We could switch the two statements and the structure will still be the same. This means we should rewrite this model to include this new piece information. We'll use the $ to describe a relation between two steps where the first doesn't depend on the second and vice versa. The model now looks like this.

{ { Load $ GatherData } → Render }

The between the first two steps and Render remains, because all data found in those steps is used in that Step. This model can also be written as follows

{ { GatherData $ Load } → Render }

because we declare the operator $ as being commutative. We can't yet know if this is an important property of our model, but it describes the examples better. The model is still quite different from the other model.

How could we combine the two models while still being specific about the steps? What have the steps Validate and Load in common? If we look at the two code examples we see that both steps create information that is used in a later step. A later step doesn't need to be the next step. The step GatherData also satisfies this property.

{ { { Validate → Load } $ GatherData } → Render }

The variables found by Validate are passed into Load, but not specifically into GatherData. How can I say that this model is the same as model (1)?

Let's say step Validate creates zero or more pieces of information. If a step creates zero pieces of information for the next step, then this is same as not performing the step at all, at least with the current understanding of the model.

This concludes our first look at the structure of controllers and actions. From experience I know that there are more ways to write a controller. We will leave those for another time, just like the specifics about finding out which functions to call. Maybe we can discover a pattern there as well?

People write blog posts. And when you have written a lot of blog posts, there comes a time when it becomes necessary to divide the posts into smaller collections of posts. One way to do this is pagination.

Pagination divides a list of items into a few pages. Each page has an URI, contains a few items and links to other pages in the list.

There are many places where this method is used. Two examples are search result pages and blogs. Search result pages can contain many, many results, sometimes as much as a few million. Showing all the result is a waste of space and bandwidth, as most people won't even look past the first page.

Google result page next links

For blogs this is a little bit different. The posts are in a reverse chronological order, thus starting with the latest post. Sometimes the last ten posts are shown on the same page, sometimes only one post.

A big difference between these two examples is that in search engines, the list is ephemeral. This list doesn't need to be the same every time you look at it. Some results move up and some results move down. Search engines shouldn't even index them, as there is no value in those pages for them.

Blogs on the other hand have a lot of value in the older posts. These posts are useful for search engines and all have permanent URIs. But that isn't always the way people find them. Sometimes a person finds an archive page with more items, that contains the post. The problem with this is that the posts move deeper and deeper into later pages, because the blog orders the posts from new to old.

For example, a blog with ten posts.

| Page 1                        |
| 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 |

When the author now writes a new post, all the posts move one position to the right.

| Page 1                         | Page 2  |
| 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 | 1       |

The first post was on page 1, but now moves to the second page. A search engine or user that thought the post was on page 1, now has to find it again, because the URI has changed. If the author writes even more posts, these posts move as well.

The historical solution

The solution that I describe above was used for a long time, and probably still is, because (1) it's easy to implement and fits the way the pages are generated and (2) because each page, except for the last contains the maximum number of items for a page.

A program that generates the pages for a blog, has a reversed list of all the posts that are on the blog. It loops through the list from the first to the last, starting a new page whenever it has shown some number of posts, for example, ten. The program writes the footer of the last page when there are no more posts left to show.

Other solutions

A better solution takes the moving post problem into account. To solve the problem, we should find another way to divide the list of posts into different pages.

A way to divide the pages is by grouping the post by a value that doesn't change, for example, the combination of the year and month of the creation date of the post. You could create a list of pages for each month of posts. This depends on the number of posts written, because you don't want more than about ten or twenty posts on one page.

A third solution would be to create the pages from the first to the last. This way a post always stays on the same page, because its index in the list doesn't change. The problem with this solution is that the homepage contains, nine out of ten times, less items than the other pages.

The fourth solution only works on a dynamically generated blog. The other three solutions all work for a statically generated blog. Twitter uses this solution, which I'll call the More solution.

Twitter more button

First we show a list of the first ten items of the blog. At the end of the list e show a link or button with the text More. Clicking this link loads the next ten items from the list of posts. This works because the More button has the timestamp, or id of the last item in the list and clicking loads the next ten posts that have an id or timestamp smaller than the current last post.

When a search engine finds these links, it creates more search results than in a model where each post can only be added to one page. In this model each post can be on as much as ten pages on any given time, depending on how often a search engine (or user) finds a link to such a page.

Conclusion

Each solution works best in a different situation. I prefer blogs that use the year-month approach for splitting up the pages, because the posts are split in a natural way.

In a searchengine however, or on other ephemeral pages, the More approach is better. Because most people don't want to go deeper into the results, but if they want, they can use the More button.

I just released a small program that gets the ip address or your computer. The nice thing is, this service is REST-based.

You can find it at Stuifzand Software Tools.

I just read something that seems really nice. It's even strange that it isn't already used like that. Use an email address-like identifier as pointer to an account. Let me explain.

Email addresses consist of two parts seperated by an @-sign. The first part is the username, the second part is the domain name of your email provider. For example, my email address is peter@example.com. The username is peter and the domain name is stuifzand.eu.

By taking this approach of username@domain we can dream all kinds of other combinations that could work.

pstuifzand@twitter.com
peter@wijvervelenons.nl
pstuifzand@flickr.com

You see? And these could all point to all kinds of user accounts. This doesn't mean that all these addresses should be email addresses. I also don't say that they shouldn't be. For some things I would make sense, for others it maybe doesn't.

Plack is a Perl Web Server:

Plack is the superglue interface between perl web application frameworks and web servers, just like Perl is the duct tape of the internet.

This is interesting and maybe useful in my webshop platform.

Sometimes I have the following ideas about blogging and other writing for the web. The thing is that these ideas will not apply in the same way to other people. I think some of the points can be generalized for other people and applications.

Notes

  • These blog posts on this website are written with Vim and a few scripts through a SSH connection to my server.

  • I would like to use Vim to edit my blog posts.

  • Vim should be able to edit a URL and use GET and PUT to read and write the page.

  • Ideally the GET will only supply the content of the page; the navigation surrounding the content should be added later by the blogging software.

  • Starting a new blog posts should be as easy as POSTing to a URL.

  • The same as I use Vim for the text parts of the website, I want to use Gimp to be able to edit pictures.

  • The text of the blog posts can be written in HTML or Markdown (which is what I use at the moment).

  • Navigation should be added later. Content can be edited. The user doesn't have to update headers and footers for each page.

General points

  • All programs should be able to GET, PUT, POST to URLs. DELETE could be implemented in the browser.

  • HTML will stay the main language for publishing content on the internet.

  • The website should add the headers and footers for the user.

Maybe I should link to my Pownce profile.

I just received my Pownce invitation. Now let's see what this is.

At lifehacker I came across this incredible tool for del.icio.us. It is called the del.icio.us Direc.tor. It a combination of webapp and del.icio.us API and Ajax.

This probably is one of the steps were going to see on the web, web applications that are written over webbased apis. Web based tools with nice user friendly clean interfaces that do something that's different enough from the actual data provider website.

Very impressive. We need more of this.

View archived entries