Web applications and data structures

There is this question that keeps coming back when I see the code of web applications. It keeps coming back, probably because it relates to the basics of our field.

Why do we forget we learned about data structures, the moment we start to write web applications?

The question points at three things. Why do we forget about data structures? Why don't we write better code for web applications? The last point is a passive-aggressive jab at all web developers. And I'm one of them.

This question seems to come up a lot for me. Probably because more often than not, I write web applications. Last year I wrote about this problem and I tried to think of a different way to tackle the web application problem. Without much progress I might say.

Let's take a look at some of the things written in the past about the importance of data structures and algorithms.

Rob Pike wrote in Notes on Programming in C:

Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self evident. Data structures, not algorithms, are central to programming. (See Brooks p. 102.)

And of course Knuth wrote "The Art of Computer Programming", four huge volumes on algorithms and data structures. So they must be important. Even the title of the book from Wirth, "Data Structures + Algorithms = Programs" says as much. Without data structures or algorithms we aren't even writing programs.

That's a bit lazy, but I think the most important thing we can learn from this, and we all already know this, is that data structures and algorithms are the basic building blocks of the programs we write.

But if data structures and algorithms are so important, why don't we use more of them in web applications?

There are two reasons for this to happen. First there is more emphasis on the algorithms, on the how, than on the data structures. And second, many of these algorithms and data structures are implicit.

Let's start with an example of an implicit data structure (in PHP).

    $name    = $_POST['name'];
    $address = $_POST['street'];
    $city    = $_POST['city'];
    db_person_insert($db, $name, $address, $city);

Here the data structure of the person is implicit. There are three variables that are related. Only two things that show they're related: the grouping of the variables and the db_person_insert function. There is no mention of something called a person, except in the name of the db_person_insert function.

And now the example of a implicit algorithm (again in PHP).

    $name    = $_POST['name'];
    $address = $_POST['street'];
    $city    = $_POST['city'];
    db_person_insert($db, $name, $address, $city);

You could think I made an error by showing you the same code. I'm sorry to disappoint you. Let me explain. The algorithm contains two steps. (1) Get the person object from the $_POST variable. (2) Insert the person object into the database.

One problem with this code is that the moment you want to change what it means to be a person in your application, you need to make many changes all over. Let's say your form also needs to handle phone numbers. Let's make the change in the code.

    $name    = $_POST['name'];
    $address = $_POST['street'];
    $city    = $_POST['city'];
    $phone   = $_POST['phone'];
    db_person_insert($db, $name, $address, $city, $phone);

I added one line to get the phone number and one parameter to the db_person_insert function, but I also have to check every other place where I call the db_person_insert function. You can make this change for every field that needs to be added, but you need to make a lot of changes in a lot of places. That sucks, but now imagine that the db_person_insert function was actually a little bit of SQL code, sprinkled around your code.

The other problem is that it's not obvious that we are working with people (as in more than one person). This code shows how we do it, but not what we do.

Especially in web applications there seems to be going no thought at all into the structure of the data that's used inside the applications. There are some objects, or maybe there is a database, but these things are not the actual data the program operates on.

Code in web applications feels shoddily written, without a kind of bigger picture. For example if you display a product in the interface, why don't you have a data structure representing a product? Or if you have a form for creating a new product, why don't you have a data structure representing that?

Without data structures you have to write certain code again and again. And while this apparently is not a big enough problem to be solved once and for all, I think some of us know there is something wrong.

Without data structures you can't have a group of operations working on a certain kind of data. You also can't use code because the data is just different enough.

Let's look at few more examples. How many times have you written a login form and authentication? Isn't every login screen the same? The user provides a username and password. That could be a data structure. Which user object is identified by these two values? Could that be a function?

Or an interface to order certain objects. There is a structure to ordering things that transcends the type of objects. For example if I want to order images, or photo albums, or comments, or videos, or products, or menu items. The way to do this is always to same. If the data is in memory we know how to do this.

// C++, compile with:
//    gcc -o test test.cpp -Wall -std=c++0x -lstdc++

#include <vector>
#include <iostream>
#include <iterator>

using namespace std;
int main()
    vector<int> numbers = { 1, 2, 3, 4 };
    swap(numbers[0], numbers[1]);
    copy(numbers.begin(), numbers.end(),
            ostream_iterator<int>(cout, "\n"));
    return 0;

If I want to change to order of items in a vector, I call the swap function. Other languages change the order with different functions.

But somehow when we start to create web applications we need to create a new solution for this. It seems almost as if there is a need for people to be original in every line of code they write. Maybe there is something called False Originality, in the tradition of False Laziness and friends. Or maybe it's "Not Invented Here".

If we don't have a data structure, we can't write a function that modifies that data. So instead that we create more general functions, we write functions that call other functions until we get to the bottom of our software stack, which could be the database or the file system.

Without the interface of your data structure to write a function for, there is no way you can write code without duplication. And without a similar looking interface you can't even see the duplication that's there. Without the data structure we can't say that certain things are similar. We can't describe it. We have no place to write the code.

For example, I want to create an interface that allows the user to reorder photos. The web interface could be really simple for this example. Every item has an id and a priority. Sort the ids according to the priorities.

The web framework calls a controller method. This controller method parses the arguments. We can write a general fuction that parses the arguments and creates a list of number pairs: [ (id, prio)... ]. I hope this notation makes sense. It describes one pair with two fields id and prio contained in a list of these pairs. The parser function returns that list and another function will write this order to a database, one at a time.

The code that reorders items could be used for every table that has a primary key and a priority field. But before that can work, we need ways to describe data structures in the code. If we don't have data structures, we can't write general algorithms to work with them.

Without data structures and general algorithms, we need to create a new solution for this problem every time. Maybe even multiple times in the same codebase.

The reason is that we don't look further than the libraries and frameworks that are handed to us. In the person example above we use the $_POST variable to get the values. This works and there is no reason to look further, the code is already very simple.

The problem is that if we don't look further, we don't evolve the craft and we have to solve the same problems over and over again. The even bigger problem is, that if we don't create data structures to work with, we can't even see the way out of the mess we're in.

Update: Added a new article about the PHP example.


My name is Peter Stuifzand. You're reading my personal website.