Programming

Software | Secret Software | Writing

How to get something for nothing

Or at least, a lot for a little

Everyone knows that that the object of Perl modules is to make life easier for the programmer; to reduce the amount of code they end up writing. More correctly, you can think of CPAN modules as reducing the amount of auxilliary code in your programs, leaving you free to get on with the specific algorithms you wish to implement.

In this article, I'm going to embrace and extend Mark-Jason Dominus' concept of "structural code". When Mark talks about structural code in his Red Flags tutorial, he means code that doesn't get you any closer to doing what you want to do, but is required to keep the compiler happy or the code looking sane. For instance, in

   sub remove_duplicates {
       my @list = @_;
       my %seen;
       return grep { !$seen{$_}++ } @list;
   }

most of the code is structural. The first three lines of that code do not do anything towards removing duplicates from a list - they fulfil no functional role, merely a structural one. This implementation is better, but still contains a lot of structural code that you can't avoid when you're Perl programming:

   sub remove_duplicates {
       my %seen;
       grep { !$seen{$_}++ } @_;
   }

As I said, I'm going to extend that concept; in this article, structural code is anything which is generic to programming and is not essential to the specific algorithms and functionality of the program you're writing.

I'd like to introduce four Perl modules - three of mine, and one originally written by Michael Schwern - and show how they can be combined to reduce the amount of structural code in an application to near zero. We'll first take a brief look at the four modules, then we'll show how they worked in a recent application of mine.

Config::Auto

Almost every single application needs to store a user's configuration settings. The end result is that every single application generally includes some code for dealing with whatever configuration format they choose. This is a prime candidate for modularisation, and indeed there are a number of modules which can deal with various formats: XML::Simple for the ever-popular XML, Config::IniFiles for Windows-style INI, and several others. For Unix applications, the standard configuration formats are either a variant of key = value: (from lynx)

       # all cookies.
       accept_all_cookies=off

       # bookmark_file specifies the name and location of the default bookmark
       # file into which the user can paste links for easy access at a later
       # date.
       bookmark_file=lynx_bookmarks.html

Or colon separated: (as in /etc/groups and friends)

       nobody:*:-2:
       nogroup:*:-1:
       wheel:*:0:root
       daemon:*:1:root

Or maybe space-separated: (this one from gltron, a rather enjoyable OpenGL light-bikes game.)

       iset show_help 0
       iset show_fps 1
       iset show_wall 1
       iset show_glow 1
       iset show_2d 1

I've had to write code to deal with all of these different formats many, many times over, and finally I gave up: I had what I call a "once and for all moment". I wanted to sit down and crunch out some code which would just handle whatever I threw at it, and know that I would never ever have to tackle this problem again in my Perl programming career. Try it. It's tremendously freeing.

So I wrote Config::Auto, which parses all of the above formats and more besides. The idea is not that it gives the user a complete free-for-all. Ideally, you would specify what format you were prepared to read, what sort of data structure you expected to get at the end of it, and then you would know that the parser would be able to handle it with no additional work needed.

In its most basic use, you would say:

    use Config::Auto;
    my $config = Config::Auto::parse("~/.myapprc", format => "equal");

to parse an equals-separated configuration file such as the .lynxrc above. But if we're trying to avoid extraneous code, why not have the parser work out what sort of configuration file it's been handed?

    use Config::Auto;
    my $config = Config::Auto::parse("~/.myapprc");

And now it'll take a long look at your rc file and determine what format it looks like it's in.

And actually, there's no reason why, assuming standard naming conventions, you should have to tell it where the configuration file is anyway. If your program is called myapp, then it's a reasonable guess that if the user has a ~/.myapprc file, those are the configuration settings. Config::Auto also tries a few other standard locations, to leave us with:

    use Config::Auto;
    my $config = Config::Auto::parse();

Configuration file handled! Structural code: zero. (Well, near zero. Future versions of Config::Auto may well declare and populate a $main::config variable for you on import. But maybe there is such a thing as Too Much Magic.)

Attribute::Persistent

The next area which requires too much code is storing persistent data; as I've mentioned in a previous column, even with AnyDBM_File and MLDBM, handling persistent variables is still a pain in the neck. Attribute::Persistent just takes all the pain away, once and for all, with no tie, no structural code at all.

    use Attribute::Persistent;
    my %hash :persistent; # And that's all.

Persistent storage handled! Structural code: zero.

Getopt::Auto

In a recent comp.lang.perl.moderated thread, it was pointed out that there are a number of things which every novice Perl program reinvents, despite there being perfectly round wheels out there, and a command line options processing system was one of them. Here I disagree.

There are a number of different styles of command line options: --long, -short, and the CVS-style bare command. In a similar vein to Config::Auto, I wanted a system which handled all of them.

But then I realised there was a more serious problem. If you're implementing something with a similar interface to cvs, a single executable which can perform various commands (although it could be argued that this is not the Unix Way) then you'll end up with a horrific piece of code something like this:

    my $command = shift @ARGV;
    if ($command eq "add") {
       do_add(@ARGV);
    } elsif ($command eq "subtract") {
       do_subtract(@ARGV);
    } ...

The two equally dissatisfying alternatives look slightly better:

    my %commands = ( add => \&do_add, subtract => \&do_subtract, ... );
    my $what = shift;
    if (exists $commands{$what}) { $commands{$what}->(@ARGV) }
    else { do_help() }

Or even

    no strict 'refs';
    my $what = shift;
    &{"do_$what"}(@ARGV);

But they have two problems: first, you still need to handle things like --help and --version separately, and your --help text will generally repeat all the possible arguments over again. Opponents of structural code will know that repetition is to be avoided at all costs: this is a specific case of the Prime Rule of programming and user interface design - "You should never tell the computer anything it already knows or can reasonably be expected to work out." If you're a manager, you might like to contemplate the fact that programmer time is expensive and computer time is cheap. Who should be doing the boring work?

The bigger problem is that all this is structural code once again. Despatching to the appropriate routine is useful, but it's not as useful as actually doing the work of your program. So I had another once-and-for-all moment, and decided that something else should be implementing this structural code. That something else is Getopt::Auto. As you can probably tell, I'm pretty fond of the idea that computers should do things automatically - it is, after all, what they're for.

With Getopt::Auto, you simply declare what commands you're willing to process, maybe give some help text for them, and the module does the rest. For instance:

     use Getopt::Auto (
         [ "--add", "Add two numbers together", \&do_add ],
         [ "--subtract", "Subtract one number from another", \&do_subtract ], 
          ...
     );

With no further code, yourapp --add 3 5 will call do_add(3,5). And, as an added bonus, you get --version and --help free of charge:

    % yourapp --help
    yourapp --help - This text
    yourapp --version - Prints the version number

    yourapp --add - Add two numbers together
    yourapp --subtract - Subtract one number from another

Of course, you may not like GNU-style --long options. Let's try again with CVS-style options and without specifying the subroutines explicitly:

     use Getopt::Auto (
         [ "add", "Add two numbers together" ],
         [ "aubtract", "Subtract one number from another" ],
          ...
     );    

This time, yourapp add 3 5 will call add(3,5); help will still work and will now spit out the commands in the new "bare" style. You write the specification, and Getopt::Auto takes care of the rest.

Now the more alert of you may well be thinking "but isn't this specification structural code?" Well, yes; I thought of that. What would be really nice is if you could say

      use Getopt::Auto;

and it would just work.

Well, with one proviso, it does.

The proviso is that you must provide POD documentation for each subroutine you want to turn into a command. But of course, all of your subroutines are documented anyway, so that shouldn't be a problem.

So here's our fully-automated calculator example:

    use Getopt::Auto;
    our $VERSION = "1.0";

    =head2 add - Adds two numbers together

       calc add x y

    Adds x and y together and prints the result.

    =cut

    sub add { print $_[0] + $_[1], "\n" }

    =head2 subtract - Subtracts one number from another

       calc subtract x y

    Subtracts y from x.

    =cut

    sub subtract { print $_[0]-$_[1], "\n" }

Now we can say

    % calc --add 3 5
    8

    % calc --help   
    This is calc, version 1.0

    calc --help - This text
    calc --version - Prints the version number

    calc --add - Adds two numbers together[*]
    calc --subtract - Subtracts one number from another[*]

    More help is available on the topics marked with [*]
    Try calc --help --foo

And if we follow its suggestion:

    % calc --help --add
    This is calc, version 1.0

    calc --add - Adds two numbers together

         calc add x y

    Adds x and y together and prints the result.

Options processing and subroutine dispatch handled! Structural code: zero.

Class::DBI

The final module is not one of my own, but it's so efficient and removing structural code in database-backed applications it absolutely has to be mentioned. Database applications with the DBI are almost breeding grounds for structural code: either you spend a lot of time handling the various select, insert, update and delete calls yourself, or you use some kind of abstraction layer which does some of the work for you.

Class::DBI is like such an abstraction layer, except that in most cases, it does almost all of the work for you. With Class::DBI, you set up one subclass which represents your database:

    package Myapp::DBI;
    use base 'Class::DBI';

and then tell it what your DBI parameters are:

    Myapp::DBI->set_db('Main', 'dbi:mysql:myapp');

No connecting, no disconnecting, no mucking about with handles. But how do you get at the data? Well, what you need is a class for each of the tables you want to play with:

    package Myapp::Person;
    use base 'Myapp::DBI';
    Myapp::Person->table("person");

Next, tell it the columns you're interested in, starting with the primary key:

    Myapp::Person->columns(All => qw( id name department salary ));

and away you go: your class now has create, retrieve and search methods to return Person objects, and you also have accessor methods for each of the columns.

    # 3% raise for all programmers!
    for my $person (Myapp::Person->search({department => "programming"}) {
        $person->salary($person->salary()*1.03);
    }

There are good tricks for handling relationships between tables and between database and non-database objects; I refer you to Tony Bowden's article on Class::DBI for perl.com at http://www.perl.com/pub/a/2002/11/27/classdbi.html.

While this removes most of the rigmarole of handling data in databases, it still violates the Prime Rule: we're having to tell the computer about the columns in our database tables. In the vast majority of cases, the database can tell us what columns it has. Unfortunately, the way it tells us is generally database-specific. So Class::DBI has certain database-specific add-on modules, such as Class::DBI::mysql. (It's only a matter of time before someone combines them all...)

Now we can tell our Myapp::DBI to inherit from this:

    package Myapp::DBI;
    use base 'Class::DBI::mysql';
    ...

and the need to detail the columns goes away.

    package Myapp::Person;
    use base 'Myapp::DBI';
    __PACKAGE__->set_up_table('person');

(Class::DBI folk tend to use __PACKAGE__ instead of repeating the class name; this is slightly related to the Prime Rule. If you ever need to change the class's name, you only want to be changing it in one place.)

But even this is reasonably structural! The computer not only knows what columns it has in its database tables, but it also knows what tables it has. With Class::DBI::Loader, we can get it down to

    use Class::DBI::Loader;
    Class::DBI::Loader->new( dsn => "dbi:mysql:myapp", 
                       namespace => "MyApp");

and now we can use MyApp::Person as before.

So that's database access handled, with very little structual code indeed.

Putting it all together

We've seen four tools which give us a great deal of functionality for very little cost in code. With all of these modules, what we gain in brevity we sacrifice in flexibility; for instance, to make absolutely full use of Class::DBI requires some investment, in terms of tuning access to the columns of each table and declaring the various relationships between columns long-hand.

In the code that I write from day to day, I try to strike a balance; the last thing you really want is classes and variables magicing themselves into existence without your really being aware of them. So, for instance, I don't use Class::DBI::Loader. I prefer to declare each table's class manually.

Well, not exactly manually. That wouldn't be a very good use of my time. Instead, I have a little script which produces an application template: a basis for an application which uses many of the techniques we've seen above. I spend most of my preparation time working out the best database schema, and then I type something like

    appgen PerlBooks

Anyone who bears the scars of the old dBase III+ application generator will recognise the name and the concept; appgen goes away and examines the database and spits out a number of skeleton files which I will turn into my eventual application.

So, first we take the name of the namespace (PerlBooks) and turn it into our database name (perlbooks) and try to use Class::DBI::Loader on that database:

    use Class::DBI::Loader;

    my $namespace = shift;
    my $database  = lc $namespace;

    my $loader = Class::DBI::Loader->new(
        dsn       => "dbi:mysql:$database",
        namespace => $namespace,
    );

(The application generator itself doesn't need to be portable to multiple databases - although its output must be! - since, for better or worse, I do all my development on Mysql.)

Now we do a little ugly messing about. First, we want our own copy of the database handle so we can prod the database, and this allows us to ask it for its tables. Instead of repeating the DSN in the DBI connection, we ask $loader what DSN it used:

    my $dbh = DBI->connect( @{ $loader->_datasource } ) or croak($DBI::errstr);
    my %tables = map { $_ => 1 } $dbh->tables;

Now for each table, we want to spit out a module representing that table in the ordinary Class::DBI way:

    foreach my $table (keys %tables) {
        my $class = $loader->_table2class($table);
        my $ref   = $dbh->selectall_arrayref("DESCRIBE $table");

Most of this code is cobbled together from bits of Class::DBI::mysql and Class::DBI::Loader. Here we turn the table name (say, account) into the appropriate class name, PerlBooks::Account using Class::DBI::Loader's built-in method, and then get a description of the database table.

Now we want to know what the primary key is, so we grep that out of the table's description:

    my ( @cols, $primary );
    foreach my $row (@$ref) {
        my ($col) = $row->[0] =~ /(\w+)/;
        push @cols, $col;
        next unless $row->[3] eq "PRI";
        die "$table has composite primary key" if $primary;
        $primary = $col;
    }
    die "$table has no primary key" unless $primary;

This gives us $primary and a list of columns in @cols.

At this point we can write our class:

        my $file = $class; $file =~ s{::}{/}g;
        open OUT, ">$file.pm" or die $!;
        print OUT <<EOF;
    package $class;
    use base '${namespace}::DBI';
    __PACKAGE__->table($table);
    __PACKAGE__->columns( Primary => q{$primary} );
    __PACKAGE__->columns( All     => qw{@cols} );
    EOF

We do something which Class::DBI::Loader doesn't do, which is to guess the has-a relationships in each table. For instance, if we have a column in transaction called account, we guess this is a reference to the primary key in the account table:

        for (@cols) {
            if (exists $tables{$_}) {
                print OUT "__PACKAGE__->has_a($_ => q{".
                    $loader->_table2class($_)."});\n";
            }
        }

This spits out something like

    __PACKAGE__->has_a(account => q{PerlBooks::Account});

Then the account method in our PerlBooks::Transaction will no longer produce a numeric ID but will instead produce a PerlBooks::Account object.

Finally our generator finishes off the current class:

        print OUT <<EOF;

    1;
    EOF
        close OUT;
    }

Now we can get onto the main PerlBooks module, which has to load up the others, and any other modules we might want to use:

    open OUT, ">$namespace.pm" or die $!;
    print OUT "package $namespace;\n\n";
    print OUT "use Config::Auto\n";
    print OUT "use ".$loader->_table2class($_).";\n" for keys %tables;
    print OUT "\n1;\n";
    close OUT;

Our PerlBooks::DBI class is generated next, but this needs to be a little careful. As we've seen, Class::DBI expects the main class which subclasses it to tell it the connection parameters, including the username and password. Typically, though, we don't want to store the username and password in our main program files, so we bring them in from a PerlBooks::Config class:

    open OUT, ">$namespace/DBI.pm" or die $!;
    print OUT <<EOT;
    package ${namespace}::DBI;
    use ${namespace}::Config;
    use base 'Class::DBI';
    __PACKAGE__->set_db('Main','dbi:'.\$${namespace}::Config::dbd.
                        ':'.\$${namespace}::Config::db,
        \$${namespace}::Config::username, \$${namespace}::Config::password);
    __PACKAGE__->autocommit(1);
    1;
    EOT
    close OUT;

Finally, we write out a skeleton version of that PerlBooks::Config class, to be overwritten by the real values of the username and password by our application's installer:

    open OUT, ">$namespace/Config.pm" or die $!;
    print OUT <<EOT
    package ${namespace}::Config;
    our (\$dbd, \$db, \$username, \$password) = 
        ("mysql", "$database", "", "");

    1;
    EOT

This is as far as I've currently got with the application generator, and already it's saved me a lot of work. But as I look at it now, there's a lot more it ought to do. For instance, it could easily spit out an ExtUtils::MakeMaker-based installation program which prompts for the correct username and password and writes the ::Config module. As Alan Perlis said "Programs which write programs are the happiest programs of all" - this is a program which writes a program which writes a program!

The other obvious task for my application generator is to spit out the main application file perlbooks, containing at least:

    use PerlBooks;
    use Getopt::Auto;

    ...

But this may be overkill, and I'm currently sufficiently happy with the ability to point my application generator at a database and come out with most of what I need to start writing database-driven application code, itself relatively free from structural code.

In Closing

There are a number of things you could take away from this article. You might want to think that I've created three really interesting modules that you should go and have a look at - but then I know who you'll come to for help with them, so maybe that's not such a good idea.

You might want to take away the Prime Rule - never tell a computer what it already knows or can be reasonably expected to find out for itself. If you do, I promise it'll radically impact the way you think about user interfaces.

You could take away the fact that, with CPAN modules, there may well be More Than One Way To Do It, but there's almost always an easier way.

But what I really want you to take away is that programming really ought to be fun. If you find that your programming is becoming a drudge, see if there isn't a way you can abstract away the drudge, whether there's already a module out there that does it all for you, or whether you should sit down and tackle it in a once-and-for-all moment.

Doing so will free you from banging out code for the sake of code, and allow you to get on with the interesting bit of your job - having ideas, working out the best way to get things done, solving problems - and my fervent hope is that it'll make programming fun for you once again.

Latest articles

Development activity

This page was last checked for correctness on 2003-02-13. Contact Simon.