Software | Secret Software | Writing
UNIVERSAL::require
We begin with the UNIVERSAL::require module.
This isn't so much related to extensibility itself,
but it will be used as a building block for many of the other techniques we'll look at.
UNIVERSAL::require is a simple module and does a simple job.
When you need to load some code at run-time - the essence of pluggable design,
you find there are several ways to do it in Perl.
You can use do or string eval if you know where the code is coming from,
but what if you have a module name instead of a file name?
You can't use use,
because that takes place at compile-time,
and you can't use require $module_name,
because require with a variable,
rather than a string constant,
expects a filename,
not a module name.
So, if we're trying to programmatically load an extension module at run-time - again, something we'll be doing a lot when developing pluggable software - we end up writing fudges like
eval "require $module_name";
UNIVERSAL::require exists purely to tidy up this very case. It adds a require method to the UNIVERSAL namespace, meaning that we can call require on any class:
My::Module->require;
This method just does the eval "require My::Module" fudge, with a little better error checking, and so we can now say
use UNIVERSAL::require;
$module_name->require;
This is the first step on the road to building our own extensible applications.
Do-it-yourself Pluggability
The Perl module Mail::Miner analyses a piece of email for various features which it stores in a database table - it does this by calling a set of "recognisers", which are its plug-in modules. Here's how we load up the plug-in modules:
use File::Spec::Functions qw(:DEFAULT splitdir);
my @files = grep length, map { glob(catfile($_,"*.pm")) }
grep { -d $_ }
map { catdir($_, "Mail", "Miner", "Recogniser") }
exists $INC{"blib.pm"} ? grep {/blib/} @INC : @INC;
my %seen;
@files = grep {
my $key = $_;
$key =~ s|.*Mail/Miner/Recogniser||;
!$seen{$key}++
} @files;
require $_ for @files;
This is quite horrible, but it's instructive to look at. We're trying to find all the files called Mail/Miner/Recogniser/something.pm in the include path, @INC, and the first @files = statement does this: it adds Mail/Miner/Recogniser to the end of each include path, and checks to see if that's a directory. If it is, then we look for all the *.pm files in that directory.
The blib.pm bit is to be used for testing new recognisers. If we've said use blib somewhere, then we're in a test suite, and we're only interested in the recogniser modules underneath the blib staging directory. This allows us to ensure that we're loading up the new modules instead of already-installed ones. When we say use blib, or indeed any other module, Perl turns the module name into a short filename (blib.pm, say, or Mail/Miner/Mail.pm) and puts this in the %INC hash, with the value being the full path of the module file. Hence, looking in %INC is a good way of telling whether or not a particular module is loaded.
Next, we make sure we only have one copy of a given recogniser; this avoids problems when a module is installed in muliple places. Finally, we have a file name, and so we can pass this to require, and Perl will load the module.
So now we have loaded up all the Mail::Miner::Recogniser::* modules that we can find on the system. That's solved one problem. The second problem is, now we have them, what do we do with them? How do they relate to the rest of the system?
The way Mail::Miner opted to do this was to have each of the plugin modules write into a hash when they loaded, and supply metadata about what they do:
package Mail::Miner::Recogniser::Phone;
$Mail::Miner::recognisers{"".__PACKAGE__} =
{
title => "Phone numbers",
help => "Match messages which contain a phone number",
keyword => "phone"
};
Now Mail::Miner can look at the packages it has available in %recognisers, and call a particular interface on each one of them:
sub modules { sort keys %recognisers };
for my $module (Mail::Miner->modules()) {
# ...
$module->process(%hash);
}
This way Mail::Miner can call out to additional installed modules without the author (me) knowing what plug-ins the user (you) has installed. Anyone can write a Mail::Miner::Recogniser::Meeting module, for instance, to attempt to identify meeting location and times in an email. Once it's installed in a Perl include path, it'll be automatically picked up and its process method will be called to examine an email.
Module::Pluggable
As I said, that's how I used to do it, until Module::Pluggable appeared. We've seen the two problems involved in developing pluggable applications: first, finding the plugins, and second, working out what to do with them. Module::Pluggable helps with the first. It does away with all the nasty code we saw above; now to find all the recognisers installed in the Mail::Miner::Recogniser namespace, I can rewrite my code as follows:
package Mail::Miner;
use Module::Pluggable search_path => ['Mail::Miner::Recogniser'];
This gives me a plugins method which returns a list of class names, just like the modules method did in our original code. If I wanted to make it completely compatible, I could also change the name of the method to modules with the sub_name configuration parameter:
use Module::Pluggable search_path => ['Mail::Miner::Recogniser'],
sub_name => "modules";
This doesn't actually cause the modules to be loaded, so we could say
$_->require for Mail::Miner->modules;
but we can also have the modules method itself load up the plugins, but passing another configuration parameter:
use Module::Pluggable search_path => ['Mail::Miner::Recogniser'],
sub_name => "modules",
require => 1;
This is a drop-in - and much simpler - replacement for all the messing about with paths and @INC we saw earlier; it even handles the test case when blib is loaded. But I haven't replace Mail::Miner's plugin system with this, and we'll see why later.
Making callbacks with Class::Trigger
First, though, another CPAN module which handles the second problem - knowing what to do with your plugins when you have them. In Mail::Miner we called a method which was assumed to be defined in all the plugins - whether they wanted it or not.
Sometimes this is the right way to do it, but often an individual plugin will want more control about what it responds to, especially if you're going to be calling your plugins on several different occasions for different things.
In these cases, you might find the CPAN module Class::Trigger a better fit. Class::Trigger allows you to add "trigger points" to your objects or classes to which third parties can attach code to be called.
For instance, if we have a method which displays some status information about an object, we could declare a trigger before printing the information out:
sub display_status {
my $object = shift;
my $message = $object->status;
$object->call_trigger("display_status", \$message)
print $message;
}
Now individual plugin modules can register with the class subroutine references to be called when the trigger is called. For instance, a module might want to modify the message because it's going to be sent out as HTML:
use HTML::Entities;
MyClass->add_trigger( display_status => sub {
my ($obj, $message) = @_;
$$message = encode_entities($$message);
});
Notice how we pass in a reference to the message, so that we can modify the message. Another plugin could provide links for all the URIs it finds in the message:
use URI::Find::Simple qw(change_uris);
MyClass->add_trigger( display_status => sub {
my ($obj, $message) = @_;
$$message = change_uris( $$message,
sub { qq{<a href="$_[0]">$_[0]</a>} } );
});
And now we come to a problem - how do we know that the "mark up URIs" trigger is going to be loaded after the "escape HTML entities" trigger? If we can't guarantee the ordering of the two triggers, we could end up with our link tags denatured by the entity escaping.
This was a problem that I came at, albeit from a slightly different angle.
Pluggable callbacks in Email::Store
You see, the reason I haven't rewritten Mail::Miner in the new plugin style with Module::Pluggable is that I've been working on a much more extensible and advanced framework for storing and data mining email, which I've called Email::Store. In the way Email::Store works, pretty much everything is a plug-in.
When you store a mail, Email::Store itself loads up the Email::Store::Mail plugin, which sets up a placeholder database row for the mail. Then Email::Store::Mail calls all of the other plugins to examine the mail and file away the things they want to note about it - what mailing lists it came from, what attachments it has, and so on.
However, we also want these plugins to specify some kind of relative order in which they're called. For example, it's more efficient if the attachment handler strips the email of its attachments before other plugins poke around in the email body, since once you've got rid of the attachments, there's less email body to poke around in.
All great ideas have been had before, of course, and this made me think of the Unix System V init process. When a Unix machine starts up, it consults files in an "rc" directory to start up particular services. These files are named in a particular way so that when the initialization process looks at the directory, it sees the services in the order that they should be started up. For instance S10sysklogd means "start the system logger at position 10", and S91apache means "start the Apache web server at position 91"; the logger gets started first, and Apache later. Now this isn't perfect, because there can be several things in position 10, and they get run in alphabetical order; and besides, nobody's policing the numbers anyway. If you think S01foo means "very early" and someone else comes along and installed S00bar, theirs gets run first. But it gives you a rough way of providing an order to the process.
What I wanted to do was give my plugins a similar rough ordering: attachment handling had to happen at position 1; working out the mailing list an email came from was a low priority task, and could happen at position 90 towards the end; everything else can go somewhere in the middle.
I also didn't really like the Class::Trigger approach of specifying a subroutine reference to be called. I prefer just writing methods. So, plugins which want to influence the way an email gets indexed can provide two methods:
package Email::Store::Summary;
sub on_store_order { 80 }
sub on_store {
my ($self, $mail) = @_
# ...
}
on_store_order is the position in which we'll be called by the indexing process; on_store is what we do when we get called. This is implemented in the ::Mail class like so:
use Module::Pluggable::Ordered search_path => ["Email::Store"];
sub store {
my ($class, $rfc822) = @_;
my $simple = Email::Simple->new($rfc822);
my $msgid = $class->fix_msg_id($simple);
my $self;
$self = $class->create ({ message_id => $msgid,
message => $rfc822,
simple => $simple });
$class->call_plugins("on_store", $self);
$self;
}
Module::Pluggable::Ordered provides the same functionality as Module::Pluggable, but also provides a call_plugins method: you give it a name of a trigger and some parameters, and it looks through your plugins, finds those which provide that method, orders them by their positions, and then calls them. In our normal Email::Store case, that one line would be the equivalent of:
Email::Store::Attachment->on_store($self);
Email::Store::Entity->on_store($self);
Email::Store::Summary->on_store($self);
Email::Store::List->on_store($self);
As new modules are developed and dropped into place, they're ordered by their on_store_order, if they provide an on_store method, and then placed into the list of on_store calls - all without Email::Store::Mail needing to know about them. The single call_plugins line combines both locating plugins and calling triggers to provide a facility for extending the indexing process.
Mixing plugins with databases
Let's now go on to write the rest of the Email::Store::Summary class that we looked at earlier. This is going to store summary information about an email so that it can be displayed in a friendly way - we'll store the subject of the mail, and the first line of original content; that is, the first thing we see after removing an attribution and a quote. These will go in the summary database table, so we need to inherit from Email::Store::DBI, the Class::DBI class which knows about the current database, and we need to tell it about the table's columns:
package Email::Store::Summary;
use base 'Email::Store::DBI';
Email::Store::Summary->table("summary");
Email::Store::Summary->columns(All => qw/mail subject original/);
Email::Store::Summary->columns(Primary => qw/mail/);
We'll use Text::Original, a module extracted from the code of the Mariachi mail archiver, which hunts out the first piece of original text in a message body:
use Text::Original qw(first_sentence);
sub on_store_order { 80 }
sub on_store {
my ($self, $mail) = @_;
my $simple = $mail->simple;
Email::Store::Summary->create({
mail => $mail->id,
subject => scalar($simple->header("Subject")),
original => first_sentence($simple->body)
});
}
When a mail is indexed, the on_store callback is called, and it receives a copy of the Email::Store::Mail object that's being indexed. The simple method returns an Email::Simple object, which we use to extract the subject header and the body of the email. Then we create a row in the summary table for this email.
Next, for this to be useful, we need to tell Email::Store::Mail how this summary information relates to a mail:
Email::Store::Summary->has_a(mail => "Email::Store::Mail");
Email::Store::Mail->might_have(
summary => "Email::Store::Summary" => qw(subject original)
);
Now an Email::Store::Mail object has two new methods - which of course we'll highlight in the documentation for our module - subject will return the first subject header, and original will return the first sentence of original text. We use might_have to consider the summary table an extension of the mail table.
But now comes the clever bit. If this is truly to be a drop-in plugin module, where is the summary table going to come from? It's one thing to be able to add concepts to a database-backed application, but these new concepts have to be supported by tables in the database. For the plugin module to be completely self-contained, it must also contain information about the table's schema. And this is precisely what Email::Store plugins do. In the DATA section of Email::Store::Summary, we'll put:
__DATA__
CREATE TABLE IF NOT EXISTS summary (
mail varchar(255) NOT NULL PRIMARY KEY,
subject varchar(255),
original text
);
There's a mixin module called Class::DBI::DATA::Schema which is used by Email::Store::DBI (and hence anything that inherits from it) which provides the run_data_sql method. As its name implies, this method runs any SQL it finds in the DATA section of a class. So all we need to do is go through all of our plugins and run run_data_sql on them to create their tables:
sub setup {
for (shift->plugins()) {
$_->require or next;
if ($_->can("run_data_sql")) {
warn "Setting up database in $_\n";
$_->run_data_sql ;
}
}
}
With this in place, a plugin module is truly self-contained: it specifies what to do at trigger points like on_store, it specifies the relationships that tie it in to the rest of the Email::Store application, and it specifies how to create the database table that it relates to.
There's one more slight niggle - since the end user specifies what SQL database to use, and since not all databases use the same variant of SQL, what if the schema in a DATA section isn't appropriate for what the end user is using? Class::DBI::DATA::Schema handles this too, by using SQL::Translator to automatically translate the schema to a different variant of SQL. We can say
use Class::DBI::DATA::Schema (translate => [ "MySQL" => "SQLite"] );
and write our DATA schemas in MySQL's SQL. Except - we don't know at compile time that the end user is going to choose SQLite for his database; in fact, we don't know until the database is set up. So we end up doing something like this:
package Email::Store::DBI;
use base 'Class::DBI';
require Class::DBI::DATA::Schema;
sub import {
my ($self, @params) = @_;
if (@params) {
$self->set_db(Main => @params);
Class::DBI::DATA::Schema->import( translate =>
[ "MySQL" => $self->__driver ]
);
}
}
When I say use Email::Store 'dbi:SQLite:mailstore.db', Email::Store::DBI first sets up the database, and then it imports CDBI::DATA::Schema telling it to translate between MySQL and SQLite, the __driver for our database. The reality is slightly more complex than this, since we use DBD::Pg but SQL::Translator expects it to be called not "Pg" but "PostgreSQL", but the basics are there. See the source to Email::Store::DBI for the full story.
We've looked at various tools to increase the pluggability of our applications: from merely requiring classes at runtime through to modules to help us find plugins and provide trigger points or callbacks for extensions to influence the behaviour of a process; we put all these together in Module::Pluggable::Ordered, which also allows us to specify a rough ordering for the extension modules, and finally we added the concept of extending a database-based application by using Class::DBI::DATA::Schema to allow us to write fully self-contained database-backed plugins.
Making your applications pluggable is an excellent way of reducing complexity from a design - Email::Store::Mail hardly does anything itself, but delegates to plugins for almost all of its functionality. Module::Pluggable::Ordered and the database techniques we've looked at provide a low-effort way of doing that, and allowing your applications to be stretched and expanded in ways you might not imagine!