Programming

Software | Secret Software | Writing

Managing a Newsletter with Perl

It ought to have been very simple. I needed to produce a newsletter. The content was going to be created off-line, and then uploaded to a processing script which was going to disribute it electronically. And Perl was going to help me.

It ought to have been very simple, but like so many things, it ended up being a little more complex than that. Complex enough, I hope, that there's one or two things involved in the process of creating the newsletter that we can all learn from.

The first idea I had was to produce a newsletter in HTML, which people could look at on the web, or print off and distribute to less net-aware friends. And because I hate producing HTML by hand, I used the Template Toolkit to template it out. Let's begin by looking at how I did that.

Template Toolkit

Andy Wardley's Template Toolkit is a fantastically useful suite of Perl modules which implement a parser and interpreter for a little templating language. Templating languages are most often used to fill values computed by a program into some text. For instance, we could have a template like this:

    [% today %]

    [% title %] [% forename %] [% surname %]
    [% address %]

    Dear [% title %] [% surname %],
        Thank you for your letter dated [% their_date %]. This is to
    confirm that we have received it and will respond with a more
    detailed response as soon as possible. In the mean time, we
    enclose more details of ...

We tell the Template Toolkit what the various values of today, title and so on ought to be, and it fills out the template.

Of course, as we're about to find out with our newsletter project, things which start out nice and simple have a way of getting bigger and more complex. Template Toolkit supports a lot more than just filling scalars into a form: it has support for arrays, hashes and objects, the ability to include templates inside templates, declare macros, run blocks multiple times, filter text through various functions, and much more besides. Thankfully, we're only going to use a small amount of this functionality in the newsletter.

The HTML page we're constructing is slightly tricky. It uses CSS to lay out text in three columns. We'll have a header, a column describing generally what the newsletter is about, a main column of news, and then a further column of other information - how to get in touch with me, and so on. At the bottom we'll put some information about how to make sure people have the latest edition of the newsletter.

The top of the HTML is static, so we pass that out to a separate file. Let's assume we have a file called Head which contains all the HTML header, and the initial <body> tag.

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Simon's Newsletter</title>
    ...
    </head>
    <body>
    <div class="box-wrap">

Similarly, we have a static Foot file

    </div>
    </body>
    </html>

Now we can forget about most of the mucky business of HTML and concentrate on the content:

    [% INCLUDE Head %]
    <p> My newsletter will appear here. </p>
    [% INCLUDE Foot %]

We can process this with the Template Toolkit using the following bit of Perl:

    use Template;
    my $template = Template->new();
    $template->process("newsletter.thtml");

(I use the extension thtml to remind myself this is templated HTML.)

With that little program, the generated HTML page gets spat out onto standard output. Great. In fact, Template Toolkit comes with a handy little utility called tpage which is functionally equivalent to our Perl program above. You can just say tpage newsletter.thtml, and Template Toolkit will process the template in the same way.

So far so good. But of course, we haven't used any template variables yet. Let's add some now, by giving the newsletter an issue number and date:

    [% INCLUDE Head %]
    <div class="box-header">
    <h1 align="center">Simon's Newsletter</h1>
    <p align="right"><I>Issue [% issue %] - [% date %]</I></p>
    </div>
    <p> My newsletter will appear here. </p>
    [% INCLUDE Foot %]

Now we need a way to tell the template what the values of these variables are. We do this by passing in a hash reference as the second parameter of process.

    use Template;
    my $template = Template->new();
    my $vars = {
        issue => 1,
        date => scalar localtime
    };
    $template->process("newsletter.thtml", $vars);

Once again, the newsletter will be produced on standard output. As it happens, we can still use tpage even if you're adding template variables. We can say

    tpage --define issue=1 --define date="May 17th" newsletter.thtml

tpage is a very handy tool for prototyping your templates in the way we're doing here.

Now we have our header out of the way. Let's move on to our three columns. We'll handle them in order of complexity. The left-hand column is just static text, so we can dispose of that trivially:

    <div class="column-two">
    <div class="column-two-content">
    <h2>WEC Trek to Japan</h2>
    [% INCLUDE trek %]
    </div>
    </div>

The right-hand column will be partially static text, but will also contain an array of brief news items. I'm going to omit all the div and other extraneous tags for the time being so that we can concentrate on the content.

    <h2>In brief</h2>
    <ul>
    [% FOREACH point = brief %]
    <li>[% point %]</li>
    [% END %]
    </ul>

    <h2>Contact Details</h2>
    [% INCLUDE contact %]

brief is going to be an array of pieces of news. Just like in Perl, we use FOREACH to iterate over that array; the template code is equivalent to Perl's

    for my $pount (@brief) {

This makes the local variable point contain the text for each news item.

The middle column is very similar, but slightly more complex. We'll have a number of more substantial news items, separated horizontal lines. These will be passed in as an array of hash references, and we let Template do the work of sorting it all out. Here's what the template looks like for the news column.

    <h2 align="center">News</h2>

    [% FOREACH item = news %]
        <h3>[%item.title%]</h3>

        [% item.content %]
        <P ALIGN="right"><I>- [%item.when%]</I></P>

        <BR>
        <HR WIDTH="80%" ALIGN="center">
        <BR>
    [% END %]

What this says is that it expects an array called news, and will iterate over the array (that's the familiar FOREACH) putting each element in a temporary variable called item. item will itself be a hash reference, and we extract the elements called title, content and when from it.

Template's dot operator is a little like Perl 5's arrow operator (and Perl 6's dot operator) without you having to worry about the brackets: it can be used to retrieve elements from hashes or arrays and also call methods on objects. Template Toolkit knows how to look at item and do the right thing with it - if we put an object inside our news array with when, content and title methods, we'd get the same results.

This works well, but there's a little bit of a bug: we want the items separated by lines, but it looks slightly ugly to have a line right at the end after the last item. So we tell Template to output the HR tag unless we're on the last item:

    [% '<HR WIDTH="80%" ALIGN="center">' UNLESS item == news.last %]

Template Toolkit provides special "virtual methods" on Perl values which allow us to do clever things like this: arrays have methods like first and last which are equivalent to .[0] and .[-1] respectively. There are also methods which allow you to call Perl functions such as split or join on template variables.

This completes the template part of the newsletter - the complete template is given in listing X. EDITOR: this is the file newsletter.thtml.

Enter Blosxom

Now let's start thinking about how we want to get these values into our template. We will have our "in brief" news points stored in a file, one point per line, to make it very easy to read those into an array:

    open BRIEF, "inbrief" or die $!;
    my @brief = <BRIEF>;
    close BRIEF;

    $vars = { 
        issue => $issue,
        brief => \@brief,
        date => $date
    };

    $template->process("newsletter.thtml", $vars, "newsletter-$issue.html");

This time, we've used a third argument to process, which tells the Template Toolkit not to write to standard output, but to save the output to the named file.

What about the main news items? Well, this is where the story starts to get a bit more complicated. I wanted the news articles to also appear on my blog (http://blog.simon-cozens.org/) as I upload them. My blog uses a piece of software called blosxom, written by Rael Dornfest at O'Reilly. I like blosxom because it has the Unix nature - I put my blog items as plain text files in a directory and it sorts them all out. So a blog entry could be a file called 1234.txt containing this:

    Head Goes Here

    <p> Here is the text of today's blog entry </p>

blosxom looks at the first line of the file and uses that as the entry's heading. The rest of the file is HTML text which is added verbatim into the blog page which is being constructed. blosxom also takes a look at the file's timestamp in the filesystem and uses that as the date of the entry. Note that the name of the file (1234.txt) is arbitrary, and isn't used in building up the entry at all.

Now, because I wanted the news articles to appear on my blog, I thought it would be sensible to use blosxom format for the articles. That way, once they've been processed into the newsletter, they can be moved across to the blog data directory and be picked up there too. So let's read in these files the same way as blosxom does:

    use File::stat;
    use File::Copy;

    my @news;
    for my $file (<*txt>) {
        my $item = {};
        $item->{when} = localtime(stat($file)->mtime);
        $item->{when} =~ s/\d+:\d+:\d+ //;

        open IN, $file or die "$file: $!";
        $item->{title} = <IN>;

        local $/;
        $item->{content} = <IN>;
        close IN;
        push @news, $item;
        copy $file, "/opt/blog/$file";
    }

We look for all the txt files in the current directory, and process each one of them. First, we look at the last-modified time of the file and convert that to a string. We remove the time, leaving only the date, and stick that as the when element of our array. Now we can open up the file, read the first line as the title, and everything else goes into content. Once we've finished reading the file, we stick the item onto the array of news items and copy the entry over to the blog data directory for blosxom to pick it up.

Now we have all the data we need... or most of it at least.

Unpacking Archives

A further wrinkle comes from the fact that I only want to create one file off-line and let my processing program do the right thing with it. This actually works to our advantage, because we can produce a tar file which contains all the data and metadata we need in one directory.

We'll stipulate that the tar file comes in with a known filename and known format: each issue should be contained in a file called issueX.tar.gz and this should contain a directory issueX/. We can now use Archive::Tar to extract the files:

    use Archive::Tar;

    my $filename = shift;
    my $tar = Archive::Tar->new;
    $tar->read($filename, 1);
    $tar->extract;

And we can grab our issue number and the directory where we expect to find our files from the name of the file:

    my $dir = $filename;
    $dir =~ s/\.tar\.gz//;
    $dir =~ /issue(-?\d+)$/ 
        or die "Directory name not in correct format";
    my $issue = $1;

The -? is there because I wanted to produce "pre" issues of the newsletter, whimsically called "Issue -2" and "Issue -1". Because these pre-issues were monthly and the real issues will be weekly, I wanted to specify the date manually: "Issue -2" should have a date of "May", rather than "July 10-17" or whatever. So we read in the date of the newsletter from a file called date in our data directory:

    open DATE, "$dir/date" or die "Can't open date file";
    my $date = <DATE>;

So now we have all we need to produce the HTML version of the newsletter: a way to untar the input, find the issue number, look at the date, read the brief news items, and also read in the Blosxom entries.

Uploading the HTML

We have a HTML file, but it's not much good just sitting on our filesystem. We need to get it out onto the web! My personal web site is currently externally hosted, so I have to use FTP to transfer the files up to the site. No problem - Perl has the Net::FTP module to handle this for me:

    use Net::FTP;
    $ftp = Net::FTP->new("simon-cozens.org");
    $ftp->login("simon",$password) or die $!.$@;

    print "Creating HTML version...\n";
    my $output = "newsletter$issue.html";
    $template->process("newsletter.thtml", $vars, $output);

    print "Uploading $output...\n";
    $ftp->put($output, "public_html/newsletter/$output");

And I also upload it once again calling it latest.html so people can make sure they're reading the most recent version.

    print "Uploading as latest.html...\n";
    $ftp->put($output, "public_html/newsletter/latest.html");

A Final Flourish

So far we have the newsletter available as a HTML file on the web, and also as entries on my weblog. But both of these are "pull" media - people have to keep checking the site to see if there's something new. Some people expressed a desire to have the news available as "push", where they get informed every time there's an update. The obvious way to do this is by email. (Another way is via RSS, but that raises the bar a little - everyone knows how email works.) And of course, I would rather die than knowingly send HTML email.

Easy enough, I thought - I'll just knock up another template which will generate a plain-text email and send that out to a mailing list which I'd set up. This template was very similar to the HTML one, but obviously, much simpler:

    Issue [% issue %] - [% date -%]
    --------------------------

    News
    ====

    [% FOREACH item = news -%]
    [%item.title%]

    [%- item.content -%]

    - [%item.when%]
    [%- '---' UNLESS item == news.last %]
    [%- END %]

    ...

But when I processed this, I realised a slight problem. All of the news items are designed to be on the web, in Blosxom format - in HTML. I had to de-HTMLify these items before putting them through the processor. The HTML::TreeBuilder and HTML::FormatText modules came to my rescue here:

    use HTML::TreeBuilder;
    use HTML::FormatText;
    for (@news) {
        my $text = $_->{content};
        $tree = HTML::TreeBuilder->new->parse($text) or die $!;
        $formatter = HTML::FormatText->new(leftmargin => 1, rightmargin => 75);
        $_->{content} = $formatter->format($tree);
    }

This replaces each content with a plain text equivalent, ready to be processed by our email template.

Now it's a very simple matter of using Mail::Mailer to send out the processed email:

    $template->process("email.template", $vars, "email.txt");

    use Mail::Mailer;
    $mailer = new Mail::Mailer 'smtp', Server => "localhost";
    $fh = $mailer->open({Subject => "Newsletter Issue $issue",
                         To      => 'wectrek2003@lists.netthink.co.uk');
    print "Sending...\n";
    open LET, "email.txt" or die $!;
    print $fh <LET>;
    $fh->close;

And we're done. 76 lines of Perl code and seven modules later, we have a system which allows me to take a file full of news and metadata, say

    % process-newsletter issue3.tar.gz

and magically have a web site and weblog updated and a newsletter sent out via email. The whole process-newsletter program can be found in listing Y. EDITOR: this is the file process.pl.

What It's All About

EDITOR: You may, at your discretion, remove this section, leaving the second paragraph of it (In the process...) as the conclusion of the article.

    Larry is wise, and strong. But remember how his one regret was he
    didn't get to a Christian missionary? Guess what Ruby's creator used
    to be? A missionary in Hiroshima, Larry. In Hiroshima. 
        - Dave Green, NTK
          http://www.ntk.net/index.cgi?b=02001-02-16&l=160#l

So far I've been very coy about what this newsletter is all about. Next month, I'm planning to even Larry's old score - I'll be going out on a short-term mission trip working with churches around the Shiga area of western Japan. In the field, I may not have excellent Internet connectivity, so I wanted something which would allow me to do as much of the work off-line as possible; this is why I wanted only to have to deal with one file and have the processing system do the rest.

In the process of writing the newsletter system, we've seen examples of how to use the Template Toolkit, how to unpack tarballs with Archive::Tar, how to upload files with Net::FTP, how to turn HTML into plain text, and how to send out mail all from Perl. By putting in the time to create this admittedly complex processor, I'll now be able to spend less time creating the newsletter and more time creating news to go in it.

This is, I believe, exactly the kind of laziness Larry had in mind when he created Perl - laziness that requires a reasonable investment of time and effort up-front, but then allows me to keep in touch with those back home, yet still have more time away from the computer, doing good things with good people.

You can keep up to date with my trip at http://simon-cozens.org/mission/latest.html, where you'll see the output of this very system.

Latest articles

Development activity

This page was last checked for correctness on 2003-05-17. Contact Simon.