Software | Secret Software | Writing
Seeking Images with Perl
As I intimated last time, I'm currently working on a multiuser photo gallery project, Memories. It works very nicely, but I'm always interested in finding ways to make it more fun - particularly if I can think of ways that Flickr hasn't thought of yet!
As I thought about getting people to tag their images,
I wondered if we could suggest tags for an image - by looking at tags that had been applied to similar images in the past.
But what's a "similar image"?
The imgSeek project (http://www.imgseek.net/) has already solved this problem for me - it uses a clever mathematical technique called wavelet decomposition to identify the "shape" of a photo,
and then searches a database for other photos with a similar shape.
So, I stole it.
Wavelet Decomposition
The idea behind imgseek is called "wavelet decomposition",
or,
to be more precise,
"Haar wavelet decomposition".
(See http://en.wikipedia.org/wiki/Haar_wavelet for the technical details.)
Essentially what this does is tells you how each pixel varies with respect to its neighbours; it creates a model of the chromatic contours of the picture. Since this is a mathematic model, it's easy to then compare the contours with those of another picture, and thus determine if they have the same sort of general "shape".
The process begins by reducing each picture down to a 128x128 pixel thumbnail; this cuts down the picture to a manageable size, meaning the algorithm can concentrate on the overall shape of the photo and not get bogged down in 6 million pixels of detail.
Next, we turn the pixel values - which are normally expressed as the amount of red, green and blue - into another measure, called "YIQ". Here "I" stands for "In-phase", "Q" stands for "quadrature", and the "Y" stands for "luminance". (Color spaces are funny like that.) The YIQ color space is better to use for this purpose than red green and blue because of the way it divides up the information. For instance, if we have two images which are identical except that image B is brighter than image A, the difference will only appear in the luminance value, not across all three colour components.
Fortunately, converting a pixel between RGB and YIQ is easy enough, using the following algorithm:
$y = $r*0.299 + $g*0.587 + $b*0.114;
$i = $r*0.596 - $g*0.274 - $b*0.322;
$q = $r*0.212 - $g*0.523 + $b*0.311;
Then we apply the Haar algorithm; we don't need to go into the details of this because we're just going to arrange the data into the form it wants and treat it as a black box, using imgseek's implementation. To do this, we're going to wrap it in XS, the glue language that connects between C and Perl.
Getting started
imgSeek itself comprises of several different components. There's a user interface, which helps you organise and search for similar photos; this is written in Python. Underlying that is the imgSeekLib library, the part of the package that does all the hard work; this is written in a mix of C++ and Python.
This is what we're going to need to wrap, so let's create an XS module. I'm an old-fashioned guy, so I still use h2xs for this:
h2xs -A -n Image::Seek
This creates us a skeleton module, with a file called Seek.xs which will become the wrapper for our library. We only need to use two C++ files from imgSeek, haar.cpp which does the wavelet decomposition calculations, and imgdb.cpp, which handles inserting and searching the image database, and their associated header files. We copy these files from imgSeekLib into our Perl module directory, since we're going to be making some changes to them.
Unfortunately, there isn't a clean way to compile multiple files in an XS module, so we just add to the top of Seek.xs the following two lines:
#include <haar.cpp>
#include <imgdb.cpp>
Now we have to convert these files from being interfaces to Python code to being XS code; at this stage we're not providing a Perl interface to these library functions, but we're tidying them up so that they meet the Perl coding practices. First, we attack the way they allocate memory. For instance, there's a function called absarray which allocates an array of doubles. The C way to do this is:
double * b=(double*) malloc(16384*sizeof(double));
But we don't want people calling malloc directly if we can avoid it, in case Perl's own malloc implementation is being used. So we convert this to the more XS-ish:
double * b; New(200, b, 16384, double);
The "200" is a tag which is reserved for malloc debugging - it's not used currently, but may be used in later Perl releases. The rest of the call says that b should be an array of 16384 doubles. If nothing else, it's a bit easier to understand than the clunky C original. Similarly,
memset(data,0,40*sizeof(int));
becomes
Zero(data, 40, int);
Finally, free turns into Safefree. These are all the changes we need to make to haar.cpp.
imgdb.cpp is another kettle of fish. This contains a lot of code for turning filenames into images, and then into thumbnails, and then extracting the pixels from them. It links against either the Qt or the ImageMagick image library. We don't want this at all, preferring to move the image processing to the Perl level. Instead, we will simply feed in three 128-character strings, representing the red, green and blue components. So this:
int addImage(const long int id, char* filename, char* thname,
int doThumb, int ignDim = 0) {
is going to become this:
int addImage(const long int id, unsigned char* red,
unsigned char* green, unsigned char* blue) {
This means we can delete an awful lot of code, because later on we get:
// A lot of code we don't need, then:
for (int idx = 0;idx<16384;idx++) {
rchan[idx] = pixel_cache->red;
gchan[idx] = pixel_cache->green;
bchan[idx] = pixel_cache->blue;
pixel_cache++;
}
transformChar(rchan,gchan,bchan,cdata1,cdata2,cdata3);
We can simply replace this with:
transformChar(red, green, blue, cdata1,cdata2,cdata3);
Now we can start writing some XS. There are two sections to the Seek.xs file: the "pure C" (or in our case, C++) functions, and the part which defines how to wrap those functions into Perl. So now in the pure C section we have some code, including addImage. To create a Perl function of the same name, we add an XS function signature for addImage into the XS section:
void
addImage(id, red, green, blue)
long id
unsigned char* red
unsigned char* green
unsigned char* blue
Similarly, imgseek has a function queryImgID which finds images similar to a given image by ID. It doesn't return anything, but fills up an internal data structure with results. We don't want to make any changes to this, just to wrap it in XS:
void
queryImgID(id, numres)
long id
int numres
However, we also went to get the results out of that data structure. It's a C++ STL queue, so we just pull results off the top of it until there are none left, and put them in our return array:
void
results()
PPCODE:
while(!pqResults.empty()) {
curResult = pqResults.top();
pqResults.pop();
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(curResult.id)));
PUSHs(sv_2mortal(newSVnv(curResult.score)));
}
This gives us a list of pairs, mapping ID to score. We'll wrap this XS function in a bit of Perl to tidy it up, calling the query function and then arranging the results into a useful list:
sub query_id {
my $id = shift;
my $results = shift || 10;
queryImgID($id, $results);
my @r = results();
my @rv;
push @rv, [shift @r, shift @r] while @r;
@rv;
}
Next, we need to do the same for the image adding function.
Image libraries
When I started coding Image::Seek it was called Imager::WaveletDecomposition; this was a really bad name. First, because it's not entirely obvious what "wavelet decomposition" is and why it would be a useful thing to do with Imager objects. Second, it takes a fairly general technique and makes it specific to one Perl image library, for no good reason.
imgSeek itself can be built with either Qt or ImageMagick, but it only uses these libraries to reduce an image to a 128x128 thumbnail, and then pull out red, green and blue streams from the thumbnail. Since most of what it does is number crunching, we want to abstract out the image code and do that from Perl.
We've rewritten the addImage function on the C++ side so it just gets handed three long strings. Now we need to implement the thumbnailing and stream extraction in Perl. First we use the Imager library, and come up with some code like this:
sub add_image_imager {
my ($img, $id) = @_;
my ($reds, $blues, $greens);
require Imager;
my $thumb = $img->scaleX(pixels => 128)->scaleY(pixels => 128);
for my $y (0..127) {
my @cols = $thumb->getscanline(y => $y);
for (@cols) {
my ($r, $g, $b) = $_->rgba;
$reds .= chr($r); $blues .= chr($b); $greens .= chr($g);
}
}
addImage($id, $reds, $greens, $blues);
}
This takes an Imager object $img, scales it to 128x128, and then uses the getscanline function to pull out a row of pixels. The rgba method gives us red, green, blue and alpha values of a pixel, but we're not really interested in the alpha value. These come out as integers from 0 to 255, so we turn them into ASCII values and build up some strings. These three strings, and the supplied ID of the picture, get passed to the addImage function.
Now there's no reason why we need to be Imager-specific, so let's write another similar function for Image::Imlib2 (which is actually a lot faster):
sub add_image_imlib2 {
my ($img, $id) = @_;
my ($reds, $blues, $greens);
require Image::Imlib2;
my $thumb = $img->create_scaled_image(128,128);
for my $y (0..127) {
for my $x (0..127) {
my ($r, $g, $b,$a) = $thumb->query_pixel($x,$y);
$reds .= chr($r); $blues .= chr($b); $greens .= chr($g);
}
}
addImage($id, $reds, $greens, $blues);
}
Now we can create a generic add_image function which looks at the object passed in, and despatches to the specific function:
sub add_image {
my ($image, $id) = @_;
if (UNIVERSAL::isa($image, "Imager")) { goto &add_image_imager }
if (UNIVERSAL::isa($image, "Image::Imlib2")) { goto &add_image_imlib2 }
croak "Don't know what sort of image $image is";
}
Finally, to load and save our database of models of each photo, we can simply wrap the C++ functions again - this goes into our .xs file:
void
loaddb(filename)
char* filename
void
savedb(filename)
char* filename
And that's it - we can now load a database, index some pictures with it, save it again, and query the database for pictures that look like one we've already indexed.
What have we got?
The original intention was to take a newly-uploaded photo, find other photos which look like it, and return their tags. Let's do that now with the tools we have. First, we build up the database:
use Memories;
for my $self (Memories::Photo->retrieve_all) {
next unless -e (my $file = $self->path("file"));
$img = Image::Imlib2->load($file);
Image::Seek::add_image($img, $self->id);
}
Image::Seek::savedb("/var/lib/memories/imgseek.db");
Now we can take a photo and query the database for things like it:
package Memories::Similar
sub find_similar {
my ($self, $count) = @_;
my @res = map {$_->[0] } Image::Seek::query_id($self->id, $count);
shift @res; # $self
map { $self->retrieve($_) } @res;
}
Here we're not really interested in the scores, so we extract the ID from each of the pairs returned by query_id. As the comment implies, the first ID in the result set will be the photo we're asking about - this makes sense, since that will be a perfect match in terms of similarity to itself! Then we turn each ID into a photo object, which we can use to display similar photos.
To recommend tags, we simply take the most popular of the tags of the most similar photos:
sub recommended_tags {
my $self = shift;
my %tags;
for (map { $_->tags } $self->find_similar(5)) {
$tags{$_->name}++
}
sort { $tags{$b} <=> $tags{$a} } keys %tags;
}
This is a common idiom for ordering the "most popular" items: put them into a hash, using each hash element as a counter, and then sort by that counter.
This is the most basic use of this technique: from there, we can play tricks to refine the process. For instance, we can use the score returned from query_id to create a cut-off value. After all, we've asked for 5 similar photos, but there might not be five similar photos - there may be three photos with quite similar scores, but the next two are significantly lower. We can detect this and cut them off.
Anyway, we now have the tools to find similar photos, and this is what it looks like in practice:
# dave-phd.png
Picking up on the blue and yellow of the robes, our algorithm has identified several similar photos; the first there are tagged with dave-phd, and that's the correct tag for this photo as well.
Your turn, Flickr...