Programming

Software | Secret Software | Writing

Sapphire - Another Gem of an Idea?

Naming languages after stones is getting a bit old-hat these days. We all know and love Perl; you might have heard of the Ruby language, (http://www.ruby-lang.org/) which I'll talk more about another time. There's also Chip Salzenburg's Topaz project, an idea to rewrite Perl in C++, which ended with the announcement of the Perl 6 effort. And now, there's Sapphire. So what's this all about then?

Sapphire is one of the many projects out there which was started purely and simply to prove a point. In this case, the point was that building a large program from scratch in this day and age is plain crazy. The way I was going to prove it was by showing how rapidly software can be developed when using already established libraries, and that was done by seeing how quickly I could rewrite Perl 5.

I also, as a subsidiary goal, wanted to show the flexibility of some of my design ideas for Perl 6. It's dangerous when people are sitting around for a long time discussing software design without implementations, without benchmarks and without a single line of code - I much prefer getting up and doing something to talking about it. So I was going to show my ideas in software, not in words.

Design Principles

Here are some of the ideas I was intending to showcase:

Being Good Open Source Citizens

What do I mean by this? Perl 5 is extremely self-sufficient. Once you've got the source kit, it'll compile anywhere, on almost any platform and requiring very few "support" libraries. It'll make do with what you've got. One of the unfortunate side-effects of this is that if Perl wants to do something, it implements it itself. As a result, Perl contains a load of interesting routines, but keeps them to itself. It also doesn't use any of the perfectly fine implementations of such interesting routines which are already out there.

Some people think this is a feature; I think it's a wart. If we can give things back to the Open Source community, and work with them to help improve their tools, then everyone benefits.

Generalising Solutions

One of the great design choices of Perl 5 which appears to have been completely and utterly rejected in the discussions on Perl 6's proposed language is that we do things in the most general way possible. This is why Perl doesn't need huge numbers of new built-ins - it just needs a way to make user-defined syntax with the same status as built-ins. It doesn't need beautiful new OO programming models - it just needs a way to help people write their own OO models. Sapphire tries to do things in the most general way possible.

Modularity

Perl 5 consists of a number of parts: a stunning regular expression engine, a decent way of dealing with multi-typed variables, and so on. Unfortunately, in the current implementation, these parts are all highly interdependent in twisty ways. Separating them out into separate modules means that you can test them independently, you can distribute them as independent libraries, and you can upgrade them independently.

Seems like a win to me!

So, uh, what is it?

Sapphire, then, is a partial implementation of the Perl 5 API - I wasn't setting out to create a new interpreter, although that would have been necessary for some of the API routines, such as those which execute Perl code. What I wanted was to recreate the programming environment which Perl 5 internally gives you - a sort of "super C", a C customised for creating things like Perl interpreters.

I specifically wasn't trying to do anything new. I like Perl 5. It has a lot going for it. Of course, it could be tidier, since five years of cruft have accumulated all around it now. It could be less quirky, and it could display the design goals I have just mentioned. That's what I wanted to do.

Where did it get?

I gave myself a week. I was going to hack on it for one week, and see where we got to. I didn't spend a lot of time on it, but I still managed to achieve a fair amount: I had scalars, arrays and hashes working, as well as Unicode manipulation to the level of Perl 5 and slightly beyond.

How? Well, I started by looking at the glib library, at http://developer.gnome.org/doc/API/glib/. This provided a fantastic amount of what I needed: the GPtrArray corresponds quite nicely with a Perl AV, and glib also implements hashes, which saved a lot of time - although to have HEs (hash entries) you need to dig a little into the glib source.

All the Unicode support was there - I initially used GNOME's libunicode, but then found that the development version of glib added UTF8 support and was much easier to deal with. There were a few functions I needed, which Perl 5 already had, and I'll be pushing those back to the glib maintainers for potential inclusion.

Perl uses a lot of internal variable types to ensure portability - an I32 is an integer type guaranteed to be 32 bits, no matter where it runs. Unsurprisingly, I didn't have much work to do there, either: glib provides a family of types like gint32 to do the same thing. Differing byte orders are also catered for. The "super C" environment that the Perl 5 API provides is largely out there, in existing code.

Oh, and let's be honest - there was one large piece of existing code that it was just too tempting not to use, and that was Perl itself. When you're trying to replicate something and you've got a working version in front of you, it's tricky not to borrow from it; it seems a shame to throw away five years worth of work without looking for what can be salvaged from it. A lot of the scalar handling code came from Perl 5, although I did rearrange it and make it a lot more sensible and maintainable. I wasn't just interested in developing with external libraries - I also wanted to see if I could correct some other misfeatures of Perl's internals.

The first problem to solve was the insidious use of macros on macros on macros. The way I went about this was by first outlawing lvalue macros. That is, for example,

    SvPVX(sv) = "foo";

had to turn into

    sv_setpv(sv, "foo");

Which is, incidentally, how perlapi says it should be done. Perl 5 often optimises for speed (sometimes overenthusiastically) at the expense of maintainability - Sapphire questions that trade-off, preferring to trust compiler optimisation and Moore's Law.

Next, I wrote a reasonably sophisticated Perl program to convert inline functions into macros. That is, it would take

    #ifdef EXPAND_MACROS
    INLINE void sv_setpv (SV* sv, char * pv) {
       ((XPV*)  SvANY(sv))->xpv_pv = pv;
    }
    #endif

and turn it, automatically, into:

    #ifdef EXPAND_MACROS
    #ifdef EXPAND_HERE
    INLINE void sv_setpv (SV* sv, char * pv) {
       ((XPV*)  SvANY(sv))->xpv_pv = pv;
    }
    #endif
    #else
    #define sv_setpv(sv, pv) ((XPV*)  SvANY(sv))->xpv_pv = pv
    #endif

Now you can choose whether your macros should be expanded by flipping on -DEXPAND_MACROS and whether they should be inline by playing with -DINLINE. But what's EXPAND_HERE for? Well, the above code snippet would go into an include file, maybe sv.h, and one C file - let's call it sv_inline.c - would contain the following code:

    #include <sapphire.h>
    #define EXPAND_HERE
    #include <sv.h>

Then if EXPAND_MACROS was defined, the function definitions would all be provided in one place; if macros were not expanded, sv_inline.c would define no functions. The function prototypes would be extracted automatically with C::Scan.

With the state of compiler optimisation these days, it's very likely that making everything into macros makes no significant speed difference. In which case, it's best to turn on EXPAND_MACROS to assist with source level debuggers which cannot read macros. However, you can't tell until you benchmark, and the "optional expansion" method gives you a nice easy way to do that.

I also took a swipe at memory allocation; it seems the first job in every large project these days is to write your own memory allocator. I had heard from perl5-porters and other places that the biggest speed overhead in XS routines is SV creation, and so I wrote an allocator which would maintain pools of ready-to-ship variables, refreshing the pools when there was nothing else to do, like a MacDonald's burger line.

What else can be done?

If I'd given myself two weeks, where would we be? Sticking with glib, we could very easily have safe signal handling, portable loadable module support, a main event dispatch loop and a safe threading model. It's all there, ready to go. It's free software, and that's just one library.

To be honest, I wouldn't advocate the use of glib for everything we could do with it. As an example, I replaced Perl's main run loop (Perl_runops_standard in run.c) with a GMainLoop and benchmarked the two: the glib version, although signal safe, was at least five times slower. (However, you may want to contemplate what it means for graphical application programming if you have, effectively, a GNOME event loop right there in your interpreter.)

Heavier Unicode support would probably need libunicode. What about regular expressions? Well, the glib developers are working on a fully Unicode-aware Perl-compatible regular expression library. (which, frankly, is more than we've got.) If they don't finish that, Philip Hazel's Perl Compatible Regular Expression library (http://foo) does exactly what it says on the tin.

What can't be done?

There are some areas of Perl's programming environment where I'm not aware of a pre-existing solution. For instance, scope.c in the Perl source distribution gives C the concept of "dynamic scope", allowing you to save away variables and restore them at the end of a block, just like the local operator in Perl.

And some problems just can't be solved in C. There's no good way, for instance, to get a partitioned namespace. I didn't bother trying. Once you've told the developer what the API is, it's their responsibility to ensure they stay out of its way.

On the other hand, C is not meant to be a language which gives you this sort of support. Some would argue that C++ solves these problems, but in my experience, C++ never solves anything.

Structure of a Sapphire

As I've mentioned, I tried to plan the structure of Sapphire along modular lines, so that pieces could be individually tested and upgraded. My proposed structure was a series of libraries, like this:

libsvar

Standing for "Sapphire variables", libsvar contains all the functions for manipulating SVs, AVs and HVs. This is an interesting library in its own right, which can be used for programming outside of the Sapphire environment - having SVs in an ordinary C program without the overhead of a Perl interpreter really expands your programming possibilities and, as far as I'm aware, there isn't a good variable handling library around.

libre

The regular expression engine would be separated into its own library, again so that external applications can use it without an entire Perl interpreter. I didn't implement this myself, leaving it to PCRE or glib to provide this.

libutf8

Again, we can split off the Unicode handling functions into their own library, although this functionality can be implemented by libunicode or glib.

libscope

The present-day scope.c and scope.h solve a problem in C by giving it dynamic scoping; this is something that contributes to the friendliness of the Perl programming environment, and something we can separate out and share.

libpp

Although this wouldn't be useful outside of Sapphire, libpp would contain the "push-pop" code which runs the operations inside of the interpreter.

libutil

libutil would contain everything else which was potentially useful outside of Sapphire - the memory allocation, the stack manipulation and so on.

The future of Sapphire

So, what am I going to do with Sapphire now? To be honest, nothing at all. I hope it's now served its purpose by presenting the argument for reusable code and stable design principles, and so I don't think there's anything else I need to do with it.

I certainly don't, at present, want to be placed in a position where I'm choosing between spending time fiddling with Sapphire and spending time contributing to coding Perl 6. Please understand: Sapphire is emphatically not intending to be a fork of Perl - merely an interesting interlude - and this is shown through the fact that I didn't try and make any exciting changes.

If anyone has some interesting ideas for how to take this ball and run with it, feel free. It's free software, and this is exactly what you should be doing. Contact me if you'd like a copy of the source.

I do have some thoughts on what my next experiment is going to be, however...

Reflections through a Sapphire

What have I learnt from all this? I've learnt a lot about the structure of Perl 5 - I've realised that roughly half of it is support infrastructure for the other half, the business half. Is this good or bad? Well, it certainly means that we're not beholden to anyone else - an external library may suddenly change its implementation, semantics or interface, and Sapphire would have to struggle to catch up. Perhaps it's all about control - by implementing everything ourselves, the porters retain control over Perl.

I've also learnt that Perl 5, internally, has a lot to share, yet, even though we claim to believe in code reuse where the CPAN's concerned, we do very little of it on a lower level, neither really giving nor really taking.

I've learnt that rapid development can come out of a number of things: firstly, having external code already written to do the work for you helps a lot, even though you don't have such control over it.

Having an existing implementation of what you're trying to program also helps, although you have to tread a fine line. Taking Perl 5 code wholesale meant I either had to do a lot of surgery or support things I didn't really want to support, but ignoring the whole of the existing codebase would feel like throwing the baby out with the bathwater. (Hence I would caution the Perl 6 internals people to thresh carefully the Perl 5 code; there is some wheat in that chaff, or else you wouldn't be using it...)

Finally, rapid development can come from having a well-organised and disciplined team: my team swiftly agreed on all matters of design and implementation and got down to coding without interminable and fruitless discussions, taking unanimous decisions on how to get around problems - because I was my team.

Would I say the Sapphire experiment was a success? Well, since it taught me all the above, it certainly can't have been a failure. Did it prove the point that developing with reusable code is worth the sacrifice in terms of control? That remains to be seen...

Latest articles

Development activity

This page was last checked for correctness on 2000-09-11. Contact Simon.