A Slight Shift in Direction ...
Monday, December 8, 2008 at 01:44PM Well, it’s been an embarrassingly long time since I last wrote anything here. That is primarily because my focus has taken a sharp shift towards the practical during the last year, which has been taking me up a long and frequently painful technology learning curve … as well as rather rapidly down an interesting ‘unlearning’ curve.
Things really got started when I followed up on the enthusiasm of a couple of friends and started to explore Ruby on Rails as a potential web development environment. Up to that point I had lots of ideas, but only the vaguest notion as to how I might ever get them actually developed. Looking at Rails and Ruby, I started to feel as though it might actually be practical for me to consider doing the development myself - despite having spent most of the last two decades primarily as a ‘theoretical technologist’ ;-)
Despite the initial attractiveness of Rails, it didn’t take me long to come to the conclusion that it wouldn’t actually suit my needs. This is not because of anything ‘wrong’ with Rails itself, but primarily because it is founded on the idea of Object-Relational Mapping (ORM) for its persistent data storage model. I had long ago come to the conclusion that the relational model is really not well suited to the kinds of systems I am interested in developing - and putting an ‘object’ veneer on them doesn’t really help much.
Ruby, however, is a different story. Despite starting out as a ‘scripting language’, it has a great deal of power and elegance. After doing a great deal of reading, I finally felt that I knew enough to try some ‘real coding’ in September. It took a while to get through the initial frustrations of having to look things up every time I wanted to try something new, but in the end I felt that my initial impressions were correct, and that this was a language I could work with.
My initial project was to come up with a persistent storage model that would suit my needs. For a variety of reasons this has been taking longer than I would have liked - although in retrospect I can’t say that I find this terribly surprising. Apart from dealing with some of Ruby’s nastier quirks - and a woefully inadequate debugging environment - my biggest challenge has actually been unlearning most of the stuff I thought I knew about database structuring techniques !!!
I always regard a serious effort at unlearning as being a tremendously good foundation for any serious attempt at innovation. As conventions and practices get settled in, codified, taught, and become dominant in the marketplace it can become extraordinarily difficult to dislodge them - especially within the constraints of a commercial enterprise. Working outside the system, as I currently do, feels amazingly and refreshingly unconstrained compared to my corporate days :-)
One of the prime motivators here is Ruby’s serious indifference towards size constraints on variables. Even its Integer types can freely expand from Fixnums (single words) to Bignums - arbitrary sequences of words strung together. String types are equally flexible. There are simply no mechanisms in the language for directly constraining the size of string containers - unless you decide to do that yourself.
This immediately creates an ‘impedance mismatch’ between Ruby and (most) relational database implementations, which tend to require specifications of the maximum sizes of fields (columns) in tables. This in turn tends to stem from the convenience and efficiency that fixed ‘record’ sizes allow in data storage and indexing models.
Anyway, to cut a long story short, I decided to make a virtue of this ‘limitation’, and see what happened if I tried to produce a persistent storage model that made a virtue out of ‘variable everything’. Although this is still work in progress, here is a list of some of the core elements so far:
- everything is stored in a single operating-system file.
- containers consist of arbitrary-sized items accessed via an ‘internal index’ in arbitrary-sized blocks.
- there is a master block index (which is itself a container) for managing allocated space (containers and free blocks).
- items are always identified by a system-generated surrogate key rather than an application-defined ‘primary key’.
- indexing containers are used for providing ordered views onto item containers.
- containers currently have a single schema for all their items (much like relational tables) - although that is set to change in the very near future.
Plans for the near future include:
- support for a much more flexible schema model.
- support for transactions - including automatic histories of item versions, along with rollback and rollforward capabilities.
- the ability to support subblocks within containers to provide more stability for the space-management algorithms.
- the ability to federate knowledge stores - which is absolutely necessary for any kind of modularity model for data - something that is notably lacking in (most) existing persistent storage models
This is still a long way from my goal of supporting very high-level semantic models, but on good days I can see a reasonable chance of getting there!
I’ve found that terminology is remarkably important. Instead of talking about ‘databases’ I talk about knowledge stores. Instead of talking about tables, I talk about ‘containers’. I’ve found that keeping this emphasis on different terminology helps significantly with the unlearning process, and also helps to keep me from getting sucked back into conventional modes of thinking.
So far, there is nothing terribly radical here from a conceptual perspective, but we are still in very early days. Stay tuned for further developments, hopefully I will write more frequently now that I’ve laid some foundations …
Ian
