Main | Article: Communication and Understanding »
Monday
08Dec2008

A Slight Shift in Direction ...

Well, it’s been an embarrassingly long time since I last wrote anything here. That is primarily because my focus has taken a sharp shift towards the practical during the last year, which has been taking me up a long and frequently painful technology learning curve … as well as rather rapidly down an interesting ‘unlearning’ curve.

Things really got started when I followed up on the enthusiasm of a couple of friends and started to explore Ruby on Rails as a potential web development environment. Up to that point I had lots of ideas, but only the vaguest notion as to how I might ever get them actually developed. Looking at Rails and Ruby, I started to feel as though it might actually be practical for me to consider doing the development myself - despite having spent most of the last two decades primarily as a ‘theoretical technologist’ ;-)

Despite the initial attractiveness of Rails, it didn’t take me long to come to the conclusion that it wouldn’t actually suit my needs. This is not because of anything ‘wrong’ with Rails itself, but primarily because it is founded on the idea of Object-Relational Mapping (ORM) for its persistent data storage model. I had long ago come to the conclusion that the relational model is really not well suited to the kinds of systems I am interested in developing - and putting an ‘object’ veneer on them doesn’t really help much.

Ruby, however, is a different story. Despite starting out as a ‘scripting language’, it has a great deal of power and elegance. After doing a great deal of reading, I finally felt that I knew enough to try some ‘real coding’ in September. It took a while to get through the initial frustrations of having to look things up every time I wanted to try something new, but in the end I felt that my initial impressions were correct, and that this was a language I could work with.

My initial project was to come up with a persistent storage model that would suit my needs. For a variety of reasons this has been taking longer than I would have liked - although in retrospect I can’t say that I find this terribly surprising. Apart from dealing with some of Ruby’s nastier quirks - and a woefully inadequate debugging environment - my biggest challenge has actually been unlearning most of the stuff I thought I knew about database structuring techniques !!!

I always regard a serious effort at unlearning as being a tremendously good foundation for any serious attempt at innovation. As conventions and practices get settled in, codified, taught, and become dominant in the marketplace it can become extraordinarily difficult to dislodge them - especially within the constraints of a commercial enterprise. Working outside the system, as I currently do, feels amazingly and refreshingly unconstrained compared to my corporate days :-)

One of the prime motivators here is Ruby’s serious indifference towards size constraints on variables. Even its Integer types can freely expand from Fixnums (single words) to Bignums - arbitrary sequences of words strung together. String types are equally flexible. There are simply no mechanisms in the language for directly constraining the size of string containers - unless you decide to do that yourself.

This immediately creates an ‘impedance mismatch’ between Ruby and (most) relational database implementations, which tend to require specifications of the maximum sizes of fields (columns) in tables. This in turn tends to stem from the convenience and efficiency that fixed ‘record’ sizes allow in data storage and indexing models.

Anyway, to cut a long story short, I decided to make a virtue of this ‘limitation’, and see what happened if I tried to produce a persistent storage model that made a virtue out of ‘variable everything’. Although this is still work in progress, here is a list of some of the core elements so far:

  • everything is stored in a single operating-system file.
  • containers consist of arbitrary-sized items accessed via an ‘internal index’ in arbitrary-sized blocks.
  • there is a master block index (which is itself a container) for managing allocated space (containers and free blocks).
  • items are always identified by a system-generated surrogate key rather than an application-defined ‘primary key’.
  • indexing containers are used for providing ordered views onto item containers.
  • containers currently have a single schema for all their items (much like relational tables) - although that is set to change in the very near future.

 Plans for the near future include:

  • support for a much more flexible schema model.
  • support for transactions - including automatic histories of item versions, along with rollback and rollforward capabilities.
  • the ability to support subblocks within containers to provide more stability for the space-management algorithms.
  • the ability to federate knowledge stores - which is absolutely necessary for any kind of modularity model for data - something that is notably lacking in (most) existing persistent storage models

This is still a long way from my goal of supporting very high-level semantic models, but on good days I can see a reasonable chance of getting there!

I’ve found that terminology is remarkably important. Instead of talking about ‘databases’ I talk about knowledge stores. Instead of talking about tables, I talk about ‘containers’. I’ve found that keeping this emphasis on different terminology helps significantly with the unlearning process, and also helps to keep me from getting sucked back into conventional modes of thinking.

So far, there is nothing terribly radical here from a conceptual perspective, but we are still in very early days. Stay tuned for further developments, hopefully I will write more frequently now that I’ve laid some foundations …

Ian

Reader Comments (3)

Ian,

I find Surrogate keys to be the most daunting concept to the "Mom's Recipe File" crowd. Databases are not only data but its interconnections. These interconnections are why the index cards were originally computerized; their existence should be monitored by the computer and the users should not be allowed to deal with them directly. In truth, I find this inability to recognize past the contents of tables also to be endemic in the application programmers' milieu. Why is data so hard to recognize as things with connections?

My own designs always have a primary key that the user cannot readily understand, i.e., an icremental number; relationships are maintained through reference to these numbers.

Every entity also always has date-tme, application, and who stamps (aka When dunnit, How dunnit, and Who dunnit.) It is not so retentive as a log file, but it is just enough to provide information for the monitors of a constantly monitored system.

What are the allowed operations on your containers? I assume: Add, Update, Delete, Read/Report/Select. Are there more?

I applaud your efforts at re-envisioning data, its storage, and its manipulations.

Cheers,

Hugh

P. S., Have you considered this in light of petabyte-sized storage media (keystroke streams) ? This only a few hardware leaps away...

December 11, 2008 at 11:19AM | Unregistered CommenterHugh Conjnor

Hi Hugh. It was a great delight to sign on today and see your thoughtful comment - my first! Thanks !!!

I first came across the idea of surrogate keys in papers by Kent from the mid-70s. I've always liked the idea, because it allows for more easily changing application-level keys without messing up relationships. While changing things like account numbers, product or part ids, and so on happens very infrequently, when it does happen it can be a right pain.

Philosophically, I tend to view surrogate keys in an application-level programming model as being analogous to object ids. They should be there in the background, but not immediately as part of the application-level data model.

I believe the same principle applies to the other kinds of metadata you mention, although one of the areas I'm still mulling over is exactly how to express relationships between these kinds of plumbing-level metadata and equivalent application-level attributes (should they be relevant).

Although I haven't quite got this far yet, I believe that containers and their items do need more operations than the standard CRUD. In particular, I believe that they need explicit relationship-management operations between items. However, I've been a bit stuck recently on exactly how to take the next step in schema management, which is where I would need to define all this.

Regarding massive data streams ... right now this isn't a primary concern. I believe that the primary difference between 'knowledge stores' and 'databases' is that knowledge stores are considerably more schema-intensive. In addition, they need to display considerably more flexibility in schema management. I'm perfectly happy to delegate massive data storage for items with fixed schemas to conventional databases.

A somewhat related issue is that I want to try making a (mostly!) non-update-in-place model work. This has some challenges - including the possibility of generating massively greater amounts of data than conventional models - but I believe that the benefits in terms of being able to express a rich temporal semantics (a key interest/concern) should be well worth it.

Anyway, once again thanks !!!

Ian

December 11, 2008 at 04:53PM | Registered CommenterIan Farbrother

Being one of those two enthusiastic guy you mentioned, it is nice to hear how your journey has gone. I remember our conversation--over Italian food I believe--and you were concerned about the persistent data storage even at that time. I hope you have enjoyed your journey so far, and I am glad that I could be a part of it.

Lionel

December 26, 2008 at 10:14PM | Unregistered CommenterLionel

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>