Monday, December 28, 2009

How to create a program

Right, so I'm back at square one.

I have fibonacci working, again, but this time with Self like objects. So how, now, should I create this fibonacci library inside my VM?

First approach is to create a constructor for the library that, when called, will go through and create everything that is needed for that library. Not too bad when considering such a simple example.

A second way would be to have a special creator function that uses Reflection/Mirrors to create all the objects and their methods. This does a better job of separating the construction logic from the actual application.

The final approach is to use serialization. In this form we have one, common, method for serialising an object graph, and its de-serialising sister. The application that we are creating is built within an IDE of some kind, and then serialised for later use.

I've been trying to think through how this might work in a prototype scenario. I quite like splitting things into a few layers;
  • The application prototype. This contains the definition of all the objects in the application, many of them set up in a prototypical fashion so that users of the program can have a look and see how the thing might be set up. This is essentially immutable.
  • The application instance. (Implying more than one instance...) This extends the prototype and actually does stuff. The prototype should never be touched by the instance (it is shared). This is what is saved when a user wants to save something... using the same serialisation/de-serialisation method.
  • The application transaction scratch space :) where in the application instance does some little bits and pieces on the way to the completion of a computation, but is too transient to bother saving in the instance.
All this requires thinking about the usual serialisation stuff; how do you draw the line around what is to be serialised? How do you link to stuff within that boundary? Outside the boundary?

In this model the application prototype is similar to the class hierarchy side of things in Java etc, but with a little more information about how the whole thing is set up - there's a lot less need for Spring like things as most of the construction has already been done.

I also wonder how/if you might apply purely functional trees (as discussed here) to the application prototype to enable multiple version of the same application running at once, or updates being applied atomically to a currently running app.

private vs public

So my objects have two collections (maps to be precise); the private slots and the public interface.

The private slots can be any object. They could therefore be used for private methods, or anonymous functions which are then used within a method (this reduces the overhead of creating anonymous functions everytime we use them), or even constants within the object (eg numbers and strings).

Using constants in this way means we don't need to provide the object with any knowledge of those constant's factories etc except when the object is constructed. This is a benefit to the image/deserialization approach to application development (more about this later).

While I say 'private', 'protected' may be a better word as they are visible to all children of the current object. They are also 'updatable', but only from the current object's point of view, the original value on the original object does not change.

The public interface is made up of only methods - messages that can be sent to the object by any other object that has a reference.

It is obviously imperative that objects have this split from a security point of view, but they also help with data/implementation hiding, a very important part of modular development.

The problem with Javascript (and I think this is what Bracha was getting at) is that everything in the object is public. Now you can use patterns as defined by Crockford, but these are a bit kludgy, a bit nasty in terms of creating a shit load of objects (all those continuations, for each and every object) and hence lead to multiple approaches to object creation.

Friday, December 18, 2009

Selflike

So I'm almost to the point where things work the way I understand Self to work. I have one bug and one conundrum to contemplate.

The bug is that functions may send messages to their context instead of the enclosing object. Basically, if an implicit receiver is used, the correct message will be found but the object that the message should be applied to is ignored, and the current activation will always be used. Not a hard fix, but have to write some tests to get it right.

The conundrum concerns instantiation of an application. There are two options; explicit builder code that is run inside a secure package (read 'object'), or a serialisation approach where objects are specified concretely, and the VM will reconstruct them from the file.

Since it would be possible to implement the second option in the first, I guess that is what I'll go for, but the mood of the VM is more inline with the objects existing in some freeze-dried state rather than executing a series of opcodes to create everything.

Wednesday, December 16, 2009

What's changed

a list of things that I am changing to get this more Self like
  • Implicit receiver is always 'this'.
  • Changed Object to have multiple inheritance
  • Remove 'context' slot from Context (made its way onto Object), turn into named super slot
Big problem at the moment is setters. Works fine for objects, but what about functions and their contexts. A closure needs to be able to update the value of its context, but we don't want just anyone able to do that...

After thinking a little while, it turns out that it is just setters. Getters work fine, and of course messages are all public so it doesn't matter. But what we really want is a way to say 'this thing is settable by the objects I contain' (be they functions or objects). This opens the object up to nefariousness, so let's say that it can only be imbued in things the object created; you can't willy nilly specify that an Object's nester is any other object.

Ahhh. Sux

KISS

I'm sick of wracking my brain over Newspeak's confounding context/inheritance hierarchy*. It's just too nasty to implement, with so many ifs and buts. So fugger it, I'm going back to Self.

Instead of having complex semantics about how to look up receivers, I'm just going to use multiple inheritance. Setters only effect the object that they are executed within ('this') while Getters go up and down the entire hierarchy.

This may sound slow, but once you've compacted the thing with PICs and so on, it really doesn't matter. I guess you could say that about the Newspeak approach as well, but this is a very simple algorithm and so should be quick to implement.

So where does 'nesting' go? Well, it's essentially a 'super' class, isn't it? In fact, apart from the convoluted semantics for receiver look up, that's all context is in Newspeak as well.

What do we lose? Well, it all ends up much less well defined, which is the power and the pain of Self, and maybe there's a security problem, but I believe most of that will be overcome by making sure that you are Dependency Injected. (Self essentially made everything accessible by containing a slot in Object to the package hierarchy)

So now we start thinking solely in instantiated objects... not pretending that some things are classes and some are contexts. It's all just objects.

Newspeak may prove better, but for my proof of concept I need something that I understand, so this will be it (Sorry Gilad that my brain is just too small)

*eg, I can't get a super receiver to then call an overriding method on the original object - but I'm sure it's just me.

Sunday, December 6, 2009

when all you have is semantics, is it a surprise that all your problems are with... semantics?


Trying to get the behaviour of message and slot lookup/access/modification correct.

Starting with a 'simple' example of inheritance, I use the above relationships (note: no nested classes yet)

So, Func0 can get at obj1, func0Slot, func0Message as well as obj1.obj1Message and obj1.obj0Message.

Obj1 constructor can get at func0Slot, func0Message, obj1Slot, obj1Message as well as obj0Message. Obj1 can update the value in func0Slot.

Obj0 can get at obj0Slot and obj0Message. Or can it?

Most of my tests have worked, with the relationship between obj1 and func0 being quiet well understood and working well. The problem comes in when I get to inheritance from obj0.

While I can find the appropriate message, the behaviour for updating the slot is broken; reading the slot should (initially) return the value in the original obj0 object - but there is no code to search for slots up the hierarchy... This could be fixed by having the message execute within the context of the original object BUT if the message updated the slot - then the original object would effected rather than the inheriting object. Boo.

So we end up with two context setting schemes. When looking up the context hierarchy, we update the slot where we find it. When looking up the inheritance hierarchy, we update the slot at the bottom of the tree!

But now the slots in parent objects are available (if you dig around) in child objects. Not ideal. And how do I determine which scheme to use - does the code have to specify?!!

So I'm a bit miffed.

Wednesday, December 2, 2009

...But in the end it helps out

Gilad's post got me thinking about the difference between objects and hashes, and specifically the encapsulation that objects bring to the table.

This then lead me to realise that ultimately, in my VM and presumably Newspeak, the slots that are available in a function call are exactly the same as the slots available in an object. And the only things that have direct access to these slots are the sub-contexts (closures) that exist beneath this level.

And so objects and functions come closer and closer as closures save the day. There are two differences I've found so far, and both are additional object behaviour, and both are because of inheritance.

Firstly Objects have parents - otherwise there would be no inheritance at all!

Secondly, according to the newspeak message lookup scheme, if the receiver hasn't been found within the context, then it should be found up the object hierarchy (though only of the deepest nested object).

Interestingly this fits well with the access rights. If you can only access things within your context, you can't affect parent objects unless they provide a message to you - which is the desired behaviour.

Anyway, I have this almost implemented (surprisingly easy) and will get back to the compiler as soon as I've
  • written test to make sure that objects created in situ will honour their parents (at the moment they won't because they are actually function contexts...)
  • found a way to enforce contextualisation of objects/functions to only the current context... at the moment you could potentially hack the context that an object is defined within!

Monday, November 23, 2009

Bloody Gilad is right again!

So I've been grappling with the problem of security with the objects in DynamOS - how does one ensure that packages can't modify the objects passed into them, with global ramifications? Well the crux of the problem is outlined by Gilad in Objects aren't Hashes.

Now I have been treating objects as hashes, and in Javascript and so on this works fine, but it means that you can do all kinds of nasty hacks right into the most fundamental classes - and these hacks affect the entire VM. Gilad points out that an object really needs to have some kind of encapsulation - and a self reference.

The encapsulation means that there are parts of the object that can't be accessed by the outside world. There are some aspects that can't be manipulated by the outside world unless you have special permission (eg you have a reflection/mirror capability or the object exposes those abilities to you).

DynamOS has Slots, which are the internal attributes of an object, and Functions, which are the interface to the rest of the world. I have been grappling with how to have the functions refer to slots that don't have public accessors (eg you are writing the public accessor...). Because everything is a message send, what message can you send since all messages are in the functions list...

Now Gilad points out why this doesn't work - because there must be a separation between the interface and the implementation/data abstraction. In Newspeak he achieves this by having classes, and within a class you can send a message that refers to the internal slot because that slot is within your context.

So that lead me to an 'aha' moment; maybe I could use the function definition in DynamOS as a kind of object definition, but with a different 'setter' mechanism.

In a function, when you set a value, dynamos will search for the slot by symbol, and when it finds it, it will set it the the specified value. But in an object we want to do any setting on the current object. In this way we have prototypical inheritance.

I hope (and I still have to think a bit harder about this) that by simply changing this mechanism that I can achieve Gilads separation with no change to my opcodes, just a small addition to the VM object (createObjectWith: opcodes andArguments: arguments).

This doesn't stop someone from adding a 'createInstance' method within their object - and hence creating a class. It also doesn't stop someone from exposing the slot CRUD mechanism. But it does mean that objects can't ordinarily have their basic makeup changed by anyone, willy nilly.

Thursday, November 19, 2009

One thing I know about parsers...

is that my many failed attempts at writing them have resulted in the limited knowledge of what's easy, and what's not. Hence I have finished the parser for my DynamOS C, because I specifically made it an easy language to parse.

The compiler should be pretty easy now that I have the actual definition. And thinking about the implementation of this work in DynamOS C itself is raising lots of interesting questions - I wonder if I'll end up with some more opcodes.

Oh, and it adds another step to my plan; a unit testing framework. I couldn't have written the parser without unit tests, and from now on I will want to unit test my work.

UPDATE : actually, I realise what I have forgotten - and neither of these were in the numbers library so I think fair enough; Constants, Closures and Comments. The three Cs. I don't think they'll take much though. Just have to find time to do it (baby just woke up damn it!)

Saturday, November 14, 2009

DynamOS 'C'

My next step is to implement the DynamOS equivalent of 'C'. Sometimes C is considered to be almost assembly language - it's so close to the metal. Well I want a language that is just as close to the Virtual metal :)

I'm thinking something like this

(function numberFactory: vm
(local numberPrototype)
(local factory)

numberPrototype: newObject

(open numberPrototype
(function plus: number
result: (vm add: number to: this)
)

(function minus: number
result: (vm subtract: number from: this)
)

(function isLessThan: number
result: (vm value: this isLessThan: number)
)
)

factory: newObject
(open factory
(function numberFrom: value
value parent: numberPrototype
)
)

result: factory
)

some nice and simple brackets to make all the function calls really obvious. With just a few keywords (such as 'function' and 'open' and 'local')

After that I want to try implementing the compiler in DynamOS - which will require strings!! And that can then lead to a command line. Once we have the command line, well I think that will be enough to get moving with the C version.

Well it makes sense to me

A fully self-contained library that provides the fibonacci function, all in Dynamos Assembly. The only parameters to this library are numberFactory and listFactory.

Debug("creating fibonacci library", numberFactory),

CreateValueObject(interpreter, 1), // create constant for '1'
Push(Symbol.RESULT),
SetObject(numberFactory),
FunctionCall(Symbol.get("numberFrom:")),
Push(Symbol.RESULT),
FunctionCall(Symbol.get("one:")),

CreateValueObject(interpreter, 2), // create constant for '2'
Push(Symbol.RESULT),
SetObject(numberFactory),
FunctionCall(Symbol.get("numberFrom:")),
Push(Symbol.RESULT),
FunctionCall(Symbol.get("two:")),

// create second anonymous function
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),

SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),
PushSymbol(temp1), // add 'temp1' local
SetObject(argumentList),
FunctionCall(Symbol.get("add:")),

StartOpCodeList(),
Push(one), // result = index - 1
SetObject(index),
FunctionCall(minus$),

Push(Symbol.RESULT), // result = fibonacci( result )
FunctionCall(fibonacci$),

Push(Symbol.RESULT), // temp1 = result
FunctionCall(temp1Setter), // temp1 = result

Push(two), // result = index - 2
SetObject(index),
FunctionCall(minus$),

Push(Symbol.RESULT), // result = fibonacci( result )
FunctionCall(fibonacci$),

Debug("left side", temp1),
Debug("right side", Symbol.RESULT),
Push(Symbol.RESULT), // temp1 = temp1 + result
SetObject(temp1),
FunctionCall(plus$),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),

Push(argumentList), // create anon2 function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),

Push(Symbol.RESULT),
FunctionCall(anon2.toSetterSymbol()),

// Create fibonacci function
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),
PushSymbol(index), // add 'index' parameter
SetObject(argumentList),
FunctionCall(Symbol.get("add:")),

SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),

StartOpCodeList(),
// create first anonymous function
// mainly here to make sure the nesting of op code lists works, otherwise would be in top context
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),

SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),

StartOpCodeList(),
new OpCode.Push(one),
new OpCode.FunctionCall(Symbol.RESULT_$),
Debug("returning (1) ", Symbol.RESULT),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),

Push(argumentList), // create anon1 function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),

Push(Symbol.RESULT),
FunctionCall(anon1.toSetterSymbol()),

// actual fibonacci function code
Debug("in fibonacci with argument", index),
Debug("******************************", numberFactory),
Push(two), // result = index isLessThan: two
SetObject(index),
FunctionCall(isLessThan$),

Push(anon1), // result = result ifTrue: [anon1] ifFalse: [anon2]
Push(anon2),
SetObject(Symbol.RESULT),
FunctionCall(ifTrue$IfFalse$),
Debug("true or false?", Symbol.RESULT),

Push(Symbol.RESULT), // contextualize anon function
Push(Symbol.CURRENT_CONTEXT),
FunctionCall(Symbol.CONTEXTUALIZE_FUNCTION_$_IN_$),
Debug("contextualized", Symbol.RESULT),

SetObject(Symbol.RESULT), // call anon function
FunctionCall(Symbol.EXECUTE),
Debug("executed function", Symbol.RESULT),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),

Push(argumentList), // create fibonacci function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),

Push(Symbol.RESULT), // contextualise the newly created function
Push(Symbol.CURRENT_CONTEXT),
FunctionCall(Symbol.CONTEXTUALIZE_FUNCTION_$_IN_$),
Debug("contextualized", Symbol.RESULT),

Push(Symbol.RESULT), // store fibonacci function temp
FunctionCall(temp1.toSetterSymbol()),

PushSymbol(fibonacci$), // save fibonacci to context
Push(temp1),
FunctionCall(Symbol.SET_FUNCTION_$_TO_$),

FunctionCall(Symbol.NEW_OBJECT), // create a new, empy object, move into fibonacciLibrarySlot
Push(Symbol.RESULT),
FunctionCall(fibonacciLibrarySlot.toSetterSymbol()),

PushSymbol(fibonacci$), // and add to the fibonacci library
Push(temp1),
SetObject(fibonacciLibrarySlot),
FunctionCall(Symbol.SET_FUNCTION_$_TO_$),
Debug("created fibonacci library", fibonacciLibrarySlot),

Push(fibonacciLibrarySlot), // return the library
FunctionCall(Symbol.RESULT_$)

This manages to do a lot of what is needed, the hairy stuff left is all about inheritance and especially getters/setters/slots.

The problem with the slots is that, because everything is supposed to be a message send, how do you ever actually update slots? At the moment, when a slot (always in the context so far) is created, the slot gets a default getter and setter. But how does one override the default getter/setter and still actually update the slot?

Also the Context setter is different from the object setter; the context setter updates the contents of the slot where it is defined, where as the object setter updates the contents of the current object.

I guess I'll probably end up with an addSlot: function, with some variants that provide different getters/setters. Then aliasing could be used to save the setter somewhere.

Still thinking.

Sunday, November 8, 2009

Immutable objects

or, more importantly, prototypes.

I'm grappling with the idea of moving objects between contexts. There are security issues (don't want people to change the root prototypes for all contexts within the whole OS - like smalltalk does) and performance issues (if we're using PIC then we want to use the same parents for as much as possible - so the PIC can do quick comparisons to determine which branch to take).

The only thing I've come up with is some sort of immutable object. If the prototype for an object is immutable then the parent id for that object is never going to change, and you can't update the prototype to cause security problems.

This fixes the above two concerns, at the expense of not being able to add your own useful functions to existing objects (like rails does to pretty much everything...). You can manually do this, at some speed expense, by using an intermediate prototype of your own making - all new leaves extend from this, with it's new functions.

This would break all PIC for inter-context communication, but that's your choice.

Friday, November 6, 2009

Actually, number one priority

is getting the environment set up within the VM. getting Object to be the root, and being initialised with it's useful functions (like parent: and addSlot:)

Then creating the Number functionality as a prototype. Because all the native stuff has moved into the VMObject, this can now be solely opcodes!

And all this will quickly require the ability to create functions. But not immediately.

How small is your VM?

We all use intel. Boo. But that's just the way it is. (OK so some use ARM, but I don't think what I'm about to say will break ARM at all)

I'm developing on an ATOM (and about to tear my hair out - netbeans is a write-off, but eclipse is ok) and if you look at the tech specs for the ATOM you'll see that it has a 24k level one instruction cache. The ATOM's big brothers have 32k - so not that much more.

If one could keep the VM down to less than this, it would be hella fast. If one could keep the VM and important native calls to less than this, it would be hella hella fast.

And another advantage would be the relative size of the code to debug etc would be small as well.

This throws up an interesting question though - how would JITing hurt this? Whenever a JITted call was made, the level one cache would be destroyed, so going to and returning from the JITted code would be costly; heuristics would be needed to see just how costly.

So, is it possible to write the VM in 24k?

I simply don't know, not having written intel/c code in a long time. I feel happy that it is. Keeping it simple by making sure the number of different opcodes is kept to a minimum (see the last post) may result in longer programmes, but those programs will hopefully execute faster.

Once the kernel is written, if there is any room left we can add extra opcodes to make things faster - but only after profiling of course!

The final point to make is that we now do everything in the data cache (level one is still only 24/32k) and our JIT code essentially becomes a software replacement for the Intel branch predictors.

Yes - 4 (or 5) opcodes can implement Fibonacci

I've managed to implement fibonacci in my own dynamic VM, using 5 opcodes (could do it in 4). It was a real discipline to have to keep those codes down, and I tried very hard to only do enough for the fibonacci function itself.

I am quite happy with the results. I've learnt quite a bit about what is needed, and what isn't. I've tried to make sure that things stay relatively neat, and what that costs at even this early stage.

It relies heavily upon closures to get stuff done, since there is no such thing as 'if'. At this stage everything that is needed is prebuilt - even the object representing the value 1.

So here is the code, as such. Firstly, a 'smalltalk' like description of how it works

function fibonacci: index
(index isLessThan: 2)
ifTrue:
[ ^ index ]
ifFalse:
[ ^ (fibonacci: (index minus: 1)) plus: (fibonacci: (index minus: 2)) ]

Next, the opcode translations. The actual fibonacci function first

Push(two), // result = index isLessThan: two
SetObject(index),
MethodCall(isLessThan$),

Push(anon1), // result = result ifTrue: anon1 ifFalse: anon2
Push(anon2),
SetObject(Symbol.RESULT),
MethodCall(ifTrue$IfFalse$),

Push(Symbol.RESULT), // contextualize anon function
Push(Symbol.CURRENT_CONTEXT),
ContextCall(Symbol.CONTEXTUALIZE_FUNCTION),

CallFunctionInSlot(Symbol.RESULT) // call anon function

The two anonymous functions are implemented as follows. The first simply returns 1

Push(one),
ContextCall(Symbol.SET_RESULT)

While the second does the more complex recursive calling of fibonacci to calculate the new number

Push(one), // result = index - 1
SetObject(index),
MethodCall(minus$),

Push(Symbol.RESULT), // result = fibonacci( result )
ContextCall(fibonacci$),

Push(Symbol.RESULT), // temp1 = result
ContextCall(temp1Setter),

Push(two), // result = index - 2
SetObject(index),
MethodCall(minus$),

Push(Symbol.RESULT), // result = fibonacci( result )
ContextCall(fibonacci$),

Push(Symbol.RESULT), // result = temp1 + result
SetObject(temp1),
MethodCall(plus$)

Note the extensive use of result. There are lots of globals in this example (one, two, fibonacci$) and lots of symbols. This approach sure burns through the symbols.

Everything is a function call - you can only use a slot by calling it's accessor. So when slots are added to an object they usually have accessors created for them.

The native functions at the moment are; plus:, minus:, isLessThan: (and behind the scenes; getters and setters)

Also note CallFunctionInSlot, which is nasty. This is because I need to invoke a function that is stored in a slot (doesn't have to be a slot...), but by calling the slot accessor the function is returned, not executed. I believe there is no way around this.

From this point I have identified a few things to do next. I really need to neaten things up; move all the 'native' functions into the VM object, implement Number properly as a prototype and factory. But more fundamentally I need to
  • introduce a PushValue opcode so I can create numbers (so the values of one and two above can be created within the fibonacci program)
  • find some way to define a function within an existing function (so the above anon functions can be created within the definition of the fibonacci program, rather than externally to it)
  • think about exceptions
Patently there's a lot more than just that, but it's a good start

Update: I've just eliminated one of those opcodes (the one I didn't like called 'CallFunctionInSlot') with the oh so obvious function 'execute' on Function. So now you can execute a function that is sitting in a slot.

Down to 3(or 4) opcodes!

Thursday, September 24, 2009

How small is your instruction set?

[these posts are quite short because I typically have a little baby crying for my attention, see my 'Big T, little t' blog]

Let's make some big assumptions and work from there. While these assumptions are big, they give me a starting point to do some small, incremental work. Which is much easier than big upfront stuff - I am an Agilist if you weren't aware.

Imagine that your opcodes are compiled for you. Imagine that your opcodes are executing inside an image that already contains all the objects you want, for example any constants. Imagine that your dependency injection/capability interface provides you with a 'VM' object that can do things like addition and so on with a simple method invocation.

Given those assumptions, how many different opcodes do you need? Not many!

In fact, all you _need_ to do is be able to send a message. Because your message may have variable numbers of parameters, you may need another opcode to build up those parameters. So that's two. I'll add a third later, but for now that is my minimal VM implementation.

It's kinda like RISC vs CISC. I have 2 instructions, and rely on the JITer to make everything work hella fast. You (squirrelfish for example) have a bunch of instructions so you can interpret your opcodes fast. I'm pretty sure I'll end up down the squirrelfish route, but for now I'm more interested in implementing this minimal instruction set.

Here's what 'add' looks like on a number

function add: to
push this
push to
call vm, add_number, null

where 'null' is the target of the result

which may be used as follows

a.result = (1 + 2) / 3 // I quite like this example...

push 2
call 1, add, temp_1
push 3
call temp_1, divide, temp_2
push temp_2
call a, result, null

boom, as they say, shanka. Now you might wonder where all those variables came from. Well they are local variables/constants within the already compiled function - I did mention that everything was compiled and all the objects we need exist...

This makes for a nice, tiny VM that I can write very quickly and get going very quickly. I love small goals and lots of rewards!

The only real complication on top of the two opcodes are the actual data structures involved.

I'll need a general object, and then function objects and number objects (to start with). I'll need to create, in the 'image' (which will be constructed in the test code) the constants 1,2,3 and their parent, the object a and the function that is being executed. Quite a bit of work, but not as much as if I created an exhaustive instruction set.


Above I mentioned another instruction. Because of the Newspeak/NewtonScript inspiration for this VM, I want to be able to send messages to an unknown recipient. To be perfectly honest, this is almost happening when accessing constants and temps above, but things would get very cyclical if I couldn't do that.

So my new instruction is

find_and_call message_name, target

which will first look inside all closures/activation objects (ie the context where the instruction was defined) and then look at the current object's parents.

Going slowly, what a surprise

Spent all last Friday trying to get Bochs going on my Mac (running leopard) but with no luck. One of those annoying situations where I knew just enough to keep trying stuff, but not enough to actually get the damn thing working.

I'm starting to get seriously annoyed with a lack of workstation. My mac is 4 years old (Mac Mini, PPC 1.25 GHz, 1MB) and not really up to scratch.

My ubuntu laptop is a netbook and not really ideal for development either - but mainly because the monitor from my Mac Mini goes spectacularly over-saturated when I plug it in.

The work I'm doing isn't that taxing on the CPU - it's not big, it's not processor intensive, in fact it's supposed to be able to work fast in resource constrained circumstances (what?! A VM working fast in resource constrained circumstances, are you mad? Maybe ;-) Consequently, it's quite frustrating to not be able to do any work!

(Maybe a more accurate assessment would be 'it's quite frustrating to be given easy excuses to not be able to do any work' :( )

Tuesday, September 15, 2009

Apple outs GCD

Ars has a writeup about the recent open sourcing of GCD, Apple's new concurrency/multi-threading framework. My naive understanding of it is that it basically makes using a bunch of queues very simple, so reduces threading to a series of consumer/producer problems.

If you think about it, erlang is kind of like that, with each node being a queue with behaviour.

Anyhoo, the design of GCD is close to how I imagine the threading in DynamOs working.




Friday, September 11, 2009

Aliens

So, we have these bytecodes, and we want to write a Device Driver (OMG!), how can we do that if we can't address memory, let alone specify bit level structures?

By using aliens a la Smalltalk/Newspeak. The basic idea is that the VM provides an object that has functions that can do this kind of stuff. Now, because we are using a capability based security system we simply don't provide this object to most software in the system, and any we do will have to be approved by the user (with appropriate 'please, for the love of god, don't click the OK button').

This means that only, for example, drivers can access the alien, so only drivers can access memory. This is a lot less code that has to be trusted to do this stuff, a lot less code to be reviewed, and a lot less code to have bugs in it!

It does mean that drivers will have some serious time critical problems; if bytecodes are being interpreted, and memory is being accessed directly _via_ another object, then we have some serious performance issues.

That is one of the big challenges of this system, and I have some ideas about tackling them, but we'll get to that :)

Dynamic

Well, this one is easy :)

Just check out all the stuff about dynamic vs static around the web. To summarize my feeling, have a read of steve yegge's articles on Javascript/dynamic languages.

The point is, you can have very rapid development, fewer API worries (they can evolve more easily than static, 'binary compatible', interfaces) and these days can go mighty fast (just look at the JS interpreters these days).

And because you are using 'bytecodes' (should use a different term, but people will understand that one) that simply DON'T ALLOW you to address memory directly then you can do a whole range of nasty/buggy stuff.

At this stage I don't know what language will be the default (self vs javascript...) but I do know that I want to use prototypical inheritance - you can simulate all other esoteric inheritance types I know of within prototypical inheritance. I'm not sure if I want to have multiple traits, like in Self, but it would be nice.

Capability based security

I'm not a security man at all, but the simplicity of Capability based security is very appealing to me. There have been various influences upon me in this area, but the basic ones are
  • the E language - a JVM language created to experiment with capabilities
  • Newspeak - a new smalltalk/self language with all dependencies having to passed in
  • Singularity - MS's experimental OS written in C#
Of course, while searching for this stuff I've just come across Capros, which seems to do similar things to what I envision (and is of course written by people who know what they're doing). It doesn't, however, use a dynamic, OO language, is written in C, and hence is open to people doing nasty pointer stuff.

The basic concept is that you're program is given capabilities that it can use, and if it doesn't have a particular capability (such as writing a file) then it can't do that thing. The OS developers don't have to worry about checking if a function call is allowed - it simply can't be made. This should simplify development of secure software.

It also makes testing (because the whole OS is dependency injected) easier, and provides default sandboxing.

Welcome to DynamOS

Well, not quite yet.

I am taking 6 months off work to look after my son (see Big T little t) and have arranged with my wife that I have Fridays to devote to all things nerdy. This is to keep my mind going for when I go back to work, but also to give me an excuse to work on a project I've been thinking about for a while.

That project is DynamOS. As the name hopefully suggests it is a DYNAMic Operating System.

The elevator pitch is as follows;

Use a dynamic language virtual machine as the kernel of an operating system to provide capability based security and an efficient micro-kernel.


Which dynamic language? I don't know yet, but SquirrelFish is attractive. Only the bytecode of the VM will be fundamental; whatever language is compiled into that bytecode doesn't really worry me.

And where do I start writing an OS? Well there's a tutorial on GeekOS, that I hope is great, that I will be following.

What I hope to do with that tutorial is write unit tests for it... Using BOCHS as a machine to run it in and, hopefully, test assertions against. (probably use rspec to run the actual tests)

Then I plan to implement the minimum VM to get those tests working using a bytecode program instead of C.

At that point I should be able to type stuff on the keyboard and have it appear on the screen! amazing hey?