Thursday, September 24, 2009

How small is your instruction set?

[these posts are quite short because I typically have a little baby crying for my attention, see my 'Big T, little t' blog]

Let's make some big assumptions and work from there. While these assumptions are big, they give me a starting point to do some small, incremental work. Which is much easier than big upfront stuff - I am an Agilist if you weren't aware.

Imagine that your opcodes are compiled for you. Imagine that your opcodes are executing inside an image that already contains all the objects you want, for example any constants. Imagine that your dependency injection/capability interface provides you with a 'VM' object that can do things like addition and so on with a simple method invocation.

Given those assumptions, how many different opcodes do you need? Not many!

In fact, all you _need_ to do is be able to send a message. Because your message may have variable numbers of parameters, you may need another opcode to build up those parameters. So that's two. I'll add a third later, but for now that is my minimal VM implementation.

It's kinda like RISC vs CISC. I have 2 instructions, and rely on the JITer to make everything work hella fast. You (squirrelfish for example) have a bunch of instructions so you can interpret your opcodes fast. I'm pretty sure I'll end up down the squirrelfish route, but for now I'm more interested in implementing this minimal instruction set.

Here's what 'add' looks like on a number

function add: to
push this
push to
call vm, add_number, null

where 'null' is the target of the result

which may be used as follows

a.result = (1 + 2) / 3 // I quite like this example...

push 2
call 1, add, temp_1
push 3
call temp_1, divide, temp_2
push temp_2
call a, result, null

boom, as they say, shanka. Now you might wonder where all those variables came from. Well they are local variables/constants within the already compiled function - I did mention that everything was compiled and all the objects we need exist...

This makes for a nice, tiny VM that I can write very quickly and get going very quickly. I love small goals and lots of rewards!

The only real complication on top of the two opcodes are the actual data structures involved.

I'll need a general object, and then function objects and number objects (to start with). I'll need to create, in the 'image' (which will be constructed in the test code) the constants 1,2,3 and their parent, the object a and the function that is being executed. Quite a bit of work, but not as much as if I created an exhaustive instruction set.


Above I mentioned another instruction. Because of the Newspeak/NewtonScript inspiration for this VM, I want to be able to send messages to an unknown recipient. To be perfectly honest, this is almost happening when accessing constants and temps above, but things would get very cyclical if I couldn't do that.

So my new instruction is

find_and_call message_name, target

which will first look inside all closures/activation objects (ie the context where the instruction was defined) and then look at the current object's parents.

Going slowly, what a surprise

Spent all last Friday trying to get Bochs going on my Mac (running leopard) but with no luck. One of those annoying situations where I knew just enough to keep trying stuff, but not enough to actually get the damn thing working.

I'm starting to get seriously annoyed with a lack of workstation. My mac is 4 years old (Mac Mini, PPC 1.25 GHz, 1MB) and not really up to scratch.

My ubuntu laptop is a netbook and not really ideal for development either - but mainly because the monitor from my Mac Mini goes spectacularly over-saturated when I plug it in.

The work I'm doing isn't that taxing on the CPU - it's not big, it's not processor intensive, in fact it's supposed to be able to work fast in resource constrained circumstances (what?! A VM working fast in resource constrained circumstances, are you mad? Maybe ;-) Consequently, it's quite frustrating to not be able to do any work!

(Maybe a more accurate assessment would be 'it's quite frustrating to be given easy excuses to not be able to do any work' :( )

Tuesday, September 15, 2009

Apple outs GCD

Ars has a writeup about the recent open sourcing of GCD, Apple's new concurrency/multi-threading framework. My naive understanding of it is that it basically makes using a bunch of queues very simple, so reduces threading to a series of consumer/producer problems.

If you think about it, erlang is kind of like that, with each node being a queue with behaviour.

Anyhoo, the design of GCD is close to how I imagine the threading in DynamOs working.




Friday, September 11, 2009

Aliens

So, we have these bytecodes, and we want to write a Device Driver (OMG!), how can we do that if we can't address memory, let alone specify bit level structures?

By using aliens a la Smalltalk/Newspeak. The basic idea is that the VM provides an object that has functions that can do this kind of stuff. Now, because we are using a capability based security system we simply don't provide this object to most software in the system, and any we do will have to be approved by the user (with appropriate 'please, for the love of god, don't click the OK button').

This means that only, for example, drivers can access the alien, so only drivers can access memory. This is a lot less code that has to be trusted to do this stuff, a lot less code to be reviewed, and a lot less code to have bugs in it!

It does mean that drivers will have some serious time critical problems; if bytecodes are being interpreted, and memory is being accessed directly _via_ another object, then we have some serious performance issues.

That is one of the big challenges of this system, and I have some ideas about tackling them, but we'll get to that :)

Dynamic

Well, this one is easy :)

Just check out all the stuff about dynamic vs static around the web. To summarize my feeling, have a read of steve yegge's articles on Javascript/dynamic languages.

The point is, you can have very rapid development, fewer API worries (they can evolve more easily than static, 'binary compatible', interfaces) and these days can go mighty fast (just look at the JS interpreters these days).

And because you are using 'bytecodes' (should use a different term, but people will understand that one) that simply DON'T ALLOW you to address memory directly then you can do a whole range of nasty/buggy stuff.

At this stage I don't know what language will be the default (self vs javascript...) but I do know that I want to use prototypical inheritance - you can simulate all other esoteric inheritance types I know of within prototypical inheritance. I'm not sure if I want to have multiple traits, like in Self, but it would be nice.

Capability based security

I'm not a security man at all, but the simplicity of Capability based security is very appealing to me. There have been various influences upon me in this area, but the basic ones are
  • the E language - a JVM language created to experiment with capabilities
  • Newspeak - a new smalltalk/self language with all dependencies having to passed in
  • Singularity - MS's experimental OS written in C#
Of course, while searching for this stuff I've just come across Capros, which seems to do similar things to what I envision (and is of course written by people who know what they're doing). It doesn't, however, use a dynamic, OO language, is written in C, and hence is open to people doing nasty pointer stuff.

The basic concept is that you're program is given capabilities that it can use, and if it doesn't have a particular capability (such as writing a file) then it can't do that thing. The OS developers don't have to worry about checking if a function call is allowed - it simply can't be made. This should simplify development of secure software.

It also makes testing (because the whole OS is dependency injected) easier, and provides default sandboxing.

Welcome to DynamOS

Well, not quite yet.

I am taking 6 months off work to look after my son (see Big T little t) and have arranged with my wife that I have Fridays to devote to all things nerdy. This is to keep my mind going for when I go back to work, but also to give me an excuse to work on a project I've been thinking about for a while.

That project is DynamOS. As the name hopefully suggests it is a DYNAMic Operating System.

The elevator pitch is as follows;

Use a dynamic language virtual machine as the kernel of an operating system to provide capability based security and an efficient micro-kernel.


Which dynamic language? I don't know yet, but SquirrelFish is attractive. Only the bytecode of the VM will be fundamental; whatever language is compiled into that bytecode doesn't really worry me.

And where do I start writing an OS? Well there's a tutorial on GeekOS, that I hope is great, that I will be following.

What I hope to do with that tutorial is write unit tests for it... Using BOCHS as a machine to run it in and, hopefully, test assertions against. (probably use rspec to run the actual tests)

Then I plan to implement the minimum VM to get those tests working using a bytecode program instead of C.

At that point I should be able to type stuff on the keyboard and have it appear on the screen! amazing hey?