So I've been grappling with the problem of security with the objects in DynamOS - how does one ensure that packages can't modify the objects passed into them, with global ramifications? Well the crux of the problem is outlined by Gilad in Objects aren't Hashes.
Now I have been treating objects as hashes, and in Javascript and so on this works fine, but it means that you can do all kinds of nasty hacks right into the most fundamental classes - and these hacks affect the entire VM. Gilad points out that an object really needs to have some kind of encapsulation - and a self reference.
The encapsulation means that there are parts of the object that can't be accessed by the outside world. There are some aspects that can't be manipulated by the outside world unless you have special permission (eg you have a reflection/mirror capability or the object exposes those abilities to you).
DynamOS has Slots, which are the internal attributes of an object, and Functions, which are the interface to the rest of the world. I have been grappling with how to have the functions refer to slots that don't have public accessors (eg you are writing the public accessor...). Because everything is a message send, what message can you send since all messages are in the functions list...
Now Gilad points out why this doesn't work - because there must be a separation between the interface and the implementation/data abstraction. In Newspeak he achieves this by having classes, and within a class you can send a message that refers to the internal slot because that slot is within your context.
So that lead me to an 'aha' moment; maybe I could use the function definition in DynamOS as a kind of object definition, but with a different 'setter' mechanism.
In a function, when you set a value, dynamos will search for the slot by symbol, and when it finds it, it will set it the the specified value. But in an object we want to do any setting on the current object. In this way we have prototypical inheritance.
I hope (and I still have to think a bit harder about this) that by simply changing this mechanism that I can achieve Gilads separation with no change to my opcodes, just a small addition to the VM object (createObjectWith: opcodes andArguments: arguments).
This doesn't stop someone from adding a 'createInstance' method within their object - and hence creating a class. It also doesn't stop someone from exposing the slot CRUD mechanism. But it does mean that objects can't ordinarily have their basic makeup changed by anyone, willy nilly.
Monday, November 23, 2009
Thursday, November 19, 2009
One thing I know about parsers...
is that my many failed attempts at writing them have resulted in the limited knowledge of what's easy, and what's not. Hence I have finished the parser for my DynamOS C, because I specifically made it an easy language to parse.
The compiler should be pretty easy now that I have the actual definition. And thinking about the implementation of this work in DynamOS C itself is raising lots of interesting questions - I wonder if I'll end up with some more opcodes.
Oh, and it adds another step to my plan; a unit testing framework. I couldn't have written the parser without unit tests, and from now on I will want to unit test my work.
UPDATE : actually, I realise what I have forgotten - and neither of these were in the numbers library so I think fair enough; Constants, Closures and Comments. The three Cs. I don't think they'll take much though. Just have to find time to do it (baby just woke up damn it!)
The compiler should be pretty easy now that I have the actual definition. And thinking about the implementation of this work in DynamOS C itself is raising lots of interesting questions - I wonder if I'll end up with some more opcodes.
Oh, and it adds another step to my plan; a unit testing framework. I couldn't have written the parser without unit tests, and from now on I will want to unit test my work.
UPDATE : actually, I realise what I have forgotten - and neither of these were in the numbers library so I think fair enough; Constants, Closures and Comments. The three Cs. I don't think they'll take much though. Just have to find time to do it (baby just woke up damn it!)
Saturday, November 14, 2009
DynamOS 'C'
My next step is to implement the DynamOS equivalent of 'C'. Sometimes C is considered to be almost assembly language - it's so close to the metal. Well I want a language that is just as close to the Virtual metal :)
I'm thinking something like this
some nice and simple brackets to make all the function calls really obvious. With just a few keywords (such as 'function' and 'open' and 'local')
After that I want to try implementing the compiler in DynamOS - which will require strings!! And that can then lead to a command line. Once we have the command line, well I think that will be enough to get moving with the C version.
I'm thinking something like this
(function numberFactory: vm
(local numberPrototype)
(local factory)
numberPrototype: newObject
(open numberPrototype
(function plus: number
result: (vm add: number to: this)
)
(function minus: number
result: (vm subtract: number from: this)
)
(function isLessThan: number
result: (vm value: this isLessThan: number)
)
)
factory: newObject
(open factory
(function numberFrom: value
value parent: numberPrototype
)
)
result: factory
)
some nice and simple brackets to make all the function calls really obvious. With just a few keywords (such as 'function' and 'open' and 'local')
After that I want to try implementing the compiler in DynamOS - which will require strings!! And that can then lead to a command line. Once we have the command line, well I think that will be enough to get moving with the C version.
Well it makes sense to me
A fully self-contained library that provides the fibonacci function, all in Dynamos Assembly. The only parameters to this library are numberFactory and listFactory.
This manages to do a lot of what is needed, the hairy stuff left is all about inheritance and especially getters/setters/slots.
The problem with the slots is that, because everything is supposed to be a message send, how do you ever actually update slots? At the moment, when a slot (always in the context so far) is created, the slot gets a default getter and setter. But how does one override the default getter/setter and still actually update the slot?
Also the Context setter is different from the object setter; the context setter updates the contents of the slot where it is defined, where as the object setter updates the contents of the current object.
I guess I'll probably end up with an addSlot: function, with some variants that provide different getters/setters. Then aliasing could be used to save the setter somewhere.
Still thinking.
Debug("creating fibonacci library", numberFactory),
CreateValueObject(interpreter, 1), // create constant for '1'
Push(Symbol.RESULT),
SetObject(numberFactory),
FunctionCall(Symbol.get("numberFrom:")),
Push(Symbol.RESULT),
FunctionCall(Symbol.get("one:")),
CreateValueObject(interpreter, 2), // create constant for '2'
Push(Symbol.RESULT),
SetObject(numberFactory),
FunctionCall(Symbol.get("numberFrom:")),
Push(Symbol.RESULT),
FunctionCall(Symbol.get("two:")),
// create second anonymous function
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),
PushSymbol(temp1), // add 'temp1' local
SetObject(argumentList),
FunctionCall(Symbol.get("add:")),
StartOpCodeList(),
Push(one), // result = index - 1
SetObject(index),
FunctionCall(minus$),
Push(Symbol.RESULT), // result = fibonacci( result )
FunctionCall(fibonacci$),
Push(Symbol.RESULT), // temp1 = result
FunctionCall(temp1Setter), // temp1 = result
Push(two), // result = index - 2
SetObject(index),
FunctionCall(minus$),
Push(Symbol.RESULT), // result = fibonacci( result )
FunctionCall(fibonacci$),
Debug("left side", temp1),
Debug("right side", Symbol.RESULT),
Push(Symbol.RESULT), // temp1 = temp1 + result
SetObject(temp1),
FunctionCall(plus$),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),
Push(argumentList), // create anon2 function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),
Push(Symbol.RESULT),
FunctionCall(anon2.toSetterSymbol()),
// Create fibonacci function
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),
PushSymbol(index), // add 'index' parameter
SetObject(argumentList),
FunctionCall(Symbol.get("add:")),
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),
StartOpCodeList(),
// create first anonymous function
// mainly here to make sure the nesting of op code lists works, otherwise would be in top context
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into arguments
FunctionCall(argumentList.toSetterSymbol()),
SetObject(listFactory), // empty symbol list
FunctionCall(Symbol.get("newList")),
Push(Symbol.RESULT), // copy into locals
FunctionCall(locals.toSetterSymbol()),
StartOpCodeList(),
new OpCode.Push(one),
new OpCode.FunctionCall(Symbol.RESULT_$),
Debug("returning (1) ", Symbol.RESULT),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),
Push(argumentList), // create anon1 function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),
Push(Symbol.RESULT),
FunctionCall(anon1.toSetterSymbol()),
// actual fibonacci function code
Debug("in fibonacci with argument", index),
Debug("******************************", numberFactory),
Push(two), // result = index isLessThan: two
SetObject(index),
FunctionCall(isLessThan$),
Push(anon1), // result = result ifTrue: [anon1] ifFalse: [anon2]
Push(anon2),
SetObject(Symbol.RESULT),
FunctionCall(ifTrue$IfFalse$),
Debug("true or false?", Symbol.RESULT),
Push(Symbol.RESULT), // contextualize anon function
Push(Symbol.CURRENT_CONTEXT),
FunctionCall(Symbol.CONTEXTUALIZE_FUNCTION_$_IN_$),
Debug("contextualized", Symbol.RESULT),
SetObject(Symbol.RESULT), // call anon function
FunctionCall(Symbol.EXECUTE),
Debug("executed function", Symbol.RESULT),
EndOpCodeList(),
Debug("got opcodes", Symbol.RESULT),
Push(argumentList), // create fibonacci function
Push(locals),
Push(Symbol.RESULT),
FunctionCall(Symbol.CREATE_FUNCTION_WITH_ARGUMENTS_$_LOCALS_$_OPCODES_$),
Debug("created", Symbol.RESULT),
Push(Symbol.RESULT), // contextualise the newly created function
Push(Symbol.CURRENT_CONTEXT),
FunctionCall(Symbol.CONTEXTUALIZE_FUNCTION_$_IN_$),
Debug("contextualized", Symbol.RESULT),
Push(Symbol.RESULT), // store fibonacci function temp
FunctionCall(temp1.toSetterSymbol()),
PushSymbol(fibonacci$), // save fibonacci to context
Push(temp1),
FunctionCall(Symbol.SET_FUNCTION_$_TO_$),
FunctionCall(Symbol.NEW_OBJECT), // create a new, empy object, move into fibonacciLibrarySlot
Push(Symbol.RESULT),
FunctionCall(fibonacciLibrarySlot.toSetterSymbol()),
PushSymbol(fibonacci$), // and add to the fibonacci library
Push(temp1),
SetObject(fibonacciLibrarySlot),
FunctionCall(Symbol.SET_FUNCTION_$_TO_$),
Debug("created fibonacci library", fibonacciLibrarySlot),
Push(fibonacciLibrarySlot), // return the library
FunctionCall(Symbol.RESULT_$)
This manages to do a lot of what is needed, the hairy stuff left is all about inheritance and especially getters/setters/slots.
The problem with the slots is that, because everything is supposed to be a message send, how do you ever actually update slots? At the moment, when a slot (always in the context so far) is created, the slot gets a default getter and setter. But how does one override the default getter/setter and still actually update the slot?
Also the Context setter is different from the object setter; the context setter updates the contents of the slot where it is defined, where as the object setter updates the contents of the current object.
I guess I'll probably end up with an addSlot: function, with some variants that provide different getters/setters. Then aliasing could be used to save the setter somewhere.
Still thinking.
Sunday, November 8, 2009
Immutable objects
or, more importantly, prototypes.
I'm grappling with the idea of moving objects between contexts. There are security issues (don't want people to change the root prototypes for all contexts within the whole OS - like smalltalk does) and performance issues (if we're using PIC then we want to use the same parents for as much as possible - so the PIC can do quick comparisons to determine which branch to take).
The only thing I've come up with is some sort of immutable object. If the prototype for an object is immutable then the parent id for that object is never going to change, and you can't update the prototype to cause security problems.
This fixes the above two concerns, at the expense of not being able to add your own useful functions to existing objects (like rails does to pretty much everything...). You can manually do this, at some speed expense, by using an intermediate prototype of your own making - all new leaves extend from this, with it's new functions.
This would break all PIC for inter-context communication, but that's your choice.
I'm grappling with the idea of moving objects between contexts. There are security issues (don't want people to change the root prototypes for all contexts within the whole OS - like smalltalk does) and performance issues (if we're using PIC then we want to use the same parents for as much as possible - so the PIC can do quick comparisons to determine which branch to take).
The only thing I've come up with is some sort of immutable object. If the prototype for an object is immutable then the parent id for that object is never going to change, and you can't update the prototype to cause security problems.
This fixes the above two concerns, at the expense of not being able to add your own useful functions to existing objects (like rails does to pretty much everything...). You can manually do this, at some speed expense, by using an intermediate prototype of your own making - all new leaves extend from this, with it's new functions.
This would break all PIC for inter-context communication, but that's your choice.
Friday, November 6, 2009
Actually, number one priority
is getting the environment set up within the VM. getting Object to be the root, and being initialised with it's useful functions (like parent: and addSlot:)
Then creating the Number functionality as a prototype. Because all the native stuff has moved into the VMObject, this can now be solely opcodes!
And all this will quickly require the ability to create functions. But not immediately.
Then creating the Number functionality as a prototype. Because all the native stuff has moved into the VMObject, this can now be solely opcodes!
And all this will quickly require the ability to create functions. But not immediately.
How small is your VM?
We all use intel. Boo. But that's just the way it is. (OK so some use ARM, but I don't think what I'm about to say will break ARM at all)
I'm developing on an ATOM (and about to tear my hair out - netbeans is a write-off, but eclipse is ok) and if you look at the tech specs for the ATOM you'll see that it has a 24k level one instruction cache. The ATOM's big brothers have 32k - so not that much more.
If one could keep the VM down to less than this, it would be hella fast. If one could keep the VM and important native calls to less than this, it would be hella hella fast.
And another advantage would be the relative size of the code to debug etc would be small as well.
This throws up an interesting question though - how would JITing hurt this? Whenever a JITted call was made, the level one cache would be destroyed, so going to and returning from the JITted code would be costly; heuristics would be needed to see just how costly.
So, is it possible to write the VM in 24k?
I simply don't know, not having written intel/c code in a long time. I feel happy that it is. Keeping it simple by making sure the number of different opcodes is kept to a minimum (see the last post) may result in longer programmes, but those programs will hopefully execute faster.
Once the kernel is written, if there is any room left we can add extra opcodes to make things faster - but only after profiling of course!
The final point to make is that we now do everything in the data cache (level one is still only 24/32k) and our JIT code essentially becomes a software replacement for the Intel branch predictors.
I'm developing on an ATOM (and about to tear my hair out - netbeans is a write-off, but eclipse is ok) and if you look at the tech specs for the ATOM you'll see that it has a 24k level one instruction cache. The ATOM's big brothers have 32k - so not that much more.
If one could keep the VM down to less than this, it would be hella fast. If one could keep the VM and important native calls to less than this, it would be hella hella fast.
And another advantage would be the relative size of the code to debug etc would be small as well.
This throws up an interesting question though - how would JITing hurt this? Whenever a JITted call was made, the level one cache would be destroyed, so going to and returning from the JITted code would be costly; heuristics would be needed to see just how costly.
So, is it possible to write the VM in 24k?
I simply don't know, not having written intel/c code in a long time. I feel happy that it is. Keeping it simple by making sure the number of different opcodes is kept to a minimum (see the last post) may result in longer programmes, but those programs will hopefully execute faster.
Once the kernel is written, if there is any room left we can add extra opcodes to make things faster - but only after profiling of course!
The final point to make is that we now do everything in the data cache (level one is still only 24/32k) and our JIT code essentially becomes a software replacement for the Intel branch predictors.
Yes - 4 (or 5) opcodes can implement Fibonacci
I've managed to implement fibonacci in my own dynamic VM, using 5 opcodes (could do it in 4). It was a real discipline to have to keep those codes down, and I tried very hard to only do enough for the fibonacci function itself.
I am quite happy with the results. I've learnt quite a bit about what is needed, and what isn't. I've tried to make sure that things stay relatively neat, and what that costs at even this early stage.
It relies heavily upon closures to get stuff done, since there is no such thing as 'if'. At this stage everything that is needed is prebuilt - even the object representing the value 1.
So here is the code, as such. Firstly, a 'smalltalk' like description of how it works
Next, the opcode translations. The actual fibonacci function first
The two anonymous functions are implemented as follows. The first simply returns 1
While the second does the more complex recursive calling of fibonacci to calculate the new number
Note the extensive use of result. There are lots of globals in this example (one, two, fibonacci$) and lots of symbols. This approach sure burns through the symbols.
Everything is a function call - you can only use a slot by calling it's accessor. So when slots are added to an object they usually have accessors created for them.
The native functions at the moment are; plus:, minus:, isLessThan: (and behind the scenes; getters and setters)
Also note CallFunctionInSlot, which is nasty. This is because I need to invoke a function that is stored in a slot (doesn't have to be a slot...), but by calling the slot accessor the function is returned, not executed. I believe there is no way around this.
From this point I have identified a few things to do next. I really need to neaten things up; move all the 'native' functions into the VM object, implement Number properly as a prototype and factory. But more fundamentally I need to
I am quite happy with the results. I've learnt quite a bit about what is needed, and what isn't. I've tried to make sure that things stay relatively neat, and what that costs at even this early stage.
It relies heavily upon closures to get stuff done, since there is no such thing as 'if'. At this stage everything that is needed is prebuilt - even the object representing the value 1.
So here is the code, as such. Firstly, a 'smalltalk' like description of how it works
function fibonacci: index
(index isLessThan: 2)
ifTrue:
[ ^ index ]
ifFalse:
[ ^ (fibonacci: (index minus: 1)) plus: (fibonacci: (index minus: 2)) ]
Next, the opcode translations. The actual fibonacci function first
Push(two), // result = index isLessThan: two
SetObject(index),
MethodCall(isLessThan$),
Push(anon1), // result = result ifTrue: anon1 ifFalse: anon2
Push(anon2),
SetObject(Symbol.RESULT),
MethodCall(ifTrue$IfFalse$),
Push(Symbol.RESULT), // contextualize anon function
Push(Symbol.CURRENT_CONTEXT),
ContextCall(Symbol.CONTEXTUALIZE_FUNCTION),
CallFunctionInSlot(Symbol.RESULT) // call anon function
The two anonymous functions are implemented as follows. The first simply returns 1
Push(one),
ContextCall(Symbol.SET_RESULT)
While the second does the more complex recursive calling of fibonacci to calculate the new number
Push(one), // result = index - 1
SetObject(index),
MethodCall(minus$),
Push(Symbol.RESULT), // result = fibonacci( result )
ContextCall(fibonacci$),
Push(Symbol.RESULT), // temp1 = result
ContextCall(temp1Setter),
Push(two), // result = index - 2
SetObject(index),
MethodCall(minus$),
Push(Symbol.RESULT), // result = fibonacci( result )
ContextCall(fibonacci$),
Push(Symbol.RESULT), // result = temp1 + result
SetObject(temp1),
MethodCall(plus$)
Note the extensive use of result. There are lots of globals in this example (one, two, fibonacci$) and lots of symbols. This approach sure burns through the symbols.
Everything is a function call - you can only use a slot by calling it's accessor. So when slots are added to an object they usually have accessors created for them.
The native functions at the moment are; plus:, minus:, isLessThan: (and behind the scenes; getters and setters)
Also note CallFunctionInSlot, which is nasty. This is because I need to invoke a function that is stored in a slot (doesn't have to be a slot...), but by calling the slot accessor the function is returned, not executed. I believe there is no way around this.
From this point I have identified a few things to do next. I really need to neaten things up; move all the 'native' functions into the VM object, implement Number properly as a prototype and factory. But more fundamentally I need to
- introduce a PushValue opcode so I can create numbers (so the values of one and two above can be created within the fibonacci program)
- find some way to define a function within an existing function (so the above anon functions can be created within the definition of the fibonacci program, rather than externally to it)
- think about exceptions
Update: I've just eliminated one of those opcodes (the one I didn't like called 'CallFunctionInSlot') with the oh so obvious function 'execute' on Function. So now you can execute a function that is sitting in a slot.
Down to 3(or 4) opcodes!
Subscribe to:
Posts (Atom)