Thursday, July 16, 2009

A New Dynamic Dispatch Algorithm?

I'm playing around a little with dynamic dispatch algorithms in a dynamically typed language.

An idea I had was for each function to create a small dispatch table, where each function slot also was a tree node.

typedef struct Slot
{
SEL selector;
IMP function;
struct Slot* left;
struct Slot* right;
} Slot;


With the class looking like this:

typedef struct Class
{
// ... Other stuff
Slot slot[FUNCTION_SLOTS];
// ... More stuff
} Class;


For dispatch we can now do:

Slot* slotPtr = object->class->slot + selector & SEL_HASH;
do
{
if (slotPtr->selector == selector) return slotPtr->function(object, selector);
slotPtr = (slotPtr->selector > selector) ? slotPtr->left : slotPtr->right;
} while (slotPtr);
// ... execute MethodNotFound:


This means a worst case O(log(N)) lookup where N is the maximum number of methods hashed to the same slot.

My tests seem to indicate that it is surprisingly fast, at least when inlining the dispatch method:

Best case: 207 million sends/s.
Worst case 6<N<11: 138 million sends/s
Direct dispatch table call: 234 million sends/s.

I'm not sure how reliable these benchmarks are, but it would be interesting to compare these results to the algorithm in the Objective-C runtime.

The downside of course is that all classes need to keep all methods they respond to in the dispatch table, as opposed to only the methods they override.

[Edit:]
Here are results of microbenchmarks on the Obj-C runtime (smaller is better):

Virtual call 1.00
Best case inlined call 1.13
Best case non-inlined 1.77
Best case lookup 1.82
Worst case inlined call 1.70
Worst case non-inlined 2.30
Worst case lookup 2.44
Objective-C OS X 10.5 1.77

Note that the Objective-C result gives the best case for the runtime.

Friday, May 15, 2009

How should the syntax look in a c-like language with closures?

I'm trying to see if I can create a nice syntax for a language that looks as close to C/C++/Java as possible while allowing syntax-like constructs to be defined by the programmer.

It turns out it's not as easy as it sounds.

The problem lies in defining multi-part function. We start out easy: how do you define the following code as functions?

if (b)
{
doWork();
}
else
{
doSomeOtherWork();
}

A stab at this particular syntax-turned-function might look like this:

void if_new(boolean b, &statement) else(&elsestatement)
{
if (b)
{
statement();
return;
}
elsestatement();
}

That looks mighty fine (let's assume the compiler knows that our syntax with the "&statement" means that it should take a block and turn it into a function), but what about the compiler?


if_new(b) { doSomething(); }
else { doSomethingElse(); }

fooFun() { foo(); }

barFun() { bar(); }

In this case, how does the compiler know that the first line is a multi-part function and the second is not? I.e. how does it know that I want the function if_new(b)else(), but not fooFun()barFun()?

One solution would be to impose a terminating ";" tax on every {}, like this:

if_new(b) { doSomething(); }
else { doSomethingElse(); };

fooFun() { foo(); };

barFun() { bar(); };

Another solution would be to make a much looser compilation the first time, and then actually treat these functions as formal syntax extensions, so that when the compiler sees "if_new" it would recognize that an "else" that follows it should be part of the function signature.

This sounds vaguely ok until you realize that you probably would like to be able to have multi-part object methods as well.

An alternative, though syntax-altering, would be to explicitly add a linking character, something like this:

if_new(b)
{
doSomething();
}
:else
{
doSomethingElse();
}

Unfortunately this means that all other multi-part syntax would have to follow the same rules, e.g.:

if (x)
{
...
}
:else
{
...
}

do
{
...
}
:while(x);

try
{
...
}
:catch (Exception e)
{
...
}
:finally
{
...
}

":" was the least intrusive character I could find, but if anyone else has a better suggestion I'd be happy to hear it.

What got me thinking if this could be done was a blog entry by Neal Gafter on multi-part method names in Java, and a follow-up by Stefan Schulz.

Thursday, February 19, 2009

(Ruby) Closures - jack of all trades, master of none.

Ruby is based around closures. Closures are both used to represent blocks of code, as well as anonymous functions:

As an anonymous function:

[1, 2, 3, 4].select { |i| i % 2 == 0 }  
# => returns [2, 4]

As a block of code:

10.times do
  puts "Hello World"
end

The problem is that they are not quite the same, and don't solve quite the same problem. In fact, I would go so far as to say that closures solve both problems... poorly.

As an anonymous function, closures grab a whole lot more context than you usually would like, and especially in ruby where you each closure by necessity always retain the complete context.

Its double nature as block and anonymous function means that exiting a closure occasionally works a bit different that expected, "return" alternately exits the current method or just returns the closure with that value. In some cases one can use "break" with an argument (a bit depending on what version of ruby you're using):

def test1
  [1, 2].collect(&Proc.new{ |i| return i })
end
p test1 # => 1

def test2
  [1, 2].collect(&lambda{ |i| return i })
end
p test2 # => 1 ([1, 2] in 1.9)

def test3
  [1, 2].collect { |i| return i } 
end
p test3 # => 1

def test4
  [1, 2].collect(&Proc.new{ |i| break i })
end
p test4 # Illegal

def test5
  [1, 2].collect(&lambda{ |i| break i })
end
p test5 # [1,2] (Illegal in 1.8)

def test6
  [1, 2].collect { |i| break i } 
end
p test6 # => 1

I am tempted to suggest that instead of lumping these two different usages into a single concept (the closure), we would be better served by separating these into anonymous functions and blocks.

A block would inherit context but an anonymous function would not (it would instead resolve variable values during creation).

Of course, the following could would be slightly different from the closure version:

a = 1
f = function() { return a; }
a = 2
p f.call() # => 1 (2 if this would have been a closure)

For our blocks this would allow us to unroll method calls, and in general optimize methods taking blocks as arguments by treating them as if the methods were macros.

Blocks and anonymous functions might not cover everything you can do with a closure, but how often do you really want a function that retains the whole context it was created in? 

Saturday, December 27, 2008

New version of xmlwise out now.

My tiny java library for looking through xml has been updated. The only new addition is a small addition to allow simple reading of plist files.  

Sunday, November 16, 2008

Read The Code

I've been thinking a bit about code reading lately. When code reading usually comes up in a discussion, it is usually about how to document the code and how much comments to add.

But I'm thinking about the skill of pure code reading - the ability to read raw source code and figure out how it works. 

Up until recently, I wasn't really aware of its importance. To be more exact, I assumed everyone's code reading skill more or less followed the overall amount of programming experience. 

It turns out it's not quite like that. Instead I've come to understand that code reading is a skill quite separate from the "general programming" skill.

I talked to one colleague about reading code, and he told me his way through complex code was not to rush headlong into the code and read it until he understood it (my preferred method), but instead he would look to divide and refactor the code into separate pieces with clear interfaces until each piece could be understood on its own.

While I think that this is a good approach to clear up complex code, it will still be hard to do the best possible refactorings unless you really understand the inner workings of the code you trying to clean.

This brings to mind a story that was related to me by another of my colleagues:

This colleague had been hired as a consultant to finish some software project. Working his way through the code he was struck by how strange the programmer who originally worked on the project had designed the program.

Believing the previous programmer had simply made some design mistakes, he went ahead and threw away the weird parts of the code and started to rewrite them. 

His newly written code started out pretty clean, but as he began work on the details he noticed the need for amendments to his original design.

As he implemented more of the details and corrected his design to accomodate them, he began to realize that his design grew more and more to resemble the original "strange" design he had thrown out.


I think this illustrates how hard it is to know what parts of the code are needlessly complex and can be safely thrown away, and what parts are complex because the code is handling some very complex details.

I suspect that good code reading skills always will be an edge, and in some cases even an essential skill.


So what can we do to get better?

There are a few ways I know of that will teach code reading: 
  • When reading a programming example, always try to run the whole programming code in your head to figure out what it does - never use code you don't understand.
  • Fix bugs in other people's code.
  • Translate code from one language into another.
I haven't read it myself, but I suspect Diomidis Spinellis's Code Reading also has some good ideas, and Joel Spolsky has an interesting blog entry that suggested a form "pair-reading" to get through truly complex code.

But most important is, as always, the desire to improve.

Friday, November 14, 2008

Good docs

Good source code documentation is valuable because trying to document your code well tells you how (re)usable your code is: 

- If it's hard to explain (i.e. hard to write clear documentation) then it is hard to understand. 
- If it is hard to understand it is hard to use (correctly). 
- If it is hard to use, then others (and probably you yourself as well) are most likely to write something new next time rather than using or bug-fixing your code.

Sunday, October 26, 2008

Busy busy

Not much written in a while, but that is mostly because I've been busy with different projects. I added a list to this blog with the tiny open source projects I'm working/have been working on.

Naga is my wrapper library around Java NIO that implements the surprisingly complex amount of boiler-plate code you usually need to use NIO for socket client-server communication.

Xmlwise is even more modest. Basically it is a few Java classes that extract data from an already parsed XML document, making the data easy to access by moving it into standard structures (HashMaps and LinkedLists). These classes have made it a delight for me to work with XML documents, and hopefully they'll be of use to others as well.

There are some other libs in progress as well.