Sunday, September 21, 2008

Heard about Newspeak

Newspeak hit the blogosphere recently, popping up on Reddit.

I obviously had to have a look, and it could be briefly described as an OO language with a (currently) smalltalk-ish syntax. 

This is not Smalltalk however, despite borrowed syntax and Smalltalk-like dynamic typing. Instead it tries to be a new language that incorporates many of the newer ideas in language design.

One of the people working on Newspeak is Gilad Bracha. Aside from working on Strongtalk (a fast Smalltalk implementation with optional typing) he's known for writing a lot of papers on various aspects on OO as well as having had a hand in how generics was added to Java. 

Looking through the papers is like looking through the feature list of Newspeak: pluggable types, the concept of mirrors for reflection, mixins for Strongtalk... they're all there.

I could write something about the language in general, but it's easier just to direct you to the website for the overview.

Anyway, I was reading through the spec today and found a bunch of small but interesting things about the language. For instance, Newspeak will be implementing fixed decimal representations, which means that doing something like 1.0000000000001 - 1.0 will actually return 0.0000000000001 instead of some unexpected number due to an imprecise floating point representation. It will use Actor model concurrency (very nice), it's going to have true immutable objects and of course the mirrors and pluggable types are there too.

Everything boiling down to a (potentially) very interesting language.


/C  

Monday, September 8, 2008

More multiple dispatch

I've been looking a bit more at multiple dispatch (multimethods). 

The neat thing is that you can view multiple dispatch as sort of a superset of dynamic typing.

For example, if I define (I'm going to use Java-like syntax here) the free function:

void open(Door this, Key key)
{
   ...
}

then this is equivalent to creating the method

class Door
{
   void open(Key key)
   {
      ....
   }
}

So something like door.open(myKey) could be considered simply syntactic sugar for open(door, myKey).

Unlike in Java, this would compile:

Door door = new Door();
Object hiddenDoor = door;
hiddenDoor.open(myKey);

...since the actual check is always done during runtime.

If we try to pass off something as a door, this will fail during runtime as well:

Dog dog = new Dog();
dog.open(myKey); // => Runtime Error!

Which is essentially dynamic typing.

What is so damn cool about multiple dispatch, is that this far from all you can do with it.

In an dynamically typed language like Ruby, your method would look like this:

class Door
   def open(key)
      ...
   end
end

But what happens if we do something like door.open(tomato)? Well there are two ways out: fail-fast or fail-at-some-other-time. Fail-fast would mean some check on the incoming object to make sure that it has the required key-like properties, fail-later would be to let the "key" fail when - at some later time - it is used under the assumption that it is a key.

With typed multiple dispatch we can make sure that it is a key (like in a statically typed language), while retaining a lot of flexibility.

However, there is one issue. What if Key is a class, and you want to pass in a LockPick that doesn't inherit from Key?

Basically what you want to do is:

LockPick pick = new LockPick(key);
door.open(pick);

The standard way to solve this in normal Java would be to replace the Key with an interface and let both LockPick and Key implement it.

So we replace the function with:

void open(Door this, KeyInterface keyInterface)
{
   ...
}

Now as long as Key and LockPick both implement KeyInterface we're ok.

What happens in Java is that KeyInterface actually allows us to have something akin to multiple inheritance while retaining most of the simplicity of single inheritance.

With single inheritance like Java, we cannot create

class TrollWarrior extends Warrior, Troll
{
   ...
}

Instead we have to select one to be an interface, like this:

class TrollWarrior extends Warrior implements Troll
{
   ...
}

Something funny happens if we have multiple-dispatch though:

If we can define:

void exposeToSunlight(Object self) {}
void exposeToSunlight(Troll self)
{
   self.turnToStone();
}

Then obviously humanWarrior.exposeToSunlight()is going to be different than trollWarrior.exposeToSunlight()! So what has happened? Well, it turns out that once we add multiple dispatch, interfaces basically turn into classes.

The downside is that multiple inheritance can be a bit tricky, especially when functions overlap. 

Say both classes A and B implement their special version of toString(), what version should the class C, inheriting from A and B use?

It turns out that creating a precedence rule isn't trivial. Fortunately, the problem has pretty much been solved already, and the answer is called C3 linearization.

Basically this is an algorithm that takes an inheritance tree and flattens it in a way that ensures consistency in three ways (hence the 3).

Here is a pseudo Java example:

class A
{
public String hello() { return "A"; }
}

class B
{
public String hello() { return "B"; }
}

class C extends B
{
public String hello() { return "C"; }
}

class D extends A, B
{}

class E extends A, C
{}

If we suppose that new D().hello() returns "A", what does new E().hello() return?

The surprising result is that many implementations of multiple dispatch in different languages would return "C"! The C3 algorithm however, does the right thing and gives us the "A" we would expect.

The C3 linearization algorithm was first created for Dylan, but has eventually worked itself into other languages as default.

I couldn't find any implementation of the algorithm in other languages than Dylan and Scheme, but the page on C3 for Python 2.3 had a very clear description of the algorithm.

Here's my Ruby version of it (the Java version was about 5 times as long):

def c3(own_class, list_of_linearizations)
  linearization = [own_class]
  while !list_of_linearizations.empty? do
    next_value = list_of_linearizations.find do |c| 
      list_of_linearizations.all? do |d| 
        !d[1..-1].member?(c[0])
      end
    end
    raise "Illegal inheritance"  unless next_value
    head = next_value[0]
    linearization <<>
    list_of_linearizations.each { |c| c.delete(head) }
    list_of_linearizations.reject! { |c| c.empty? }
  end
  linearization
end

It might look a bit tricky, but it's not so bad once you wrap your head around it.

Multiple dispatch keeps looking better every time I look at it, which makes me all the more embarrassed that I never looked at CLOS (CLOS bases its OO system on multiple dispatch). 

I was really surprised and impressed by the fact that support for dynamic typing and multiple inheritance arose spontaneously from just following consequences of basic multiple dispatch. 

Wednesday, September 3, 2008

Java multi threaded blocking IO wins over NIO?!

I just read through a blog post titled "Writing Java Multithreaded Servers - What's old is new". Basically they did some serious benchmarking on java with NIO vs threads with blocking IO and found that the latter was faster and scaled well too.

Before you run into the streets shouting "Lies! Bloody lies all of it!", I think I should mention that this was on the 2.6 Linux kernel. If you have played around with threads on earlier Linux kernels, you know that threads on those systems ate resources like crazy. Apparently the new threading implementation changed all that. Interestingly, Windows XP also show significantly better performance with multiple threads rather than NIO.

Talk about the world turning upside down.

That said, I still am not 100% convinced that the last word has been said. Most of the examples were of IO where data flows in one direction at a time. This is where multi-threaded blocking IO rules. But what if you have something like say an online poker server, where you have continuous input and output? Suddenly one thread isn't good enough, you need at least one for input and one for output. In addition, any timeouts require yet another thread, since it has to be unaffected by the blocking IO.

So say you have your java process with 15k connections. That's a whopping 30k threads just to handle the IO. That is not counting the threads for the poker tables or db access.

What's more, both input and output threads become little more than worker threads to receive and send data, since the actual gameplay will necessarily run on another thread. This means that all the connection input thread will do is really to read input and dispatch it to some poker table thread. A message being another word for "event".

Which means we're back to event handling just like in the NIO case, and we just gained 30k threads mostly doing nothing.

What I'm trying to say is that I don't think the results are as conclusive as they may seem. Yes, there are many applications where threads with blocking IO makes things incredibly easy to code, and that's where the benchmarks are applicable, but then there are other situations where they quite simply doesn't make sense, like in the poker server example, where each connection would require more threads and those threads in turn would need to communicate with other threads a lot more.