Pages

Monday, April 2, 2012

Poker at work


This post in not mainly about poker.

Recently at work, we've been using a humongous third-party open source library. Our quad-core, 64 bit processors + 16 GB RAM computers take about 20 minutes to compile the library from scratch.

We wanted to speed up compile time, so we've been considering distcc, which I believe Google has been using for a while. The concept is pretty nice: just farm out a bunch of compilers.

The thing is we only have 4 computers we can use as build machines. Luckily we only have 4 devs, but we expect to grow in the future, so I was curious to know when is a good time to get more build machines.

Fortunately, I've played poker before, so I'm kind of familiar with some basic probability. With my limited knowledge, here's what I've got.

I'll just make up some numbers.

Suppose
  • there are 4 devs including me
  • we have 3 build machines
  • we all spend 1/5 of our time to compile code
  • 2 build machines mean 2 times faster the compilation time (and 3 machines mean 3X, you get it)
  • 2 devs compiling at the same time means twice the time it takes to compile (and 3 devs mean 3X, you get it)
Problem: If I wanna compile some code, what's the expected compile time?

We just need to find the expected value.

Let's enumerate the cases:
  • I'm the only person compiling.
  • 1 more dev is compiling at the same time.
  • 2 more devs are compiling at the same time.
  • 3 more devs are compiling at the same time.
The "value" in this case is compilation speed, it will take
  • 1/3 times normal compilation time, if I'm the only person compiling.
  • 2/3 times normal compilation time, if 1 more dev is compiling at the same time.
  • 3/3 times normal compilation time, if 2 more devs are compiling at the same time. (No speed up.)
  • 4/3 times normal compilation time, if 3 more devs are compiling at the same time. (Even slower.)
We got all the cases and values laid out, now the probability of each case.

If I'm the only person compiling


The probability that the 3 other devs aren't compiling is: (4/5)(4/5)(4/5).

The probability of this case happening is (4/5)(4/5)(4/5) = 0.512

If 1 more dev is compiling at the same time

The probability that another 1 dev is compiling and the other 2 aren't is: (1/5)(4/5)(4/5), but remember that there are several ways this can happen.

Dev #2 might be the person compiling while the other aren't, or dev #3 could be the person compiling as well.

There are choose(3,1) ways to choose who's compiling and who's not.

So the The probability that another 1 dev is compiling and the other 2 aren't is: (1/5)(4/5)(4/5) choose(3,1).

The probability of this case happening is (1/5)(4/5)(4/5) choose(3,1) = 0.384.

If 2 more devs are compiling at the same time

The probability that another 1 dev is compiling and the other 2 aren't is: (1/5)(1/5)(4/5) choose(3,2).

The probability of this case happening is (1/5)(1/5)(4/5) choose(3,2) = 0.096

If 3 more devs are compiling at the same time

The probability that another 1 dev is compiling and the other 2 aren't is: (1/5)(1/5)(1/5).

The probability of this case happening is (1/5)(1/5)(1/5) = 0.008.

For each case we multiply its probability with its value.

probabilitycompile speed
0.5121/30.171
0.3842/30.256
0.0963/30.096
0.0084/30.011

Expected compile speed is the sum of the last column, which is 0.533

So it seems with 4 devs and 3 machines, compiling 1/5 of the time, we still gain faster compile speed compared to our normal compile time (0.533 of original compile time).

Problem: How many devs can we afford with 3 build machines?

I can setup an equation and solve it, but it will end up with something like this.
Where n is the number of devs we need to find. Apparently, I don't know how to solve that equation, and I'm not even sure I got the equation right or not.

Plugging in some number sounded easier.

I kept increasing the number of devs until the expected compile speed became more than 1. Looks like we get about equal performance as soon as we have 11 devs including myself.

From that point, every 5th additional dev requires another build machine, which makes sense since the compile load is 0.2.

So this is how poker applies to work.

No comments:

Post a Comment