Sunday, September 7, 2008

Why 9 > 10


So I've been resistant to upgrade to Microsoft's Windows Vista for a number of reasons but probably the biggest is consistently observing Windows XP yielding higher Frames Per Second (FPS) with many DirectX titles. Windows Vista introduced DirectX 10 to the world and Windows XP users have had to be content with the older DirectX 9. I had read that Microsoft changed the video driver model for Vista to mitigate the infamous Blue Screen Of Death (BSOD) that has been the butt of jokes for many years but did not investigate further since my plans are to stay on Windows XP for quite some more time.

Here is a shining example of Microsoft being canon fodder on account of BSOD issues, it's a very old Sun Microsystems commercial from yesteryear (before Windows 2000) and pays homage to the late Jacques-Yves Cousteau (the narration). It's quite good!

http://www.youtube.com/watch?v=eNqPTOb31S8

It's also testimony to a long standing problem on Microsoft's operating systems based on the Windows NT kernel introduced in the early 90's. In Microsoft's defense, it can't control how video drivers are written by third parties, but nevertheless, less than perfect video drivers have been one of the leading causes of the infamous BSOD.

So curiosity finally got the best of me as I was talking about DirectX performance with a friend and I decided to investigate why seemingly Vista always loses to XP in sundry benchmarks I've seen in online articles and the printed word. I started with the following Wikipedia article about the Windows Display Drive Model introduced in Windows Vista:

http://en.wikipedia.org/wiki/Windows_Display_Driver_Model

When the Wikipedia article mentioned that part of a Vista display driver lives in userland the light bulb went on.

User space (aka userland) is where applications live, e.g., your web browser, your email client, your IM client, etc., etc. Kernel space is where the code that makes up the OS lives. Both of these terms are commonly used in systems programming.

So the question that beckons, when an application asks the OS (operating system) to do something on its behalf and execution transitions from your application to the OS, how does the operating system protect the stability of the system? After all, you as an application programmer might have just passed some errant arguments that could potentially bring down the system… or on the more sinister side, malicious code could have slipped in through your browser and is making system calls under the hood.

The answer is… with hardware!

The Intel architecture has a notion of “rings” when it comes to executing code (other processor architectures I’m sure have the same semantics, the nomenclature may vary). More on this in a second.

It turns out when executing code, more bits are at play than the usual 32 bits you sometimes hear about, i.e., I’m running 32 bit code versus I’m running 64 bit code. On the x86 architecture memory access is implicitly qualified with additional registers. These registers are called selectors and they happen to be 16 bits. The first 13 bits of a selector are an index into what’s called a Descriptor Table. 2 to the 32nd power is roughly 4 billion so it would seem a 32 bit Intel processor is incapable of seeing more. Such a statement is not accurate of what really is going on. You see, 32 bits refers to the context of a single application, the limits of its addressibility, not that of the processor's.

This means by varying the first 13 bits in a x86 selector register you jump from only having 4 billion memory addresses to:

2^13 (8192) * 2^32 (4 billion)

Which calculates to:

18,446,744,100,000,000,000

But wait, there’s more. The next bit (just one) says whether to go to the Local Descriptor Table or the Global Descriptor Table. So that means you can now double this figure. So you see, the “32 bits” people speak of does not really reflect the mount of memory x86 processors are really capable of managing.

Back to those rings. The last 2 bits in a selector register has to do with rings.

Applications (your browser) run in ring 3 a.k.a. user space. When an application asks the OS to do something, the OS changes last two bits of a selector register and transitions from a higher privileged ring to a lower privileged ring, e.g., from ring 0 (kernel space) to ring 3 (user space). Code running in a lower privileged ring cannot manipulate a selector register to transition back to a higher privileged ring. That’s by design since the OS cannot trust arguments passed to it. From a security standpoint this is sound. You maintain the integrity of the system continuing to provide basic services to other applications and you make malice hard at this particular layer. And there are so many layers… this is why the term computer security is such an overloaded term, has become hackneyed and means different things to different people.

But it turns out these ring transitions have an impact on performance! Any time you have boundary checks, whether they are metaphorical or literal (customs at the US border), things are slower than if you just let things through.

And with Vista’s display drivers partially living in ring 3 (user space), there are likely more ring transitions than the driver model found under Windows XP and its predecessors, hence the 10% to 15% drops (or worse) in performance under Vista.

So while Windows Vista is more stable and mitigates the blue screen of death problems of its predecessors, you get less bang for your buck as far as that screaming new video card goes (at least versus the guy who has Windows XP).

No comments: