Quote of the day

Somewhat amusing quote from gamedeff.com:

Дешевая популярность в тяжелые времена не мешает, поэтому в блог срать надо почаще (всем, кстати, рекомендую).

Preemptive note: Google Translate does not quite cope with it.

ARB_draw_buffers

ARB_draw_buffers

No, I don’t have any particular point to make. But I did not even get the t-shirt…

Achievement of the week: MakeVistaDWMHappyDance

This was the function that I added:

void GUIView::MakeVistaDWMHappyDance()
{
    // Looks like Vista has some bug in DWM. Whenever we maximize or dock
    // a view, we must do something magic, otherwise
    // white stuff appears in place of the view.
    // See http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=4208117&SiteID=1

    bool earlierThanVista = systeminfo::GetOperatingSystemNumeric() < 600;
    if( earlierThanVista )
        return;

    // What seems to work is drawing one pixel via GDI.
    // We draw it at (1,1) with usual background color.
    int grayColor = 0.61f * 255.0f;
    PAINTSTRUCT ps;
    BeginPaint(m_View, &ps);
    SetPixel(ps.hdc, 1, 1, RGB(grayColor,grayColor,grayColor));
    EndPaint(m_View, &ps);
}

I know. Reading from screen when Aero is on is slow, bad and wrong. But then, what do you do? It’s better than users staring an all-white window just because Vista decided to draw it white, no matter what you think you’re drawing into it.

…still, MakeVistaDWMHappyDance is not nearly as cool as

internal interface ICanHazCustomMenu { … }

that Nicholas added a while ago.

Don’t try to outsmart the compiler

The other day at work there was a need to flip an image vertically, in a way that did not bring large portions of other code that deals with images. Flipping vertically is easy:

for( int y = 0; y < height/2; ++y ) {
    memswap( img+y*width, img+(height-y-1)*width, width*img(arr[0]) );
}

memswap function was done this way:

// why isnt this in the std lib?
// using XOR to avoid tmp var
void memswap( void* m1, void* m2, size_t n )
{
    char *p = (char*)m1; char *q = (char*)m2;
    while ( n-- ) {
        *p ^= *q; *q ^= *p; *p ^= *q;
        p++; q++;
    }
}

The comment above the function was what triggered my interest. I just added:

// because it can be slower (local variable is likely in register;
// whereas using XOR involves reads/writes to memory)

But then I got interested in this, I just had to check what happens in one or another case.

Using Apple’s gcc 4.0.1 on Core 2 Duo, the above memory swapping code takes about 12.5 clock cycles per swapped image pixel (pixel = 4 bytes). The inner loop is this:

movzx  eax,BYTE PTR [edx-0x1]
xor    al,BYTE PTR [ecx-0x1]
mov    BYTE PTR [edx-0x1],al
xor    al,BYTE PTR [ecx-0x1]
mov    BYTE PTR [ecx-0x1],al
xor    BYTE PTR [edx-0x1],al
dec    ebx
inc    edx
inc    ecx
cmp    ebx,0xffffffff
jne    loopstart

So the loop is three memory reads, three writes and some increments of the pointers / loop counter. Visual C++ 2008 compiles it very similarly, just uses more complex addressing mode to save one loop counter:

movzx       edx,byte ptr [ecx+eax]
xor         byte ptr [eax],dl
mov         dl,byte ptr [eax]
xor         byte ptr [ecx+eax],dl
mov         dl,byte ptr [ecx+eax]
xor         byte ptr [eax],dl
dec         esi
inc         eax
test        esi,esi
jne         loopstart

What if we don’t do this “XOR trick”, and just swap the contents using a temporary variable?

// ...
char t = *p; *p = *q; *q = t;
// ...

Lo and behold, now it runs at 7 cycles / pixel (almost twice as fast), and the inner loop is two memory reads and two writes:

movzx  edx,BYTE PTR [ebx-0x1]
movzx  eax,BYTE PTR [ecx-0x1]
mov    BYTE PTR [ebx-0x1],al
mov    BYTE PTR [ecx-0x1],dl
// ... incrementing pointers / counter here, like in previous case

So yeah. The XOR trick is pretty much useless here - it’s twice as slow. Hey, it can even be slower as images get larger - if tested on a 2048×2048 image, regular swap still takes 7 cycles/pixel, but XOR trick takes 55 cycles/pixel!

I guess XOR trick is useful only in quite rare situations, for example when you’re inside of some inner loop and want to swap register values without spilling them to memory or using an additional register. Heh, Wikipedia has info on this, so I’m not saying anything new :)

Now of course, if we happen to know that our pixels are 32 bits in size, there’s no good reason to keep the loop in bytes. We can operate on integers instead:

void memswapI( void* m1, void* m2, size_t n )
{
    size_t nn = n/sizeof(int);
    int *p = (int*)m1; int *q = (int*)m2;
    while ( nn-- ) {
        int t = *p; *p = *q; *q = t;
        p++; q++;
    }
}

This runs at 1.5 cycles/pixel (XOR variant at 2.5 cycles/pixel). The assembly is pretty much the same, just with 32 bit registers.

Another option? If you use STL, just use:

std::swap_ranges(p, p+n, q);

on the pixel datatype. On 32 bit pixels, this also runs at 1.5 cycles/pixel.

So yeah. Don’t try to outsmart the compiler without measuring it.

Cool tech vs. boring details

Some of the stuff I’ve been working on last week:

  • Fixed import progress bar for movies with no audio
  • Fixed first context menu click not working on Windows
  • Eye dropper backend on Windows
  • Export Package actually works on Windows
  • Compare Binary works on Windows
  • Add checkbox to project wizard to always open it on startup
  • F1 in bundled text editor goes to scripting docs for current word
  • Fixed q/w/e/r keys in password fields and text areas toggling active Tool on Windows
  • Fixed panes not repainting on Windows after some change is done via context menu on them
  • …and so on.

Boring tiny little details.

This probably best summarizes where lion’s share of time goes when developing anything. I’m not working on some cool spherical harmonics lightmap compression. Or on cunning ways to encode shadow map information for better filtering. Or on using CUDA to compute something interesting.

In other words, I’m not working on cool technology. Instead I’m adding missing menu items. Fixing obscure corner cases. Fighting inconsistencies in operating system APIs. Spotting misplaced pixels. Adding missing keyboard shortcuts.

Nothing interesting to blog about!

But still, methinks the difference between software that is merely “good” and software that is “great” is in the details. And only in the details.

I’ll just take care of tons of more details. Maybe it will result in something good.

Crunchtime!

A few weeks ago it was all calm in the source control. Now it’s crunchtime!

I’m the master of svn deception. I do tons of useless commits just so that the stats look good. Yeah!

…ok, back to work.

Windows 7

After a steaming pile of poo that is Windows Vista, looks like Windows 7 will be something that is done right.

Ok, to be fair, Vista has lots of new features and improvements under the hood. Now, I haven’t used them, but transactional file system, exposed low level APIs to get detailed memory/IO stats, etc. etc. sound like cool & useful stuff. The problem with Vista is that all those core improvements are out-weighted by inconsistent & slow UI and some stupid blunders.

Now, Windows 7 seems to be taking on two things: 1) performance and 2) consistency. Building on all the low level improvements done in Vista, and getting the part that is visible to the user right. Yay if Microsoft can pull this off. We’ll see.

The awesome support we do

Yesterday’s experience catching up with Unity forums, as I remember it:

Take a quick look at zillions of new posts.

Answer about five questions with “what’s the value of your camera’s near plane?”.

There should be some way to automate all of this. For every 20th question, reply with “increase your near plane!”, or something.

Unite 2008

Spent last week at our conference, Unite 2008. Lots of people, lots of stuff and goodness, tired as hell, but almost recovered already.

We showed a glimpse of Unity editor for Windows at the keynote, so it is public now - yes, we are working on Windows toolchain. About the time! This is the major area I’m spending time these days - Windows, Windows, Windows. Learning WinAPI as I cruise along :) Before Unity 2.1 I spent months fixing tons of small issues, now I’m spending months doing tons of small Windows related things. Someday I’ll get back to doing tons of small things on the rendering side.

Here’s a couple of random photos that I stoleborrowed from Mantas:


Keynote in front of a Sentinel from The Matrix.


Presenters talking.


People listening!


I don’t know that guy in the center. Probably some stupid outsider. Really!

Implicit to-pointer operators must die!

For the sake of the nation,
this operator must die!

Seriously. Suppose there is some class, let’s say ColorRGBAf. That has four floats inside. Now, someone at some point decided to add this operator to it:

operator float* () { /**/ }
operator const float* () const { /**/ }

Probably because it’s easier to pass color to OpenGL this way, or something like that.

This is evil. Like, really evil. Especially if that class did not have comparison operators defined, and some totally unrelated code four years later does:

if (color != oldColor) { /* … */ }

Ouch! Sounds like someone will spend four hours debugging something that looks like an event routing issue that only happens on Windows and only with optimizations on (yes, I just did that…).

What happens here? The compiler takes pointers to two colors and compares the pointers. If for some reason both colors are temporary objects, then it can even happen that both get folded into the same variable/register/whatnot. The pointers are the same. Ouch!

Implicit “nice” operators are just disguised evil. Remove that operator, add something like GetPointer() to class if someone really wants to use that, and better even make the comparison operators private and without implementations. Yes. Much better.