Code question about Screen Melt and Row-Major Rendering

Sonim · June 8

DOOM has a column major transform when performing the screen melt, which according to the DOOM Black Book "is done so the read operations on vertical strips play nicely with the 486 cachelines". Is this sort of micro-optimization still a thing in modern computers? If so, then why DOOM does not use this trick in all its rendering process?

Edited June 9 by Sonim

kalensar · June 8

Removing it from Chocolate Doom as a mainline modification of the source code defeats the mission statement of being close to accurate vanilla gameplay replication.

But other than that, Doom Source ports in general are emulators with different hardware goals, specs and capabilities. Doom2.exe and Doom.exe pretty much have to use DOSBOX, or the newly released DOS Open Source version, or any other Dos emulation software. Windows Doom basically has to use compatibility modes.

My first taste of Doom 2 was in the early 2000s on WIN XP. Shareware Doom was Dos oriented from what I remember.

Basically You aren't really gaining or losing anything by removing the Screen Melt Effect. Its original use just happened to work very friendly on X486 PCs at the time those were all running together.

Sonim · June 8

I guess it would improve some performance, considering we do not need to convert from row major to column major.

Gez · June 8

9 minutes ago, Sonim said:

I guess it would improve some performance, considering we do not need to convert from row major to column major.

What would be the point? Remember that this effect is only active during screen transitions, and only overlaps with gameplay during the first fraction of a second when entering a level.

And you can already get tens of thousands of FPS on the vanilla levels anyway.

SaladBadger · June 8

modern CPUs still heavily use cachelines that favor a column major transform like this, when drawing vertical columns. at 320x200 on a modern CPU it doesn't matter much, but Gooberman, in an effort to make the Doom renderer run way faster at high resolutions, has determined that making the framebuffer column-major always significantly increased the speed, since rendering of walls and things (floors/ceilings cause a bit of trouble there, though) is done vertically.

Sonim · June 8

46 minutes ago, Gez said:

What would be the point? Remember that this effect is only active during screen transitions, and only overlaps with gameplay during the first fraction of a second when entering a level.

And you can already get tens of thousands of FPS on the vanilla levels anyway.

Well, if indeed row major arrays are faster, then this is probably useful info to source ports like FastDoom, which are focused on squeezing every bit of performance. Besides, DOOM original code is now +30yrs old and I think it is healthy to reevaluate some of the functions which do not make more sense in modern times.

Gez · June 8

1 minute ago, Sonim said:

Well, if indeed row major arrays are faster, then this is probably useful info to source ports like FastDoom, which are focused on squeezing every bit of performance. Besides, DOOM original code is now +30yrs old and I think it is healthy to reevaluate some of the functions which do not make more sense in modern times.

Wrong example; FastDoom is meant to be run on old hardware, so something that was done to "play nicely with 486 cachelines" is still perfectly relevant to it.

Again, the screen melt function is not relevant to the gameplay loop, since it's only used for screen transitions. That means that if you yank it out, you get a bonus of exactly +0 FPS during gameplay. If you want irrelevant micro-optimizations, then you can also remove the function that displays the TITLEPIC lump. That way you don't have to load it and cache it; you can just display a black screen instead. This will save you one hundredth of a microsecond while launching the game, so it's a healthy reevaluation of a function which does not make more sense in modern time.

Xaser · June 8

I'm basically repeating what SaladBadger said, but the question in the OP is backwards. Rather than "can this be ripped out?", let's ask "are there other places where this trick could be used that actually make a big difference?", and the answer to that is "hell yes". Rum & Raisin is the proof. :P

Sonim · June 9

5 hours ago, Gez said:

Wrong example; FastDoom is meant to be run on old hardware, so something that was done to "play nicely with 486 cachelines" is still perfectly relevant to it.

Again, the screen melt function is not relevant to the gameplay loop, since it's only used for screen transitions. That means that if you yank it out, you get a bonus of exactly +0 FPS during gameplay. If you want irrelevant micro-optimizations, then you can also remove the function that displays the TITLEPIC lump. That way you don't have to load it and cache it; you can just display a black screen instead. This will save you one hundredth of a microsecond while launching the game, so it's a healthy reevaluation of a function which does not make more sense in modern time.

Maybe I didn't express myself so well or maybe you're just having a bad day, either way I'll just ignore any response with an ironic tone. I asked the question out of curiosity about any parts of the original code that haven't aged so well, and not with the intention of "irrelevant micro-optimizations" that will save me "one hundredth of a microsecond".

5 hours ago, Xaser said:

I'm basically repeating what SaladBadger said, but the question in the OP is backwards. Rather than "can this be ripped out?", let's ask "are there other places where this trick could be used that actually make a big difference?", and the answer to that is "hell yes". Rum & Raisin is the proof. :P

Yeah, maybe I should have asked "why DOOM does not use column major by default to improve cache use?". I'll checkout how rendering is done in Rum & Raisin code, thanks for the useful link.

Blzut3 · June 9

2 hours ago, Sonim said:

Yeah, maybe I should have asked "why DOOM does not use column major by default to improve cache use?". I'll checkout how rendering is done in Rum & Raisin code, thanks for the useful link.

I believe normally Doom renders directly to VRAM and the layout of memory is defined by the graphics hardware. The frames to transition between need to have a saved off screen buffer anyway so there's more freedom there. These days the framebuffer is rendered to a texture which your 3D accelerator can transform, so the graphics hardware will accept whatever.

Doom also uses both row major and column major drawing since the floors/ceiling drawing is done row major (as some of the calculations are constant per row unlike walls which are constant per column). The fact that it did both is why until Rum & Raisin's experiments showed otherwise everyone just assumed that a column major frame buffer would be a wash.

Rum & Raisin also changes the floor/ceiling drawing to be done in columns as well. This requires more math to be done, but on modern machines at least it seems it's faster to just go ahead and do that math (although technically part of the speed also comes from lower image quality even if mostly imperceptible). I would have to imagine the 386/486 wouldn't be able to see benefits from this, but I suppose I'm not aware of anyone validating that.

2 hours ago, Sonim said:

Maybe I didn't express myself so well

Given what Gez said I think he (and Kalensar) thinks you were suggesting the whole screen melt be removed from Chocolate Doom. I'll be honest I thought that at first glance too and similarly was confused as to what you were trying to accomplish by asking.

Gez · June 9

6 hours ago, Sonim said:

Maybe I didn't express myself so well or maybe you're just having a bad day, either way I'll just ignore any response with an ironic tone. I asked the question out of curiosity about any parts of the original code that haven't aged so well, and not with the intention of "irrelevant micro-optimizations" that will save me "one hundredth of a microsecond".

You've probably heard the adage, "if it ain't broke, don't fix it"?

Optimizing the screen transition code makes sense iff there is a problem with screen transitions. Any profiling analysis done to show that there's an unacceptable performance loss during screen melt? What is the desired end result of this optimization? What's the cost/benefit analysis?

If you were talking about optimizing line-of-sight computation, for example, then yes, there would be an actual benefit to this, as this is typically a very expensive operation, to the point that the id guys brought in a "cheat" in the form of the REJECTS lump that is there just to allow to skip some of them.

But again, screen transitions only happen when entering or exiting a level, so they don't overlap with actual gameplay. (Oh, and also for the finale, where it's even less relevant.) Here's the RunFrame function from Chocolate Doom:

//
//  D_RunFrame
//
void D_RunFrame()
{
    int nowtime;
    int tics;
    static int wipestart;
    static boolean wipe;

    if (wipe)
    {
        do
        {
            nowtime = I_GetTime ();
            tics = nowtime - wipestart;
            I_Sleep(1);
        } while (tics <= 0);

        wipestart = nowtime;
        wipe = !wipe_ScreenWipe(wipe_Melt
                               , 0, 0, SCREENWIDTH, SCREENHEIGHT, tics);
        I_UpdateNoBlit ();
        M_Drawer ();                            // menu is drawn even on top of wipes
        I_FinishUpdate ();                      // page flip or blit buffer
        return;
    }

    // frame syncronous IO operations
    I_StartFrame ();

    TryRunTics (); // will run at least one tic

    S_UpdateSounds (players[consoleplayer].mo);// move positional sounds

    // Update display, next frame, with current state if no profiling is on
    if (screenvisible && !nodrawers)
    {
        if ((wipe = D_Display ()))
        {
            // start wipe on this frame
            wipe_EndScreen(0, 0, SCREENWIDTH, SCREENHEIGHT);

            wipestart = I_GetTime () - 1;
        } else {
            // normal update
            I_FinishUpdate ();              // page flip or blit buffer
        }
    }
}

So you can see that if there's a wipe going on, it hits a return, meaning it will not do the rest of the function outside of the "if (wipe)" block, meaning it will never get to TryRunTics(), meaning there is zero gameplay going on. No thinkers are processed. Game time is fully paused for as long as it takes for the melt to proceed.

When I liken removing or improving the melt code to an irrelevant micro-optimization, that's because, in my opinion, it is. Even if you could find a way to make all the screen melt computation happen instantly, this would have no noticeable impact, especially on a modern machine. Look again at the function above, I'm pretty sure the large majority of the time is spent waiting on the I_Sleep() call, not on waiting on wipe_ScreenWipe() one...

Now if you want to do things like running Doom on a 286 or even a 8088, why not, then perhaps you will need to find ways to optimize this effect, or remove it altogether; but that's going to be a drop in the bucket compared to the work needed everywhere else to achieve this goal.

But if the desired end result is "faster performances on a computer from the 2020s", then even if you succeed, it's not going to have a noticeable effect.

Blzut3 · June 9

@Gez I believe you're still missing the actual question the OP asked. The question was if a column major frame buffer should be used elsewhere (i.e. for rendering) SaladBadger and Xaser answered it correctly. The last sentence, which you seem to be focused on, was actually asking if the already present micro-optimization should be removed.

dpJudas · June 9

I think it is worth pointing out that rendering everything as spans is only an optimization on modern computers. The original engine did not do this because it violates the principle of constant-Z: to do the perspective division only once per wall column. The reason it is faster on newer computers is simply that the CPU is now so much faster at doing multiply and divisions that the cache hit is greater than the extra math per pixel.

If you simply transform the frame buffer 90 degrees you don't actually solve anything, because while your walls are now perfectly aligned with the cache lines the flats are now no longer aligned. If you look closely at the Raisin performance data you'll see a basic transform makes the scenes dominated by wall drawing faster, while being slower if the scene is dominated by flats. To truly gain the speed improvement consistently on a modern CPU you have to get rid of all column drawers and convert everything to use spans.

Sinshu · June 9

Although it has not been experimentally confirmed, I think simply rotating the framebuffer by 90 degrees would still have an effect. If flat fills the screen, in the worst case, flat will only cover the entire screen. On the other hand, everything else, such as walls, two-sided normal textures, and sprites, could be overdrawn multiple times after filling the screen. Therefore, even a simple 90-degree rotation is likely to be effective, especially in modern WADs that use many two-sided normal textures and sprites.

Sonim · June 9

5 hours ago, Blzut3 said:

Given what Gez said I think he (and Kalensar) thinks you were suggesting the whole screen melt be removed from Chocolate Doom. I'll be honest I thought that at first glance too and similarly was confused as to what you were trying to accomplish by asking.

I edited the last part of my question, as it was doing more harm than anything.

Sonim · June 9

1 hour ago, Blzut3 said:

The last sentence, which you seem to be focused on, was actually asking if the already present micro-optimization should be removed.

Exactly, given this optimization was done with the 486s in mind, I was wondering if this row-major-to-column-major transform trick makes sense in a modern computer. The way I phrased it initially made it look like I was suggesting to remove the screen melt altogether, but I was suggesting to remove just the `wipe_shittyColMajorXform` function specifically.

Edited June 9 by Sonim

dpJudas · June 9

5 hours ago, Sinshu said:

Although it has not been experimentally confirmed, I think simply rotating the framebuffer by 90 degrees would still have an effect. If flat fills the screen, in the worst case, flat will only cover the entire screen. On the other hand, everything else, such as walls, two-sided normal textures, and sprites, could be overdrawn multiple times after filling the screen. Therefore, even a simple 90-degree rotation is likely to be effective, especially in modern WADs that use many two-sided normal textures and sprites.

That's why I said it depends on the scene. If what you see is primarily spans (flats) then the rotate will be slower, but if it is mostly columns (walls, sprites) then it will be faster. The biggest issue with the rotate solution is actually that you really want to avoid having some frames be expensive while others being really cheap. That's why Raisin eventually switched over to rendering everything horizontally.

Raisin kept transforming everything by 90 degrees, but that's actually more because the entire engine "thinks" in columns. The clipper clips in columns and even flats are initially columns in the engine. It isn't really for performance concerns that you want to do this. It is just much easier if you can get the final transform for free via the GPU.

Blzut3 · June 10

While it's definitely true that it depends on the scene (hence why I assumed it would be a wash originally), I would argue that Rum & Raisins data suggests that the column major frame buffer wins more than it loses even with the flats still being rendered row major. Of course that data was for one demo, so really to draw a conclusion it would probably make sense to run a large number of demos. I believe I heard a couple of ports did switch to a column major buffer after that, so not sure if anyone has collected more data.

Either way though for FastDoom or similar DOS ports it's still irrelevant since as I said those are limited by the layout of the hardware frame buffer (at least when talking about standard 320x200@8bpp modes).

Frenkel · June 11

On 6/9/2024 at 10:34 AM, Gez said:

Now if you want to do things like running Doom on a 286 or even a 8088, why not, then perhaps you will need to find ways to optimize this effect, or remove it altogether; but that's going to be a drop in the bucket compared to the work needed everywhere else to achieve this goal.

In Doom8088 the screen melt is done the same way as in nRF52840 Doom and GBA Doom.

1. Render first frame (we are melting to) into back buffer. (Basically render as normal)

2. Front buffer contains last frame as usual, (Menu, intermission, whatever)

3. Then do the melt by copying columns down in-place in the front buffer and filling from the backbuffer in a loop. We don't do a page-flip while melting.

This needs no extra buffers

quote source

wipe_shittyColMajorXform() isn't needed.

Sign In

Code question about Screen Melt and Row-Major Rendering

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in