Rendering Performance Guidelines

Overview

This article gives performance guidelines related to materials, shaders and rendering performance in general.

Measuring Performance

A common measure for performance in computer games is Frames Per Second (FPS). Although it gives a good overview of the overall performance, it is not suitable for more fine-grained performance analysis or expressing performance differences. The reason for this is that FPS is defined as 1/frame time and is hence a non-linear measure. An increase of 2 FPS for example when the game is running at 20 FPS, gives a profitable gain of 5 ms, while the same 2 FPS improvement on a game running at 60 FPS, will just result in a gain of 0.5 ms.

A more useful measure is the frame time. It refers to the time that each frame takes from the beginning to the end and is usually expressed in milliseconds (ms). By setting r_DisplayInfo to 2 instead of 1, it is possible to see the frame time instead of the FPS.

Useful numbers:

  • 16 ms = 60 FPS
  • 33 ms = 30 FPS (target for CryENGINE)
  • 40 ms = 25 FPS
  • 50 ms = 20 FPS

Performance can heavily depend on the execution environment, so it is important to use similar conditions when comparing performance numbers. The performance on systems with different hardware for example is likely to vary a lot. The GPU time is also very dependent from the screen resolution (higher resolution results usually in slower performance) or whether anti-aliasing is used or not.

If your team is targeting a game with 30 fps, each frame may not take more than 33 milliseconds to execute. Each processing step of the engine will add to the frame time. In practical terms, this means if you spend 5 ms on the Zpass, 6 ms on the general rendering pass, 10 ms on shadows (taking into account just these features), it all adds up in the end to about 21 ms which means your maximum framerate on the GPU side would theoretically be ~47 fps and would never go beyond that.

In the end, every nanosecond matters for performance, especially on consoles. So everyone including the artists and designers needs to strive for saving as much processing time as possible when creating and placing assets.

Drawcalls

General Drawcalls

  • Every object with a different material has a separate drawcall. It's possible on programming side to merge cases where a material is shared, but always keep this in mind when creating art assets.
  • This also means that every-sub material is a separate drawcall.
  • Each drawcall means setting material data and some other extra work on the CPU side plus fillrate costs on the GPU side (which varies depending on screen area occupied by the drawcall).
  • Drawcall count can and will be affected depending on a certain number of conditions:
    • By default, opaque geometry has at least 2 drawcalls (1 for Zpass, 1 for General rendering).
    • Shadows are extra drawcalls, as are detail passes and some material layers as well.
  • General guidelines:
    • Strive to maintain acceptable drawcall count - around 2000 drawcalls - on design and art side at all times per-frame.
    • Try packing multiple object textures into a single texture and material whenever it makes sense. For example: different "publicity" placards could use the exact same material and texture (with different uv's mapping).

With the recent addition of deferred rendering, you should strive to do more work in a deferred fashion. For example, caustics rendering used to be a separate drawcall. With deferred, they are now done in post-processing in a single "full-screen" drawcall (with proper culling). A "deferred drawcall" on the CPU side is much cheaper than a regular forward drawcall, as you don't have to set material parameters. The main cost is assigned to GPU rendering only, which varies depending on the screen area used.

Shadow Drawcalls

CRYENGINE has the ability to automatically optimize shadow drawcalls through the r_MergeShadowDrawcalls CVar (enabled by default). This function checks the asset for similarities in materials and merges as much as possible into a single drawcall. This means it can actually sometimes be cheaper to leave shadow casting enabled in the sub-material than it is to disable it, because disabling it might cause extra passes to be required. The cheapest possible way to render shadows is to create a dedicated "shadow proxy" mesh which mimics the shape of the overall rendermesh and is the only object in the asset which casts shadows, costing a single drawcall.

Drawcall Costs on Consoles

  • Designers and artists must keep this to 2000 maximum and also make aggressive use of LOD's.
  • As a rough cost estimation "tool" that can help designers/artists estimate performance on the PS3:
    • If you want to hit a target of maximum 33 ms ( 30 fps ), a drawcall which is vertex shader bound (Zpass/shadows, part of general pass) on the PS3 can cost about 10 microseconds on GPU (more or less - depending on the amount of vertices to process), so this means you can estimate approximately 2k * 10us = ~20 ms.
    • So for 2k drawcalls you can expect it to cost a minimum of around 20 ms, which means more than half of the frame budget is spent processing vertices. Shading, shadow masks, and much more still needs to be processed. It all adds up very quickly.

As an example, if a scene is uses 4k drawcalls on some areas, you can already estimate 4k * 10 us = ~40 ms which is already beyond budget, and this is just for vertex processing.

So the bottom line is:

  • There are still optimizations that you will do to help on vertex processing, but you cannot bend hardware limits.
  • 2000 drawcalls is the limit. Although it's not a rigid number, and can go slightly above on cases where it really matters and makes a visual difference, 3000, 4000, and above are not acceptable numbers.
  • Make aggressive usage of LOD's - the fewer vertices the PS3 has to process the better, as it will make drawcalls cheaper.
  • Remove as much as possible things that can't be seen (terrain below some geometry, avoid overdraw, unnecessary amount of vertices, etc).
  • Use r_measureoverdraw 3 to display where the most expensive vertex shaders are being used - should be helpful to track some cases where expensive stuff was used by mistake, like bending enabled on rocks, etc.

General Guidelines

Using the Right Shader Option

Using the right shader option for your material type (eg: Vegetation should be used for vegetation) will ensure the best performance possible.

Here are specific performance cases examples:

  • Use the Grass shader generation option when your material is supposed to be used with grass rendering. The grass option is a very cheap rendering approach, with all shading done per-vertex and with the least texture reads possible. It is very important to use this correctly, particularly on consoles.
  • Use the Decal shader generation option when your material is used as a decal. This is very important to ensure proper rendering, for example you'll avoid z-fighting issues on some hardware and rendering conditions. Also, deferred rendering will only render normals for opaque geometry/terrain layers/decals. If not enabled, you'll have to read the normal map again during rendering and compute fog.

Avoiding the Alpha Testing and Alpha Blending Options

Alpha testing is quite expensive on older PC hardware and in consoles. Using alpha testing forces you to read the diffuse texture to get the alpha channel value (and reject that sample), which forces you to skip specific fully opaque rendering optimizations.

This is especially problematic for shadow depth map rendering and Zpass since instead of doing fully opaque rendering with no texture reads, you must do it if alpha test is enabled.

On the other hand, alpha blending forces you to skip specific opaque rendering optimizations and additionally have to blend the current results with framebuffer, which on some hardware (PS3) is a bit expensive. Also some rendering techniques will not work in your asset perfectly (eg: fog will have to be computed per-vertex instead of per-pixel).

In general, try to avoid such cases whenever possible on design/art side.

Avoiding Multiple Simultaneous Post Effects

Each post-effect adds a relatively big rendering cost. Try to minimize this by not enabling too many at the same time and using strategies like different timing for each post effect and reducing amount of time each post effect visible on screen.

If you really need a lot of them enabled at same time, try to merge them into a single unique post process.

Misc Specific Guidelines

Water/Rivers Volumes

  • When using water volumes and underwater fog that is barely visible (for example in low-depth water), set the density to 0. This will skip fog processing (helps a bit on consoles).
  • Place water volumes only in areas that are visible. Instead of a huge water volume covering a big area, use more smaller ones covering just the required/visible area.