122 lines
4.5 KiB
ReStructuredText
122 lines
4.5 KiB
ReStructuredText
Low Resolution Z Buffer
|
|
=======================
|
|
|
|
This doc is based on a6xx HW reverse engineering, a5xx should be similar to
|
|
a6xx before gen3.
|
|
|
|
Low Resolution Z buffer is very similar to a depth prepass that helps
|
|
the HW to avoid executing the fragment shader on those fragments that will
|
|
be subsequently discarded by the depth test afterwards.
|
|
|
|
The interesting part of this feature is that it allows applications
|
|
to submit the vertices in any order.
|
|
|
|
Citing official Adreno documentation:
|
|
|
|
::
|
|
|
|
[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
|
|
depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
|
|
and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
|
|
is then used during the rendering pass to reject pixels efficiently before testing
|
|
against the full resolution Z-buffer.
|
|
|
|
TODO: a7xx
|
|
|
|
Limitations
|
|
-----------
|
|
|
|
There are two main limitations of LRZ:
|
|
|
|
- Since LRZ is an early depth test, such test cannot be used when late-z is required;
|
|
- LRZ buffer could be formed only in one direction, changing depth comparison directions
|
|
without disabling LRZ would lead to a malformed LRZ buffer.
|
|
|
|
Pre-a650 (before gen3)
|
|
----------------------
|
|
|
|
The direction is fully tracked on CPU. In render pass LRZ starts with
|
|
unknown direction, the direction is set first time when depth write occurs
|
|
and if it does change afterwards then the direction becomes invalid and LRZ is
|
|
disabled for the rest of the render pass.
|
|
|
|
Since the direction is not tracked by the GPU, it's impossible to know whether
|
|
LRZ is enabled during construction of secondary command buffers.
|
|
|
|
For the same reason, it's impossible to reuse LRZ between render passes.
|
|
|
|
A650+ (gen3+)
|
|
-------------
|
|
|
|
Now LRZ direction can be tracked on GPU. There are two parts:
|
|
|
|
- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
|
|
- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
|
|
|
|
The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
|
|
is used, its direction is compared to the previously known direction
|
|
and direction byte is set to disabled when directions are incompatible.
|
|
|
|
Additionally, to reuse LRZ between render passes, ``GRAS_LRZ_CNTL`` checks
|
|
if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
|
|
stored in the buffer. If not, LRZ is disabled. This is necessary
|
|
because depth buffer may have several layers and mip levels, while the
|
|
LRZ buffer represents only a single layer + mip level.
|
|
|
|
LRZ Fast-Clear
|
|
--------------
|
|
|
|
The LRZ fast-clear buffer is initialized to zeroes and read/written
|
|
when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
|
|
``0`` means block has original depth clear value, and ``1`` means that the
|
|
corresponding block in LRZ has been modified.
|
|
|
|
LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
|
|
written the LRZ block which corresponds to a single fast-clear bit is cleared:
|
|
|
|
- To ``0.0`` if depth comparison is ``GREATER``
|
|
- To ``1.0`` if depth comparison is ``LESS``
|
|
|
|
This way it's always valid to fast-clear.
|
|
|
|
LRZ Precision
|
|
-------------
|
|
|
|
LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
|
|
not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
|
|
This especially raises questions in context of fast-clear, if fast-clear
|
|
uses a value which cannot be precisely represented by LRZ - we wouldn't
|
|
be able to round it in the correct direction since direction is tracked
|
|
on GPU.
|
|
|
|
However, it seems that depth comparisons with LRZ values have some "slack"
|
|
and nothing special should be done for such depth clear values.
|
|
|
|
How it was tested:
|
|
|
|
- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
|
|
|
|
- LRZ buffer contains all zeroes.
|
|
|
|
- Do draws and check whether all samples are passing:
|
|
|
|
- ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
|
|
- ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
|
|
- ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
|
|
- ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
|
|
- ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
|
|
|
|
In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
|
|
|
|
LRZ Caches
|
|
----------
|
|
|
|
``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
|
|
|
|
- Cache for fast-clear buffer;
|
|
- Cache for direction byte + depth view params.
|
|
|
|
They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
|
|
the caches should be flushed with ``LRZ_FLUSH`` afterwards.
|
|
|
|
``GRAS_LRZ_CNTL`` reads from these caches.
|