sync with OpenBSD -current

This commit is contained in:
purplerain 2024-04-29 00:35:41 +00:00
parent 5d45cd7ee8
commit 155eb8555e
Signed by: purplerain
GPG key ID: F42C07F07E2E35B7
5506 changed files with 1786257 additions and 1416034 deletions

View file

@ -27,7 +27,8 @@ Mali G57 Valhall (v9) 3.1 3.1
Other Midgard and Bifrost chips (T604, G71) are not yet supported.
Older Mali chips based on the Utgard architecture (Mali 400, Mali 450) are
supported in the Lima driver, not Panfrost. Lima is also available in Mesa.
supported in the :doc:`Lima <lima>` driver, not Panfrost. Lima is also
available in Mesa.
Other graphics APIs (Vulkan, OpenCL) are not supported at this time.
@ -115,13 +116,25 @@ Additional GPU IDs are enumerated in the ``panfrost_model_list`` list in
``src/panfrost/lib/pan_props.c``.
As an example: assuming Mesa is installed to a local path ``~/lib`` and Mesa's
build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as::
build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as:
~/shader-db$ BIFROST_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 ./run shaders/glmark/1-1.shader_test
.. code-block:: console
The same shader can be compiled for Mali-T720 as::
~/shader-db$ BIFROST_MESA_DEBUG=shaders \
LIBGL_DRIVERS_PATH=~/lib/dri/ \
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
PAN_GPU_ID=7212 \
./run shaders/glmark/1-1.shader_test
~/shader-db$ MIDGARD_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=720 ./run shaders/glmark/1-1.shader_test
The same shader can be compiled for Mali-T720 as:
.. code-block:: console
~/shader-db$ MIDGARD_MESA_DEBUG=shaders \
LIBGL_DRIVERS_PATH=~/lib/dri/ \
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
PAN_GPU_ID=720 \
./run shaders/glmark/1-1.shader_test
These examples set the compilers' ``shaders`` debug flags to dump the optimized
NIR, backend IR after instruction selection, backend IR after register
@ -133,9 +146,18 @@ pretty-printing GPU data structures and disassembling all shaders
(``PAN_MESA_DEBUG=dump``). The ``EGL_PLATFORM=surfaceless`` environment variable
and various flags to dEQP mimic the surfaceless environment that our
continuous integration (CI) uses. This eliminates window system dependencies,
although it requires a specially built CTS::
although it requires a specially built CTS:
~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless ./glcts --deqp-surface-type=pbuffer --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256 -n dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
.. code-block:: console
~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump \
LIBGL_DRIVERS_PATH=~/lib/dri/ \
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless \
./glcts --deqp-surface-type=pbuffer \
--deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 \
--deqp-surface-height=256 -n \
dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
U-interleaved tiling
---------------------
@ -257,29 +279,30 @@ following the exact same algorithm that the hardware uses, then multiply it
by the GL-level divisor to get the hardware-level divisor. This case is
further divided into two more cases. If the hardware-level divisor is a
power of two, then we just need to shift. The shift amount is specified by
the shift field, so that the hardware-level divisor is just 2^shift.
the shift field, so that the hardware-level divisor is just
:math:`2^\text{shift}`.
If it isn't a power of two, then we have to divide by an arbitrary integer.
For that, we use the well-known technique of multiplying by an approximation
of the inverse. The driver must compute the magic multiplier and shift
amount, and then the hardware does the multiplication and shift. The
hardware and driver also use the "round-down" optimization as described in
http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
The hardware further assumes the multiplier is between 2^31 and 2^32, so the
high bit is implicitly set to 1 even though it is set to 0 by the driver --
presumably this simplifies the hardware multiplier a little. The hardware
first multiplies linear_id by the multiplier and takes the high 32 bits,
then applies the round-down correction if extra_flags = 1, then finally
shifts right by the shift field.
https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
The hardware further assumes the multiplier is between :math:`2^{31}` and
:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
to 0 by the driver -- presumably this simplifies the hardware multiplier a
little. The hardware first multiplies linear_id by the multiplier and
takes the high 32 bits, then applies the round-down correction if
extra_flags = 1, then finally shifts right by the shift field.
There are some differences between ridiculousfish's algorithm and the Mali
hardware algorithm, which means that the reference code from ridiculousfish
doesn't always produce the right constants. Mali does not use the pre-shift
optimization, since that would make a hardware implementation slower (it
would have to always do the pre-shift, multiply, and post-shift operations).
It also forces the multiplier to be at least 2^31, which means that the
exponent is entirely fixed, so there is no trial-and-error. Altogether,
given the divisor d, the algorithm the driver must follow is:
It also forces the multiplier to be at least :math:`2^{31}`, which means
that the exponent is entirely fixed, so there is no trial-and-error.
Altogether, given the divisor d, the algorithm the driver must follow is:
1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.