sync with OpenBSD -current

2024-04-29 00:35:41 +00:00 · 2024-04-29 00:35:41 +00:00 · 155eb8555e
commit 155eb8555e
parent 5d45cd7ee8
5506 changed files with 1786257 additions and 1416034 deletions
--- a/lib/mesa/docs/drivers/panfrost.rst
+++ b/lib/mesa/docs/drivers/panfrost.rst
@ -27,7 +27,8 @@ Mali G57   Valhall (v9) 3.1          3.1
 Other Midgard and Bifrost chips (T604, G71) are not yet supported.

 Older Mali chips based on the Utgard architecture (Mali 400, Mali 450) are
-supported in the Lima driver, not Panfrost. Lima is also available in Mesa.
+supported in the :doc:`Lima <lima>` driver, not Panfrost. Lima is also
+available in Mesa.

 Other graphics APIs (Vulkan, OpenCL) are not supported at this time.

@ -115,13 +116,25 @@ Additional GPU IDs are enumerated in the ``panfrost_model_list`` list in
 ``src/panfrost/lib/pan_props.c``.

 As an example: assuming Mesa is installed to a local path ``~/lib`` and Mesa's
-build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as::
+build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as:

-   ~/shader-db$ BIFROST_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 ./run shaders/glmark/1-1.shader_test
+.. code-block:: console

-The same shader can be compiled for Mali-T720 as::
+   ~/shader-db$ BIFROST_MESA_DEBUG=shaders \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=7212 \
+   ./run shaders/glmark/1-1.shader_test

-   ~/shader-db$ MIDGARD_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=720 ./run shaders/glmark/1-1.shader_test
+The same shader can be compiled for Mali-T720 as:
+
+.. code-block:: console
+
+   ~/shader-db$ MIDGARD_MESA_DEBUG=shaders \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=720 \
+   ./run shaders/glmark/1-1.shader_test

 These examples set the compilers' ``shaders`` debug flags to dump the optimized
 NIR, backend IR after instruction selection, backend IR after register
@ -133,9 +146,18 @@ pretty-printing GPU data structures and disassembling all shaders
 (``PAN_MESA_DEBUG=dump``). The ``EGL_PLATFORM=surfaceless`` environment variable
 and various flags to dEQP mimic the surfaceless environment that our
 continuous integration (CI) uses. This eliminates window system dependencies,
-although it requires a specially built CTS::
+although it requires a specially built CTS:

-   ~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless ./glcts --deqp-surface-type=pbuffer --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256 -n dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
+.. code-block:: console
+
+   ~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump \
+   LIBGL_DRIVERS_PATH=~/lib/dri/ \
+   LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
+   PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless \
+   ./glcts --deqp-surface-type=pbuffer \
+   --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 \
+   --deqp-surface-height=256 -n \
+   dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute

 U-interleaved tiling
 ---------------------
@ -257,29 +279,30 @@ following the exact same algorithm that the hardware uses, then multiply it
 by the GL-level divisor to get the hardware-level divisor. This case is
 further divided into two more cases. If the hardware-level divisor is a
 power of two, then we just need to shift. The shift amount is specified by
-the shift field, so that the hardware-level divisor is just 2^shift.
+the shift field, so that the hardware-level divisor is just
+:math:`2^\text{shift}`.

 If it isn't a power of two, then we have to divide by an arbitrary integer.
 For that, we use the well-known technique of multiplying by an approximation
 of the inverse. The driver must compute the magic multiplier and shift
 amount, and then the hardware does the multiplication and shift. The
 hardware and driver also use the "round-down" optimization as described in
-http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
-The hardware further assumes the multiplier is between 2^31 and 2^32, so the
-high bit is implicitly set to 1 even though it is set to 0 by the driver --
-presumably this simplifies the hardware multiplier a little. The hardware
-first multiplies linear_id by the multiplier and takes the high 32 bits,
-then applies the round-down correction if extra_flags = 1, then finally
-shifts right by the shift field.
+https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
+The hardware further assumes the multiplier is between :math:`2^{31}` and
+:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
+to 0 by the driver -- presumably this simplifies the hardware multiplier a
+little. The hardware first multiplies linear_id by the multiplier and
+takes the high 32 bits, then applies the round-down correction if
+extra_flags = 1, then finally shifts right by the shift field.

 There are some differences between ridiculousfish's algorithm and the Mali
 hardware algorithm, which means that the reference code from ridiculousfish
 doesn't always produce the right constants. Mali does not use the pre-shift
 optimization, since that would make a hardware implementation slower (it
 would have to always do the pre-shift, multiply, and post-shift operations).
-It also forces the multiplier to be at least 2^31, which means that the
-exponent is entirely fixed, so there is no trial-and-error. Altogether,
-given the divisor d, the algorithm the driver must follow is:
+It also forces the multiplier to be at least :math:`2^{31}`, which means
+that the exponent is entirely fixed, so there is no trial-and-error.
+Altogether, given the divisor d, the algorithm the driver must follow is:

 1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
 2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.