sync with OpenBSD -current
This commit is contained in:
parent
5d45cd7ee8
commit
155eb8555e
5506 changed files with 1786257 additions and 1416034 deletions
|
@ -27,7 +27,8 @@ Mali G57 Valhall (v9) 3.1 3.1
|
|||
Other Midgard and Bifrost chips (T604, G71) are not yet supported.
|
||||
|
||||
Older Mali chips based on the Utgard architecture (Mali 400, Mali 450) are
|
||||
supported in the Lima driver, not Panfrost. Lima is also available in Mesa.
|
||||
supported in the :doc:`Lima <lima>` driver, not Panfrost. Lima is also
|
||||
available in Mesa.
|
||||
|
||||
Other graphics APIs (Vulkan, OpenCL) are not supported at this time.
|
||||
|
||||
|
@ -115,13 +116,25 @@ Additional GPU IDs are enumerated in the ``panfrost_model_list`` list in
|
|||
``src/panfrost/lib/pan_props.c``.
|
||||
|
||||
As an example: assuming Mesa is installed to a local path ``~/lib`` and Mesa's
|
||||
build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as::
|
||||
build directory is ``~/mesa/build``, a shader can be compiled for Mali-G52 as:
|
||||
|
||||
~/shader-db$ BIFROST_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 ./run shaders/glmark/1-1.shader_test
|
||||
.. code-block:: console
|
||||
|
||||
The same shader can be compiled for Mali-T720 as::
|
||||
~/shader-db$ BIFROST_MESA_DEBUG=shaders \
|
||||
LIBGL_DRIVERS_PATH=~/lib/dri/ \
|
||||
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
|
||||
PAN_GPU_ID=7212 \
|
||||
./run shaders/glmark/1-1.shader_test
|
||||
|
||||
~/shader-db$ MIDGARD_MESA_DEBUG=shaders LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=720 ./run shaders/glmark/1-1.shader_test
|
||||
The same shader can be compiled for Mali-T720 as:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
~/shader-db$ MIDGARD_MESA_DEBUG=shaders \
|
||||
LIBGL_DRIVERS_PATH=~/lib/dri/ \
|
||||
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
|
||||
PAN_GPU_ID=720 \
|
||||
./run shaders/glmark/1-1.shader_test
|
||||
|
||||
These examples set the compilers' ``shaders`` debug flags to dump the optimized
|
||||
NIR, backend IR after instruction selection, backend IR after register
|
||||
|
@ -133,9 +146,18 @@ pretty-printing GPU data structures and disassembling all shaders
|
|||
(``PAN_MESA_DEBUG=dump``). The ``EGL_PLATFORM=surfaceless`` environment variable
|
||||
and various flags to dEQP mimic the surfaceless environment that our
|
||||
continuous integration (CI) uses. This eliminates window system dependencies,
|
||||
although it requires a specially built CTS::
|
||||
although it requires a specially built CTS:
|
||||
|
||||
~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless ./glcts --deqp-surface-type=pbuffer --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256 -n dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
|
||||
.. code-block:: console
|
||||
|
||||
~/VK-GL-CTS/build/external/openglcts/modules$ PAN_MESA_DEBUG=trace,dump \
|
||||
LIBGL_DRIVERS_PATH=~/lib/dri/ \
|
||||
LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so \
|
||||
PAN_GPU_ID=7212 EGL_PLATFORM=surfaceless \
|
||||
./glcts --deqp-surface-type=pbuffer \
|
||||
--deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 \
|
||||
--deqp-surface-height=256 -n \
|
||||
dEQP-GLES31.functional.shaders.builtin_functions.common.abs.float_highp_compute
|
||||
|
||||
U-interleaved tiling
|
||||
---------------------
|
||||
|
@ -257,29 +279,30 @@ following the exact same algorithm that the hardware uses, then multiply it
|
|||
by the GL-level divisor to get the hardware-level divisor. This case is
|
||||
further divided into two more cases. If the hardware-level divisor is a
|
||||
power of two, then we just need to shift. The shift amount is specified by
|
||||
the shift field, so that the hardware-level divisor is just 2^shift.
|
||||
the shift field, so that the hardware-level divisor is just
|
||||
:math:`2^\text{shift}`.
|
||||
|
||||
If it isn't a power of two, then we have to divide by an arbitrary integer.
|
||||
For that, we use the well-known technique of multiplying by an approximation
|
||||
of the inverse. The driver must compute the magic multiplier and shift
|
||||
amount, and then the hardware does the multiplication and shift. The
|
||||
hardware and driver also use the "round-down" optimization as described in
|
||||
http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
|
||||
The hardware further assumes the multiplier is between 2^31 and 2^32, so the
|
||||
high bit is implicitly set to 1 even though it is set to 0 by the driver --
|
||||
presumably this simplifies the hardware multiplier a little. The hardware
|
||||
first multiplies linear_id by the multiplier and takes the high 32 bits,
|
||||
then applies the round-down correction if extra_flags = 1, then finally
|
||||
shifts right by the shift field.
|
||||
https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
|
||||
The hardware further assumes the multiplier is between :math:`2^{31}` and
|
||||
:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
|
||||
to 0 by the driver -- presumably this simplifies the hardware multiplier a
|
||||
little. The hardware first multiplies linear_id by the multiplier and
|
||||
takes the high 32 bits, then applies the round-down correction if
|
||||
extra_flags = 1, then finally shifts right by the shift field.
|
||||
|
||||
There are some differences between ridiculousfish's algorithm and the Mali
|
||||
hardware algorithm, which means that the reference code from ridiculousfish
|
||||
doesn't always produce the right constants. Mali does not use the pre-shift
|
||||
optimization, since that would make a hardware implementation slower (it
|
||||
would have to always do the pre-shift, multiply, and post-shift operations).
|
||||
It also forces the multiplier to be at least 2^31, which means that the
|
||||
exponent is entirely fixed, so there is no trial-and-error. Altogether,
|
||||
given the divisor d, the algorithm the driver must follow is:
|
||||
It also forces the multiplier to be at least :math:`2^{31}`, which means
|
||||
that the exponent is entirely fixed, so there is no trial-and-error.
|
||||
Altogether, given the divisor d, the algorithm the driver must follow is:
|
||||
|
||||
1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
|
||||
2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue