2023-08-28 05:57:34 +00:00
|
|
|
|
ANV
|
|
|
|
|
===
|
|
|
|
|
|
|
|
|
|
Debugging
|
|
|
|
|
---------
|
|
|
|
|
|
|
|
|
|
Here are a few environment variable debug environment variables
|
|
|
|
|
specific to ANV:
|
|
|
|
|
|
2023-11-11 18:37:37 +00:00
|
|
|
|
:envvar:`ANV_ENABLE_GENERATED_INDIRECT_DRAWS`
|
|
|
|
|
If defined to ``0`` or ``false``, this will disable the generated
|
|
|
|
|
indirect draw optimization in Anv. This will only affect Gfx11+.
|
2023-08-28 05:57:34 +00:00
|
|
|
|
:envvar:`ANV_ENABLE_PIPELINE_CACHE`
|
|
|
|
|
If defined to ``0`` or ``false``, this will disable pipeline
|
|
|
|
|
caching, forcing ANV to reparse and recompile any VkShaderModule
|
|
|
|
|
(SPIRV) it is given.
|
|
|
|
|
:envvar:`ANV_DISABLE_SECONDARY_CMD_BUFFER_CALLS`
|
|
|
|
|
If defined to ``1`` or ``true``, this will prevent usage of self
|
|
|
|
|
modifying command buffers to implement ``vkCmdExecuteCommands``. As
|
2023-11-11 18:37:37 +00:00
|
|
|
|
a result of this, it will also disable :ext:`VK_KHR_performance_query`.
|
2023-08-28 05:57:34 +00:00
|
|
|
|
:envvar:`ANV_ALWAYS_BINDLESS`
|
|
|
|
|
If defined to ``1`` or ``true``, this forces all descriptor sets to
|
|
|
|
|
use the internal `Bindless model`_.
|
|
|
|
|
:envvar:`ANV_QUEUE_THREAD_DISABLE`
|
|
|
|
|
If defined to ``1`` or ``true``, this disables support for timeline
|
|
|
|
|
semaphores.
|
|
|
|
|
:envvar:`ANV_USERSPACE_RELOCS`
|
|
|
|
|
If defined to ``1`` or ``true``, this forces ANV to always do
|
|
|
|
|
kernel relocations in command buffers. This should only have an
|
|
|
|
|
effect on hardware that doesn't support soft-pinning (Ivybridge,
|
|
|
|
|
Haswell, Cherryview).
|
|
|
|
|
:envvar:`ANV_PRIMITIVE_REPLICATION_MAX_VIEWS`
|
|
|
|
|
Specifies up to how many view shaders can be lowered to handle
|
2023-11-11 18:37:37 +00:00
|
|
|
|
:ext:`VK_KHR_multiview`. Beyond this number, multiview is implemented
|
2023-08-28 05:57:34 +00:00
|
|
|
|
using instanced rendering. If unspecified, the value default to
|
|
|
|
|
``2``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Experimental features
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
:envvar:`ANV_EXPERIMENTAL_NV_MESH_SHADER`
|
|
|
|
|
If defined to ``1`` or ``true``, this advertise support for
|
2023-11-11 18:37:37 +00:00
|
|
|
|
:ext:`VK_NV_mesh_shader` extension for platforms that have hardware
|
2023-08-28 05:57:34 +00:00
|
|
|
|
support for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _`Bindless model`:
|
|
|
|
|
|
|
|
|
|
Binding Model
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Here is the ANV bindless binding model that was implemented for the
|
|
|
|
|
descriptor indexing feature of Vulkan 1.2 :
|
|
|
|
|
|
|
|
|
|
.. graphviz::
|
|
|
|
|
|
|
|
|
|
digraph G {
|
|
|
|
|
fontcolor="black";
|
|
|
|
|
compound=true;
|
|
|
|
|
|
|
|
|
|
subgraph cluster_1 {
|
|
|
|
|
label = "Binding Table (HW)";
|
|
|
|
|
|
|
|
|
|
bgcolor="cornflowerblue";
|
|
|
|
|
|
|
|
|
|
node [ style=filled,shape="record",fillcolor="white",
|
|
|
|
|
label="RT0" ] n0;
|
|
|
|
|
node [ label="RT1" ] n1;
|
|
|
|
|
node [ label="dynbuf0"] n2;
|
|
|
|
|
node [ label="set0" ] n3;
|
|
|
|
|
node [ label="set1" ] n4;
|
|
|
|
|
node [ label="set2" ] n5;
|
|
|
|
|
|
|
|
|
|
n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis];
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_2 {
|
|
|
|
|
label = "Descriptor Set 0";
|
|
|
|
|
|
|
|
|
|
bgcolor="burlywood3";
|
|
|
|
|
fixedsize = true;
|
|
|
|
|
|
|
|
|
|
node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4,
|
|
|
|
|
label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor" ] n8;
|
|
|
|
|
node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9;
|
|
|
|
|
node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor" ] n10;
|
|
|
|
|
node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor" ] n11;
|
|
|
|
|
|
|
|
|
|
n8 -> n9 -> n10 -> n11 [style=invis];
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_5 {
|
|
|
|
|
label = "Vulkan Objects"
|
|
|
|
|
|
|
|
|
|
fontcolor="black";
|
|
|
|
|
bgcolor="darkolivegreen4";
|
|
|
|
|
|
|
|
|
|
subgraph cluster_6 {
|
|
|
|
|
label = "VkImageView";
|
|
|
|
|
|
|
|
|
|
bgcolor=darkolivegreen3;
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="surface_state" ] n12;
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_7 {
|
|
|
|
|
label = "VkSampler";
|
|
|
|
|
|
|
|
|
|
bgcolor=darkolivegreen3;
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="sample_state" ] n13;
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_8 {
|
|
|
|
|
label = "VkImageView";
|
|
|
|
|
bgcolor="darkolivegreen3";
|
|
|
|
|
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="surface_state" ] n14;
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_9 {
|
|
|
|
|
label = "VkBuffer";
|
|
|
|
|
bgcolor=darkolivegreen3;
|
|
|
|
|
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="address" ] n15;
|
|
|
|
|
}
|
|
|
|
|
subgraph cluster_10 {
|
|
|
|
|
label = "VkBufferView";
|
|
|
|
|
|
|
|
|
|
bgcolor=darkolivegreen3;
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="surface_state" ] n16;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
n12 -> n13 -> n14 -> n15 -> n16 [style=invis];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
subgraph cluster_11 {
|
|
|
|
|
subgraph cluster_12 {
|
|
|
|
|
label = "CommandBuffer state stream";
|
|
|
|
|
|
|
|
|
|
bgcolor="gold3";
|
|
|
|
|
node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
|
|
|
|
|
label="surface_state" ] n17;
|
|
|
|
|
node [ label="surface_state" ] n18;
|
|
|
|
|
node [ label="surface_state" ] n19;
|
|
|
|
|
|
|
|
|
|
n17 -> n18 -> n19 [style=invis];
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
n3 -> n8 [lhead=cluster_2];
|
|
|
|
|
|
|
|
|
|
n8 -> n12;
|
|
|
|
|
n9 -> n13;
|
|
|
|
|
n9 -> n14;
|
|
|
|
|
n10 -> n15;
|
|
|
|
|
n11 -> n16;
|
|
|
|
|
|
|
|
|
|
n0 -> n17;
|
|
|
|
|
n1 -> n18;
|
|
|
|
|
n2 -> n19;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The HW binding table is generated when the draw or dispatch commands
|
|
|
|
|
are emitted. Here are the types of entries one can find in the binding
|
|
|
|
|
table :
|
|
|
|
|
|
|
|
|
|
- The currently bound descriptor sets, one entry per descriptor set
|
|
|
|
|
(our limit is 8).
|
|
|
|
|
|
|
|
|
|
- For dynamic buffers, one entry per dynamic buffer.
|
|
|
|
|
|
|
|
|
|
- For draw commands, render target entries if needed.
|
|
|
|
|
|
|
|
|
|
The entries of the HW binding table for descriptor sets are
|
|
|
|
|
RENDER_SURFACE_STATE similar to what you would have for a normal
|
|
|
|
|
uniform buffer. The shader will emit reads this buffer first to get
|
|
|
|
|
the information it needs to access a surface/sampler/etc... and then
|
|
|
|
|
emits the appropriate message using the information gathered from the
|
|
|
|
|
descriptor set buffer.
|
|
|
|
|
|
|
|
|
|
Each binding type entry gets an associated structure in memory
|
|
|
|
|
(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``,
|
|
|
|
|
``anv_address_range_descriptor``, ``anv_storage_image_descriptor``).
|
|
|
|
|
This is the information read by the shader.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _`Descriptor Set Memory Layout`:
|
|
|
|
|
|
|
|
|
|
Descriptor Set Memory Layout
|
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
Here is a representation of how the descriptor set bindings, with each
|
|
|
|
|
elements in each binding is mapped to a the descriptor set memory :
|
|
|
|
|
|
|
|
|
|
.. graphviz::
|
|
|
|
|
|
|
|
|
|
digraph structs {
|
|
|
|
|
node [shape=record];
|
|
|
|
|
rankdir=LR;
|
|
|
|
|
|
|
|
|
|
struct1 [label="Descriptor Set | \
|
|
|
|
|
<b0> binding 0\n STORAGE_IMAGE \n (array_length=3) | \
|
|
|
|
|
<b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) | \
|
|
|
|
|
<b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) | \
|
|
|
|
|
<b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"];
|
|
|
|
|
struct2 [label="Descriptor Set Memory | \
|
|
|
|
|
<b0e0> anv_storage_image_descriptor|\
|
|
|
|
|
<b0e1> anv_storage_image_descriptor|\
|
|
|
|
|
<b0e2> anv_storage_image_descriptor|\
|
|
|
|
|
<b1e0> anv_sampled_image_descriptor|\
|
|
|
|
|
<b1e1> anv_sampled_image_descriptor|\
|
|
|
|
|
<b2e0> anv_address_range_descriptor|\
|
|
|
|
|
<b3e0> anv_storage_image_descriptor"];
|
|
|
|
|
|
|
|
|
|
struct1:b0 -> struct2:b0e0;
|
|
|
|
|
struct1:b0 -> struct2:b0e1;
|
|
|
|
|
struct1:b0 -> struct2:b0e2;
|
|
|
|
|
struct1:b1 -> struct2:b1e0;
|
|
|
|
|
struct1:b1 -> struct2:b1e1;
|
|
|
|
|
struct1:b2 -> struct2:b2e0;
|
|
|
|
|
struct1:b3 -> struct2:b3e0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
Each Binding in the descriptor set is allocated an array of
|
|
|
|
|
``anv_*_descriptor`` data structure. The type of ``anv_*_descriptor``
|
|
|
|
|
used for a binding is selected based on the ``VkDescriptorType`` of
|
|
|
|
|
the bindings.
|
|
|
|
|
|
|
|
|
|
The value of ``anv_descriptor_set_binding_layout::descriptor_offset``
|
|
|
|
|
is a byte offset from the descriptor set memory to the associated
|
|
|
|
|
binding. ``anv_descriptor_set_binding_layout::array_size`` is the
|
|
|
|
|
number of ``anv_*_descriptor`` elements in the descriptor set memory
|
|
|
|
|
from that offset for the binding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pipeline state emission
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Vulkan initially started by baking as much state as possible in
|
|
|
|
|
pipelines. But extension after extension, more and more state has
|
|
|
|
|
become potentially dynamic.
|
|
|
|
|
|
|
|
|
|
ANV tries to limit the amount of time an instruction has to be packed
|
|
|
|
|
to reprogram part of the 3D pipeline state. The packing is happening
|
|
|
|
|
in 2 places :
|
|
|
|
|
|
|
|
|
|
- ``genX_pipeline.c`` where the non dynamic state is emitted in the
|
|
|
|
|
pipeline batch. This batch is copied into the command buffer batch
|
|
|
|
|
when calling ``vkCmdBindPipeline()``
|
|
|
|
|
|
|
|
|
|
- ``genX_cmd_buffer.c`` in the ``cmd_buffer_flush_state`` function
|
|
|
|
|
which ends up calling into ``gfx8_cmd_buffer.c`` &
|
|
|
|
|
``gfx7_cmd_buffer.c``
|
|
|
|
|
|
|
|
|
|
The rule to know where to emit an instruction programming the 3D
|
|
|
|
|
pipeline is as follow :
|
|
|
|
|
|
|
|
|
|
- If any field of the instruction can be made dynamic, it should be
|
|
|
|
|
emitted in ``genX_cmd_buffer.c``, ``gfx8_cmd_buffer.c`` or
|
|
|
|
|
``gfx7_cmd_buffer.c``
|
|
|
|
|
|
|
|
|
|
- Otherwise, the instruction can be emitted in ``genX_pipeline.c``
|
|
|
|
|
|
|
|
|
|
When a piece of state programming is dynamic, it should have a
|
|
|
|
|
corresponding field in ``anv_dynamic_state`` and the
|
|
|
|
|
``anv_dynamic_state_copy()`` function should be updated to ensure we
|
|
|
|
|
minimize the amount of time an instruction should be emitted. Each
|
|
|
|
|
instruction should have a associated ``ANV_CMD_DIRTY_*`` mask so that
|
|
|
|
|
the dynamic emission code can tell when to re-emit an instruction.
|
|
|
|
|
|
|
|
|
|
An instruction can also be re-emitted when a pipeline changes by
|
|
|
|
|
checking for ``ANV_CMD_DIRTY_PIPELINE``. It should only do so if it
|
|
|
|
|
requires to know some value that is coming from the
|
|
|
|
|
``anv_graphics_pipeline`` object that is not available from
|
|
|
|
|
``anv_dynamic_state``.
|
2023-11-11 18:37:37 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Generated indirect draws optimization
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
Indirect draws have traditionally been implemented on Intel HW by
|
|
|
|
|
loading the indirect parameters from memory into HW registers using
|
|
|
|
|
the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before
|
|
|
|
|
dispatching a draw call to the 3D pipeline.
|
|
|
|
|
|
|
|
|
|
On recent products, it was found that the command streamer is showing
|
|
|
|
|
as performance bottleneck, because it cannot dispatch draw calls fast
|
|
|
|
|
enough to keep the 3D pipeline busy.
|
|
|
|
|
|
|
|
|
|
The solution to this problem is to change the way we deal with
|
|
|
|
|
indirect draws. Instead of loading HW registers with values using the
|
|
|
|
|
command streamer, we generate entire set of ``3DPRIMITIVE``
|
|
|
|
|
instructions using a shader. The generated instructions contain the
|
|
|
|
|
entire draw call parameters. This way the command streamer executes
|
|
|
|
|
only ``3DPRIMITIVE`` instructions and doesn´t do any data loading from
|
|
|
|
|
memory or touch HW registers, feeding the 3D pipeline as fast as it
|
|
|
|
|
can.
|
|
|
|
|
|
|
|
|
|
In Anv this implemented by using a side batch buffer. When Anv
|
|
|
|
|
encounters the first indirect draws, it generates a jump into the side
|
|
|
|
|
batch, the side batch contains a draw call using a generation shader
|
|
|
|
|
for each indirect draw. We keep adding on more generation draws into
|
|
|
|
|
the batch until we have to stop due to command buffer end, secondary
|
|
|
|
|
command buffer calls or a barrier containing the access flag
|
|
|
|
|
``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``. The side batch buffer jump
|
|
|
|
|
back right after the instruction where it was called.
|