sync code with last improvements from OpenBSD

This commit is contained in:
purplerain 2023-08-28 05:57:34 +00:00
commit 88965415ff
Signed by: purplerain
GPG key ID: F42C07F07E2E35B7
26235 changed files with 29195616 additions and 0 deletions

82
lib/mesa/docs/ci/LAVA.rst Normal file
View file

@ -0,0 +1,82 @@
LAVA CI
=======
`LAVA <https://lavasoftware.org/>`_ is a system for functional testing
of boards including deploying custom bootloaders and kernels. This is
particularly relevant to testing Mesa because we often need to change
kernels for UAPI changes (and this lets us do full testing of a new
kernel during development), and our workloads can easily take down
boards when mistakes are made (kernel oopses, OOMs that take out
critical system services).
Mesa-LAVA software architecture
-------------------------------
The gitlab-runner will run on some host that has access to the LAVA
lab, with tags like "mesa-ci-x86-64-lava-$DEVICE_TYPE" to control only
taking in jobs for the hardware that the LAVA lab contains. The
gitlab-runner spawns a Docker container with lavacli in it, and
connects to the LAVA lab using a predefined token to submit jobs under
a specific device type.
The LAVA instance manages scheduling those jobs to the boards present.
For a job, it will deploy the kernel, device tree, and the ramdisk
containing the CTS.
Deploying a new Mesa-LAVA lab
-----------------------------
You'll want to start with setting up your LAVA instance and getting
some boards booting using test jobs. Start with the stock QEMU
examples to make sure your instance works at all. Then, you'll need
to define your actual boards.
The device type in lava-gitlab-ci.yml is the device type you create in
your LAVA instance, which doesn't have to match the board's name in
``/etc/lava-dispatcher/device-types``. You create your boards under
that device type and the Mesa jobs will be scheduled to any of them.
Instantiate your boards by creating them in the UI or at the command
line attached to that device type, then populate their dictionary
(using an "extends" line probably referencing the board's template in
``/etc/lava-dispatcher/device-types``). Now, go find a relevant
health check job for your board as a test job definition, or cobble
something together from a board that boots using the same boot_method
and some public images, and figure out how to get your boards booting.
Once you can boot your board using a custom job definition, it's time
to connect Mesa CI to it. Install gitlab-runner and register as a
shared runner (you'll need a GitLab admin for help with this). The
runner *must* have a tag (like "mesa-ci-x86-64-lava-rk3399-gru-kevin")
to restrict the jobs it takes or it will grab random jobs from tasks
across ``gitlab.freedesktop.org``, and your runner isn't ready for
that.
The Docker image will need access to the LAVA instance. If it's on a
public network it should be fine. If you're running the LAVA instance
on localhost, you'll need to set ``network_mode="host"`` in
``/etc/gitlab-runner/config.toml`` so it can access localhost. Create a
gitlab-runner user in your LAVA instance, log in under that user on
the web interface, and create an API token. Copy that into a
``lavacli.yaml``:
.. code-block:: yaml
default:
token: <token contents>
uri: <URL to the instance>
username: gitlab-runner
Add a volume mount of that ``lavacli.yaml`` to
``/etc/gitlab-runner/config.toml`` so that the Docker container can
access it. You probably have a ``volumes = ["/cache"]`` already, so now it would be::
volumes = ["/home/anholt/lava-config/lavacli.yaml:/root/.config/lavacli.yaml", "/cache"]
Note that this token is visible to anybody that can submit MRs to
Mesa! It is not an actual secret. We could just bake it into the
GitLab CI YAML, but this way the current method of connecting to the
LAVA instance is separated from the Mesa branches (particularly
relevant as we have many stable branches all using CI).
Now it's time to define your test jobs in the driver-specific
gitlab-ci.yml file, using the device-specific tags.

View file

@ -0,0 +1,231 @@
Bare-metal CI
=============
The bare-metal scripts run on a system with gitlab-runner and Docker,
connected to potentially multiple bare-metal boards that run tests of
Mesa. Currently "fastboot", "ChromeOS Servo", and POE-powered devices are
supported.
In comparison with LAVA, this doesn't involve maintaining a separate
web service with its own job scheduler and replicating jobs between the
two. It also places more of the board support in Git, instead of
web service configuration. On the other hand, the serial interactions
and bootloader support are more primitive.
Requirements (fastboot)
-----------------------
This testing requires power control of the DUTs by the gitlab-runner
machine, since this is what we use to reset the system and get back to
a pristine state at the start of testing.
We require access to the console output from the gitlab-runner system,
since that is how we get the final results back from the tests. You
should probably have the console on a serial connection, so that you
can see bootloader progress.
The boards need to be able to have a kernel/initramfs supplied by the
gitlab-runner system, since Mesa often needs to update the kernel either for new
DRM functionality, or to fix kernel bugs.
The boards must have networking, so that we can extract the dEQP XML results to
artifacts on GitLab, and so that we can download traces (too large for an
initramfs) for trace replay testing. Given that we need networking already, and
our dEQP/Piglit/etc. payload is large, we use NFS from the x86 runner system
rather than initramfs.
See `src/freedreno/ci/gitlab-ci.yml` for an example of fastboot on DB410c and
DB820c (freedreno-a306 and freedreno-a530).
Requirements (Servo)
--------------------
For Servo-connected boards, we can use the EC connection for power
control to reboot the board. However, loading a kernel is not as easy
as fastboot, so we assume your bootloader can do TFTP, and that your
gitlab-runner mounts the runner's tftp directory specific to the board
at /tftp in the container.
Since we're going the TFTP route, we also use NFS root. This avoids
packing the rootfs and sending it to the board as a ramdisk, which
means we can support larger rootfses (for Piglit testing), at the cost
of needing more storage on the runner.
Telling the board about where its TFTP and NFS should come from is
done using dnsmasq on the runner host. For example, this snippet in
the dnsmasq.conf.d in the google farm, with the gitlab-runner host we
call "servo"::
dhcp-host=1c:69:7a:0d:a3:d3,10.42.0.10,set:servo
# Fixed dhcp addresses for my sanity, and setting a tag for
# specializing other DHCP options
dhcp-host=a0:ce:c8:c8:d9:5d,10.42.0.11,set:cheza1
dhcp-host=a0:ce:c8:c8:d8:81,10.42.0.12,set:cheza2
# Specify the next server, watch out for the double ',,'. The
# filename didn't seem to get picked up by the bootloader, so we use
# tftp-unique-root and mount directories like
# /srv/tftp/10.42.0.11/jwerner/cheza as /tftp in the job containers.
tftp-unique-root
dhcp-boot=tag:cheza1,cheza1/vmlinuz,,10.42.0.10
dhcp-boot=tag:cheza2,cheza2/vmlinuz,,10.42.0.10
dhcp-option=tag:cheza1,option:root-path,/srv/nfs/cheza1
dhcp-option=tag:cheza2,option:root-path,/srv/nfs/cheza2
See `src/freedreno/ci/gitlab-ci.yml` for an example of Servo on cheza. Note
that other Servo boards in CI are managed using LAVA.
Requirements (POE)
------------------
For boards with 30W or less power consumption, POE can be used for the power
control. The parts list ends up looking something like (for example):
- x86-64 gitlab-runner machine with a mid-range CPU, and 3+ GB of SSD storage
per board. This can host at least 15 boards in our experience.
- Cisco 2960S gigabit ethernet switch with POE. (Cisco 3750G, 3560G, or 2960G
were also recommended as reasonable-priced HW, but make sure the name ends in
G, X, or S)
- POE splitters to power the boards (you can find ones that go to micro USB,
USBC, and 5V barrel jacks at least)
- USB serial cables (Adafruit sells pretty reliable ones)
- A large powered USB hub for all the serial cables
- A pile of ethernet cables
You'll talk to the Cisco for configuration using its USB port, which provides a
serial terminal at 9600 baud. You need to enable SNMP control, which we'll do
using a "mesaci" community name that the gitlab runner can access as its
authentication (no password) to configure. To talk to the SNMP on the router,
you need to put an IP address on the default vlan (vlan 1).
Setting that up looks something like:
.. code-block: console
Switch>
Password:
Switch#configure terminal
Switch(config)#interface Vlan 1
Switch(config-if)#ip address 10.42.0.2 255.255.0.0
Switch(config-if)#end
Switch(config)#snmp-server community mesaci RW
Switch(config)#end
Switch#copy running-config startup-config
With that set up, you should be able to power on/off a port with something like:
.. code-block: console
% snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 1
% snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 4
Note that the "1.3.6..." SNMP OID changes between switches. The last digit
above is the interface id (port number). You can probably find the right OID by
google, that was easier than figuring it out from finding the switch's MIB
database. You can query the POE status from the switch serial using the `show
power inline` command.
Other than that, find the dnsmasq/tftp/NFS setup for your boards "servo" above.
See `src/broadcom/ci/gitlab-ci.yml` and `src/nouveau/ci/gitlab-ci.yml` for an
examples of POE for Raspberry Pi 3/4, and Jetson Nano.
Setup
-----
Each board will be registered in freedesktop.org GitLab. You'll want
something like this to register a fastboot board:
.. code-block:: console
sudo gitlab-runner register \
--url https://gitlab.freedesktop.org \
--registration-token $1 \
--name MY_BOARD_NAME \
--tag-list MY_BOARD_TAG \
--executor docker \
--docker-image "alpine:latest" \
--docker-volumes "/dev:/dev" \
--docker-network-mode "host" \
--docker-privileged \
--non-interactive
For a Servo board, you'll need to also volume mount the board's NFS
root dir at /nfs and TFTP kernel directory at /tftp.
The registration token has to come from a freedesktop.org GitLab admin
going to https://gitlab.freedesktop.org/admin/runners
The name scheme for Google's lab is google-freedreno-boardname-n, and
our tag is something like google-freedreno-db410c. The tag is what
identifies a board type so that board-specific jobs can be dispatched
into that pool.
We need privileged mode and the /dev bind mount in order to get at the
serial console and fastboot USB devices (--device arguments don't
apply to devices that show up after container start, which is the case
with fastboot, and the Servo serial devices are actually links to
/dev/pts). We use host network mode so that we can spin up a nginx
server to collect XML results for fastboot.
Once you've added your boards, you're going to need to add a little
more customization in ``/etc/gitlab-runner/config.toml``. First, add
``concurrent = <number of boards>`` at the top ("we should have up to
this many jobs running managed by this gitlab-runner"). Then for each
board's runner, set ``limit = 1`` ("only 1 job served by this board at a
time"). Finally, add the board-specific environment variables
required by your bare-metal script, something like::
[[runners]]
name = "google-freedreno-db410c-1"
environment = ["BM_SERIAL=/dev/ttyDB410c8", "BM_POWERUP=google-power-up.sh 8", "BM_FASTBOOT_SERIAL=15e9e390", "FDO_CI_CONCURRENT=4"]
The ``FDO_CI_CONCURRENT`` variable should be set to the number of CPU threads on
the board, which is used for auto-tuning of job parallelism.
Once you've updated your runners' configs, restart with ``sudo service
gitlab-runner restart``
Caching downloads
-----------------
To improve the runtime for downloading traces during traces job runs, you will
want a pass-through HTTP cache. On your runner box, install nginx:
.. code-block:: console
sudo apt install nginx libnginx-mod-http-lua
Add the server setup files:
.. literalinclude:: fdo-cache
:name: /etc/nginx/sites-available/fdo-cache
:caption: /etc/nginx/sites-available/fdo-cache
.. literalinclude:: uri-caching.conf
:name: /etc/nginx/snippets/uri-caching.conf
:caption: /etc/nginx/snippets/uri-caching.conf
Edit the listener addresses in fdo-cache to suit the ethernet interface that
your devices are on.
Enable the site and restart nginx:
.. code-block:: console
sudo ln -s /etc/nginx/sites-available/fdo-cache /etc/nginx/sites-enabled/fdo-cache
sudo service nginx restart
# First download will hit the internet
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo.trace
# Second download should be cached.
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo.trace
Now, set ``download-url`` in your ``traces-*.yml`` entry to something like
``http://10.42.0.1:8888/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public``
and you should have cached downloads for traces. Add it to
``FDO_HTTP_CACHE_URI=`` in your ``config.toml`` runner environment lines and you
can use it for cached artifact downloads instead of going all the way to
freedesktop.org on each job.

View file

@ -0,0 +1,74 @@
Docker CI
=========
For LLVMpipe and Softpipe CI, we run tests in a container containing
VK-GL-CTS, on the shared GitLab runners provided by `freedesktop
<http://freedesktop.org>`_
Software architecture
---------------------
The Docker containers are rebuilt using the shell scripts under
.gitlab-ci/container/ when the FDO\_DISTRIBUTION\_TAG changes in
.gitlab-ci.yml. The resulting images are around 1 GB, and are
expected to change approximately weekly (though an individual
developer working on them may produce many more images while trying to
come up with a working MR!).
gitlab-runner is a client that polls gitlab.freedesktop.org for
available jobs, with no inbound networking requirements. Jobs can
have tags, so we can have DUT-specific jobs that only run on runners
with that tag marked in the GitLab UI.
Since dEQP takes a long time to run, we mark the job as "parallel" at
some level, which spawns multiple jobs from one definition, and then
deqp-runner.sh takes the corresponding fraction of the test list for
that job.
To reduce dEQP runtime (or avoid tests with unreliable results), a
deqp-runner.sh invocation can provide a list of tests to skip. If
your driver is not yet conformant, you can pass a list of expected
failures, and the job will only fail on tests that aren't listed (look
at the job's log for which specific tests failed).
DUT requirements
----------------
In addition to the general :ref:`CI-farm-expectations`, using
Docker requires:
* DUTs must have a stable kernel and GPU reset (if applicable).
If the system goes down during a test run, that job will eventually
time out and fail (default 1 hour). However, if the kernel can't
reliably reset the GPU on failure, bugs in one MR may leak into
spurious failures in another MR. This would be an unacceptable impact
on Mesa developers working on other drivers.
* DUTs must be able to run Docker
The Mesa gitlab-runner based test architecture is built around Docker,
so that we can cache the Debian package installation and CTS build
step across multiple test runs. Since the images are large and change
approximately weekly, the DUTs also need to be running some script to
prune stale Docker images periodically in order to not run out of disk
space as we rev those containers (perhaps `this script
<https://gitlab.com/gitlab-org/gitlab-runner/issues/2980#note_169233611>`_).
Note that Docker doesn't allow containers to be stored on NFS, and
doesn't allow multiple Docker daemons to interact with the same
network block device, so you will probably need some sort of physical
storage on your DUTs.
* DUTs must be public
By including your device in .gitlab-ci.yml, you're effectively letting
anyone on the internet run code on your device. Docker containers may
provide some limited protection, but how much you trust that and what
you do to mitigate hostile access is up to you.
* DUTs must expose the DRI device nodes to the containers.
Obviously, to get access to the HW, we need to pass the render node
through. This is done by adding ``devices = ["/dev/dri"]`` to the
``runners.docker`` section of /etc/gitlab-runner/config.toml.

View file

@ -0,0 +1,44 @@
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=my_cache:10m max_size=24g inactive=48h use_temp_path=off;
server {
listen 10.42.0.1:8888 default_server;
listen 127.0.0.1:8888 default_server;
listen [::]:8888 default_server;
resolver 8.8.8.8;
root /var/www/html;
# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;
server_name _;
add_header X-GG-Cache-Status $upstream_cache_status;
proxy_cache my_cache;
location /cache_gitlab_artifacts {
internal;
# Gitlabs http server puts everything as no-cache even though
# the artifacts URLS don't change. So enforce a long validity
# time and ignore the headers that defeat caching
proxy_cache_valid 200 48h;
proxy_ignore_headers Cache-Control Set-Cookie;
include snippets/uri-caching.conf;
}
location /cache {
# special case gitlab artifacts
if ($arg_uri ~* /.*gitlab.*artifacts(\/|%2F)raw/ ) {
rewrite ^ /cache_gitlab_artifacts;
}
# Set a really low validity together with cache revalidation; Our goal
# for caching isn't to lower the number of http requests but to
# lower the amount of data transfer. Also for some test
# scenarios (typical manual tests) the file at a given url
# might get modified so avoid confusion by ensuring
# revalidations happens often.
proxy_cache_valid 200 10s;
proxy_cache_revalidate on;
include snippets/uri-caching.conf;
}
}

273
lib/mesa/docs/ci/index.rst Normal file
View file

@ -0,0 +1,273 @@
Continuous Integration
======================
GitLab CI
---------
GitLab provides a convenient framework for running commands in response to Git pushes.
We use it to test merge requests (MRs) before merging them (pre-merge testing),
as well as post-merge testing, for everything that hits ``main``
(this is necessary because we still allow commits to be pushed outside of MRs,
and even then the MR CI runs in the forked repository, which might have been
modified and thus is unreliable).
The CI runs a number of tests, from trivial build-testing to complex GPU rendering:
- Build testing for a number of build systems, configurations and platforms
- Sanity checks (``meson test``)
- Some drivers (Softpipe, LLVMpipe, Freedreno and Panfrost) are also tested
using `VK-GL-CTS <https://github.com/KhronosGroup/VK-GL-CTS>`__
- Replay of application traces
A typical run takes between 20 and 30 minutes, although it can go up very quickly
if the GitLab runners are overwhelmed, which happens sometimes. When it does happen,
not much can be done besides waiting it out, or cancel it.
Due to limited resources, we currently do not run the CI automatically
on every push; instead, we only run it automatically once the MR has
been assigned to ``Marge``, our merge bot.
If you're interested in the details, the main configuration file is ``.gitlab-ci.yml``,
and it references a number of other files in ``.gitlab-ci/``.
If the GitLab CI doesn't seem to be running on your fork (or MRs, as they run
in the context of your fork), you should check the "Settings" of your fork.
Under "CI / CD" → "General pipelines", make sure "Custom CI config path" is
empty (or set to the default ``.gitlab-ci.yml``), and that the
"Public pipelines" box is checked.
If you're having issues with the GitLab CI, your best bet is to ask
about it on ``#freedesktop`` on OFTC and tag `Daniel Stone
<https://gitlab.freedesktop.org/daniels>`__ (``daniels`` on IRC) or
`Emma Anholt <https://gitlab.freedesktop.org/anholt>`__ (``anholt`` on
IRC).
The three GitLab CI systems currently integrated are:
.. toctree::
:maxdepth: 1
bare-metal
LAVA
docker
Application traces replay
-------------------------
The CI replays application traces with various drivers in two different jobs. The first
job replays traces listed in ``src/<driver>/ci/traces-<driver>.yml`` files and if any
of those traces fail the pipeline fails as well. The second job replays traces listed in
``src/<driver>/ci/restricted-traces-<driver>.yml`` and it is allowed to fail. This second
job is only created when the pipeline is triggered by `marge-bot` or any other user that
has been granted access to these traces.
A traces YAML file also includes a ``download-url`` pointing to a MinIO
instance where to download the traces from. While the first job should always work with
publicly accessible traces, the second job could point to an URL with restricted access.
Restricted traces are those that have been made available to Mesa developers without a
license to redistribute at will, and thus should not be exposed to the public. Failing to
access that URL would not prevent the pipeline to pass, therefore forks made by
contributors without permissions to download non-redistributable traces can be merged
without friction.
As an aside, only maintainers of such non-redistributable traces are responsible for
ensuring that replays are successful, since other contributors would not be able to
download and test them by themselves.
Those Mesa contributors that believe they could have permission to access such
non-redistributable traces can request permission to Daniel Stone <daniels@collabora.com>.
gitlab.freedesktop.org accounts that are to be granted access to these traces will be
added to the OPA policy for the MinIO repository as per
https://gitlab.freedesktop.org/freedesktop/helm-gitlab-config/-/commit/a3cd632743019f68ac8a829267deb262d9670958 .
So the jobs are created in personal repositories, the name of the user's account needs
to be added to the rules attribute of the GitLab CI job that accesses the restricted
accounts.
.. toctree::
:maxdepth: 1
local-traces
Intel CI
--------
The Intel CI is not yet integrated into the GitLab CI.
For now, special access must be manually given (file a issue in
`the Intel CI configuration repo <https://gitlab.freedesktop.org/Mesa_CI/mesa_jenkins>`__
if you think you or Mesa would benefit from you having access to the Intel CI).
Results can be seen on `mesa-ci.01.org <https://mesa-ci.01.org>`__
if you are *not* an Intel employee, but if you are you
can access a better interface on
`mesa-ci-results.jf.intel.com <http://mesa-ci-results.jf.intel.com>`__.
The Intel CI runs a much larger array of tests, on a number of generations
of Intel hardware and on multiple platforms (X11, Wayland, DRM & Android),
with the purpose of detecting regressions.
Tests include
`Crucible <https://gitlab.freedesktop.org/mesa/crucible>`__,
`VK-GL-CTS <https://github.com/KhronosGroup/VK-GL-CTS>`__,
`dEQP <https://android.googlesource.com/platform/external/deqp>`__,
`Piglit <https://gitlab.freedesktop.org/mesa/piglit>`__,
`Skia <https://skia.googlesource.com/skia>`__,
`VkRunner <https://github.com/Igalia/vkrunner>`__,
`WebGL <https://github.com/KhronosGroup/WebGL>`__,
and a few other tools.
A typical run takes between 30 minutes and an hour.
If you're having issues with the Intel CI, your best bet is to ask about
it on ``#dri-devel`` on OFTC and tag `Nico Cortes
<https://gitlab.freedesktop.org/ngcortes>`__ (``ngcortes`` on IRC).
.. _CI-farm-expectations:
CI farm expectations
--------------------
To make sure that testing of one vendor's drivers doesn't block
unrelated work by other vendors, we require that a given driver's test
farm produces a spurious failure no more than once a week. If every
driver had CI and failed once a week, we would be seeing someone's
code getting blocked on a spurious failure daily, which is an
unacceptable cost to the project.
Additionally, the test farm needs to be able to provide a short enough
turnaround time that we can get our MRs through marge-bot without the
pipeline backing up. As a result, we require that the test farm be
able to handle a whole pipeline's worth of jobs in less than 15 minutes
(to compare, the build stage is about 10 minutes).
If a test farm is short the HW to provide these guarantees, consider dropping
tests to reduce runtime. dEQP job logs print the slowest tests at the end of
the run, and Piglit logs the runtime of tests in the results.json.bz2 in the
artifacts. Or, you can add the following to your job to only run some fraction
(in this case, 1/10th) of the dEQP tests.
.. code-block:: yaml
variables:
DEQP_FRACTION: 10
to just run 1/10th of the test list.
If a HW CI farm goes offline (network dies and all CI pipelines end up
stalled) or its runners are consistently spuriously failing (disk
full?), and the maintainer is not immediately available to fix the
issue, please push through an MR disabling that farm's jobs by adding
'.' to the front of the jobs names until the maintainer can bring
things back up. If this happens, the farm maintainer should provide a
report to mesa-dev@lists.freedesktop.org after the fact explaining
what happened and what the mitigation plan is for that failure next
time.
Personal runners
----------------
Mesa's CI is currently run primarily on packet.net's m1xlarge nodes
(2.2Ghz Sandy Bridge), with each job getting 8 cores allocated. You
can speed up your personal CI builds (and marge-bot merges) by using a
faster personal machine as a runner. You can find the gitlab-runner
package in Debian, or use GitLab's own builds.
To do so, follow `GitLab's instructions
<https://docs.gitlab.com/ce/ci/runners/#create-a-specific-runner>`__ to
register your personal GitLab runner in your Mesa fork. Then, tell
Mesa how many jobs it should serve (``concurrent=``) and how many
cores those jobs should use (``FDO_CI_CONCURRENT=``) by editing these
lines in ``/etc/gitlab-runner/config.toml``, for example::
concurrent = 2
[[runners]]
environment = ["FDO_CI_CONCURRENT=16"]
Docker caching
--------------
The CI system uses Docker images extensively to cache
infrequently-updated build content like the CTS. The `freedesktop.org
CI templates
<https://gitlab.freedesktop.org/freedesktop/ci-templates/>`_ help us
manage the building of the images to reduce how frequently rebuilds
happen, and trim down the images (stripping out manpages, cleaning the
apt cache, and other such common pitfalls of building Docker images).
When running a container job, the templates will look for an existing
build of that image in the container registry under
``MESA_IMAGE_TAG``. If it's found it will be reused, and if
not, the associated `.gitlab-ci/containers/<jobname>.sh`` will be run
to build it. So, when developing any change to container build
scripts, you need to update the associated ``MESA_IMAGE_TAG`` to
a new unique string. We recommend using the current date plus some
string related to your branch (so that if you rebase on someone else's
container update from the same day, you will get a Git conflict
instead of silently reusing their container)
When developing a given change to your Docker image, you would have to
bump the tag on each ``git commit --amend`` to your development
branch, which can get tedious. Instead, you can navigate to the
`container registry
<https://gitlab.freedesktop.org/mesa/mesa/container_registry>`_ for
your repository and delete the tag to force a rebuild. When your code
is eventually merged to main, a full image rebuild will occur again
(forks inherit images from the main repo, but MRs don't propagate
images from the fork into the main repo's registry).
Building locally using CI docker images
---------------------------------------
It can be frustrating to debug build failures on an environment you
don't personally have. If you're experiencing this with the CI
builds, you can use Docker to use their build environment locally. Go
to your job log, and at the top you'll see a line like::
Pulling docker image registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11
We'll use a volume mount to make our current Mesa tree be what the
Docker container uses, so they'll share everything (their build will
go in _build, according to ``meson-build.sh``). We're going to be
using the image non-interactively so we use ``run --rm $IMAGE
command`` instead of ``run -it $IMAGE bash`` (which you may also find
useful for debug). Extract your build setup variables from
.gitlab-ci.yml and run the CI meson build script:
.. code-block:: console
IMAGE=registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11
sudo docker pull $IMAGE
sudo docker run --rm -v `pwd`:/mesa -w /mesa $IMAGE env PKG_CONFIG_PATH=/usr/local/lib/aarch64-linux-android/pkgconfig/:/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android/pkgconfig/ GALLIUM_DRIVERS=freedreno UNWIND=disabled EXTRA_OPTION="-D android-stub=true -D llvm=disabled" DRI_LOADERS="-D glx=disabled -D gbm=disabled -D egl=enabled -D platforms=android" CROSS=aarch64-linux-android ./.gitlab-ci/meson-build.sh
All you have left over from the build is its output, and a _build
directory. You can hack on mesa and iterate testing the build with:
.. code-block:: console
sudo docker run --rm -v `pwd`:/mesa $IMAGE ninja -C /mesa/_build
Conformance Tests
-----------------
Some conformance tests require a special treatment to be maintained on GitLab CI.
This section lists their documentation pages.
.. toctree::
:maxdepth: 1
skqp
Updating GitLab CI Linux Kernel
-------------------------------
GitLab CI usually runs a bleeding-edge kernel. The following documentation has
instructions on how to uprev Linux Kernel in the GitLab CI ecosystem.
.. toctree::
:maxdepth: 1
kernel

121
lib/mesa/docs/ci/kernel.rst Normal file
View file

@ -0,0 +1,121 @@
Upreving Linux Kernel
=====================
Occasionally, the GitLab CI needs a Linux Kernel update to enable new kernel
features, device drivers, bug fixes etc to CI jobs.
Kernel uprevs in GitLab CI are relatively simple, but prone to lots of
side-effects since many devices from different platforms are involved in the
pipeline.
Kernel repository
-----------------
The Linux Kernel used in the GitLab CI is stored at the following repository:
https://gitlab.freedesktop.org/gfx-ci/linux
It is common that Mesa kernel brings some patches that were not merged on the
Linux mainline, that is why Mesa has its own kernel version which should be used
as the base for newer kernels.
So, one should base the kernel uprev from the last tag used in the Mesa CI,
please refer to `.gitlab-ci/container/gitlab-ci.yml` `KERNEL_URL` variable.
Every tag has a standard naming: `vX.YZ-for-mesa-ci-<commit_short_SHA>`, which
can be created via the command:
:code:`git tag vX.YZ-for-mesa-ci-$(git rev-parse --short HEAD)`
Building Kernel
---------------
When Mesa CI generates a new rootfs image, the Linux Kernel is built based on
the script located at `.gitlab-ci/container/build-kernel.sh`.
Updating Kconfigs
^^^^^^^^^^^^^^^^^
When a Kernel uprev happens, it is worth compiling and cross-compiling the
Kernel locally, in order to update the Kconfigs accordingly. Remember that the
resulting Kconfig is a merge between *Mesa CI Kconfig* and *Linux tree
defconfig* made via `merge_config.sh` script located at Linux Kernel tree.
Kconfigs location
"""""""""""""""""
+------------+--------------------------------------------+-------------------------------------+
| Platform | Mesa CI Kconfig location | Linux tree defconfig |
+============+============================================+=====================================+
| arm | .gitlab-ci/container/arm.config | arch/arm/configs/multi_v7_defconfig |
+------------+--------------------------------------------+-------------------------------------+
| arm64 | .gitlab-ci/container/arm64.config | arch/arm64/configs/defconfig |
+------------+--------------------------------------------+-------------------------------------+
| x86-64 | .gitlab-ci/container/x86_64.config | arch/x86/configs/x86_64_defconfig |
+------------+--------------------------------------------+-------------------------------------+
Updating image tags
-------------------
Every kernel uprev should update 3 image tags, located at two files.
:code:`.gitlab-ci/container/gitlab-ci.yml` tag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- **KERNEL_URL** for the location of the new kernel
:code:`.gitlab-ci/image-tags.yml` tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- **KERNEL_ROOTFS_TAG** to rebuild rootfs with the new kernel
- **DEBIAN_X86_TEST_GL_TAG** to ensure that the new rootfs is being used by the GitLab x86 jobs
Development routine
-------------------
1. Compile the newer kernel locally for each platform.
2. Compile device trees for ARM platforms
3. Update Kconfigs. Are new Kconfigs necessary? Is CONFIG_XYZ_BLA deprecated? Does the `merge_config.sh` override an important config?
4. Push a new development branch to `Kernel repository`_ based on the latest kernel tag used in GitLab CI
5. Hack `build-kernel.sh` script to clone kernel from your development branch
6. Update image tags. See `Updating image tags`_
7. Run the entire CI pipeline, all the automatic jobs should be green. If some job is red or taking too long, you will need to investigate it and probably ask for help.
When the Kernel uprev is stable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. Push a new tag to Mesa CI `Kernel repository`_
2. Update KERNEL_URL `debian/x86_test-gl` job definition
3. Open a merge request, if it is not opened yet
Tips and Tricks
---------------
Compare pipelines
^^^^^^^^^^^^^^^^^
To have the most confidence that a kernel uprev does not break anything in Mesa,
it is suggested that one runs the entire CI pipeline to check if the update affected the manual CI jobs.
Step-by-step
""""""""""""
1. Create a local branch in the same git ref (should be the main branch) before branching to the kernel uprev kernel.
2. Push this test branch
3. Run the entire pipeline against the test branch, even the manual jobs
4. Now do the same for the kernel uprev branch
5. Compare the job results. If a CI job turned red on your uprev branch, it means that the kernel update broke the test. Otherwise, it should be fine.
Bare-metal custom kernels
^^^^^^^^^^^^^^^^^^^^^^^^^
Some CI jobs have support to plug in a custom kernel by simply changing a variable.
This is great, since rebuilding the kernel and rootfs may takes dozens of minutes.
For example, Freedreno jobs `gitlab.yml` manifest support a variable named
`BM_KERNEL`. If one puts a gz-compressed kernel URL there, the job will use that
kernel to boot the Freedreno bare-metal devices. The same works for `BM_DTB` in
the case of device tree binaries.
Careful reading of the job logs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes a job may turn to red for reasons unrelated to the kernel update, e.g.
LAVA `tftp` timeout, problems with the freedesktop servers etc.
So it is important to see the reason why the job turned red, and retry it if an
infrastructure error has happened.

View file

@ -0,0 +1,45 @@
Running traces on a local machine
=================================
Prerequisites
-------------
- Install `Apitrace <https://apitrace.github.io/>`_
- Install `Renderdoc <https://renderdoc.org/>`_ (only needed for some traces)
- Download and compile `Piglit <https://gitlab.freedesktop.org/mesa/piglit>`_ and install his `dependencies <https://gitlab.freedesktop.org/mesa/piglit#2-setup>`_
- Download traces you want to replay from `traces-db <https://gitlab.freedesktop.org/gfx-ci/tracie/traces-db/>`_
Running single trace
--------------------
A simple run to see the output of the trace can be done with
.. code-block:: console
apitrace replay -w name_of_trace.trace
For more information, look into the `Apitrace documentation <https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown>`_.
For comparing checksums use:
.. code-block:: console
cd piglit/replayer
export PIGLIT_SOURCE_DIR="../"
./replayer.py compare trace -d test path/name_of_trace.trace 0 # replace with expected checksum
Simulating CI trace job
-----------------------
Sometimes it's useful to be able to test traces on your local machine instead of the Mesa CI runner. To simulate the CI environment as closely as possible.
Download the YAML file from your driver's `ci/` directory and then change the path in the YAML file from local proxy or MinIO to the local directory (url-like format ``file://``)
.. code-block:: console
# The PIGLIT_REPLAY_DEVICE_NAME has to match name in the YAML file.
export PIGLIT_REPLAY_DEVICE_NAME='your_device_name'
export PIGLIT_REPLAY_DESCRIPTION_FILE='path_to_mesa_traces_file.yml'
./piglit run -l verbose --timeout 300 -j10 replay ~/results/
Note: For replaying traces, you may need to allow higher GL and GLSL versions. You can achieve that by setting  ``MESA_GLSL_VERSION_OVERRIDE`` and ``MESA_GL_VERSION_OVERRIDE``.

101
lib/mesa/docs/ci/skqp.rst Normal file
View file

@ -0,0 +1,101 @@
SkQP
====
`SkQP <https://skia.org/docs/dev/testing/skqp/>`_ stands for SKIA Quality
Program conformance tests. Basically, it has sets of rendering tests and unit
tests to ensure that `SKIA <https://skia.org/>`_ is meeting its design specifications on a specific
device.
The rendering tests have support for GL, GLES and Vulkan backends and test some
rendering scenarios.
And the unit tests check the GPU behavior without rendering images.
Tests
-----
Render tests design
^^^^^^^^^^^^^^^^^^^
It is worth noting that `rendertests.txt` can bring some detail about each test
expectation, so each test can have a max pixel error count, to tell SkQP that it
is OK to have at most that number of errors for that test. See also:
https://github.com/google/skia/blob/c29454d1c9ebed4758a54a69798869fa2e7a36e0/tools/skqp/README_ALGORITHM.md
.. _test-location:
Location
^^^^^^^^
Each `rendertests.txt` and `unittest.txt` file must be located inside a specific
subdirectory inside SkQP assets directory.
+--------------+--------------------------------------------+
| Test type | Location |
+==============+============================================+
| Render tests | `${SKQP_ASSETS_DIR}/skqp/rendertests.txt` |
+--------------+--------------------------------------------+
| Unit tests | `${SKQP_ASSETS_DIR}/skqp/unittests.txt` |
+--------------+--------------------------------------------+
The `skqp-runner.sh` script will make the necessary modifications to separate
`rendertests.txt` for each backend-driver combination. As long as the test files are located in the expected place:
+--------------+----------------------------------------------------------------------------------------------+
| Test type | Location |
+==============+==============================================================================================+
| Render tests | `${MESA_REPOSITORY_DIR}/src/${GPU_DRIVER}/ci/${GPU_VERSION}-${SKQP_BACKEND}_rendertests.txt` |
+--------------+----------------------------------------------------------------------------------------------+
| Unit tests | `${MESA_REPOSITORY_DIR}/src/${GPU_DRIVER}/ci/${GPU_VERSION}_unittests.txt` |
+--------------+----------------------------------------------------------------------------------------------+
Where `SKQP_BACKEND` can be:
- gl: for GL backend
- gles: for GLES backend
- vk: for Vulkan backend
Example file
""""""""""""
.. code-block:: console
src/freedreno/ci/freedreno-a630-skqp-gl_rendertests.txt
- GPU_DRIVER: `freedreno`
- GPU_VERSION: `freedreno-a630`
- SKQP_BACKEND: `gl`
.. _rendertests-design:
SkQP reports
------------
SkQP generates reports after finishing its execution, they are located at the job
artifacts results directory and are divided in subdirectories by rendering tests
backends and unit
tests. The job log has links to every generated report in order to facilitate
the SkQP debugging.
Maintaining SkQP on Mesa CI
---------------------------
SkQP is built alongside with another binary, namely `list_gpu_unit_tests`, it is
located in the same folder where `skqp` binary is.
This binary will generate the expected `unittests.txt` for the target GPU, so
ideally it should be executed on every SkQP update and when a new device
receives SkQP CI jobs.
1. Generate target unit tests for the current GPU with :code:`./list_gpu_unit_tests > unittests.txt`
2. Run SkQP job
3. If there is a failing or crashing unit test, remove it from the corresponding `unittests.txt`
4. If there is a crashing render test, remove it from the corresponding `rendertests.txt`
5. If there is a failing render test, visually inspect the result from the HTML report
- If the render result is OK, update the max error count for that test
- Otherwise, or put `-1` in the same threshold, as seen in :ref:`rendertests-design`
6. Remember to put the new tests files to the locations cited in :ref:`test-location`

View file

@ -0,0 +1,37 @@
set $proxy_authorization '';
set_by_lua $proxyuri '
unescaped = ngx.unescape_uri(ngx.var.arg_uri);
it, err = ngx.re.match(unescaped, "(https?://)(.*@)?([^/]*)(/.*)?");
if not it then
-- Hack to cause nginx to return 404
return "http://localhost/404"
end
scheme = it[1];
authstring = it[2];
host = it[3];
query = it[4];
if ngx.var.http_authorization and ngx.var.http_authorization ~= "" then
ngx.var.proxy_authorization = ngx.var.http_authorization;
elseif authstring then
auth = string.sub(authstring, 0, -2);
auth64 = ngx.encode_base64(auth);
ngx.var.proxy_authorization = "Basic " .. auth64;
end
-- Default to / if none is set to avoid using the request_uri query
if not query then
query = "/";
end
return scheme .. host .. query;
';
add_header X-GG-Cache-Status $upstream_cache_status;
proxy_set_header Authorization $proxy_authorization;
proxy_pass $proxyuri;
# Redirect back to ourselves on 301 replies
proxy_redirect ~^(.*)$ /cache/?uri=$1;