1 | Bare-metal CI
|
---|
2 | =============
|
---|
3 |
|
---|
4 | The bare-metal scripts run on a system with gitlab-runner and Docker,
|
---|
5 | connected to potentially multiple bare-metal boards that run tests of
|
---|
6 | Mesa. Currently "fastboot", "ChromeOS Servo", and POE-powered devices are
|
---|
7 | supported.
|
---|
8 |
|
---|
9 | In comparison with LAVA, this doesn't involve maintaining a separate
|
---|
10 | web service with its own job scheduler and replicating jobs between the
|
---|
11 | two. It also places more of the board support in Git, instead of
|
---|
12 | web service configuration. On the other hand, the serial interactions
|
---|
13 | and bootloader support are more primitive.
|
---|
14 |
|
---|
15 | Requirements (fastboot)
|
---|
16 | -----------------------
|
---|
17 |
|
---|
18 | This testing requires power control of the DUTs by the gitlab-runner
|
---|
19 | machine, since this is what we use to reset the system and get back to
|
---|
20 | a pristine state at the start of testing.
|
---|
21 |
|
---|
22 | We require access to the console output from the gitlab-runner system,
|
---|
23 | since that is how we get the final results back from the tests. You
|
---|
24 | should probably have the console on a serial connection, so that you
|
---|
25 | can see bootloader progress.
|
---|
26 |
|
---|
27 | The boards need to be able to have a kernel/initramfs supplied by the
|
---|
28 | gitlab-runner system, since Mesa often needs to update the kernel either for new
|
---|
29 | DRM functionality, or to fix kernel bugs.
|
---|
30 |
|
---|
31 | The boards must have networking, so that we can extract the dEQP XML results to
|
---|
32 | artifacts on GitLab, and so that we can download traces (too large for an
|
---|
33 | initramfs) for trace replay testing. Given that we need networking already, and
|
---|
34 | our dEQP/Piglit/etc. payload is large, we use NFS from the x86 runner system
|
---|
35 | rather than initramfs.
|
---|
36 |
|
---|
37 | See ``src/freedreno/ci/gitlab-ci.yml`` for an example of fastboot on DB410c and
|
---|
38 | DB820c (freedreno-a306 and freedreno-a530).
|
---|
39 |
|
---|
40 | Requirements (Servo)
|
---|
41 | --------------------
|
---|
42 |
|
---|
43 | For Servo-connected boards, we can use the EC connection for power
|
---|
44 | control to reboot the board. However, loading a kernel is not as easy
|
---|
45 | as fastboot, so we assume your bootloader can do TFTP, and that your
|
---|
46 | gitlab-runner mounts the runner's tftp directory specific to the board
|
---|
47 | at /tftp in the container.
|
---|
48 |
|
---|
49 | Since we're going the TFTP route, we also use NFS root. This avoids
|
---|
50 | packing the rootfs and sending it to the board as a ramdisk, which
|
---|
51 | means we can support larger rootfses (for Piglit testing), at the cost
|
---|
52 | of needing more storage on the runner.
|
---|
53 |
|
---|
54 | Telling the board about where its TFTP and NFS should come from is
|
---|
55 | done using dnsmasq on the runner host. For example, this snippet in
|
---|
56 | the dnsmasq.conf.d in the google farm, with the gitlab-runner host we
|
---|
57 | call "servo"::
|
---|
58 |
|
---|
59 | dhcp-host=1c:69:7a:0d:a3:d3,10.42.0.10,set:servo
|
---|
60 |
|
---|
61 | # Fixed dhcp addresses for my sanity, and setting a tag for
|
---|
62 | # specializing other DHCP options
|
---|
63 | dhcp-host=a0:ce:c8:c8:d9:5d,10.42.0.11,set:cheza1
|
---|
64 | dhcp-host=a0:ce:c8:c8:d8:81,10.42.0.12,set:cheza2
|
---|
65 |
|
---|
66 | # Specify the next server, watch out for the double ',,'. The
|
---|
67 | # filename didn't seem to get picked up by the bootloader, so we use
|
---|
68 | # tftp-unique-root and mount directories like
|
---|
69 | # /srv/tftp/10.42.0.11/jwerner/cheza as /tftp in the job containers.
|
---|
70 | tftp-unique-root
|
---|
71 | dhcp-boot=tag:cheza1,cheza1/vmlinuz,,10.42.0.10
|
---|
72 | dhcp-boot=tag:cheza2,cheza2/vmlinuz,,10.42.0.10
|
---|
73 |
|
---|
74 | dhcp-option=tag:cheza1,option:root-path,/srv/nfs/cheza1
|
---|
75 | dhcp-option=tag:cheza2,option:root-path,/srv/nfs/cheza2
|
---|
76 |
|
---|
77 | See ``src/freedreno/ci/gitlab-ci.yml`` for an example of Servo on cheza. Note
|
---|
78 | that other Servo boards in CI are managed using LAVA.
|
---|
79 |
|
---|
80 | Requirements (POE)
|
---|
81 | ------------------
|
---|
82 |
|
---|
83 | For boards with 30W or less power consumption, POE can be used for the power
|
---|
84 | control. The parts list ends up looking something like (for example):
|
---|
85 |
|
---|
86 | - x86-64 gitlab-runner machine with a mid-range CPU, and 3+ GB of SSD storage
|
---|
87 | per board. This can host at least 15 boards in our experience.
|
---|
88 | - Cisco 2960S gigabit ethernet switch with POE. (Cisco 3750G, 3560G, or 2960G
|
---|
89 | were also recommended as reasonable-priced HW, but make sure the name ends in
|
---|
90 | G, X, or S)
|
---|
91 | - POE splitters to power the boards (you can find ones that go to micro USB,
|
---|
92 | USBC, and 5V barrel jacks at least)
|
---|
93 | - USB serial cables (Adafruit sells pretty reliable ones)
|
---|
94 | - A large powered USB hub for all the serial cables
|
---|
95 | - A pile of ethernet cables
|
---|
96 |
|
---|
97 | You'll talk to the Cisco for configuration using its USB port, which provides a
|
---|
98 | serial terminal at 9600 baud. You need to enable SNMP control, which we'll do
|
---|
99 | using a "mesaci" community name that the gitlab runner can access as its
|
---|
100 | authentication (no password) to configure. To talk to the SNMP on the router,
|
---|
101 | you need to put an IP address on the default VLAN (VLAN 1).
|
---|
102 |
|
---|
103 | Setting that up looks something like:
|
---|
104 |
|
---|
105 | .. code-block: console
|
---|
106 |
|
---|
107 | Switch>
|
---|
108 | Password:
|
---|
109 | Switch#configure terminal
|
---|
110 | Switch(config)#interface Vlan 1
|
---|
111 | Switch(config-if)#ip address 10.42.0.2 255.255.0.0
|
---|
112 | Switch(config-if)#end
|
---|
113 | Switch(config)#snmp-server community mesaci RW
|
---|
114 | Switch(config)#end
|
---|
115 | Switch#copy running-config startup-config
|
---|
116 |
|
---|
117 | With that set up, you should be able to power on/off a port with something like:
|
---|
118 |
|
---|
119 | .. code-block: console
|
---|
120 |
|
---|
121 | % snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 1
|
---|
122 | % snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 4
|
---|
123 |
|
---|
124 | Note that the "1.3.6..." SNMP OID changes between switches. The last digit
|
---|
125 | above is the interface id (port number). You can probably find the right OID by
|
---|
126 | google, that was easier than figuring it out from finding the switch's MIB
|
---|
127 | database. You can query the POE status from the switch serial using the ``show
|
---|
128 | power inline`` command.
|
---|
129 |
|
---|
130 | Other than that, find the dnsmasq/tftp/NFS setup for your boards "servo" above.
|
---|
131 |
|
---|
132 | See ``src/broadcom/ci/gitlab-ci.yml`` and ``src/nouveau/ci/gitlab-ci.yml`` for an
|
---|
133 | examples of POE for Raspberry Pi 3/4, and Jetson Nano.
|
---|
134 |
|
---|
135 | Setup
|
---|
136 | -----
|
---|
137 |
|
---|
138 | Each board will be registered in freedesktop.org GitLab. You'll want
|
---|
139 | something like this to register a fastboot board:
|
---|
140 |
|
---|
141 | .. code-block:: console
|
---|
142 |
|
---|
143 | sudo gitlab-runner register \
|
---|
144 | --url https://gitlab.freedesktop.org \
|
---|
145 | --registration-token $1 \
|
---|
146 | --name MY_BOARD_NAME \
|
---|
147 | --tag-list MY_BOARD_TAG \
|
---|
148 | --executor docker \
|
---|
149 | --docker-image "alpine:latest" \
|
---|
150 | --docker-volumes "/dev:/dev" \
|
---|
151 | --docker-network-mode "host" \
|
---|
152 | --docker-privileged \
|
---|
153 | --non-interactive
|
---|
154 |
|
---|
155 | For a Servo board, you'll need to also volume mount the board's NFS
|
---|
156 | root dir at /nfs and TFTP kernel directory at /tftp.
|
---|
157 |
|
---|
158 | The registration token has to come from a freedesktop.org GitLab admin
|
---|
159 | going to https://gitlab.freedesktop.org/admin/runners
|
---|
160 |
|
---|
161 | The name scheme for Google's lab is google-freedreno-boardname-n, and
|
---|
162 | our tag is something like google-freedreno-db410c. The tag is what
|
---|
163 | identifies a board type so that board-specific jobs can be dispatched
|
---|
164 | into that pool.
|
---|
165 |
|
---|
166 | We need privileged mode and the /dev bind mount in order to get at the
|
---|
167 | serial console and fastboot USB devices (--device arguments don't
|
---|
168 | apply to devices that show up after container start, which is the case
|
---|
169 | with fastboot, and the Servo serial devices are actually links to
|
---|
170 | /dev/pts). We use host network mode so that we can spin up a nginx
|
---|
171 | server to collect XML results for fastboot.
|
---|
172 |
|
---|
173 | Once you've added your boards, you're going to need to add a little
|
---|
174 | more customization in ``/etc/gitlab-runner/config.toml``. First, add
|
---|
175 | ``concurrent = <number of boards>`` at the top ("we should have up to
|
---|
176 | this many jobs running managed by this gitlab-runner"). Then for each
|
---|
177 | board's runner, set ``limit = 1`` ("only 1 job served by this board at a
|
---|
178 | time"). Finally, add the board-specific environment variables
|
---|
179 | required by your bare-metal script, something like::
|
---|
180 |
|
---|
181 | [[runners]]
|
---|
182 | name = "google-freedreno-db410c-1"
|
---|
183 | environment = ["BM_SERIAL=/dev/ttyDB410c8", "BM_POWERUP=google-power-up.sh 8", "BM_FASTBOOT_SERIAL=15e9e390", "FDO_CI_CONCURRENT=4"]
|
---|
184 |
|
---|
185 | The ``FDO_CI_CONCURRENT`` variable should be set to the number of CPU threads on
|
---|
186 | the board, which is used for auto-tuning of job parallelism.
|
---|
187 |
|
---|
188 | Once you've updated your runners' configs, restart with ``sudo service
|
---|
189 | gitlab-runner restart``
|
---|
190 |
|
---|
191 | Caching downloads
|
---|
192 | -----------------
|
---|
193 |
|
---|
194 | To improve the runtime for downloading traces during traces job runs, you will
|
---|
195 | want a pass-through HTTP cache. On your runner box, install nginx:
|
---|
196 |
|
---|
197 | .. code-block:: console
|
---|
198 |
|
---|
199 | sudo apt install nginx libnginx-mod-http-lua
|
---|
200 |
|
---|
201 | Add the server setup files:
|
---|
202 |
|
---|
203 | .. literalinclude:: fdo-cache
|
---|
204 | :name: /etc/nginx/sites-available/fdo-cache
|
---|
205 | :caption: /etc/nginx/sites-available/fdo-cache
|
---|
206 |
|
---|
207 | .. literalinclude:: uri-caching.conf
|
---|
208 | :name: /etc/nginx/snippets/uri-caching.conf
|
---|
209 | :caption: /etc/nginx/snippets/uri-caching.conf
|
---|
210 |
|
---|
211 | Edit the listener addresses in fdo-cache to suit the ethernet interface that
|
---|
212 | your devices are on.
|
---|
213 |
|
---|
214 | Enable the site and restart nginx:
|
---|
215 |
|
---|
216 | .. code-block:: console
|
---|
217 |
|
---|
218 | sudo rm /etc/nginx/sites-enabled/default
|
---|
219 | sudo ln -s /etc/nginx/sites-available/fdo-cache /etc/nginx/sites-enabled/fdo-cache
|
---|
220 | sudo systemctl restart nginx
|
---|
221 |
|
---|
222 | # First download will hit the internet
|
---|
223 | wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace
|
---|
224 | # Second download should be cached.
|
---|
225 | wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace
|
---|
226 |
|
---|
227 | Now, set ``download-url`` in your ``traces-*.yml`` entry to something like
|
---|
228 | ``http://caching-proxy/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public``
|
---|
229 | and you should have cached downloads for traces. Add it to
|
---|
230 | ``FDO_HTTP_CACHE_URI=`` in your ``config.toml`` runner environment lines and you
|
---|
231 | can use it for cached artifact downloads instead of going all the way to
|
---|
232 | freedesktop.org on each job.
|
---|