1 | <?xml version="1.0" encoding="UTF-8"?>
|
---|
2 | <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
---|
3 | "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
|
---|
4 | <chapter id="TechnicalBackground">
|
---|
5 | <title>Technical background</title>
|
---|
6 |
|
---|
7 | <para>The contents of this chapter are not required to use VirtualBox
|
---|
8 | successfully. The following is provided as additional information for
|
---|
9 | readers who are more familiar with computer architecture and technology and
|
---|
10 | wish to find out more about how VirtualBox works "under the hood".</para>
|
---|
11 |
|
---|
12 | <sect1>
|
---|
13 | <title>VirtualBox executables and components</title>
|
---|
14 |
|
---|
15 | <para>VirtualBox was designed to be modular and flexible. When the
|
---|
16 | VirtualBox graphical user interface (GUI) is opened and a VM is started,
|
---|
17 | at least three processes are running:<orderedlist>
|
---|
18 | <listitem>
|
---|
19 | <para><computeroutput>VBoxSVC</computeroutput>, the VirtualBox
|
---|
20 | service process which always runs in the background. This process is
|
---|
21 | started automatically by the first VirtualBox client process (the
|
---|
22 | GUI, <computeroutput>VBoxManage</computeroutput>,
|
---|
23 | <computeroutput>VBoxHeadless</computeroutput>, the web service or
|
---|
24 | others) and exits a short time after the last client exits. The
|
---|
25 | service is responsible for bookkeeping, maintaining the state of all
|
---|
26 | VMs, and for providing communication between VirtualBox components.
|
---|
27 | This communication is implemented via COM/XPCOM.<note>
|
---|
28 | <para>When we refer to "clients" here, we mean the local clients
|
---|
29 | of a particular <computeroutput>VBoxSVC</computeroutput> server
|
---|
30 | process, not clients in a network. VirtualBox employs its own
|
---|
31 | client/server design to allow its processes to cooperate, but
|
---|
32 | all these processes run under the same user account on the host
|
---|
33 | operating system, and this is totally transparent to the
|
---|
34 | user.</para>
|
---|
35 | </note></para>
|
---|
36 | </listitem>
|
---|
37 |
|
---|
38 | <listitem>
|
---|
39 | <para>The GUI process, <computeroutput>VirtualBox</computeroutput>,
|
---|
40 | a client application based on the cross-platform Qt library. When
|
---|
41 | started without the <computeroutput>--startvm</computeroutput>
|
---|
42 | option, this application acts as the VirtualBox manager, displaying
|
---|
43 | the VMs and their settings. It then communicates settings and state
|
---|
44 | changes to <computeroutput>VBoxSVC</computeroutput> and also
|
---|
45 | reflects changes effected through other means, e.g.,
|
---|
46 | <computeroutput>VBoxManage</computeroutput>.</para>
|
---|
47 | </listitem>
|
---|
48 |
|
---|
49 | <listitem>
|
---|
50 | <para>If the <computeroutput>VirtualBox</computeroutput> client
|
---|
51 | application is started with the
|
---|
52 | <computeroutput>--startvm</computeroutput> argument, it loads the
|
---|
53 | VMM library which includes the actual hypervisor and then runs a
|
---|
54 | virtual machine and provides the input and output for the
|
---|
55 | guest.</para>
|
---|
56 | </listitem>
|
---|
57 | </orderedlist></para>
|
---|
58 |
|
---|
59 | <para>Any VirtualBox front-end (client) will communicate with the service
|
---|
60 | process and can both control and reflect the current state. For example,
|
---|
61 | either the VM selector or the VM window or VBoxManage can be used to pause
|
---|
62 | the running VM, and other components will always reflect the changed
|
---|
63 | state.</para>
|
---|
64 |
|
---|
65 | <para>The VirtualBox GUI application is only one of several available
|
---|
66 | front ends (clients). The complete list shipped with VirtualBox
|
---|
67 | is:<orderedlist>
|
---|
68 | <listitem>
|
---|
69 | <para><computeroutput>VirtualBox</computeroutput>, the Qt front end
|
---|
70 | implementing the manager and running VMs;</para>
|
---|
71 | </listitem>
|
---|
72 |
|
---|
73 | <listitem>
|
---|
74 | <para><computeroutput>VBoxManage</computeroutput>, a less
|
---|
75 | user-friendly but more powerful alternative, described in <xref
|
---|
76 | linkend="vboxmanage" />.</para>
|
---|
77 | </listitem>
|
---|
78 |
|
---|
79 | <listitem>
|
---|
80 | <para><computeroutput>VBoxSDL</computeroutput>, a simple graphical
|
---|
81 | front end based on the SDL library; see <xref
|
---|
82 | linkend="vboxsdl" />.</para>
|
---|
83 | </listitem>
|
---|
84 |
|
---|
85 | <listitem>
|
---|
86 | <para><computeroutput>VBoxHeadless</computeroutput>, a VM front end
|
---|
87 | which does not directly provide any video output and keyboard/mouse
|
---|
88 | input, but allows redirection via VRDP; see <xref
|
---|
89 | linkend="vboxheadless" />.</para>
|
---|
90 | </listitem>
|
---|
91 |
|
---|
92 | <listitem>
|
---|
93 | <para><computeroutput>vboxwebsrv</computeroutput>, the VirtualBox
|
---|
94 | web service process which allows for controlling a VirtualBox host
|
---|
95 | remotely. This is described in detail in the VirtualBox Software
|
---|
96 | Development Kit (SDK) reference; please see <xref
|
---|
97 | linkend="VirtualBoxAPI" /> for details.</para>
|
---|
98 | </listitem>
|
---|
99 |
|
---|
100 | <listitem>
|
---|
101 | <para>The VirtualBox Python shell, a Python alternative to
|
---|
102 | VBoxManage. This is also described in the SDK reference.</para>
|
---|
103 | </listitem>
|
---|
104 | </orderedlist></para>
|
---|
105 |
|
---|
106 | <para>Internally, VirtualBox consists of many more or less separate
|
---|
107 | components. You may encounter these when analyzing VirtualBox internal
|
---|
108 | error messages or log files. These include:</para>
|
---|
109 |
|
---|
110 | <itemizedlist>
|
---|
111 | <listitem>
|
---|
112 | <para>IPRT, a portable runtime library which abstracts file access,
|
---|
113 | threading, string manipulation, etc. Whenever VirtualBox accesses host
|
---|
114 | operating features, it does so through this library for cross-platform
|
---|
115 | portability.</para>
|
---|
116 | </listitem>
|
---|
117 |
|
---|
118 | <listitem>
|
---|
119 | <para>VMM (Virtual Machine Monitor), the heart of the
|
---|
120 | hypervisor.</para>
|
---|
121 | </listitem>
|
---|
122 |
|
---|
123 | <listitem>
|
---|
124 | <para>EM (Execution Manager), controls execution of guest code.</para>
|
---|
125 | </listitem>
|
---|
126 |
|
---|
127 | <listitem>
|
---|
128 | <para>REM (Recompiled Execution Monitor), provides software emulation
|
---|
129 | of CPU instructions.</para>
|
---|
130 | </listitem>
|
---|
131 |
|
---|
132 | <listitem>
|
---|
133 | <para>TRPM (Trap Manager), intercepts and processes guest traps and
|
---|
134 | exceptions.</para>
|
---|
135 | </listitem>
|
---|
136 |
|
---|
137 | <listitem>
|
---|
138 | <para>HWACCM (Hardware Acceleration Manager), provides support for
|
---|
139 | VT-x and AMD-V.</para>
|
---|
140 | </listitem>
|
---|
141 |
|
---|
142 | <listitem>
|
---|
143 | <para>PDM (Pluggable Device Manager), an abstract interface between
|
---|
144 | the VMM and emulated devices which separates device implementations
|
---|
145 | from VMM internals and makes it easy to add new emulated devices.
|
---|
146 | Through PDM, third-party developers can add new virtual devices to
|
---|
147 | VirtualBox without having to change VirtualBox itself.</para>
|
---|
148 | </listitem>
|
---|
149 |
|
---|
150 | <listitem>
|
---|
151 | <para>PGM (Page Manager), a component controlling guest paging.</para>
|
---|
152 | </listitem>
|
---|
153 |
|
---|
154 | <listitem>
|
---|
155 | <para>PATM (Patch Manager), patches guest code to improve and speed up
|
---|
156 | software virtualization.</para>
|
---|
157 | </listitem>
|
---|
158 |
|
---|
159 | <listitem>
|
---|
160 | <para>TM (Time Manager), handles timers and all aspects of time inside
|
---|
161 | guests.</para>
|
---|
162 | </listitem>
|
---|
163 |
|
---|
164 | <listitem>
|
---|
165 | <para>CFGM (Configuration Manager), provides a tree structure which
|
---|
166 | holds configuration settings for the VM and all emulated
|
---|
167 | devices.</para>
|
---|
168 | </listitem>
|
---|
169 |
|
---|
170 | <listitem>
|
---|
171 | <para>SSM (Saved State Manager), saves and loads VM state.</para>
|
---|
172 | </listitem>
|
---|
173 |
|
---|
174 | <listitem>
|
---|
175 | <para>VUSB (Virtual USB), a USB layer which separates emulated USB
|
---|
176 | controllers from the controllers on the host and from USB devices;
|
---|
177 | this also enables remote USB.</para>
|
---|
178 | </listitem>
|
---|
179 |
|
---|
180 | <listitem>
|
---|
181 | <para>DBGF (Debug Facility), a built-in VM debuger.</para>
|
---|
182 | </listitem>
|
---|
183 |
|
---|
184 | <listitem>
|
---|
185 | <para>VirtualBox emulates a number of devices to provide the hardware
|
---|
186 | environment that various guests need. Most of these are standard
|
---|
187 | devices found in many PC compatible machines and widely supported by
|
---|
188 | guest operating systems. For network and storage devices in
|
---|
189 | particular, there are several options for the emulated devices to
|
---|
190 | access the underlying hardware. These devices are managed by
|
---|
191 | PDM.</para>
|
---|
192 | </listitem>
|
---|
193 |
|
---|
194 | <listitem>
|
---|
195 | <para>Guest Additions for various guest operating systems. This is
|
---|
196 | code that is installed from within a virtual machine; see <xref
|
---|
197 | linkend="guestadditions" />.</para>
|
---|
198 | </listitem>
|
---|
199 |
|
---|
200 | <listitem>
|
---|
201 | <para>The "Main" component is special: it ties all the above bits
|
---|
202 | together and is the only public API that VirtualBox provides. All the
|
---|
203 | client processes listed above use only this API and never access the
|
---|
204 | hypervisor components directly. As a result, third-party applications
|
---|
205 | that use the VirtualBox Main API can rely on the fact that it is
|
---|
206 | always well-tested and that all capabilities of VirtualBox are fully
|
---|
207 | exposed. It is this API that is described in the VirtualBox SDK
|
---|
208 | mentioned above (again, see <xref linkend="VirtualBoxAPI" />).</para>
|
---|
209 | </listitem>
|
---|
210 | </itemizedlist>
|
---|
211 | </sect1>
|
---|
212 |
|
---|
213 | <sect1 id="hwvirt">
|
---|
214 | <title>Hardware vs. software virtualization</title>
|
---|
215 |
|
---|
216 | <para>VirtualBox allows software in the virtual machine to run directly on
|
---|
217 | the processor of the host, but an array of complex techniques is employed
|
---|
218 | to intercept operations that would interfere with your host. Whenever the
|
---|
219 | guest attempts to do something that could be harmful to your computer and
|
---|
220 | its data, VirtualBox steps in and takes action. In particular, for lots of
|
---|
221 | hardware that the guest believes to be accessing, VirtualBox simulates a
|
---|
222 | certain "virtual" environment according to how you have configured a
|
---|
223 | virtual machine. For example, when the guest attempts to access a hard
|
---|
224 | disk, VirtualBox redirects these requests to whatever you have configured
|
---|
225 | to be the virtual machine's virtual hard disk -- normally, an image file
|
---|
226 | on your host.</para>
|
---|
227 |
|
---|
228 | <para>Unfortunately, the x86 platform was never designed to be
|
---|
229 | virtualized. Detecting situations in which VirtualBox needs to take
|
---|
230 | control over the guest code that is executing, as described above, is
|
---|
231 | difficult. There are two ways in which to achive this:<itemizedlist>
|
---|
232 | <listitem>
|
---|
233 | <para>Since 2006, Intel and AMD processors have had support for
|
---|
234 | so-called <emphasis role="bold">"hardware
|
---|
235 | virtualization"</emphasis>. This means that these processors can
|
---|
236 | help VirtualBox to intercept potentially dangerous operations that a
|
---|
237 | guest operating system may be attempting and also makes it easier to
|
---|
238 | present virtual hardware to a virtual machine.</para>
|
---|
239 |
|
---|
240 | <para>These hardware features differ between Intel and AMD
|
---|
241 | processors. Intel named its technology <emphasis
|
---|
242 | role="bold">VT-x</emphasis>; AMD calls theirs <emphasis
|
---|
243 | role="bold">AMD-V</emphasis>. The Intel and AMD support for
|
---|
244 | virtualization is very different in detail, but not very different
|
---|
245 | in principle.<note>
|
---|
246 | <para>On many systems, the hardware virtualization features
|
---|
247 | first need to be enabled in the BIOS before VirtualBox can use
|
---|
248 | them.</para>
|
---|
249 | </note></para>
|
---|
250 | </listitem>
|
---|
251 |
|
---|
252 | <listitem>
|
---|
253 | <para>As opposed to other virtualization software, for many usage
|
---|
254 | scenarios, VirtualBox does not <emphasis>require</emphasis> hardware
|
---|
255 | virtualization features to be present. Through sophisticated
|
---|
256 | techniques, VirtualBox virtualizes many guest operating systems
|
---|
257 | entirely in <emphasis role="bold">software</emphasis>. This means
|
---|
258 | that you can run virtual machines even on older processors which do
|
---|
259 | not support hardware virtualization.</para>
|
---|
260 | </listitem>
|
---|
261 | </itemizedlist></para>
|
---|
262 |
|
---|
263 | <para>Even though VirtualBox does not always require hardware
|
---|
264 | virtualization, enabling it is <emphasis>required</emphasis> in the
|
---|
265 | following scenarios:<itemizedlist>
|
---|
266 | <listitem>
|
---|
267 | <para>Certain rare guest operating systems like OS/2 make use of
|
---|
268 | very esoteric processor instructions that are not supported with our
|
---|
269 | software virtualization. For virtual machines that are configured to
|
---|
270 | contain such an operating system, hardware virtualization is enabled
|
---|
271 | automatically.</para>
|
---|
272 | </listitem>
|
---|
273 |
|
---|
274 | <listitem>
|
---|
275 | <para>VirtualBox's 64-bit guest support (added with version 2.0) and
|
---|
276 | multiprocessing (SMP, added with version 3.0) both require hardware
|
---|
277 | virtualization to be enabled. (This is not much of a limitation
|
---|
278 | since the vast majority of today's 64-bit and multicore CPUs ship
|
---|
279 | with hardware virtualization anyway; the exceptions to this rule are
|
---|
280 | e.g. older Intel Celeron and AMD Opteron CPUs.)</para>
|
---|
281 | </listitem>
|
---|
282 | </itemizedlist></para>
|
---|
283 |
|
---|
284 | <warning>
|
---|
285 | <para>Do not run other hypervisors (open-source or commercial
|
---|
286 | virtualization products) together with VirtualBox! While several
|
---|
287 | hypervisors can normally be <emphasis>installed</emphasis> in parallel,
|
---|
288 | do not attempt to <emphasis>run</emphasis> several virtual machines from
|
---|
289 | competing hypervisors at the same time. VirtualBox cannot track what
|
---|
290 | another hypervisor is currently attempting to do on the same host, and
|
---|
291 | especially if several products attempt to use hardware virtualization
|
---|
292 | features such as VT-x, this can crash the entire host. Also, within
|
---|
293 | VirtualBox, you can mix software and hardware virtualization when
|
---|
294 | running multiple VMs. In certain cases a small performance penalty will
|
---|
295 | be unavoidable when mixing VT-x and software virtualization VMs. We
|
---|
296 | recommend not mixing virtualization modes if maximum performance and low
|
---|
297 | overhead are essential. This does <emphasis>not</emphasis> apply to
|
---|
298 | AMD-V.</para>
|
---|
299 | </warning>
|
---|
300 | </sect1>
|
---|
301 |
|
---|
302 | <sect1>
|
---|
303 | <title>Details about software virtualization</title>
|
---|
304 |
|
---|
305 | <para>Implementing virtualization on x86 CPUs with no hardware
|
---|
306 | virtualization support is an extraordinarily complex task because the CPU
|
---|
307 | architecture was not designed to be virtualized. The problems can usually
|
---|
308 | be solved, but at the cost of reduced performance. Thus, there is a
|
---|
309 | constant clash between virtualization performance and accuracy.</para>
|
---|
310 |
|
---|
311 | <para>The x86 instruction set was originally designed in the 1970s and
|
---|
312 | underwent significant changes with the addition of protected mode in the
|
---|
313 | 1980s with the 286 CPU architecture and then again with the Intel 386 and
|
---|
314 | its 32-bit architecture. Whereas the 386 did have limited virtualization
|
---|
315 | support for real mode operation (V86 mode, as used by the "DOS Box" of
|
---|
316 | Windows 3.x and OS/2 2.x), no support was provided for virtualizing the
|
---|
317 | entire architecture.</para>
|
---|
318 |
|
---|
319 | <para>In theory, software virtualization is not overly complex. In
|
---|
320 | addition to the four privilege levels ("rings") provided by the hardware
|
---|
321 | (of which typically only two are used: ring 0 for kernel mode and ring 3
|
---|
322 | for user mode), one needs to differentiate between "host context" and
|
---|
323 | "guest context".</para>
|
---|
324 |
|
---|
325 | <para>In "host context", everything is as if no hypervisor was active.
|
---|
326 | This might be the active mode if another application on your host has been
|
---|
327 | scheduled CPU time; in that case, there is a host ring 3 mode and a host
|
---|
328 | ring 0 mode. The hypervisor is not involved.</para>
|
---|
329 |
|
---|
330 | <para>In "guest context", however, a virtual machine is active. So long as
|
---|
331 | the guest code is running in ring 3, this is not much of a problem since a
|
---|
332 | hypervisor can set up the page tables properly and run that code natively
|
---|
333 | on the processor. The problems mostly lie in how to intercept what the
|
---|
334 | guest's kernel does.</para>
|
---|
335 |
|
---|
336 | <para>There are several possible solutions to these problems. One approach
|
---|
337 | is full software emulation, usually involving recompilation. That is, all
|
---|
338 | code to be run by the guest is analyzed, transformed into a form which
|
---|
339 | will not allow the guest to either modify or see the true state of the
|
---|
340 | CPU, and only then executed. This process is obviously highly complex and
|
---|
341 | costly in terms of performance. (VirtualBox contains a recompiler based on
|
---|
342 | QEMU which can be used for pure software emulation, but the recompiler is
|
---|
343 | only activated in special situations, described below.)</para>
|
---|
344 |
|
---|
345 | <para>Another possible solution is paravirtualization, in which only
|
---|
346 | specially modified guest OSes are allowed to run. This way, most of the
|
---|
347 | hardware access is abstracted and any functions which would normally
|
---|
348 | access the hardware or privileged CPU state are passed on to the
|
---|
349 | hypervisor instead. Paravirtualization can achieve good functionality and
|
---|
350 | performance on standard x86 CPUs, but it can only work if the guest OS can
|
---|
351 | actually be modified, which is obviously not always the case.</para>
|
---|
352 |
|
---|
353 | <para>VirtualBox chooses a different approach. When starting a virtual
|
---|
354 | machine, through its ring-0 support kernel driver, VirtualBox has set up
|
---|
355 | the host system so that it can run most of the guest code natively, but it
|
---|
356 | has inserted itself at the "bottom" of the picture. It can then assume
|
---|
357 | control when needed -- if a privileged instruction is executed, the guest
|
---|
358 | traps (in particular because an I/O register was accessed and a device
|
---|
359 | needs to be virtualized) or external interrupts occur. VirtualBox may then
|
---|
360 | handle this and either route a request to a virtual device or possibly
|
---|
361 | delegate handling such things to the guest or host OS. In guest context,
|
---|
362 | VirtualBox can therefore be in one of three states:</para>
|
---|
363 |
|
---|
364 | <para><itemizedlist>
|
---|
365 | <listitem>
|
---|
366 | <para>Guest ring 3 code is run unmodified, at full speed, as much as
|
---|
367 | possible. The number of faults will generally be low (unless the
|
---|
368 | guest allows port I/O from ring 3, something we cannot do as we
|
---|
369 | don't want the guest to be able to access real ports). This is also
|
---|
370 | referred to as "raw mode", as the guest ring-3 code runs
|
---|
371 | unmodified.</para>
|
---|
372 | </listitem>
|
---|
373 |
|
---|
374 | <listitem>
|
---|
375 | <para>For guest code in ring 0, VirtualBox employs a nasty trick: it
|
---|
376 | actually reconfigures the guest so that its ring-0 code is run in
|
---|
377 | ring 1 instead (which is normally not used in x86 operating
|
---|
378 | systems). As a result, when guest ring-0 code (actually running in
|
---|
379 | ring 1) such as a guest device driver attempts to write to an I/O
|
---|
380 | register or execute a privileged instruction, the VirtualBox
|
---|
381 | hypervisor in "real" ring 0 can take over.</para>
|
---|
382 | </listitem>
|
---|
383 |
|
---|
384 | <listitem>
|
---|
385 | <para>The hypervisor (VMM) can be active. Every time a fault occurs,
|
---|
386 | VirtualBox looks at the offending instruction and can relegate it to
|
---|
387 | a virtual device or the host OS or the guest OS or run it in the
|
---|
388 | recompiler.</para>
|
---|
389 |
|
---|
390 | <para>In particular, the recompiler is used when guest code disables
|
---|
391 | interrupts and VirtualBox cannot figure out when they will be
|
---|
392 | switched back on (in these situations, VirtualBox actually analyzes
|
---|
393 | the guest code using its own disassembler). Also, certain privileged
|
---|
394 | instructions such as LIDT need to be handled specially. Finally, any
|
---|
395 | real-mode or protected-mode code (e.g. BIOS code, a DOS guest, or
|
---|
396 | any operating system startup) is run in the recompiler
|
---|
397 | entirely.</para>
|
---|
398 | </listitem>
|
---|
399 | </itemizedlist></para>
|
---|
400 |
|
---|
401 | <para>Unfortunately this only works to a degree. Among others, the
|
---|
402 | following situations require special handling:</para>
|
---|
403 |
|
---|
404 | <para><orderedlist>
|
---|
405 | <listitem>
|
---|
406 | <para>Running ring 0 code in ring 1 causes a lot of additional
|
---|
407 | instruction faults, as ring 1 is not allowed to execute any
|
---|
408 | privileged instructions (of which guest's ring-0 contains plenty).
|
---|
409 | With each of these faults, the VMM must step in and emulate the code
|
---|
410 | to achieve the desired behavior. While this works, emulating
|
---|
411 | thousands of these faults is very expensive and severely hurts the
|
---|
412 | performance of the virtualized guest.</para>
|
---|
413 | </listitem>
|
---|
414 |
|
---|
415 | <listitem>
|
---|
416 | <para>There are certain flaws in the implementation of ring 1 in the
|
---|
417 | x86 architecture that were never fixed. Certain instructions that
|
---|
418 | <emphasis>should</emphasis> trap in ring 1 don't. This affect for
|
---|
419 | example the LGDT/SGDT, LIDT/SIDT, or POPF/PUSHF instruction pairs.
|
---|
420 | Whereas the "load" operation is privileged and can therefore be
|
---|
421 | trapped, the "store" instruction always succeed. If the guest is
|
---|
422 | allowed to execute these, it will see the true state of the CPU, not
|
---|
423 | the virtualized state. The CPUID instruction also has the same
|
---|
424 | problem.</para>
|
---|
425 | </listitem>
|
---|
426 |
|
---|
427 | <listitem>
|
---|
428 | <para>A hypervisor typically needs to reserve some portion of the
|
---|
429 | guest's address space (both linear address space and selectors) for
|
---|
430 | its own use. This is not entirely transparent to the guest OS and
|
---|
431 | may cause clashes.</para>
|
---|
432 | </listitem>
|
---|
433 |
|
---|
434 | <listitem>
|
---|
435 | <para>The SYSENTER instruction (used for system calls) executed by
|
---|
436 | an application running in a guest OS always transitions to ring 0.
|
---|
437 | But that is where the hypervisor runs, not the guest OS. In this
|
---|
438 | case, the hypervisor must trap and emulate the instruction even when
|
---|
439 | it is not desirable.</para>
|
---|
440 | </listitem>
|
---|
441 |
|
---|
442 | <listitem>
|
---|
443 | <para>The CPU segment registers contain a "hidden" descriptor cache
|
---|
444 | which is not software-accessible. The hypervisor cannot read, save,
|
---|
445 | or restore this state, but the guest OS may use it.</para>
|
---|
446 | </listitem>
|
---|
447 |
|
---|
448 | <listitem>
|
---|
449 | <para>Some resources must (and can) be trapped by the hypervisor,
|
---|
450 | but the access is so frequent that this creates a significant
|
---|
451 | performance overhead. An example is the TPR (Task Priority) register
|
---|
452 | in 32-bit mode. Accesses to this register must be trapped by the
|
---|
453 | hypervisor, but certain guest operating systems (notably Windows and
|
---|
454 | Solaris) write this register very often, which adversely affects
|
---|
455 | virtualization performance.</para>
|
---|
456 | </listitem>
|
---|
457 | </orderedlist></para>
|
---|
458 |
|
---|
459 | <para>To fix these performance and security issues, VirtualBox contains a
|
---|
460 | Code Scanning and Analysis Manager (CSAM), which disassembles guest code,
|
---|
461 | and the Patch Manager (PATM), which can replace it at runtime.</para>
|
---|
462 |
|
---|
463 | <para>Before executing ring 0 code, CSAM scans it recursively to discover
|
---|
464 | problematic instructions. PATM then performs <emphasis>in-situ
|
---|
465 | </emphasis>patching, i.e. it replaces the instruction with a jump to
|
---|
466 | hypervisor memory where an integrated code generator has placed a more
|
---|
467 | suitable implementation. In reality, this is a very complex task as there
|
---|
468 | are lots of odd situations to be discovered and handled correctly. So,
|
---|
469 | with its current complexity, one could argue that PATM is an advanced
|
---|
470 | <emphasis>in-situ</emphasis> recompiler.</para>
|
---|
471 |
|
---|
472 | <para>In addition, every time a fault occurs, VirtualBox analyzes the
|
---|
473 | offending code to determine if it is possible to patch it in order to
|
---|
474 | prevent it from causing more faults in the future. This approach works
|
---|
475 | well in practice and dramatically improves software virtualization
|
---|
476 | performance.</para>
|
---|
477 | </sect1>
|
---|
478 |
|
---|
479 | <sect1>
|
---|
480 | <title>Details about hardware virtualization</title>
|
---|
481 |
|
---|
482 | <para>With Intel VT-x, there are two distinct modes of CPU operation: VMX
|
---|
483 | root mode and non-root mode.<itemizedlist>
|
---|
484 | <listitem>
|
---|
485 | <para>In root mode, the CPU operates much like older generations of
|
---|
486 | processors without VT-x support. There are four privilege levels
|
---|
487 | ("rings"), and the same instruction set is supported, with the
|
---|
488 | addition of several virtualization specific instruction. Root mode
|
---|
489 | is what a host operating system without virtualization uses, and it
|
---|
490 | is also used by a hypervisor when virtualization is active.</para>
|
---|
491 | </listitem>
|
---|
492 |
|
---|
493 | <listitem>
|
---|
494 | <para>In non-root mode, CPU operation is significantly different.
|
---|
495 | There are still four privilege rings and the same instruction set,
|
---|
496 | but a new structure called VMCS (Virtual Machine Control Structure)
|
---|
497 | now controls the CPU operation and determines how certain
|
---|
498 | instructions behave. Non-root mode is where guest systems
|
---|
499 | run.</para>
|
---|
500 | </listitem>
|
---|
501 | </itemizedlist></para>
|
---|
502 |
|
---|
503 | <para>Switching from root mode to non-root mode is called "VM entry", the
|
---|
504 | switch back is "VM exit". The VMCS includes a guest and host state area
|
---|
505 | which is saved/restored at VM entry and exit. Most importantly, the VMCS
|
---|
506 | controls which guest operations will cause VM exits.</para>
|
---|
507 |
|
---|
508 | <para>The VMCS provides fairly fine-grained control over what the guests
|
---|
509 | can and can't do. For example, a hypervisor can allow a guest to write
|
---|
510 | certain bits in shadowed control registers, but not others. This enables
|
---|
511 | efficient virtualization in cases where guests can be allowed to write
|
---|
512 | control bits without disrupting the hypervisor, while preventing them from
|
---|
513 | altering control bits over which the hypervisor needs to retain full
|
---|
514 | control. The VMCS also provides control over interrupt delivery and
|
---|
515 | exceptions.</para>
|
---|
516 |
|
---|
517 | <para>Whenever an instruction or event causes a VM exit, the VMCS contains
|
---|
518 | information about the exit reason, often with accompanying detail. For
|
---|
519 | example, if a write to the CR0 register causes an exit, the offending
|
---|
520 | instruction is recorded, along with the fact that a write access to a
|
---|
521 | control register caused the exit, and information about source and
|
---|
522 | destination register. Thus the hypervisor can efficiently handle the
|
---|
523 | condition without needing advanced techniques such as CSAM and PATM
|
---|
524 | described above.</para>
|
---|
525 |
|
---|
526 | <para>VT-x inherently avoids several of the problems which software
|
---|
527 | virtualization faces. The guest has its own completely separate address
|
---|
528 | space not shared with the hypervisor, which eliminates potential clashes.
|
---|
529 | Additionally, guest OS kernel code runs at privilege ring 0 in VMX
|
---|
530 | non-root mode, obviating the problems by running ring 0 code at less
|
---|
531 | privileged levels. For example the SYSENTER instruction can transition to
|
---|
532 | ring 0 without causing problems. Naturally, even at ring 0 in VMX non-root
|
---|
533 | mode, any I/O access by guest code still causes a VM exit, allowing for
|
---|
534 | device emulation.</para>
|
---|
535 |
|
---|
536 | <para>The biggest difference between VT-x and AMD-V is that AMD-V provides
|
---|
537 | a more complete virtualization environment. VT-x requires the VMX non-root
|
---|
538 | code to run with paging enabled, which precludes hardware virtualization
|
---|
539 | of real-mode code and non-paged protected-mode software. This typically
|
---|
540 | only includes firmware and OS loaders, but nevertheless complicates VT-x
|
---|
541 | hypervisor implementation. AMD-V does not have this restriction.</para>
|
---|
542 |
|
---|
543 | <para>Of course hardware virtualization is not perfect. Compared to
|
---|
544 | software virtualization, the overhead of VM exits is relatively high. This
|
---|
545 | causes problems for devices whose emulation requires high number of traps.
|
---|
546 | One example is the VGA device in 16-color modes, where not only every I/O
|
---|
547 | port access but also every access to the framebuffer memory must be
|
---|
548 | trapped.</para>
|
---|
549 | </sect1>
|
---|
550 |
|
---|
551 | <sect1 id="nestedpaging">
|
---|
552 | <title>Nested paging and VPIDs</title>
|
---|
553 |
|
---|
554 | <para>In addition to "plain" hardware virtualization, your processor may
|
---|
555 | also support additional sophisticated techniques:<footnote>
|
---|
556 | <para>VirtualBox 2.0 added support for AMD's nested paging; support
|
---|
557 | for Intel's EPT and VPIDs was added with version 2.1.</para>
|
---|
558 | </footnote><itemizedlist>
|
---|
559 | <listitem>
|
---|
560 | <para>A newer feature called <emphasis role="bold">"nested
|
---|
561 | paging"</emphasis> implements some memory management in hardware,
|
---|
562 | which can greatly accelerate hardware virtualization since these
|
---|
563 | tasks no longer need to be performed by the virtualization
|
---|
564 | software.</para>
|
---|
565 |
|
---|
566 | <para>With nested paging, the hardware provides another level of
|
---|
567 | indirection when translating linear to physical addresses. Page
|
---|
568 | tables function as before, but linear addresses are now translated
|
---|
569 | to "guest physical" addresses first and not physical addresses
|
---|
570 | directly. A new set of paging registers now exists under the
|
---|
571 | traditional paging mechanism and translates from guest physical
|
---|
572 | addresses to host physical addresses, which are used to access
|
---|
573 | memory.</para>
|
---|
574 |
|
---|
575 | <para>Nested paging eliminates the overhead caused by VM exits and
|
---|
576 | page table accesses. In essence, with nested page tables the guest
|
---|
577 | can handle paging without intervention from the hypervisor. Nested
|
---|
578 | paging thus significantly improves virtualization
|
---|
579 | performance.</para>
|
---|
580 |
|
---|
581 | <para>On AMD processors, nested paging has been available starting
|
---|
582 | with the Barcelona (K10) architecture; Intel added support for
|
---|
583 | nested paging, which they call "extended page tables" (EPT), with
|
---|
584 | their Core i7 (Nehalem) processors.</para>
|
---|
585 |
|
---|
586 | <para>If nested paging is enabled, the VirtualBox hypervisor can
|
---|
587 | also use <emphasis role="bold">large pages</emphasis> to reduce TLB
|
---|
588 | usage and overhead. This can yield a performance improvement of up
|
---|
589 | to 5%. To enable this feature for a VM, you need to use the
|
---|
590 | <computeroutput>VBoxManage modifyvm
|
---|
591 | </computeroutput><computeroutput>--largepages</computeroutput>
|
---|
592 | command; see <xref linkend="vboxmanage-modifyvm" />.</para>
|
---|
593 | </listitem>
|
---|
594 |
|
---|
595 | <listitem>
|
---|
596 | <para>On Intel CPUs, another hardware feature called <emphasis
|
---|
597 | role="bold">"Virtual Processor Identifiers" (VPIDs)</emphasis> can
|
---|
598 | greatly accelerate context switching by reducing the need for
|
---|
599 | expensive flushing of the processor's Translation Lookaside Buffers
|
---|
600 | (TLBs).</para>
|
---|
601 |
|
---|
602 | <para>To enable these features for a VM, you need to use the
|
---|
603 | <computeroutput>VBoxManage modifyvm --vtxvpids</computeroutput> and
|
---|
604 | <computeroutput>--largepages</computeroutput> commands; see <xref
|
---|
605 | linkend="vboxmanage-modifyvm" />.</para>
|
---|
606 | </listitem>
|
---|
607 | </itemizedlist></para>
|
---|
608 | </sect1>
|
---|
609 | </chapter>
|
---|