VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR3/NEMR3.cpp@ 71275

Last change on this file since 71275 was 71275, checked in by vboxsync, 7 years ago

NEM: working on the @page docs for windows. bugref:9044

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 18.9 KB
Line 
1/* $Id: NEMR3.cpp 71275 2018-03-08 14:31:07Z vboxsync $ */
2/** @file
3 * NEM - Native execution manager.
4 */
5
6/*
7 * Copyright (C) 2018 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18/** @page pg_nem NEM - Native Execution Manager.
19 *
20 * Later.
21 *
22 *
23 * @section sec_nem_win Windows
24 *
25 * On Windows the Hyper-V root partition (dom0 in zen terminology) does not have
26 * nested VT-x or AMD-V capabilities. For a while raw-mode worked in it,
27 * however now we \#GP when modifying CR4. So, when Hyper-V is active on
28 * Windows we have little choice but to use Hyper-V to run our VMs.
29 *
30 * @subsection subsec_nem_win_whv The WinHvPlatform API
31 *
32 * Since Windows 10 build 17083 there is a documented API for managing Hyper-V
33 * VMs, header file WinHvPlatform.h and implementation in WinHvPlatform.dll.
34 * This interface is a wrapper around the undocumented Virtualization
35 * Infrastructure Driver (VID) API - VID.DLL and VID.SYS. The wrapper is
36 * written in C++, namespaced and early version (at least) was using standard
37 * container templates in several places.
38 *
39 * When creating a VM using WHvCreatePartition, it will only create the
40 * WinHvPlatform structures for it, to which you get an abstract pointer. The
41 * VID API that actually creates the partition is first engaged when you call
42 * WHvSetupPartition after first setting a lot of properties using
43 * WHvSetPartitionProperty. Since the VID API is just a very thin wrapper
44 * around CreateFile and NtDeviceIoControl, it returns an actual HANDLE for the
45 * partition WinHvPlatform. We fish this HANDLE out of the WinHvPlatform
46 * partition structures because we need to talk directly to VID for reasons
47 * we'll get to in a bit. (Btw. we could also intercept the CreateFileW or
48 * NtDeviceIoControl calls from VID.DLL to get the HANDLE should fishing in the
49 * partition structures become difficult.)
50 *
51 * The WinHvPlatform API requires us to both set the number of guest CPUs before
52 * setting up the partition and call WHvCreateVirtualProcessor for each of them.
53 * The CPU creation function boils down to a VidMessageSlotMap call that sets up
54 * and maps a message buffer into ring-3 for async communication with hyper-V
55 * and/or the VID.SYS thread actually running the CPU. When for instance a
56 * VMEXIT is encountered, hyper-V sends a message that the
57 * WHvRunVirtualProcessor API retrieves (and later acknowledges) via
58 * VidMessageSlotHandleAndGetNext. It should be noteded that
59 * WHvDeleteVirtualProcessor doesn't do much as there seems to be no partner
60 * function VidMessagesSlotMap that reverses what it did.
61 *
62 * Memory is managed thru calls to WHvMapGpaRange and WHvUnmapGpaRange (GPA does
63 * not mean grade point average here, but rather guest physical addressspace),
64 * which corresponds to VidCreateVaGpaRangeSpecifyUserVa and VidDestroyGpaRange
65 * respectively. As 'UserVa' indicates, the functions works on user process
66 * memory. The mappings are also subject to quota restrictions, so the number
67 * of ranges are limited and probably their total size as well. Obviously
68 * VID.SYS keeps track of the ranges, but so does WinHvPlatform, which means
69 * there is a bit of overhead involved and quota restrctions makes sense. For
70 * some reason though, regions are lazily mapped on VMEXIT/memory by
71 * WHvRunVirtualProcessor.
72 *
73 * Running guest code is done thru the WHvRunVirtualProcessor function. It
74 * asynchronously starts or resumes hyper-V CPU execution and then waits for an
75 * VMEXIT message. Other threads can interrupt the execution by using
76 * WHvCancelVirtualProcessor, which which case the thread in
77 * WHvRunVirtualProcessor is woken up via a dummy QueueUserAPC and will call
78 * VidStopVirtualProcessor to asynchronously end execution. The stop CPU call
79 * not immediately succeed if the CPU encountered a VMEXIT before the stop was
80 * processed, in which case the VMEXIT needs to be processed first, and the
81 * pending stop will be processed in a subsequent call to
82 * WHvRunVirtualProcessor.
83 *
84 * {something about registers}
85 *
86 * @subsubsection subsubsec_nem_win_whv_cons Issues / Disadvantages
87 *
88 * Here are some observations:
89 *
90 * - The WHvCancelVirtualProcessor API schedules a dummy usermode APC callback
91 * in order to cancel any current or future alertable wait in VID.SYS during
92 * the VidMessageSlotHandleAndGetNext call.
93 *
94 * IIRC this will make the kernel schedule the callback thru
95 * NTDLL!KiUserApcDispatcher by modifying the thread context and quite
96 * possibly the userland thread stack. When the APC callback returns to
97 * KiUserApcDispatcher, it will call NtContinue to restore the old thread
98 * context and resume execution from there. Upshot this is a bit expensive.
99 *
100 * Using NtAltertThread call could do the same without the thread context
101 * modifications and the extra kernel call.
102 *
103 *
104 * - Not sure if this is a thing, but WHvCancelVirtualProcessor seems to cause
105 * cause a lot more spurious WHvRunVirtualProcessor returns that what we get
106 * with the replacement code. By spurious returns we mean that the
107 * subsequent call to WHvRunVirtualProcessor would return immediately.
108 *
109 *
110 * - When WHvRunVirtualProcessor returns without a message, or on a terse
111 * VID message like HLT, it will make a kernel call to get some registers.
112 * This is potentially inefficient if the caller decides he needs more
113 * register state.
114 *
115 * It would be better to just return what's available and let the caller fetch
116 * what is missing from his point of view in a single kernel call.
117 *
118 *
119 * - The WHvRunVirtualProcessor implementation does lazy GPA range mappings when
120 * a unmapped GPA message is received from hyper-V.
121 *
122 * Since MMIO is currently realized as unmapped GPA, this will slow down all
123 * MMIO accesses a tiny little bit as WHvRunVirtualProcessor looks up the
124 * guest physical address the checks if it's a pending lazy mapping.
125 *
126 *
127 * - There is no API for modifying protection of a page within a GPA range.
128 *
129 * We're left with having to unmap the range and then remap it with the new
130 * protection. For instance we're actively using this to track dirty VRAM
131 * pages, which means there are occational readonly->writable transitions at
132 * run time followed by bulk reversal to readonly when the display is
133 * refreshed.
134 *
135 * Now to work around the issue, we do page sized GPA ranges. In addition to
136 * add a lot of tracking overhead to WinHvPlatform and VID.SYS, it also causes
137 * us to exceed our quota before we've even mapped a default sized VRAM
138 * page-by-page. So, to work around this quota issue we have to lazily map
139 * pages and actively restrict the number of mappings.
140 *
141 * Out best workaround thus far is bypassing WinHvPlatform and VID when in
142 * comes to memory and instead us the hypercalls to do it (HvCallMapGpaPages,
143 * HvCallUnmapGpaPages). (This also maps a whole lot better into our own
144 * guest page management infrastructure.)
145 *
146 *
147 * - Observed problems doing WHvUnmapGpaRange followed by WHvMapGpaRange.
148 *
149 * As mentioned above, we've been forced to use this sequence when modifying
150 * page protection. However, when upgrading from readonly to writable, we've
151 * ended up looping forever with the same write to readonly memory exit.
152 *
153 * Workaround: Insert a WHvRunVirtualProcessor call and make sure to get a GPA
154 * unmapped exit between the two calls. Terrible for performance and code
155 * sanity.
156 *
157 *
158 * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to it's
159 * own defined format.
160 *
161 * We understand this might be because Microsoft wishes to remain free to
162 * modify the VID/Hyper-V messages, but it's still rather silly and does slow
163 * things down.
164 *
165 *
166 * - WHVRunVirtualProcessor would've benefited from using a callback interface:
167 * - The potential size changes of the exit context structure wouldn't be
168 * an issue, since the function could manage that itself.
169 * - State handling could be optimized simplified (esp. cancellation).
170 *
171 *
172 * - WHvGetVirtualProcessorRegisters and WHvSetVirtualProcessorRegisters
173 * internally converts register names, probably using temporary heap buffers.
174 *
175 * From the looks of things, it's converting from WHV_REGISTER_NAME to
176 * HV_REGISTER_NAME that's documented in the "Virtual Processor Register
177 * Names" section of "Hypervisor Top-Level Functional Specification". This
178 * feels like an awful waste of time. We simply cannot understand why it
179 * wouldn't have sufficed to use HV_REGISTER_NAME here and simply checked the
180 * input values if restrictions were desired.
181 *
182 * To avoid the heap + conversion overhead, we're currently using the
183 * HvCallGetVpRegisters and HvCallSetVpRegisters calls directly.
184 *
185 *
186 * - Why does WINHVR.SYS (or VID.SYS) only query/set 32 registers at the time
187 * thru the HvCallGetVpRegisters and HvCallSetVpRegisters hypercalls?
188 *
189 * We've not trouble getting/setting all the registers defined by
190 * WHV_REGISTER_NAME in one hypercall...
191 *
192 *
193 * - .
194 *
195 */
196
197
198/*********************************************************************************************************************************
199* Header Files *
200*********************************************************************************************************************************/
201#define LOG_GROUP LOG_GROUP_NEM
202#include <VBox/vmm/nem.h>
203#include "NEMInternal.h"
204#include <VBox/vmm/vm.h>
205
206#include <iprt/asm.h>
207
208
209
210/**
211 * Basic init and configuration reading.
212 *
213 * Always call NEMR3Term after calling this.
214 *
215 * @returns VBox status code.
216 * @param pVM The cross context VM structure.
217 */
218VMMR3_INT_DECL(int) NEMR3InitConfig(PVM pVM)
219{
220 LogFlow(("NEMR3Init\n"));
221
222 /*
223 * Assert alignment and sizes.
224 */
225 AssertCompileMemberAlignment(VM, nem.s, 64);
226 AssertCompile(sizeof(pVM->nem.s) <= sizeof(pVM->nem.padding));
227
228 /*
229 * Initialize state info so NEMR3Term will always be happy.
230 * No returning prior to setting magics!
231 */
232 pVM->nem.s.u32Magic = NEM_MAGIC;
233 for (VMCPUID iCpu = 0; iCpu < pVM->cCpus; iCpu++)
234 pVM->aCpus[iCpu].nem.s.u32Magic = NEMCPU_MAGIC;
235
236 /*
237 * Read configuration.
238 */
239 PCFGMNODE pCfgNem = CFGMR3GetChild(CFGMR3GetRoot(pVM), "NEM/");
240
241 /*
242 * Validate the NEM settings.
243 */
244 int rc = CFGMR3ValidateConfig(pCfgNem,
245 "/NEM/",
246 "Enabled",
247 "" /* pszValidNodes */, "NEM" /* pszWho */, 0 /* uInstance */);
248 if (RT_FAILURE(rc))
249 return rc;
250
251 /** @cfgm{/NEM/NEMEnabled, bool, true}
252 * Whether NEM is enabled. */
253 rc = CFGMR3QueryBoolDef(pCfgNem, "Enabled", &pVM->nem.s.fEnabled, true);
254 AssertLogRelRCReturn(rc, rc);
255
256 return VINF_SUCCESS;
257}
258
259
260/**
261 * This is called by HMR3Init() when HM cannot be used.
262 *
263 * Sets VM::bMainExecutionEngine to VM_EXEC_ENGINE_NATIVE_API if we can use a
264 * native hypervisor API to execute the VM.
265 *
266 * @returns VBox status code.
267 * @param pVM The cross context VM structure.
268 * @param fFallback Whether this is a fallback call. Cleared if the VM is
269 * configured to use NEM instead of HM.
270 * @param fForced Whether /HM/HMForced was set. If set and we fail to
271 * enable NEM, we'll return a failure status code.
272 * Otherwise we'll assume HMR3Init falls back on raw-mode.
273 */
274VMMR3_INT_DECL(int) NEMR3Init(PVM pVM, bool fFallback, bool fForced)
275{
276 Assert(pVM->bMainExecutionEngine != VM_EXEC_ENGINE_NATIVE_API);
277 int rc;
278 if (pVM->nem.s.fEnabled)
279 {
280#ifdef VBOX_WITH_NATIVE_NEM
281 rc = nemR3NativeInit(pVM, fFallback, fForced);
282 ASMCompilerBarrier(); /* May have changed bMainExecutionEngine. */
283#else
284 RT_NOREF(fFallback);
285 rc = VINF_SUCCESS;
286#endif
287 if (RT_SUCCESS(rc))
288 {
289 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
290 LogRel(("NEM: NEMR3Init: Active.\n"));
291 else
292 {
293 LogRel(("NEM: NEMR3Init: Not available.\n"));
294 if (fForced)
295 rc = VERR_NEM_NOT_AVAILABLE;
296 }
297 }
298 else
299 LogRel(("NEM: NEMR3Init: Native init failed: %Rrc.\n", rc));
300 }
301 else
302 {
303 LogRel(("NEM: NEMR3Init: Disabled.\n"));
304 rc = fForced ? VERR_NEM_NOT_ENABLED : VINF_SUCCESS;
305 }
306 return rc;
307}
308
309
310/**
311 * Perform initialization that depends on CPUM working.
312 *
313 * This is a noop if NEM wasn't activated by a previous NEMR3Init() call.
314 *
315 * @returns VBox status code.
316 * @param pVM The cross context VM structure.
317 */
318VMMR3_INT_DECL(int) NEMR3InitAfterCPUM(PVM pVM)
319{
320 int rc = VINF_SUCCESS;
321#ifdef VBOX_WITH_NATIVE_NEM
322 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
323 rc = nemR3NativeInitAfterCPUM(pVM);
324#else
325 RT_NOREF(pVM);
326#endif
327 return rc;
328}
329
330
331/**
332 * Called when a init phase has completed.
333 *
334 * @returns VBox status code.
335 * @param pVM The cross context VM structure.
336 * @param enmWhat The phase that completed.
337 */
338VMMR3_INT_DECL(int) NEMR3InitCompleted(PVM pVM, VMINITCOMPLETED enmWhat)
339{
340 int rc = VINF_SUCCESS;
341#ifdef VBOX_WITH_NATIVE_NEM
342 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
343 rc = nemR3NativeInitCompleted(pVM, enmWhat);
344#else
345 RT_NOREF(pVM, enmWhat);
346#endif
347 return rc;
348}
349
350
351/**
352 *
353 * @returns VBox status code.
354 * @param pVM The cross context VM structure.
355 */
356VMMR3_INT_DECL(int) NEMR3Term(PVM pVM)
357{
358 AssertReturn(pVM->nem.s.u32Magic == NEM_MAGIC, VERR_WRONG_ORDER);
359 for (VMCPUID iCpu = 0; iCpu < pVM->cCpus; iCpu++)
360 AssertReturn(pVM->aCpus[iCpu].nem.s.u32Magic == NEMCPU_MAGIC, VERR_WRONG_ORDER);
361
362 /* Do native termination. */
363 int rc = VINF_SUCCESS;
364#ifdef VBOX_WITH_NATIVE_NEM
365 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
366 rc = nemR3NativeTerm(pVM);
367#endif
368
369 /* Mark it as terminated. */
370 for (VMCPUID iCpu = 0; iCpu < pVM->cCpus; iCpu++)
371 pVM->aCpus[iCpu].nem.s.u32Magic = NEMCPU_MAGIC_DEAD;
372 pVM->nem.s.u32Magic = NEM_MAGIC_DEAD;
373 return rc;
374}
375
376
377/**
378 * The VM is being reset.
379 *
380 * @param pVM The cross context VM structure.
381 */
382VMMR3_INT_DECL(void) NEMR3Reset(PVM pVM)
383{
384#ifdef VBOX_WITH_NATIVE_NEM
385 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
386 nemR3NativeReset(pVM);
387#else
388 RT_NOREF(pVM);
389#endif
390}
391
392
393/**
394 * Resets a virtual CPU.
395 *
396 * Used to bring up secondary CPUs on SMP as well as CPU hot plugging.
397 *
398 * @param pVCpu The cross context virtual CPU structure to reset.
399 * @param fInitIpi Set if being reset due to INIT IPI.
400 */
401VMMR3_INT_DECL(void) NEMR3ResetCpu(PVMCPU pVCpu, bool fInitIpi)
402{
403#ifdef VBOX_WITH_NATIVE_NEM
404 if (pVCpu->pVMR3->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
405 nemR3NativeResetCpu(pVCpu, fInitIpi);
406#else
407 RT_NOREF(pVCpu, fInitIpi);
408#endif
409}
410
411
412VMMR3_INT_DECL(VBOXSTRICTRC) NEMR3RunGC(PVM pVM, PVMCPU pVCpu)
413{
414 Assert(VM_IS_NEM_ENABLED(pVM));
415#ifdef VBOX_WITH_NATIVE_NEM
416 return nemR3NativeRunGC(pVM, pVCpu);
417#else
418 NOREF(pVM); NOREF(pVCpu);
419 return VERR_INTERNAL_ERROR_3;
420#endif
421}
422
423
424VMMR3_INT_DECL(bool) NEMR3CanExecuteGuest(PVM pVM, PVMCPU pVCpu, PCPUMCTX pCtx)
425{
426 Assert(VM_IS_NEM_ENABLED(pVM));
427#ifdef VBOX_WITH_NATIVE_NEM
428 return nemR3NativeCanExecuteGuest(pVM, pVCpu, pCtx);
429#else
430 NOREF(pVM); NOREF(pVCpu); NOREF(pCtx);
431 return false;
432#endif
433}
434
435
436VMMR3_INT_DECL(bool) NEMR3SetSingleInstruction(PVM pVM, PVMCPU pVCpu, bool fEnable)
437{
438 Assert(VM_IS_NEM_ENABLED(pVM));
439#ifdef VBOX_WITH_NATIVE_NEM
440 return nemR3NativeSetSingleInstruction(pVM, pVCpu, fEnable);
441#else
442 NOREF(pVM); NOREF(pVCpu); NOREF(fEnable);
443 return false;
444#endif
445}
446
447
448VMMR3_INT_DECL(void) NEMR3NotifyFF(PVM pVM, PVMCPU pVCpu, uint32_t fFlags)
449{
450 AssertLogRelReturnVoid(VM_IS_NEM_ENABLED(pVM));
451#ifdef VBOX_WITH_NATIVE_NEM
452 nemR3NativeNotifyFF(pVM, pVCpu, fFlags);
453#else
454 RT_NOREF(pVM, pVCpu, fFlags);
455#endif
456}
457
458
459
460
461VMMR3_INT_DECL(int) NEMR3NotifyPhysRamRegister(PVM pVM, RTGCPHYS GCPhys, RTGCPHYS cb)
462{
463 int rc = VINF_SUCCESS;
464#ifdef VBOX_WITH_NATIVE_NEM
465 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
466 rc = nemR3NativeNotifyPhysRamRegister(pVM, GCPhys, cb);
467#else
468 NOREF(pVM); NOREF(GCPhys); NOREF(cb);
469#endif
470 return rc;
471}
472
473
474VMMR3_INT_DECL(int) NEMR3NotifyPhysMmioExMap(PVM pVM, RTGCPHYS GCPhys, RTGCPHYS cb, uint32_t fFlags, void *pvMmio2)
475{
476 int rc = VINF_SUCCESS;
477#ifdef VBOX_WITH_NATIVE_NEM
478 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
479 rc = nemR3NativeNotifyPhysMmioExMap(pVM, GCPhys, cb, fFlags, pvMmio2);
480#else
481 NOREF(pVM); NOREF(GCPhys); NOREF(cb); NOREF(fFlags); NOREF(pvMmio2);
482#endif
483 return rc;
484}
485
486
487VMMR3_INT_DECL(int) NEMR3NotifyPhysMmioExUnmap(PVM pVM, RTGCPHYS GCPhys, RTGCPHYS cb, uint32_t fFlags)
488{
489 int rc = VINF_SUCCESS;
490#ifdef VBOX_WITH_NATIVE_NEM
491 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
492 rc = nemR3NativeNotifyPhysMmioExUnmap(pVM, GCPhys, cb, fFlags);
493#else
494 NOREF(pVM); NOREF(GCPhys); NOREF(cb); NOREF(fFlags);
495#endif
496 return rc;
497}
498
499
500VMMR3_INT_DECL(int) NEMR3NotifyPhysRomRegisterEarly(PVM pVM, RTGCPHYS GCPhys, RTGCPHYS cb, uint32_t fFlags)
501{
502 int rc = VINF_SUCCESS;
503#ifdef VBOX_WITH_NATIVE_NEM
504 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
505 rc = nemR3NativeNotifyPhysRomRegisterEarly(pVM, GCPhys, cb, fFlags);
506#else
507 NOREF(pVM); NOREF(GCPhys); NOREF(cb); NOREF(fFlags);
508#endif
509 return rc;
510}
511
512
513/**
514 * Called after the ROM range has been fully completed.
515 *
516 * This will be preceeded by a NEMR3NotifyPhysRomRegisterEarly() call as well a
517 * number of NEMHCNotifyPhysPageProtChanged calls.
518 *
519 * @returns VBox status code
520 * @param pVM The cross context VM structure.
521 * @param GCPhys The ROM address (page aligned).
522 * @param cb The size (page aligned).
523 * @param fFlags NEM_NOTIFY_PHYS_ROM_F_XXX.
524 */
525VMMR3_INT_DECL(int) NEMR3NotifyPhysRomRegisterLate(PVM pVM, RTGCPHYS GCPhys, RTGCPHYS cb, uint32_t fFlags)
526{
527 int rc = VINF_SUCCESS;
528#ifdef VBOX_WITH_NATIVE_NEM
529 if (pVM->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
530 rc = nemR3NativeNotifyPhysRomRegisterLate(pVM, GCPhys, cb, fFlags);
531#else
532 NOREF(pVM); NOREF(GCPhys); NOREF(cb); NOREF(fFlags);
533#endif
534 return rc;
535}
536
537
538VMMR3_INT_DECL(void) NEMR3NotifySetA20(PVMCPU pVCpu, bool fEnabled)
539{
540#ifdef VBOX_WITH_NATIVE_NEM
541 if (pVCpu->pVMR3->bMainExecutionEngine == VM_EXEC_ENGINE_NATIVE_API)
542 nemR3NativeNotifySetA20(pVCpu, fEnabled);
543#else
544 NOREF(pVCpu); NOREF(fEnabled);
545#endif
546}
547
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette