VirtualBox

Changeset 71279 in vbox for trunk/src/VBox/VMM/VMMR3


Ignore:
Timestamp:
Mar 8, 2018 7:56:47 PM (7 years ago)
Author:
vboxsync
svn:sync-xref-src-repo-rev:
121208
Message:

NEM: working on the @page docs for windows. bugref:9044

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/VBox/VMM/VMMR3/NEMR3.cpp

    r71275 r71279  
    2424 *
    2525 * On Windows the Hyper-V root partition (dom0 in zen terminology) does not have
    26  * nested VT-x or AMD-V capabilities.  For a while raw-mode worked in it,
    27  * however now we \#GP when modifying CR4.  So, when Hyper-V is active on
    28  * Windows we have little choice but to use Hyper-V to run our VMs.
     26 * nested VT-x or AMD-V capabilities.  For a while raw-mode worked inside it,
     27 * but for a while now we've been getting \#GP when trying to modify CR4 in the
     28 * world switcher.  So, when Hyper-V is active on Windows we have little choice
     29 * but to use Hyper-V to run our VMs.
     30 *
    2931 *
    3032 * @subsection subsec_nem_win_whv   The WinHvPlatform API
     
    3436 * This interface is a wrapper around the undocumented Virtualization
    3537 * Infrastructure Driver (VID) API - VID.DLL and VID.SYS.  The wrapper is
    36  * written in C++, namespaced and early version (at least) was using standard
     38 * written in C++, namespaced, early versions (at least) was using standard C++
    3739 * container templates in several places.
    3840 *
     
    7375 * Running guest code is done thru the WHvRunVirtualProcessor function.  It
    7476 * asynchronously starts or resumes hyper-V CPU execution and then waits for an
    75  * VMEXIT message.   Other threads can interrupt the execution by using
    76  * WHvCancelVirtualProcessor, which which case the thread in
    77  * WHvRunVirtualProcessor is woken up via a dummy QueueUserAPC and will call
    78  * VidStopVirtualProcessor to asynchronously end execution.  The stop CPU call
    79  * not immediately succeed if the CPU encountered a VMEXIT before the stop was
    80  * processed, in which case the VMEXIT needs to be processed first, and the
    81  * pending stop will be processed in a subsequent call to
    82  * WHvRunVirtualProcessor.
    83  *
    84  * {something about registers}
     77 * VMEXIT message.  Hyper-V / VID.SYS will return information about the message
     78 * in the message buffer mapping, and WHvRunVirtualProcessor will convert that
     79 * into it's own WHV_RUN_VP_EXIT_CONTEXT format.
     80 *
     81 * Other threads can interrupt the execution by using WHvCancelVirtualProcessor,
     82 * which which case the thread in WHvRunVirtualProcessor is woken up via a dummy
     83 * QueueUserAPC and will call VidStopVirtualProcessor to asynchronously end
     84 * execution.  The stop CPU call not immediately succeed if the CPU encountered
     85 * a VMEXIT before the stop was processed, in which case the VMEXIT needs to be
     86 * processed first, and the pending stop will be processed in a subsequent call
     87 * to WHvRunVirtualProcessor.
     88 *
     89 * Registers are retrieved and set via WHvGetVirtualProcessorRegisters and
     90 * WHvSetVirtualProcessorRegisters.  In addition, several VMEXITs include
     91 * essential register state in the exit context information, potentially making
     92 * it possible to emulate the instruction causing the exit without involving
     93 * WHvGetVirtualProcessorRegisters.
     94 *
    8595 *
    8696 * @subsubsection subsubsec_nem_win_whv_cons    Issues / Disadvantages
     
    92102 *   the VidMessageSlotHandleAndGetNext call.
    93103 *
    94  *   IIRC this will make the kernel schedule the callback thru
     104 *   IIRC this will make the kernel schedule the specified callback thru
    95105 *   NTDLL!KiUserApcDispatcher by modifying the thread context and quite
    96106 *   possibly the userland thread stack.  When the APC callback returns to
    97107 *   KiUserApcDispatcher, it will call NtContinue to restore the old thread
    98  *   context and resume execution from there.  Upshot this is a bit expensive.
     108 *   context and resume execution from there.  This naturally adds up to some
     109 *   CPU cycles, ring transitions aren't for free, especially after Spectre &
     110 *   Meltdown mitigations.
    99111 *
    100112 *   Using NtAltertThread call could do the same without the thread context
     
    122134 *   Since MMIO is currently realized as unmapped GPA, this will slow down all
    123135 *   MMIO accesses a tiny little bit as WHvRunVirtualProcessor looks up the
    124  *   guest physical address the checks if it's a pending lazy mapping.
     136 *   guest physical address to check if it is a pending lazy mapping.
     137 *
     138 *   The lazy mapping feature makes no sense to us.  We as API user have all the
     139 *   information and can do lazy mapping ourselves if we want/have to (see next
     140 *   point).
    125141 *
    126142 *
    127143 * - There is no API for modifying protection of a page within a GPA range.
    128144 *
    129  *   We're left with having to unmap the range and then remap it with the new
    130  *   protection.  For instance we're actively using this to track dirty VRAM
    131  *   pages, which means there are occational readonly->writable transitions at
    132  *   run time followed by bulk reversal to readonly when the display is
    133  *   refreshed.
    134  *
    135  *   Now to work around the issue, we do page sized GPA ranges.  In addition to
    136  *   add a lot of tracking overhead to WinHvPlatform and VID.SYS, it also causes
    137  *   us to exceed our quota before we've even mapped a default sized VRAM
    138  *   page-by-page.  So, to work around this quota issue we have to lazily map
    139  *   pages and actively restrict the number of mappings.
    140  *
    141  *   Out best workaround thus far is bypassing WinHvPlatform and VID when in
    142  *   comes to memory and instead us the hypercalls to do it (HvCallMapGpaPages,
    143  *   HvCallUnmapGpaPages).  (This also maps a whole lot better into our own
    144  *   guest page management infrastructure.)
    145  *
    146  *
    147  * - Observed problems doing WHvUnmapGpaRange followed by WHvMapGpaRange.
     145 *   From what we can tell, the only way to modify the protection (like readonly
     146 *   -> writable, or vice versa) is to first unmap the range and then remap it
     147 *   with the new protection.
     148 *
     149 *   We are for instance doing this quite a bit in order to track dirty VRAM
     150 *   pages.  VRAM pages starts out as readonly, when the guest writes to a page
     151 *   we take an exit, notes down which page it is, makes it writable and restart
     152 *   the instruction.  After refreshing the display, we reset all the writable
     153 *   pages to readonly again, bulk fashion.
     154 *
     155 *   Now to work around this issue, we do page sized GPA ranges.  In addition to
     156 *   add a lot of tracking overhead to WinHvPlatform and VID.SYS, this also
     157 *   causes us to exceed our quota before we've even mapped a default sized
     158 *   (128MB) VRAM page-by-page.  So, to work around this quota issue we have to
     159 *   lazily map pages and actively restrict the number of mappings.
     160 *
     161 *   Our best workaround thus far is bypassing WinHvPlatform and VID entirely
     162 *   when in comes to guest memory management and instead use the underlying
     163 *   hypercalls (HvCallMapGpaPages, HvCallUnmapGpaPages) to do it ourselves.
     164 *   (This also maps a whole lot better into our own guest page management
     165 *   infrastructure.)
     166 *
     167 *
     168 * - Observed problems doing WHvUnmapGpaRange immediately followed by
     169 *   WHvMapGpaRange.
    148170 *
    149171 *   As mentioned above, we've been forced to use this sequence when modifying
    150  *   page protection.   However, when upgrading from readonly to writable, we've
    151  *   ended up looping forever with the same write to readonly memory exit.
     172 *   page protection.   However, when transitioning from readonly to writable,
     173 *   we've ended up looping forever with the same write to readonly memory
     174 *   VMEXIT.  We're wondering if this issue might be related to the lazy mapping
     175 *   logic in WinHvPlatform.
    152176 *
    153177 *   Workaround: Insert a WHvRunVirtualProcessor call and make sure to get a GPA
    154  *   unmapped exit between the two calls.  Terrible for performance and code
    155  *   sanity.
    156  *
    157  *
    158  * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to it's
    159  *   own defined format.
     178 *   unmapped exit between the two calls.  Not entirely great performance wise
     179 *   (or the santity of our code).
     180 *
     181 *
     182 * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to its
     183 *   own format (WHV_RUN_VP_EXIT_CONTEXT).
    160184 *
    161185 *   We understand this might be because Microsoft wishes to remain free to
    162186 *   modify the VID/Hyper-V messages, but it's still rather silly and does slow
    163  *   things down.
     187 *   things down a little.  We'd much rather just process the messages directly.
    164188 *
    165189 *
    166190 * - WHVRunVirtualProcessor would've benefited from using a callback interface:
     191 *
    167192 *      - The potential size changes of the exit context structure wouldn't be
    168193 *        an issue, since the function could manage that itself.
    169  *      - State handling could be optimized simplified (esp. cancellation).
     194 *
     195 *      - State handling could probably be simplified (like cancelation).
    170196 *
    171197 *
     
    173199 *   internally converts register names, probably using temporary heap buffers.
    174200 *
    175  *   From the looks of things, it's converting from WHV_REGISTER_NAME to
    176  *   HV_REGISTER_NAME that's documented in the "Virtual Processor Register
    177  *   Names" section of "Hypervisor Top-Level Functional Specification".  This
    178  *   feels like an awful waste of time.  We simply cannot understand why it
    179  *   wouldn't have sufficed to use HV_REGISTER_NAME here and simply checked the
    180  *   input values if restrictions were desired.
     201 *   From the looks of things, they are converting from WHV_REGISTER_NAME to
     202 *   HV_REGISTER_NAME from in the "Virtual Processor Register Names" section in
     203 *   the "Hypervisor Top-Level Functional Specification" document.  This feels
     204 *   like an awful waste of time.
     205 *
     206 *   We simply cannot understand why HV_REGISTER_NAME isn't used directly here,
     207 *   or at least the same values, making any conversion reduntant.  Restricting
     208 *   access to certain registers could easily be implement by scanning the
     209 *   inputs.
    181210 *
    182211 *   To avoid the heap + conversion overhead, we're currently using the
     
    184213 *
    185214 *
     215 * - The YMM and XCR0 registers are not yet named (17083).  This probably
     216 *   wouldn't be a problem if HV_REGISTER_NAME was used, see previous point.
     217 *
     218 *
    186219 * - Why does WINHVR.SYS (or VID.SYS) only query/set 32 registers at the time
    187220 *   thru the HvCallGetVpRegisters and HvCallSetVpRegisters hypercalls?
    188221 *
    189222 *   We've not trouble getting/setting all the registers defined by
    190  *   WHV_REGISTER_NAME in one hypercall...
    191  *
    192  *
    193  * - .
     223 *   WHV_REGISTER_NAME in one hypercall (around 80)...
     224 *
     225 *
     226 * - The I/O port exit context information seems to be missing the address size
     227 *   information needed for correct string I/O emulation.
     228 *
     229 *   VT-x provides this information in bits 7:9 in the instruction information
     230 *   field on newer CPUs.  AMD-V in bits 7:9 in the EXITINFO1 field in the VMCB.
     231 *
     232 *   We can probably work around this by scanning the instruction bytes for
     233 *   address size prefixes.  Haven't investigated it any further yet.
     234 *
     235 *
     236 * - The WHvGetCapability function has a weird design:
     237 *      - The CapabilityCode parameter is pointlessly duplicated in the output
     238 *        structure (WHV_CAPABILITY).
     239 *
     240 *      - API takes void pointer, but everyone will probably be using
     241 *        WHV_CAPABILITY due to WHV_CAPABILITY::CapabilityCode making it
     242 *        impractical to use anything else.
     243 *
     244 *      - No output size.
     245 *
     246 *      - See GetFileAttributesEx, GetFileInformationByHandleEx,
     247 *        FindFirstFileEx, and others for typical pattern for generic
     248 *        information getters.
     249 *
     250 *
     251 * - The WHvGetPartitionProperty function uses the same weird design as
     252 *   WHvGetCapability, see above.
     253 *
     254 *
     255 * - The WHvSetPartitionProperty function has a totally weird design too:
     256 *      - In contrast to its partner WHvGetPartitionProperty, the property code
     257 *        is not a separate input parameter here but part of the input
     258 *        structure.
     259 *
     260 *      - The input structure is a void pointer rather than a pointer to
     261 *        WHV_PARTITION_PROPERTY which everyone probably will be using because
     262 *        of the WHV_PARTITION_PROPERTY::PropertyCode field.
     263 *
     264 *      - Really, why use PVOID for the input when the function isn't accepting
     265 *        minimal sizes.  E.g. WHVPartitionPropertyCodeProcessorClFlushSize only
     266 *        requires a 9 byte input, but the function insists on 16 bytes (17083).
     267 *
     268 *      - See GetFileAttributesEx, SetFileInformationByHandle, FindFirstFileEx,
     269 *        and others for typical pattern for generic information setters and
     270 *        getters.
     271 *
     272 *
     273 * @subsection subsec_nem_win_impl   Our implementation.
     274 *
     275 * Tomorrow...
     276 *
    194277 *
    195278 */
Note: See TracChangeset for help on using the changeset viewer.

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette