
Changeset 74588 in vbox for trunk/src

Oct 2, 2018 11:44:30 PM (6 years ago)

NEM/win: Doc updating for build 17757, 17763. bugref:9044

2 edited


  • trunk/src/VBox/VMM/VMMR3/NEMR3Native-win.cpp

    r74517 r74588  
    20702070 *   packed.
    20712071 *
     2072 *   Update: Somewhere along the security fixes or/and microcode updates this
     2073 *   summer (2018), performance dropped even more.
     2074 *
     2075 *   Update [build 17757]: Some performance improvements here, but they don't
     2076 *   yet make up for what was lost this summer.
     2077 *
    20722078 *
    20732079 * - We need a way to directly modify the TSC offset (or bias if you like).
    21002106 *        there is no way to support X2APIC.
    21012107 *
     2108 *
     2109 * - Not sure if this is a thing, but WHvCancelVirtualProcessor seems to cause
     2110 *   cause a lot more spurious WHvRunVirtualProcessor returns that what we get
     2111 *   with the replacement code.  By spurious returns we mean that the
     2112 *   subsequent call to WHvRunVirtualProcessor would return immediately.
     2113 *
     2114 *
     2115 * - There is no API for modifying protection of a page within a GPA range.
     2116 *
     2117 *   From what we can tell, the only way to modify the protection (like readonly
     2118 *   -> writable, or vice versa) is to first unmap the range and then remap it
     2119 *   with the new protection.
     2120 *
     2121 *   We are for instance doing this quite a bit in order to track dirty VRAM
     2122 *   pages.  VRAM pages starts out as readonly, when the guest writes to a page
     2123 *   we take an exit, notes down which page it is, makes it writable and restart
     2124 *   the instruction.  After refreshing the display, we reset all the writable
     2125 *   pages to readonly again, bulk fashion.
     2126 *
     2127 *   Now to work around this issue, we do page sized GPA ranges.  In addition to
     2128 *   add a lot of tracking overhead to WinHvPlatform and VID.SYS, this also
     2129 *   causes us to exceed our quota before we've even mapped a default sized
     2130 *   (128MB) VRAM page-by-page.  So, to work around this quota issue we have to
     2131 *   lazily map pages and actively restrict the number of mappings.
     2132 *
     2133 *   Our best workaround thus far is bypassing WinHvPlatform and VID entirely
     2134 *   when in comes to guest memory management and instead use the underlying
     2135 *   hypercalls (HvCallMapGpaPages, HvCallUnmapGpaPages) to do it ourselves.
     2136 *   (This also maps a whole lot better into our own guest page management
     2137 *   infrastructure.)
     2138 *
     2139 *   Update [build 17757]: Introduces a KVM like dirty logging API which could
     2140 *   help tracking dirty VGA pages, while being useless for shadow ROM and
     2141 *   devices trying catch the guest updating descriptors and such.
     2142 *
     2143 *
     2144 * - Observed problems doing WHvUnmapGpaRange immediately followed by
     2145 *   WHvMapGpaRange.
     2146 *
     2147 *   As mentioned above, we've been forced to use this sequence when modifying
     2148 *   page protection.   However, when transitioning from readonly to writable,
     2149 *   we've ended up looping forever with the same write to readonly memory
     2150 *   VMEXIT.  We're wondering if this issue might be related to the lazy mapping
     2151 *   logic in WinHvPlatform.
     2152 *
     2153 *   Workaround: Insert a WHvRunVirtualProcessor call and make sure to get a GPA
     2154 *   unmapped exit between the two calls.  Not entirely great performance wise
     2155 *   (or the santity of our code).
     2156 *
     2157 *
     2158 * - Implementing A20 gate behavior is tedious, where as correctly emulating the
     2159 *   A20M# pin (present on 486 and later) is near impossible for SMP setups
     2160 *   (e.g. possiblity of two CPUs with different A20 status).
     2161 *
     2162 *   Workaround: Only do A20 on CPU 0, restricting the emulation to HMA.  We
     2163 *   unmap all pages related to HMA (0x100000..0x10ffff) when the A20 state
     2164 *   changes, lazily syncing the right pages back when accessed.
     2165 *
     2166 *
     2167 * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to its
     2168 *   own format (WHV_RUN_VP_EXIT_CONTEXT).
     2169 *
     2170 *   We understand this might be because Microsoft wishes to remain free to
     2171 *   modify the VID/Hyper-V messages, but it's still rather silly and does slow
     2172 *   things down a little.  We'd much rather just process the messages directly.
     2173 *
     2174 *
     2175 * - WHVRunVirtualProcessor would've benefited from using a callback interface:
     2176 *
     2177 *      - The potential size changes of the exit context structure wouldn't be
     2178 *        an issue, since the function could manage that itself.
     2179 *
     2180 *      - State handling could probably be simplified (like cancelation).
     2181 *
     2182 *
     2183 * - WHvGetVirtualProcessorRegisters and WHvSetVirtualProcessorRegisters
     2184 *   internally converts register names, probably using temporary heap buffers.
     2185 *
     2186 *   From the looks of things, they are converting from WHV_REGISTER_NAME to
     2187 *   HV_REGISTER_NAME from in the "Virtual Processor Register Names" section in
     2188 *   the "Hypervisor Top-Level Functional Specification" document.  This feels
     2189 *   like an awful waste of time.
     2190 *
     2191 *   We simply cannot understand why HV_REGISTER_NAME isn't used directly here,
     2192 *   or at least the same values, making any conversion reduntant.  Restricting
     2193 *   access to certain registers could easily be implement by scanning the
     2194 *   inputs.
     2195 *
     2196 *   To avoid the heap + conversion overhead, we're currently using the
     2197 *   HvCallGetVpRegisters and HvCallSetVpRegisters calls directly, at least for
     2198 *   the ring-0 code.
     2199 *
     2200 *   Update [build 17757]: Register translation has been very cleverly
     2201 *   optimized and made table driven (2 top level tables, 4 + 1 leaf tables).
     2202 *   Register information consists of the 32-bit HV register name, register page
     2203 *   offset, and flags (giving valid offset, size and more).  Register
     2204 *   getting/settings seems to be done by hoping that the register page provides
     2205 *   it all, and falling back on the VidSetVirtualProcessorState if one or more
     2206 *   registers are not available there.
     2207 *
     2208 *   Note! We have currently not updated our ring-0 code to take the register
     2209 *   page into account, so it's suffering a little compared to the ring-3 code
     2210 *   that now uses the offical APIs for registers.
     2211 *
     2212 *
     2213 * - The YMM and XCR0 registers are not yet named (17083).  This probably
     2214 *   wouldn't be a problem if HV_REGISTER_NAME was used, see previous point.
     2215 *
     2216 *   Update [build 17757]: XCR0 is added. YMM register values seems to be put
     2217 *   into a yet undocumented XsaveState interface.  Approach is a little bulky,
     2218 *   but saves number of enums and dispenses with register transation.  Also,
     2219 *   the underlying Vid setter API duplicates the input buffer on the heap,
     2220 *   adding a 16 byte header.
     2221 *
     2222 *
     2223 * - Why does VID.SYS only query/set 32 registers at the time thru the
     2224 *   HvCallGetVpRegisters and HvCallSetVpRegisters hypercalls?
     2225 *
     2226 *   We've not trouble getting/setting all the registers defined by
     2227 *   WHV_REGISTER_NAME in one hypercall (around 80).  Some kind of stack
     2228 *   buffering or similar?
     2229 *
     2230 *
     2231 * - To handle the VMMCALL / VMCALL instructions, it seems we need to intercept
     2232 *   \#UD exceptions and inspect the opcodes.  A dedicated exit for hypercalls
     2233 *   would be more efficient, esp. for guests using \#UD for other purposes..
     2234 *
     2235 *
     2236 * - Wrong instruction length in the VpContext with unmapped GPA memory exit
     2237 *   contexts on 17115/AMD.
     2238 *
     2239 *   One byte "PUSH CS" was reported as 2 bytes, while a two byte
     2240 *   "MOV [EBX],EAX" was reported with a 1 byte instruction length.  Problem
     2241 *   naturally present in untranslated hyper-v messages.
     2242 *
     2243 *
     2244 * - The I/O port exit context information seems to be missing the address size
     2245 *   information needed for correct string I/O emulation.
     2246 *
     2247 *   VT-x provides this information in bits 7:9 in the instruction information
     2248 *   field on newer CPUs.  AMD-V in bits 7:9 in the EXITINFO1 field in the VMCB.
     2249 *
     2250 *   We can probably work around this by scanning the instruction bytes for
     2251 *   address size prefixes.  Haven't investigated it any further yet.
     2252 *
     2253 *
     2254 * - Querying WHvCapabilityCodeExceptionExitBitmap returns zero even when
     2255 *   intercepts demonstrably works (17134).
     2256 *
     2257 *
     2258 * - Querying HvPartitionPropertyDebugChannelId via HvCallGetPartitionProperty
     2259 *   (hypercall) hangs the host (17134).
     2260 *
     2261 *
     2262 *
     2263 * Old concerns that have been addressed:
    21022264 *
    21032265 * - The WHvCancelVirtualProcessor API schedules a dummy usermode APC callback
    21162278 *   modifications and the extra kernel call.
    21172279 *
    2118  *
    2119  * - Not sure if this is a thing, but WHvCancelVirtualProcessor seems to cause
    2120  *   cause a lot more spurious WHvRunVirtualProcessor returns that what we get
    2121  *   with the replacement code.  By spurious returns we mean that the
    2122  *   subsequent call to WHvRunVirtualProcessor would return immediately.
     2280 *   Update: All concerns have addressed in or about build 17757.
     2281 *
     2282 *   The WHvCancelVirtualProcessor API is now implemented using a new
     2283 *   VidMessageSlotHandleAndGetNext() flag (4).  Codepath is slightly longer
     2284 *   than NtAlertThread, but has the added benefit that spurious wakeups can be
     2285 *   more easily reduced.
    21232286 *
    21242287 *
    21312294 *   what is missing from his point of view in a single kernel call.
    21322295 *
     2296 *   Update: All concerns have been addressed in or about build 17757.  Selected
     2297 *   registers are now available via shared memory and thus HLT should (not
     2298 *   verified) no longer require a system call to compose the exit context data.
     2299 *
    21332300 *
    21342301 * - The WHvRunVirtualProcessor implementation does lazy GPA range mappings when
    21432310 *   point).
    21442311 *
    2145  *
    2146  * - There is no API for modifying protection of a page within a GPA range.
    2147  *
    2148  *   From what we can tell, the only way to modify the protection (like readonly
    2149  *   -> writable, or vice versa) is to first unmap the range and then remap it
    2150  *   with the new protection.
    2151  *
    2152  *   We are for instance doing this quite a bit in order to track dirty VRAM
    2153  *   pages.  VRAM pages starts out as readonly, when the guest writes to a page
    2154  *   we take an exit, notes down which page it is, makes it writable and restart
    2155  *   the instruction.  After refreshing the display, we reset all the writable
    2156  *   pages to readonly again, bulk fashion.
    2157  *
    2158  *   Now to work around this issue, we do page sized GPA ranges.  In addition to
    2159  *   add a lot of tracking overhead to WinHvPlatform and VID.SYS, this also
    2160  *   causes us to exceed our quota before we've even mapped a default sized
    2161  *   (128MB) VRAM page-by-page.  So, to work around this quota issue we have to
    2162  *   lazily map pages and actively restrict the number of mappings.
    2163  *
    2164  *   Our best workaround thus far is bypassing WinHvPlatform and VID entirely
    2165  *   when in comes to guest memory management and instead use the underlying
    2166  *   hypercalls (HvCallMapGpaPages, HvCallUnmapGpaPages) to do it ourselves.
    2167  *   (This also maps a whole lot better into our own guest page management
    2168  *   infrastructure.)
    2169  *
    2170  *
    2171  * - Observed problems doing WHvUnmapGpaRange immediately followed by
    2172  *   WHvMapGpaRange.
    2173  *
    2174  *   As mentioned above, we've been forced to use this sequence when modifying
    2175  *   page protection.   However, when transitioning from readonly to writable,
    2176  *   we've ended up looping forever with the same write to readonly memory
    2177  *   VMEXIT.  We're wondering if this issue might be related to the lazy mapping
    2178  *   logic in WinHvPlatform.
    2179  *
    2180  *   Workaround: Insert a WHvRunVirtualProcessor call and make sure to get a GPA
    2181  *   unmapped exit between the two calls.  Not entirely great performance wise
    2182  *   (or the santity of our code).
    2183  *
    2184  *
    2185  * - Implementing A20 gate behavior is tedious, where as correctly emulating the
    2186  *   A20M# pin (present on 486 and later) is near impossible for SMP setups
    2187  *   (e.g. possiblity of two CPUs with different A20 status).
    2188  *
    2189  *   Workaround: Only do A20 on CPU 0, restricting the emulation to HMA.  We
    2190  *   unmap all pages related to HMA (0x100000..0x10ffff) when the A20 state
    2191  *   changes, lazily syncing the right pages back when accessed.
    2192  *
    2193  *
    2194  * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to its
    2195  *   own format (WHV_RUN_VP_EXIT_CONTEXT).
    2196  *
    2197  *   We understand this might be because Microsoft wishes to remain free to
    2198  *   modify the VID/Hyper-V messages, but it's still rather silly and does slow
    2199  *   things down a little.  We'd much rather just process the messages directly.
    2200  *
    2201  *
    2202  * - WHVRunVirtualProcessor would've benefited from using a callback interface:
    2203  *
    2204  *      - The potential size changes of the exit context structure wouldn't be
    2205  *        an issue, since the function could manage that itself.
    2206  *
    2207  *      - State handling could probably be simplified (like cancelation).
    2208  *
    2209  *
    2210  * - WHvGetVirtualProcessorRegisters and WHvSetVirtualProcessorRegisters
    2211  *   internally converts register names, probably using temporary heap buffers.
    2212  *
    2213  *   From the looks of things, they are converting from WHV_REGISTER_NAME to
    2214  *   HV_REGISTER_NAME from in the "Virtual Processor Register Names" section in
    2215  *   the "Hypervisor Top-Level Functional Specification" document.  This feels
    2216  *   like an awful waste of time.
    2217  *
    2218  *   We simply cannot understand why HV_REGISTER_NAME isn't used directly here,
    2219  *   or at least the same values, making any conversion reduntant.  Restricting
    2220  *   access to certain registers could easily be implement by scanning the
    2221  *   inputs.
    2222  *
    2223  *   To avoid the heap + conversion overhead, we're currently using the
    2224  *   HvCallGetVpRegisters and HvCallSetVpRegisters calls directly.
    2225  *
    2226  *
    2227  * - The YMM and XCR0 registers are not yet named (17083).  This probably
    2228  *   wouldn't be a problem if HV_REGISTER_NAME was used, see previous point.
    2229  *
    2230  *
    2231  * - Why does VID.SYS only query/set 32 registers at the time thru the
    2232  *   HvCallGetVpRegisters and HvCallSetVpRegisters hypercalls?
    2233  *
    2234  *   We've not trouble getting/setting all the registers defined by
    2235  *   WHV_REGISTER_NAME in one hypercall (around 80).  Some kind of stack
    2236  *   buffering or similar?
    2237  *
    2238  *
    2239  * - To handle the VMMCALL / VMCALL instructions, it seems we need to intercept
    2240  *   \#UD exceptions and inspect the opcodes.  A dedicated exit for hypercalls
    2241  *   would be more efficient, esp. for guests using \#UD for other purposes..
    2242  *
    2243  *
    2244  * - Wrong instruction length in the VpContext with unmapped GPA memory exit
    2245  *   contexts on 17115/AMD.
    2246  *
    2247  *   One byte "PUSH CS" was reported as 2 bytes, while a two byte
    2248  *   "MOV [EBX],EAX" was reported with a 1 byte instruction length.  Problem
    2249  *   naturally present in untranslated hyper-v messages.
    2250  *
    2251  *
    2252  * - The I/O port exit context information seems to be missing the address size
    2253  *   information needed for correct string I/O emulation.
    2254  *
    2255  *   VT-x provides this information in bits 7:9 in the instruction information
    2256  *   field on newer CPUs.  AMD-V in bits 7:9 in the EXITINFO1 field in the VMCB.
    2257  *
    2258  *   We can probably work around this by scanning the instruction bytes for
    2259  *   address size prefixes.  Haven't investigated it any further yet.
    2260  *
    2261  *
    2262  * - Querying WHvCapabilityCodeExceptionExitBitmap returns zero even when
    2263  *   intercepts demonstrably works (17134).
    2264  *
    2265  *
    2266  * - Querying HvPartitionPropertyDebugChannelId via HvCallGetPartitionProperty
    2267  *   (hypercall) hangs the host (17134).
     2312 *   Update: All concerns have been addressed in or about build 17757.
    22682313 *
    22692314 *
    23582403 * @subsection sec_nem_win_benchmarks           Benchmarks.
    23592404 *
    2360  * @subsubsection subsect_nem_win_benchmarks_bs2t1 Bootsector2-test1
     2405 * @subsubsection subsect_nem_win_benchmarks_bs2t1      17134/2018-06-22: Bootsector2-test1
    23612406 *
    23622407 * This is ValidationKit/bootsectors/bootsector2-test1.asm as of 2018-06-22
    24672512 *
    24682513 *
    2469  * @subsubsection subsect_nem_win_benchmarks_w2k    Windows 2000 Boot & Shutdown
     2514 * @subsubsection subsect_nem_win_benchmarks_bs2t1u1   17134/2018-10-02: Bootsector2-test1
     2515 *
     2516 * Update on 17134.  While expectantly testing a couple of newer builds (17758,
     2517 * 17763) hoping for some increases in performance, the numbers turn out
     2518 * to be generally worse than the initial test June test run.  So, I went back
     2519 * to the 1803 (17134) installation and re-tested, finding that the numbers had
     2520 * somehow turned worse over the last 3-4 months.
     2521 *
     2522 *
     2523 *
     2524 * Suspects are security updates and/or microcode updates installed since then,
     2525 * either hitting thread switching and/or hyper-V badly.  I'm a bit puzzled why
     2526 * AMD is affected this badly too.
     2527 *
     2528 *
     2529 * @subsubsection subsect_nem_win_benchmarks_bs2t1u2   17763: Bootsector2-test1
     2530 *
     2531 * Some preliminary numbers for build 17763 on the 3.4 GHz AMD 1950X, the second
     2532 * column will improve we get time to have a look the register page.
     2533 *
     2534 * There is a  50%  performance loss here compared to the June numbers with
     2535 * build 17134.  The RDTSC numbers hits that it isn't in the Hyper-V core
     2536 * (hvax64.exe), but something on the NT side.
     2537 *
     2538 * @verbatim
     2539TESTING...                                                           WinHv API           Hypercalls + VID    VirtualBox AMD-V
     2540  32-bit paged protected mode, CPUID                        :           54 145 ins/sec        51 436
     2541  real mode, CPUID                                          :           54 178 ins/sec        51 713
     2542  [snip]
     2543  32-bit paged protected mode, RDTSC                        :       98 927 639 ins/sec   100 254 552
     2544  real mode, RDTSC                                          :       99 601 206 ins/sec   100 886 699
     2545  [snip]
     2546  32-bit paged protected mode, 32-bit IN                    :           54 621 ins/sec        51 524
     2547  32-bit paged protected mode, 32-bit OUT                   :           54 870 ins/sec        51 671
     2548  32-bit paged protected mode, 32-bit IN-to-ring-3          :           54 624 ins/sec        43 964
     2549  32-bit paged protected mode, 32-bit OUT-to-ring-3         :           54 803 ins/sec        44 087
     2550  [snip]
     2551  32-bit paged protected mode, 32-bit read                  :           28 230 ins/sec        34 042
     2552  32-bit paged protected mode, 32-bit write                 :           27 962 ins/sec        34 050
     2553  32-bit paged protected mode, 32-bit read-to-ring-3        :           27 841 ins/sec        28 397
     2554  32-bit paged protected mode, 32-bit write-to-ring-3       :           27 896 ins/sec        29 455
     2555 * @endverbatim
     2556 *
     2557 *
     2558 * @subsubsection subsect_nem_win_benchmarks_w2k    17134/2018-06-22: Windows 2000 Boot & Shutdown
    24702559 *
    24712560 * Timing the startup and automatic shutdown of a Windows 2000 SP4 guest serves
  • trunk/src/VBox/VMM/testcase/NemRawBench-1.cpp

    r74582 r74588  
    12581258    if (rcExit == 0)
    12591259    {
    1260         printf("tstNemMini-1: Successfully created test VM...\n");
     1260        printf("tstNemBench-1: Successfully created test VM...\n");
    12621262        /*
    12671267        mmioTest(cFactor);
    1269         printf("tstNemMini-1: done\n");
     1269        printf("tstNemBench-1: done\n");
    12701270    }
    12711271    return rcExit;
Note: See TracChangeset for help on using the changeset viewer.

© 2025 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette