Opened 8 years ago
Closed 8 years ago
#16643 closed defect (fixed)
rdtsc is not reset on CPU reset => Fixed in SVN
Reported by: | Axel Dörfler | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 5.1.18 |
Keywords: | cpu rdtsc reset | Cc: | |
Guest type: | other | Host type: | Windows |
Description
According to the official Intel documentation, this counter should be reset when the CPU is reset (chapter 17.15 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 from September 2016):
"The time-stamp counter (as implemented in the P6 family, Pentium, Pentium M, Pentium 4, Intel Xeon, Intel Core Solo and Intel Core Duo processors and later processors) is a 64-bit counter that is set to 0 following a RESET of the processor."
You can easily reproduce this using Haiku which uses rdtsc to compute the uptime of the system.
Attachments (4)
Change History (22)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Haiku tries to reboot via ACPI first, and if that fails, uses the keyboard controller to reset the machine. If that fails, too, it overwrites the local descriptor table with null: http://code.metager.de/source/xref/haiku/src/system/kernel/arch/x86/arch_cpu.cpp#1212
I'm not sure which method is actually used here, I'd guess it's ACPI. In any case, the machine reboots just fine otherwise :-)
comment:3 by , 8 years ago
Yes, the TSC is set to zero on CPU reset, but an OS does not take control at CPU reset. The system could be sitting at some boot menu for an hour or a day or a month. In addition, the firmware / boot loader is free to set the TSC to any specific or random value before the OS boots.
The upshot is that an OS cannot extrapolate anything from the value the TSC has when the OS boots. An OS can only compare the current TSC value with the value the TSC had when the OS first read it.
comment:4 by , 8 years ago
One could argue that the time spent in the boot loader accounts to the system uptime as well, but this ticket should not be about that Haiku could handle computing the system uptime differently (you're free to report a bug at the project's bug tracker, though :-)).
It just reports a bug in VirtualBox, and Haiku merely offers a way to reproduce it conveniently.
comment:5 by , 8 years ago
If you could find some official document stating that the firmware is not allowed to manipulate the TSC and that every reboot must result in a hard reset of the CPU, that would help. Otherwise it just looks like Haiku making invalid assumptions.
comment:6 by , 8 years ago
I'm afraid there is no such document. However, there is the Intel specification that VirtualBox clearly violates, no matter what kind of assumptions Haiku makes, again it's just a convenient test case.
If you want to ignore this, fine, I just reported it to improve the software. But please don't come up with excuses why VirtualBox behaves within the spec here. It does not.
comment:7 by , 8 years ago
I may have found a bug caused by this. On multi-CPU VBox VMs, if they have been running for quite some time (sometimes I do not need to reboot for a month or 2), and that the OS running in the VM is Windows (2012R2 in my case, but I think this applies to any Windows version), Windows will not load past the splash screen (spinning dots on W2K12R2). I have to issue a power off / power on to solve the problem.
I have found a VMware bug (https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092807&src=vmw_so_vex_mbrad_895) that has been corrected in VMware. It says that the TSC (Time Stamp Counter) is incremented as CPU cycles go. But cycles are not always equally distributed for all CPUs, and when you ask Windows to reboot, it does a "soft reset", not a complete CPU reset, which does not clear the TSC (at least in VMware). When TSC is not nearly equal on all CPUs, Windows refuses to boot (though I do not see how Windows should be concerned by this at boot time...)
Maybe a patch should be issued in VBox to reset the TSC, or at least set it equal on all CPUs, when an ACPI soft reset is asked by the OS.
comment:8 by , 8 years ago
jeffcourteau, did you really experience such a problem with VirtualBox? If so, can you provide a VBox.log file of such a VM session?
by , 8 years ago
Attachment: | VBox.log.1 added |
---|
comment:9 by , 8 years ago
VBox.log.1 is the log file when I rebooted the VM and it hung at Windows splash screen. Even a reset would not do the trick, I had to poweroff / power on.
VBox.log.2 is the log file after I powered off / powered on.
comment:10 by , 8 years ago
Thanks for the log files but you probably mixed something up. VBox.log.2 shows RESTTING after 12:58 and the following events hint that the guest was not stuck: The guest driver was loaded and there are several screen resize events up to 1600x900. So unlikely that this guest hung during splash screen. VBox.log.1 has a long uptime of 327:42h but does not show a single RESTTING line.
Did you attach the wrong files?
by , 8 years ago
Attachment: | VBox.log.3 added |
---|
comment:11 by , 8 years ago
Here with the oldest log file, VBox.log.3, I know I encountered the situation in this one. You can see the first reset I issued at the 1279:29:59.186625 timestamp, after 24 minutes stuck on the splashscreen...
by , 8 years ago
Attachment: | SRVWEB02.VBox.log.2 added |
---|
comment:12 by , 8 years ago
Yes, that log is much more interesting. The VM reset happened after about 53.3 days of uptime. The calculations in the VMware KB article indicate that on a 4 GHz system the Windows bug will be triggered after about 52 days (slower TSC means longer uptime before the bug triggers).
There is good evidence that you really did hit this Windows bug. In the VBox.log.3 file, you can see at the end that /TM/TSC/offCPU1 is wildly different from /TM/TSC/offCPU0. That's because Windows changed the TSC on CPU0 but left it alone on CPU1. In VBox.log.1 the two are identical after almost two weeks of uptime.
Your summary of the problem in comment 7 is not at all what VMware describes. The problem isn't that the TSC is out of sync, the problem is that Windows makes the TSC out of sync if the TSC value at boot time is relatively high (greater than 0x40000000000000).
comment:13 by , 8 years ago
OK so basically it would be a Windows bug when it runs inside a VM, but not on bare metal? If the bug is not there on bare metal, isn't it a VBox / VMware bug, that should mimic bare metal behavior?
follow-up: 16 comment:14 by , 8 years ago
We will probably change VBox to behave like most bare metal (ie reset the TSC on reset).
But so far there is no evidence for any VBox bug -- only for a Windows bug! The different behavior is not necessarily an evidence for a VBox bug. If bare metal would reset the TSC just for paranoia (without any clearly documented need) then users of Windows are just lucky that the Windows bug does not hit them.
comment:15 by , 8 years ago
It's not virtual vs. physical system, it's more a question of platform/firmware behavior.
I have to assume that it's trivial (if tedious) to reproduce the same behavior on a physical system. Power on the system, let it sit at some boot prompt for 2-4 months, then continue booting. That should let the TSC advance enough that Windows will get confused.
comment:16 by , 8 years ago
Replying to frank:
But so far there is no evidence for any VBox bug -- only for a Windows bug!
How do interprete the Intel CPU spec then? Do you generally not care about it that much, ie. is it not a reliable source of information?
comment:17 by , 8 years ago
Does the Intel SDM say somewhere that firmware must perform a hard reset of the CPU on system reboot? If so, I'd like to know where exactly.
And no, in general the SDM is not an entirely reliable source of information. There are quite a few errors and omissions. Is it a good source of information? Absolutely. Is it 100% reliable? Absolutely not.
comment:18 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Summary: | rdtsc is not reset on CPU reset → rdtsc is not reset on CPU reset => Fixed in SVN |
We confirmed that Windows 10 is buggy (as VMware found) and will therefore reset the TSC on VM reset in the next VirtualBox maintenance update.
It's clear that the TSC is set to 0 following a CPU reset - but what situations trigger a CPU reset? Does a warm/cold reboot by jumping to the BIOS reset the CPU (I wouldn't expect it to)? Which way(s) is Haiku using to reboot the system?