Opened 10 years ago
Closed 9 years ago
#13961 closed defect (fixed)
Unable to handle kernel paging request (SMAP with 4.3.26)
Reported by: | Christian Hesse | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 4.3.26 |
Keywords: | Cc: | ||
Guest type: | other | Host type: | Linux |
Description
My device is a Lenovo Thinkpad X250 (Broadwell CPU) running Linux 4.0rc4. System crashes as soon as virtualbox guest is started. This happens with 4.3.24 and 4.3.26.
I am pretty sure this is not related to the CR4 changes. I compiled Linux 4.0rc4 with CR4 changes reverted and this still happens.
Kernel log are attached.
Attachments (31)
Change History (79)
by , 10 years ago
comment:1 by , 10 years ago
I added 'nosmap' and 'nosmep' to host boot parameters. Virtualbox guest now starts without issues.
comment:2 by , 10 years ago
Could you attach the compiled vboxdrv.ko module from VirtualBox 4.3.26? Also, could you check adding 'nosmep' and 'nosmap' exclusively? Thanks!
comment:3 by , 10 years ago
You should increase the upload limit... My vboxdrv.ko exceeds the limit by about 20kB.
I uploaded it to my webserver for now: http://www.eworm.de/tmp/vboxdrv.ko
This is virtualbox 4.3.26 and linux 4.0rc4.r0.g06e5801.
comment:4 by , 10 years ago
Downloaded, thanks. If you compress this binary it will not exceed the size limit.
comment:5 by , 10 years ago
Having SMEP (supervisor mode execution prevention) enabled is just fine. It's sufficient to have 'nosmap' (to disable supervisor mode access prevention) in boot parameters.
comment:6 by , 10 years ago
You are right... I expected it to be compressed, but looks like dkms does not compress.
Uploading compressed vboxdrv.ko for reference.
by , 10 years ago
Attachment: | vboxdrv.ko.gz added |
---|
vboxdrv.ko.gz (virtualbox 4.3.26, linux 4.0rc4.r0.g06e5801)
comment:7 by , 10 years ago
Thanks. Actually I think I know where the problem is and I might have a patch available during the next few hours.
comment:8 by , 10 years ago
Summary: | unable to handle kernel paging request → Unable to handle kernel paging request (SMAP with 4.3.26) |
---|
by , 10 years ago
Attachment: | diff_smap_2 added |
---|
comment:9 by , 10 years ago
Attached a diff for the kernel driver which should fix the problem. After you applied the diff to the VirtualBox kernel driver sources (which are located at /usr/src/vboxhost-4.3.26) please recompile the host kernel modules by
/etc/init.d/vboxdrv setup
and start your VM. Please make sure to run this on a Linux kernel with 'nosmap' and 'nosmep' removed.
by , 10 years ago
Attachment: | vboxdrv.ko.2.gz added |
---|
vboxdrv.ko.gz (virtualbox 4.3.26 + diff_smap_2, linux 4.0rc4.r0.g06e5801)
comment:12 by , 10 years ago
Thanks. Unfortunately I don't understand why EFLAGS.AC is still not set. Could you repeat the experiment and attach all corresponding items from the same VM session:
- The VBox.log file
- The Linux kernel log
- The vboxdrv.ko file if different than vboxdrv.ko.2.gz
This will help me to debug the problem because the VBox.log file contains the load addresses of the VMM modules. Unfortunately we cannot reproduce the problem as we still don't have Broadwell hardware.
comment:13 by , 10 years ago
Ok, here we go...
It's not easy to capture VBox.log from a dying machine, but inotify, tail and ssh did the trick. ;)
vboxdrv.ko is unchanged, logs will follow.
by , 10 years ago
Attachment: | kernel-20150317-211216.log added |
---|
kernel log virtualbox 4.3.26 + diff_smap_2
by , 10 years ago
Attachment: | VBox-20150317-211216.log added |
---|
VBox.log virtualbox 4.3.26 + diff_smap_2
comment:14 by , 10 years ago
Thanks again. As you used a non-official package, could you also provide the VMMR0.r0 module?
comment:15 by , 10 years ago
That is what's found at /usr/lib/virtualbox/VMMR0.r0?
I will attach it in a few seconds. Though I am not sure if this is identical to what I used... The logs were made with a package I compiled myself, I now have installed my distribution's packages. Both were built in a clean chroot.
Let me know if I should repeat my tests.
by , 10 years ago
Attachment: | diff_smap_3 added |
---|
comment:16 by , 10 years ago
Ok. Could you try diff_smap_3 instead of diff_smap_2 and see if you would now be able to start VMs with SMAP enabled? Thanks!
comment:17 by , 10 years ago
Still crashes. This was with linux 4.0rc4.r199.gb314aca, virtualbox 4.3.26 + diff_smap_3.
by , 10 years ago
Attachment: | leda-kernel.log added |
---|
by , 10 years ago
Attachment: | leda-vbox.log added |
---|
by , 10 years ago
Attachment: | vboxdrv.ko.3.gz added |
---|
by , 10 years ago
Attachment: | diff_smap_4 added |
---|
comment:18 by , 10 years ago
Next try. We just saw this changeset which would explain why the other patches did not work. Could you try again? Thank you!
comment:19 by , 10 years ago
Looks like that did the trick! Guest is up and running, host is still alive. ;)
Thanks a lot!
comment:21 by , 10 years ago
It's very seldom, but still happens from time to time... Looks like we have a corner case that still crashes the machine. Any ideas? Sadly I can not reproduce, happens about once a week for me.
comment:22 by , 10 years ago
eworm, that's important. I'm running VBo on a Linux 4.0.0 host and never saw such problems for many weeks now. It would be nice if you could provide at least a VBox.log file together with the output of 'dmesg' and the corresponding vboxdrv.ko as you provided before.
comment:23 by , 10 years ago
I think I did about a hundred reboot cycles... Finally it crashed. :D Have fun!
comment:24 by , 10 years ago
Thanks eworm. One more request: Could you also attach the VMMR0.r0 file from your installation? You are using a distribution-specific package therefore I don't have a reference. Thanks!
comment:25 by , 10 years ago
comment:26 by , 10 years ago
I've applied the latest patch in this thread and it has resolved the issue I was encountering with Virtualbox freezing the host. I have yet to encounter any further trouble but I will keep an eye out for the issue eworm mentions and will follow-up if I encounter it. Thanks!
by , 10 years ago
Attachment: | journal.log added |
---|
by , 10 years ago
by , 10 years ago
Attachment: | vboxdrv.ko.5.gz added |
---|
comment:27 by , 10 years ago
Just had another crash... Uploaded the logs and kernel module.
Any news on this? This is really annoying. Would be great to have a stable workstation any time soon.
comment:28 by , 10 years ago
Thanks for the new dump. Looks like the fault was triggered at the exact same place as before. We still don't know how this can happen and try to reproduce the problem.
comment:29 by , 10 years ago
Let me know if I can help in one way or another.
Does it help to upload more logs if a crash occurs?
comment:30 by , 10 years ago
VBox 4.3.28 contains the last code including diff_smap_4. I guess this will still not fix eworms problems but I would like to know if other users have any SMAP problems with VBox 4.3.28.
comment:31 by , 10 years ago
Hi frank; having the same issue as eworm on my Thinkpad T450s, using the latest virtualbox 4.3.28, so can confirm the issue isn't fixed. I've uploaded my system.log, and am trying to find the other information such as my virtualbox log, will upload it as I find it.
by , 10 years ago
by , 10 years ago
Attachment: | vboxdrv.ko.6.gz added |
---|
comment:32 by , 10 years ago
I think that gives you everything you need, but let me know if there's anything else that'll help. Thanks!
comment:33 by , 10 years ago
fardog and eworm, your log files indicated that your VM processes still crash at the very same position. At the moment we cannot explain this. It's also interesting that only ArchLinux users seem to be affected, at least I'm not aware of users having 4.3.28 installed and having problems with SMAP. One developer installed ArchLinux on a Broadwell laptop and still was not able to reproduce the problem.
Could you attach the Linux kernel configuration?
(removed the last paragraph. I will prepare another test build soon)
comment:34 by , 10 years ago
Could you install this 4.3 test build and try to reproduce the crash? In that case, please attach the VBox.log file, the output of 'dmesg' and the vboxdrv.ko module as you already did before. Thank you!
comment:35 by , 10 years ago
Hrmpf. Sorry, that test build might fail to compile the kernel modules. Please use this test build instead. Not my day :-/
comment:36 by , 10 years ago
Attached the Linux kernel configuration. It's the default from Arch Linux linux package version 4.0.4-1.
Latst time my system crashed with linux 4.0.2-1 and Virtualbox 4.3.26 (+ patches). Given the fact that it happens really seldom I can not tell whether or not latest versions are still effected. Configuration did not change since then, though.
I am not sure how to reliably test this... Even rebooting the guest twenty times and more in a row without issues does not indicate it is fixed. I will think about it...
Wondering what influence the guest setup has... Does it matter? I took a look at the last crash logs available and saw that the BUG follows:
kernel: device bridge entered promiscuous mode
Where bridge is a bridge interface with static IP and dhcp daemon. Anything else that could have an effect?
by , 10 years ago
Attachment: | config.2.gz added |
---|
follow-up: 39 comment:37 by , 10 years ago
Over and over again Google brings me to an old ticket about a similar issue:
BUG: unable to handle kernel paging request
Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?
comment:38 by , 10 years ago
Hi frank; I won't be able to test that build until later tonight or tomorrow, but will give it a go. For the time being, I've uploaded my kernel config (for version linux 4.0.2-1, I haven't upgraded to the latest 4.0.3-1 yet, although it looks like 4.0.4-1 is eminent in Arch's repos). This is the version that the crash logs above were from.
Please note: this config.gz
was from the running system, which has nosmap
set as a boot parameter since that's how I can get virtualbox to run (I depend on it heavily for work); I'm not sure if that shows up in the config file, but I didn't want it to confuse you. The crash logs above are from a different boot, when I was NOT running the nosmap
flag.
Thanks!
comment:39 by , 10 years ago
Replying to eworm:
Over and over again Google brings me to an old ticket about a similar issue:
BUG: unable to handle kernel paging request
Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?
No, completely unrelated. Look at your kernel crash dump:
- CR4: 00000000003427e0, so bit 20 and 21 are set. That means that SMAP is activated.
- BUG: unable to handle kernel paging request at 00007f8460fcd000. That means that the kernel is accessing memory which is mapped into userland. This is considered being hacky but for historical reasons, VirtualBox still works this way. For example, on 32-bit hosts it would be not possible to map the complete guest address space into the 1G kernel address space.
- EFLAGS: 00010202. That means that bit 18 of EFlags (AC) is clear. But with VBox 4.3.28 this bit is supposed to be set on SMAP-enabled hosts.
That means that the AC flag is somewhere cleared in the kernel code and currently we don't know where. We even installed ArchLinux on a SMAP-enabled laptop, unfortunately no success...
follow-up: 41 comment:40 by , 10 years ago
Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?
by , 10 years ago
Attachment: | 0001-x86_64-smap-call-stac-before-touching-user-memory.patch added |
---|
x86_64, smap: call stac() before touching user memory
comment:41 by , 10 years ago
Replying to eworm:
Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?
No :-)
It works like this: stac() is for setting the AC flag. If the AC flag is set in R0 then the SMAP check (if R0=kernel is allowed to R3=userland) is disabled. clac() clears the AC flag and therefore enables the SMAP check. The latter is default in recent Linux on Broadwell CPUs.
The place you found is just the last part of an error handler. The code for copying data from user to kernel obviously needs to have the AC flag set to temporarily disable the SMAP check. That's done for instance in copy_user_generic_string (see copy_user_64.S). The copy_user_handle_tail() function is called if there was a normal page fault while accessing the provided user data from the kernel.
comment:42 by , 10 years ago
I encounter the same issue on my Lenovo L450 on Arch linux:
- Linux thinkpad 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 x86_64 GNU/Linux
- Virtualbox 4.3.28
- Windows 8.1 x64 guest
It happens on about 20% of starts with smap active very early in the boot process (Windows Logo showing with spinner).
Any hints on how to debug this are much appreciated.
by , 10 years ago
Attachment: | virtualbox-crash-on-startup added |
---|
comment:43 by , 10 years ago
Lenovo X1 Carbon 2015 model here (20BS). Arch Linux, VB 4.3.28. Win8.1 guest OS.
I've hit this bug regularly (1/4 virtual machine boots avg) since this report was filed. Also followed duplicate/similar reports regarding Broadwell, but this report seems to have the most relevant info.
I just crashed 3/3 times, and each requires a hard power-off of the host. This is a data-loss-potential bug. I'm surprised to see it unresolved.
Crash info attached.
by , 10 years ago
Attachment: | virtualbox-crash-on-startup-2015-06-24 added |
---|
Latest crash log info added
comment:44 by , 10 years ago
I confirm I have this issue as well on my Haswell Lenovo T540p. Kernel 4.1.1-r1, VB 4.3.28.
by , 10 years ago
adding nosmap to kernel parameters didn't help me. I still got panic tonight
comment:45 by , 10 years ago
Running Virtualbox 5.0.0 with KVM virtualization and SMAP enabled now. Looks like that does not suffer the issue. I will give it some more testing.
Virtualization "Default" is KVM as well, no?
comment:46 by , 9 years ago
VBox 5.0.2 contains more fixes and hopefully fixes all remaining problems with Linux and SMAP.
comment:47 by , 9 years ago
Running VBox 5.0.0 / 5.0.2 since about four weeks now. No remaining issues with SMAP enabled and KVM virtualization in action.
kernel log virtualbox 4.3.24