Opened 5 years ago
Last modified 4 years ago
#19133 assigned defect
VM lockup when guest opens a named pipe in a shared folder
Reported by: | lesha | Owned by: | |
---|---|---|---|
Component: | shared folders | Version: | VirtualBox 6.0.14 |
Keywords: | named pipe, fifo, lockup | Cc: | |
Guest type: | all | Host type: | all |
Description (last modified by )
I discovered this bug because I was rsync
ing a shared folder containing a named pipe, and this hung my machine.
Repro steps:
- Set up a Linux guest on a Mac host. I have Ubuntu 18.04 running on VBox 6.0.14 (latest stable as of now).
- Set up a shared folder
- On the host,
mkfifo TEST_PIPE
in the folder - On the guest,
cat TEST_PIPE
At this point, the guest is (partially) locked up. Specifically:
- Ctrl-C nor SIGKILL will work for
cat
— it is hung in D state, unkillable - Any process touching the shared folder hangs likewise
dmesg
will show the following stack for the hungcat
https://pastebin.com/WjKQaZys [1]- The guest cannot be cleanly rebooted because shutdown requires tearing down the guest additions module, which is blocked on talking to the VM.
- Powering off the VM (not from the guest, but from the UI!) will hang.
So this is not a hang in guest additions, but in the VM code itself.
My guess is that the thread handling shared folders is locked up trying to read from the named pipe.
What confirms this is that the hang is resolved the moment that I do echo > TEST_PIPE
on the host.
The reason this is not expected behavior is that the guest sees the named pipe as a regular file:
$ ls -l TEST_PIPE -rwxrwx--- 1 root vboxsf 0 Dec 6 16:21 TEST_PIPE
Two fixes seem possible:
- Make named pipes act as true named pipes to the host. If you google "vboxsf named pipe", this is actually a feature that was previously requested.
- Make the
open
syscall fail in this context. This is worse than working pipes, but better than a lockup. At the moment, data does not actually travel down the named pipe, if the host doesecho foo > TEST_PIPE
and the guest doescat TEST_PIPE
, the guest simply sayscat: TEST_PIPE: Protocol error
.
$ sudo cat /proc/7376/stack [<0>] rtR0SemEventMultiLnxWait.isra.2+0x33d/0x370 [vboxguest] [<0>] VBoxGuest_RTSemEventMultiWaitEx+0xe/0x10 [vboxguest] [<0>] VBoxGuest_RTSemEventMultiWait+0x28/0x30 [vboxguest] [<0>] vgdrvHgcmAsyncWaitCallbackWorker+0x1c3/0x210 [vboxguest] [<0>] VGDrvCommonIoCtl+0x489/0x18e0 [vboxguest] [<0>] VBoxGuestIDC+0x149/0x160 [vboxguest] [<0>] VbglR0IdcCallRaw+0x13/0x20 [vboxsf] [<0>] VbglR0HGCMFastCall+0x1c/0x20 [vboxsf] [<0>] vbsf_reg_open+0x291/0x4f0 [vboxsf] [<0>] do_dentry_open+0x1c2/0x310 [<0>] vfs_open+0x4f/0x80 [<0>] path_openat+0x6bf/0x1900 [<0>] do_filp_open+0x9b/0x110 [<0>] do_sys_open+0x1bb/0x2c0 [<0>] SyS_openat+0x14/0x20 [<0>] do_syscall_64+0x73/0x130 [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [<0>] 0xffffffffffffffff
Change History (10)
comment:1 by , 5 years ago
Owner: | set to |
---|---|
Status: | new → accepted |
comment:3 by , 4 years ago
Description: | modified (diff) |
---|
comment:4 by , 4 years ago
Guest type: | Linux → all |
---|---|
Host type: | Mac OS X → all |
comment:5 by , 4 years ago
I could reproduce this on Solaris & Linux guests so far and on MacOS -X and Linux hosts, almost likely all platforms are affected.
comment:6 by , 4 years ago
Summary: | Mac-hosted VM lockup when guest opens a named pipe in a shared folder → VM lockup when guest opens a named pipe in a shared folder |
---|
comment:7 by , 4 years ago
to picture the description in a bit more detail, that's what happens when you attempt to use the named pipe/fifo from the host inside the guest:
[fbatschu@localhost sf_Music]$ ls -la /media/sf_Music/TEST_PIPE -rwxrwx---. 1 root vboxsf 0 Aug 10 14:53 /media/sf_Music/TEST_PIPE [fbatschu@localhost sf_Music]$ file TEST_PIPE TEST_PIPE: empty [fbatschu@localhost sf_Music]$ stat TEST_PIPE File: TEST_PIPE Size: 0 Blocks: 0 IO Block: 1048576 regular empty file Device: 27h/39d Inode: 128 Links: 1 Access: (0770/-rwxrwx---) Uid: ( 0/ root) Gid: ( 975/ vboxsf) Context: system_u:object_r:vmblock_t:s0 Access: 2020-08-10 14:53:46.069672000 +0200 Modify: 2020-08-10 14:53:46.069672000 +0200 Change: 2020-08-10 14:53:46.069672000 +0200 Birth: 2020-08-10 14:53:46.069672000 +0200
### now open the named pipe/fifo file:
[fbatschu@localhost sf_Music]$ cat TEST_PIPE
### now process is stuck in the guest, cannot CTRL+C the process.
[fbatschu@localhost ~]$ sudo cat /proc/3274/stack [<0>] rtR0SemEventMultiLnxWait.isra.0+0x2e2/0x3b0 [vboxguest] [<0>] vgdrvHgcmAsyncWaitCallbackWorker+0xcf/0x220 [vboxguest] [<0>] VGDrvCommonIoCtl+0x47f/0x1900 [vboxguest] [<0>] VBoxGuestIDC+0x113/0x130 [vboxguest] [<0>] vbsf_reg_open+0x23a/0x4b0 [vboxsf] [<0>] do_dentry_open+0x13a/0x380 [<0>] path_openat+0x998/0xfb0 [<0>] do_filp_open+0x7e/0xd0 [<0>] do_sys_openat2+0x1f1/0x2a0 [<0>] do_sys_open+0x34/0x60 [<0>] do_syscall_64+0x5b/0x1c0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
### write something on the host into the named pipe/fifo:
fbatschu@lserver:~/Music$ echo bla > TEST_PIPE
### and the cat in the guest which has the named pipe/fifo file ### open fails with an error
[fbatschu@localhost sf_Music]$ cat TEST_PIPE cat: TEST_PIPE: Protocol error
### from inside the guest you cannot create a named pipe/fifo ### file in the shared folder file system:
[fbatschu@localhost sf_Music]$ mkfifo TEST_PIPE_2 mkfifo: cannot create fifo 'TEST_PIPE_2': Operation not permitted
### guest writes data to named pipe/fifo file:
[fbatschu@localhost sf_Music]$ echo "guest" > TEST_PIPE -bash: TEST_PIPE: Invalid argument [fbatschu@localhost sf_Music]$ echo $? 1
### host process reading from named pipe/fifo unblocks ### but receives no data:
fbatschu@lserver:~/Music$ cat TEST_PIPE fbatschu@lserver:~/Music$ echo $? 0
comment:8 by , 4 years ago
So we have the following situation:
1) the named pipe/fifo special file in the host is presented as a regular file in the guests shared folder file system.
2) you cannot create a named pipe/fifo special file in the shared folder file system in the guest.
3) in the guest, the shared folder file system behaves like the
file would be a named pipe/fifo even though it appears not to be one, ie. when reading, we have blocking access to that file until the other end of the named pipe /fifo in the host file system actually writes data to it which unblocks the guest access to that file again, ie. the behavior of an ordinary named pipe/fifo.
4) although 3) appears to be the named pipe behavior it is actually not really. once the hosts sending end of the named pipe/fifo has written something into the named pipe/fifo special file, the guest unblocks but fails with an error "Protocol error" and no actual data did travel from the hosts named pipe sending end to the guest named pipe receiving end.
5) writing data in the guests shared file system to this "regular|/named pipe file blocks until there is a reader on the other end of this named pipe in the host side. Once there is a reader at the host end of the named pipe/fifo, the guests writing process unblocks,ie. like in 3) we sort of have the behavior of an ordinary named pipe/fifo.
6) although 5) appears to be named pipe/fifo behavior it is actually not really. once the receiving end of the named pipe on the host is connected the guest process fails the write to the named pipe/fifo special file with EINVAL and no actual data did travel to the hosts process reading from the named pipe/fifo.
7) when a process in the guest reads or writes to that file in the shared folder file system that is supposed to be a named pipe/fifo special file (from the hosts perspective at least), you cannot interrupt this process with CTRL+C or by a signal, eg. kill -15, anymore when there is no process on the other (host) side end of the named pipe/fifo. That behavior is not the case for process doing the same in the regular file system.
comment:9 by , 4 years ago
So whether or not the current implementation is a non-working attempt to also deal with named pipes between the host and the guest or was never intended for such host/guest IPC at all, we need to fix it somehow.
The current fix approach will be:
1) host: deny any open() or mmap() of a host side file of type pipe,
return ENOTSUP to the guests attempt
2) host: return file attributes to the guest for such a file to reflect
the type pipe to the guest
3) guest: show the file correctly as being a pipe as can be seen in any other
Linux file system, ie:
$ ls -lisa fifo 14287767 0 prw-r--r-- 1 fbatschu staff 0 Sep 14 12:32 fifo $ stat fifo File: fifo Size: 0 Blocks: 0 IO Block: 4096 fifo Device: 801h/2049d Inode: 14287767 Links: 1 Access: (0644/prw-r--r--) Uid: ( 1000/fbatschu) Gid: ( 50/ staff) Access: 2020-09-14 12:32:35.993088487 +0200 Modify: 2020-09-14 12:32:35.993088487 +0200 Change: 2020-09-14 12:32:35.993088487 +0200 Birth: -
4) guest: fail an attempt to open() or mmap() such file with ENOTSUP,
need to figure out whether or not we channel the host error message through to the guest or if we add a short cut directly in the guests vboxsf
5) guest: change the error return value when the guest is attempting
to create a pipe from EPERM (Operation not permitted) to ENOTSUP (Operation not supported). Afterall this is not a permission failure and not a question about missing privileges, rather we do not support such operation on vboxsf at all.
comment:10 by , 4 years ago
Owner: | removed |
---|---|
Status: | accepted → assigned |
comment:11 by , 4 years ago
Just adding that I also hit this one, Oracle Linux 7U9 host running VirtualBox 6.1.22_144080, Windows 10 guest, VM hang upon performing shared folder operations with a named pipe in the root of the shared folder.
When the Windows VM hung, only option was to kill -9 the VirtualBox process at host level.
pstack of VirtualBox from host showed shared folder thread stuck in open64(), some juggling and strace'ing showed the named pipe being the argument of open64().
Removing the named pipe made all hangs disappear at once.
The part I can't quite explain is what changed to cause this lockup behavior, since the named pipe file was dated 2012 and I carry over the filesystem from laptop to laptop, using my Windows VM at least a couple times a week; lockups started somewhere between end of May and beginning of June 2021.
thanks for this excellent bug report!
THe fix strategy is likely going to be to show the hosts file as a pipe (currently it is shown as a regular file) but do not support any actvity on such a pipe file from the guest side and fail such attempts in the guest with ENOTSUP.