Opened 10 years ago
Last modified 10 years ago
#14213 new defect
NAT networking stops responding blocks all I/O on that interface for Ubuntu/Debian x64 guests
Reported by: | Coffee_fan | Owned by: | |
---|---|---|---|
Component: | network/NAT | Version: | VirtualBox 4.3.28 |
Keywords: | NAT | Cc: | pierrj@… |
Guest type: | Linux | Host type: | Windows |
Description
Summary
I can reproduce almost 100% this fault, which seems to affect Virtualbox versions 4.3.x in both Windows 8.1 and Windows 10 using ubuntu 14.04.2 or debian Jessie guests.
I attached a script, which pings a well known site once a second that you can run to show the precise moment at which networking stops responding. You may use this script or any tool you like for this. For the purposes of the bug description I will assume you are using the embedded script.
Repro steps to detect fault
- Make sure you have a standard NAT based single interface ubuntu guest running.
- Log-in to guest.
- Install either google chrome (triggers fault immediately) or chromium-browser (ubuntu) or chromium (debian jessie) which trigger very often but not 100% of the time. Install google-chrome using the following snippet:
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - echo "deb http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list sudo apt-get -yqq update sudo apt-get install -yqq google-chrome-stable
- Start embedded script to monitor network, access.
Expected result: It should be possible to ping well known address, the result should be something along these lines: ... 1 packets transmitted, 1 received, 0% packet loss, time 0ms ...
- Start chromium browser from UI and try to access google.com or any address.
Result: Scripts start showing that network access is broken. Lines like the following one show in script after a small delay where everything is frozen: ... 1 packets transmitted, 0 received, 100% packet loss, time 0ms ... Result 2: Virtualbox process in Windows becomes unresponsive and unstable, cannot be easily stopped, may turn windows blank as hung processes show.
Expected result: Network access should work normally, no disruptions.
Script to trigger fault
For repro steps I used:
./net_tester.sh -x
$ cat net_tester.sh #!/bin/bash function usage() { cat <<eof Usage: $(basename $0) [-x] [-h] [output log file] The purpose of this script is to check network reliability of Virtualbox Options: -x, --external Uses the google DNS to ping instead of NAT gateway address. -h, --help This message. eof exit 1 } # # Change the following line to the address of the NAT gateway. # ping_address=10.0.0.2 out_file=out.log while [ "$1" != "" ] do case "$1" in -h|--help) usage ;; -x|--external) ping_address=8.8.8.8 ;; *) out_file=$1 ;; esac shift done while : do date ping -c 1 ${ping_address} sleep 1 done | tee ${out_file}
Attachments (4)
Change History (20)
comment:2 by , 10 years ago
As far as I remember, in my case I went as far back as 4.3.8 and reproduced similar behavior in each build. I was hoping it was a regression, but does not seem to be. I will try test-build and I will try again with 4.3.8 or 4.3.12 which seemed to be the most stable. I will also include a table with my findings.
comment:3 by , 10 years ago
Actually, since the naming is unfortunately confusing, do you use "NAT" or "NAT Network"?
comment:5 by , 10 years ago
This is a table I assembled about a month ago, which listed my observations of running not google chrome, but a build process that takes some 12 minutes without network interruptions. I apologize for the reference to VMWare in 4.3.28, but oddly enough, I am able to recursively run ubuntu in VirtualBox 4.3.28 inside an ubuntu VMWare guest, with no network problems and the build in this case would take 12 mins, which is similar to what Virtualbox native takes when the networking is stable.
Version | Summary | Comments |
---|---|---|
4.3.8 | Good for W81 | Network interruptions every two minutes, but they do not seem as disruptive as in other builds. |
4.3.12 | Barely OK for W81 | Build takes 17m22s with network interruptions of 10secs every minute or so. Recovery takes longer. |
4.3.18 | Good for W81 | Build takes 16 mins with network interruptions of 10secs every 2minutes, whereas, the same, in Virtualbox 4.3.28 inside a VMWare VM takes 11m50 and in VMWare it takes 16m. |
4.3.28 | Too fragile | Constant network interruptions |
comment:6 by , 10 years ago
Sorry for the name confusion, was trying to find the best way to describe. :-).
follow-up: 10 comment:7 by , 10 years ago
When network interruption happens, is it just packet loss or do you lose physical (well, virtual :) link with DHCP renewal afterwards?
comment:8 by , 10 years ago
I tried 4.3.29.101039 on Windows 10 10130 and the behavior reproes immediately. As soon as I have time, will try test-build with Windows 8.1.
comment:9 by , 10 years ago
Please, can you attach VBox.log
that corresponds to the run that experience problems. If you can also provide both host and guest side packet captures, that would also be useful. TIA.
comment:10 by , 10 years ago
Replying to vushakov:
When network interruption happens, is it just packet loss or do you lose physical (well, virtual :) link with DHCP renewal afterwards?
I lose all connectivity. I have not sniffed the network to see whether there is DHCP renewal or not. I have wireshark and will check that.
follow-up: 12 comment:11 by , 10 years ago
Do you see e1000: eth0 NIC Link is Down
in dmesg in /var/log/syslog
(and corresponding network manager messages)?
comment:12 by , 10 years ago
Replying to vushakov:
Do you see
e1000: eth0 NIC Link is Down
in dmesg in/var/log/syslog
(and corresponding network manager messages)?
I did not see NIC Link is down in dmesg. I only saw this:
[ 2.665021] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
As to /var/log/syslog not sure what to look for. I attached it and also the host side.
I can create a fresh VM on Windows 8.1 and we start from there if you want, as I am on Windows 10 in this machine, which adds more variables. Your choice.
comment:13 by , 10 years ago
Please, can you also attach the .vbox
file of the VM. Your VBox.log
has
NAT: Host Resolver conflicts with DNS proxy, the last one was forcely ignored
so you probably have them both turned on and that's probably not intended. You don't want host resolver, unless you have a very special setup, which you most likely don't.
by , 10 years ago
Attachment: | vbox4.3.28_windows.log.7z added |
---|
Added file with log of connectivity which shows how network connectivity appears and disappears.
comment:14 by , 10 years ago
I think I use host resolver, because otherwise, the intranet DNS, which is out of Windows Active Directory does not get properly propagated to the VMs, which means I can resolve public IP addresses, but NOT Intranet addresses. When I put --natdnshostresolver1, internal network resolution works.
Could the conflict you mention be because yesterday, trying to find a way in which things would work, I enabled --natdnsproxy1 in addition to --natdnshostresolver1?
If that is the case, the net result is it did not work. Currently in the vagrant file, both settings are commented out and the issue is still happening.
comment:15 by , 10 years ago
The file size limit impedes uploading the captures. Do you have an email address I can send this to?
comment:16 by , 10 years ago
Packet captures compress very well. If they are not significantly larger than the limit, split
(1) them. If they are significantly larger, I'm afraid they will be above the limits that the mail server accepts anyway.
dropbox or some other cloud storage?
Probably a duplicate of #13987UPD: misinterpreted "NAT networking".