Discussion:
IPv6 -> IPv4 fallback broken in serf, kernel bug?
(too old to reply)
Don Lewis
2016-07-26 15:59:15 UTC
Permalink
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.

The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.

Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().

An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.

socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...


It looks like it should be possible to patch serf to handle this, but:
* Should POLLIN be set for this event?

* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
Karl Denninger
2016-07-26 16:06:23 UTC
Permalink
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
This is kinda serious in that the above manifestation in svn effectively
disables it for those of us that are on IPv4 connections and have no
provider capability for IPv6 at the present time. When I was running
10.2 this was not a problem but as soon as I rolled forward to 11.x it
showed up.

Fortunately svnlite does work, but if this same breakage manages to
migrate there as well.......
--
Karl Denninger
***@denninger.net <mailto:***@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
Don Lewis
2016-07-26 18:19:43 UTC
Permalink
Post by Don Lewis
* Should POLLIN be set for this event?
I don't think it should, but the standard doesn't cover this case. On a
successful non-blocking connect(), our man page says that select(2) will
indicate that the fd is writeable. The Open Group Base Specifications
Issue 7 says that pselect(), select(), and poll() shall indicate that
the socket is ready for writing. I haven't seen anything that says what
should be done if the connect fails.
Post by Don Lewis
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
Our read(2) man page does not document that ENOTCONN can be returned,
though we explicitly return it and it is listed as valid by The Open
Group Base Specifications. It does not list the connect failure errno
values other than ETIMEDOUT as valid for read(). Though read() should
not be called before the connection is up, if it is I *think* these
errors should be mapped to ENOTCONN, but handling ETIMEDOUT is trickier.
If that error came from the connection attempt, then we would want to
return ENOTCONN, but if the connection came up and was later dropped due
to a timeout, then ETIMEDOUT should be returned.
Don Lewis
2016-07-26 22:49:01 UTC
Permalink
Post by Karl Denninger
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
This is kinda serious in that the above manifestation in svn effectively
disables it for those of us that are on IPv4 connections and have no
provider capability for IPv6 at the present time. When I was running
10.2 this was not a problem but as soon as I rolled forward to 11.x it
showed up.
I saw it on 10.3-STABLE, but I don't see any changes in the kernel
source between the stable/10 branch point and the tip of that branch
that look suspicious. I'll try to find some time to write a simple test
case and run it on some older releases as well as on Linux.

It looks to me like soisdisconnected() should not do a read wakeup if
the socket was never in a connected state. I think it should also set a
new flag to indicate whether or not the socket was previously connected
so that read() and write() can return the proper errno value if the
socket was never connected.
Post by Karl Denninger
Fortunately svnlite does work, but if this same breakage manages to
migrate there as well.......
I'm surprised that svnlite is working for you. The truss output looks
the same to me as svn and the serf fallback code is the same.

socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffddb4,0x4) = 0 (0x0)
gettimeofday({ 1469572654.492874 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469572654.493011 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x802898300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x802898300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80289d064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...

The host I pointed svnlite at has both IPv4 and IPv6 addresses in DNS,
but it is only listening to IPv4 on port 80.


A lack of connectivity that results in the IPv6 connection requests
getting dropped into a black hole might behave differently. I'm not
sure that serf/apr wait for the ETIMEDOUT error to occur and may bail
out early. In that case they won't see the POLLIN event and won't take
the wrong code path that bypasses the fallback.
Bruce Evans
2016-07-26 22:57:00 UTC
Permalink
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
FreeBSD still bogusly returns POLLIN (and POLLRDNORM) together with
POLLHUP at EOF when there is no data (both set should mean both), and
still has the bogus POLLINIGNEOF, but it it almost never returns POLLERR.
My regression tests in tools/regression/poll check for not having this
bug

The only setting of POLLERR in kern is in kqueue_poll() for errors in
initialization, and this doesn't set the other flags.

The only uses of POLLERR in kern are:
- in select(), to turn POLLERR into "set" for any backend that sets it
(and there seems to be only 1 backend that sets it)
- in vop_stdpoll() and poll_no_poll(), there is inconsistent bogus masking
using POLLSTANDARD to obfuscate that standard flags which must be
ignored are _not_ masked.

So I don't see how you can get POLLIN with POLLERR.
Post by Don Lewis
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
I don't see any POLL* there or completely understand the notation or kqueue,
but this looks like the poll() bug with POLLIN together with POLLHUP, not
POLLIN together with POLLERR.

Everything here seems to be correct. Not very good, but good enough here.

EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set.
SBS_CANTRECVMORE means hangup, not EOF, and I think there can be
readable data from a socket in general but not after a connection
error. So this translation is incorrect in general but correct after
a connection error. kqueue just can't represent hangup and conflates
it with EOF.

When filt_soread() sets EV_EOF, it doesn't clear other flags, so
NOTE_LOWAT remains set. This happens to be correct. But since NOTE_LOWAT
really means low water, you can't use it to determine if (non-null) data
can be read. (POSIX is unclear about whether the "data" for select() and
poll() is actual data or just EOF.)

poll() has almost the opposite problems. It can represent hangup but
can't represent EOF. It can represent no data, but this doesn't mean
EOF when the file is open. It can't represent low-water.
so_poll_generic() starts carefully by setting POLLIN iff soreadable().
soreadable() is true above the watermark. So POLLIN for a socket
normally means that (non-null) data above the watermark can be read
(without blocking because it is above the watermark). This is correct
semantics. But then so_poll_generic() sets POLLIN if it sets POLLHUP.
This makes POLLIN worse than useless. A naive reader won't look at
POLLHUP, but will trust POLLIN and spin reading at EOF. A non-naive
reader will see POLLHUP but can't trust POLLIN then. It must spin
reading until read returns EOF, and poll() is useless for avoiding
this busy-waiting. Turning off O_NONBLOCK to avoid spinning is unsafe
if the EOF is not sticky.

Just having watermarks further complicates the idea of what "data" is.
Null data is a special case of data that it is too small to be worth
reading. It corresponds to a low watermark of 0 or 1. With watermarks,
non-null datai below low water should be considered as not being there
for the purposes of select() and poll(), but there if you try to read
it. POSIX is unclear about this too. kqueue has the opposite problem.
It handles watermarks directly, but seems to be missing support for
transient EOF.

This causes problems for tty devices too. In Net/2, select() basically
uses a hard-coded watermark of 1, and this doesn't even work to give
tinygrams because read() blocks after select() returns "set" for certain
MIN/TIME combinations where the watermark should be MIN. This was fixed
in FreeBSD-1, basically by copying the socket code. This was broken in
4.4BSD. This was broken in FreeBSD-2.early by copying 4.4BSD. This was
fixed in FreeBSD-2 by restoring fixes. The fixes were refined in
FreeBSD-[2-7]. All of the fixes were lost in FreeBSD-8. Most of the
fixes are restored in my version.
Post by Don Lewis
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
I think there never was any data, so no for poll(). kqueue just cannot
represent the no-data condition.
Post by Don Lewis
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
Don't know.

Bruce
Don Lewis
2016-07-26 23:40:54 UTC
Permalink
Post by Bruce Evans
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
FreeBSD still bogusly returns POLLIN (and POLLRDNORM) together with
POLLHUP at EOF when there is no data (both set should mean both), and
still has the bogus POLLINIGNEOF, but it it almost never returns POLLERR.
My regression tests in tools/regression/poll check for not having this
bug
The only setting of POLLERR in kern is in kqueue_poll() for errors in
initialization, and this doesn't set the other flags.
- in select(), to turn POLLERR into "set" for any backend that sets it
(and there seems to be only 1 backend that sets it)
- in vop_stdpoll() and poll_no_poll(), there is inconsistent bogus masking
using POLLSTANDARD to obfuscate that standard flags which must be
ignored are _not_ masked.
So I don't see how you can get POLLIN with POLLERR.
Post by Don Lewis
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
I don't see any POLL* there or completely understand the notation or kqueue,
but this looks like the poll() bug with POLLIN together with POLLHUP, not
POLLIN together with POLLERR.
I didn't try to decipher out the kqueue stuff. I was thinking that our
poll() was using kqueue under the hood, but it turns out that the poll
emulation is actually being done by apr. Sigh ...

A comment in the emulation code says:

/* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this
* implementation.

... double sigh.
Post by Bruce Evans
Everything here seems to be correct. Not very good, but good enough here.
EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set.
SBS_CANTRECVMORE means hangup, not EOF, and I think there can be
readable data from a socket in general but not after a connection
error. So this translation is incorrect in general but correct after
a connection error. kqueue just can't represent hangup and conflates
it with EOF.
But should there be a hangup or EOF if we never got connected in the
first place?
Post by Bruce Evans
When filt_soread() sets EV_EOF, it doesn't clear other flags, so
NOTE_LOWAT remains set. This happens to be correct. But since NOTE_LOWAT
really means low water, you can't use it to determine if (non-null) data
can be read. (POSIX is unclear about whether the "data" for select() and
poll() is actual data or just EOF.)
poll() has almost the opposite problems. It can represent hangup but
can't represent EOF. It can represent no data, but this doesn't mean
EOF when the file is open. It can't represent low-water.
so_poll_generic() starts carefully by setting POLLIN iff soreadable().
soreadable() is true above the watermark. So POLLIN for a socket
normally means that (non-null) data above the watermark can be read
(without blocking because it is above the watermark). This is correct
semantics. But then so_poll_generic() sets POLLIN if it sets POLLHUP.
This makes POLLIN worse than useless. A naive reader won't look at
POLLHUP, but will trust POLLIN and spin reading at EOF. A non-naive
reader will see POLLHUP but can't trust POLLIN then. It must spin
reading until read returns EOF, and poll() is useless for avoiding
this busy-waiting. Turning off O_NONBLOCK to avoid spinning is unsafe
if the EOF is not sticky.
Just having watermarks further complicates the idea of what "data" is.
Null data is a special case of data that it is too small to be worth
reading. It corresponds to a low watermark of 0 or 1. With watermarks,
non-null datai below low water should be considered as not being there
for the purposes of select() and poll(), but there if you try to read
it. POSIX is unclear about this too. kqueue has the opposite problem.
It handles watermarks directly, but seems to be missing support for
transient EOF.
This causes problems for tty devices too. In Net/2, select() basically
uses a hard-coded watermark of 1, and this doesn't even work to give
tinygrams because read() blocks after select() returns "set" for certain
MIN/TIME combinations where the watermark should be MIN. This was fixed
in FreeBSD-1, basically by copying the socket code. This was broken in
4.4BSD. This was broken in FreeBSD-2.early by copying 4.4BSD. This was
fixed in FreeBSD-2 by restoring fixes. The fixes were refined in
FreeBSD-[2-7]. All of the fixes were lost in FreeBSD-8. Most of the
fixes are restored in my version.
Post by Don Lewis
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
I think there never was any data, so no for poll(). kqueue just cannot
represent the no-data condition.
Post by Don Lewis
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
Don't know.
Bruce
Don Lewis
2016-07-27 01:24:38 UTC
Permalink
After giving this some more thought, I believe that the read and write
wakeups are correct when the connection attempt fails. I also think
that read() should return ENOTCONN if the socket never got to the
connected state.

I'm not sure how write() should behave. The Open Group Base
Specifications Issue 7 says:

[ECONNRESET]
A write was attempted on a socket that is not connected.

[EPIPE]
A write was attempted on a socket that is shut down for writing, or
is no longer connected. In the latter case, if the socket is of type
SOCK_STREAM, a SIGPIPE signal shall also be sent to the thread.

whereas our man page only mentions EPIPE.

I think poll() should set POLLERR and not POLLIN or POLLOUT if the
connection attempt fails.

I think kqueue is fine, but the poll() emulation in apr should map the
connection failure into POLLERR.
Bruce Evans
2016-07-27 06:16:43 UTC
Permalink
Post by Don Lewis
Post by Bruce Evans
Post by Don Lewis
...
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
I don't see any POLL* there or completely understand the notation or kqueue,
but this looks like the poll() bug with POLLIN together with POLLHUP, not
POLLIN together with POLLERR.
I didn't try to decipher out the kqueue stuff. I was thinking that our
poll() was using kqueue under the hood, but it turns out that the poll
emulation is actually being done by apr. Sigh ...
/* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this
* implementation.
... double sigh.
Post by Bruce Evans
Everything here seems to be correct. Not very good, but good enough here.
EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set.
SBS_CANTRECVMORE means hangup, not EOF, and I think there can be
readable data from a socket in general but not after a connection
error. So this translation is incorrect in general but correct after
a connection error. kqueue just can't represent hangup and conflates
it with EOF.
But should there be a hangup or EOF if we never got connected in the
first place?
I think hangup is correct. Named pipes have this problem and more. The
connection may be re-opened, so hangup should not be sticky. Except,
for some uses it should be sticky. The initial state when there is
no writer and no data is like a non-sticky hangup, and I think POLLHUP
should be returned for both. I think this is what the old fifofs
implementation did (it set SBS_CANT* initially and sopoll() should turn
this into POLLHUP). However, this is not quite right since it leaves
no good way to wait for a writer. select() and poll() are useless since
they are specified to return immediately in the hangup state. There is
no way to get back to a blocking open() with an open fd. You have to
use a new blocking open(). But a new open might have side effects, and
it often have to be in a separate thread, and with threads you could
do almost everything using blocking threads to do the i/o and waiting
in these threads instead of select() or poll().

Emulation gives another problem. It was difficult to emulate named
pipes on top of sockets in old fifofs even with full access to kernel
state and kernel events. The socket layer might be missing some state
or events for it changing. It was missing reporting of POLLHUP as late
as FreeBSD-4. This is difficult to fix in an emulator, and fifofs in
FreeBSD-4 didn't try. POLLHUP was just unsupported for most file types
in FreeBSD-4.

Bruce
Don Lewis
2016-07-27 07:42:46 UTC
Permalink
Post by Karl Denninger
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
This is kinda serious in that the above manifestation in svn effectively
disables it for those of us that are on IPv4 connections and have no
provider capability for IPv6 at the present time. When I was running
10.2 this was not a problem but as soon as I rolled forward to 11.x it
showed up.
Try the following apr patch. It works for me with svn, but I'm getting
a crash in another application that uses apr.

--- apr-1.5.2/poll/unix/kqueue.c.orig 2015-03-20 01:34:07 UTC
+++ apr-1.5.2/poll/unix/kqueue.c
@@ -25,21 +25,40 @@

#ifdef HAVE_KQUEUE

-static apr_int16_t get_kqueue_revent(apr_int16_t event, apr_int16_t flags)
+static apr_int16_t get_kqueue_revent(apr_int16_t event, apr_int16_t flags,
+ int fflags, intptr_t data)
{
apr_int16_t rv = 0;

- if (event == EVFILT_READ)
- rv |= APR_POLLIN;
- else if (event == EVFILT_WRITE)
- rv |= APR_POLLOUT;
- if (flags & EV_EOF)
- rv |= APR_POLLHUP;
- /* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this
- * implementation.
+ /* APR_POLLPRI and APR_POLLNVAL are not handled by this implementation.
* TODO: See if EV_ERROR + certain system errors in the returned data field
* should map to APR_POLLNVAL.
*/
+ if (event == EVFILT_READ) {
+ if (data > 0 || fflags == 0)
+ rv |= APR_POLLIN;
+ else
+ rv |= APR_POLLERR;
+ /*
+ * Don't return POLLHUP if connect fails. Apparently Linux
+ * does not, and this is expected by serf in order for IPv6 to
+ * IPv4 or multihomed host fallback to work.
+ *
+ * ETIMEDOUT is ambiguous here since we don't know if a
+ * connection was established. We don't want to return
+ * POLLHUP here if the connection attempt timed out, but
+ * we do if the connection was successful but later dropped.
+ * For now, favor the latter.
+ */
+ if ((flags & EV_EOF) != 0 && fflags != ECONNREFUSED &&
+ fflags != ENETUNREACH && fflags != EHOSTUNREACH)
+ rv |= APR_POLLHUP;
+ } else if (event == EVFILT_WRITE) {
+ if (data > 0 || fflags == 0)
+ rv |= APR_POLLOUT;
+ else
+ rv |= APR_POLLERR;
+ }
return rv;
}

@@ -290,7 +309,9 @@ static apr_status_t impl_pollset_poll(ap
pollset->p->result_set[j] = fd;
pollset->p->result_set[j].rtnevents =
get_kqueue_revent(pollset->p->ke_set[i].filter,
- pollset->p->ke_set[i].flags);
+ pollset->p->ke_set[i].flags,
+ pollset->p->ke_set[i].fflags,
+ pollset->p->ke_set[i].data);
j++;
}
}
@@ -471,7 +492,9 @@ static apr_status_t impl_pollcb_poll(apr
apr_pollfd_t *pollfd = (apr_pollfd_t *)(pollcb->pollset.ke[i].udata);

pollfd->rtnevents = get_kqueue_revent(pollcb->pollset.ke[i].filter,
- pollcb->pollset.ke[i].flags);
+ pollcb->pollset.ke[i].flags,
+ pollcb->pollset.ke[i].fflags,
+ pollcb->pollset.ke[i].data);

rv = func(baton, pollfd);
Don Lewis
2016-07-28 22:15:45 UTC
Permalink
Post by Karl Denninger
Post by Don Lewis
Serf has some code to fall back from IPv4 if an IPv6 and more generally
try different addresses on multi-homed servers if connection attempts
fail, but it does not work properly on recent versions of FreeBSD. I've
tested both recent FreeBSD 10.3-STABLE and HEAD.
The way that it is supposed to work is that serf creates a socket, sets
it non-blocking, calls connect(), and then passes the fd to poll(). When
the connection attempt fails, it expects to see a POLLERR event. The
POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of
a couple of other errors, then serf will move on to the next address.
Instead what happens is that serf also(?) sees POLLIN set, which it
processes first by calling read(), which returns an ECONNREFUSED error.
That not a documented error return from read().
An easy way to test this is to truss svn and attempt to do an http
checkout from a host that has both IPv6 and IPv4 addresses, but is not
listening on port 80. The only connection attempt will be to the IPv6
address.
socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4)
fcntl(4,F_GETFL,) = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0)
gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80549c064,8000) ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4) = 0 (0x0)
close(3) = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...
* Should POLLIN be set for this event?
* What errno value should read() return in this case, if it is
ECONNREFUSED, then that should be documented.
This is kinda serious in that the above manifestation in svn effectively
disables it for those of us that are on IPv4 connections and have no
provider capability for IPv6 at the present time. When I was running
10.2 this was not a problem but as soon as I rolled forward to 11.x it
showed up.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211430

Loading...