[Zope] Re: Running more than one instance on windows often block each other

Tim Peters tim.peters at gmail.com
Thu Jul 28 16:43:11 EDT 2005


[Sune B. Woeller]
> ...
> This is what I'm experiencing as well.
> I can narrow it down a bit: I *always* experience one out of two
> erroneous behaviours, as described below.

I see only one of the behaviors below (the second -- no problems), and
don't agree it's in error.

> I tried to make an even simpler test situation, without binding
> sockets 'r' and 'w' to each other in the same process. I try to
> reproduce the problem in a 'standard' socket use case, where a client
> in one process binds to a server in another process.
> 
> The following two scripts acts as a server and a client.
> 
> #***********************
> # sock_server_reader.py
> #***********************
> import socket
>
> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)

Note that

    a = socket.socket()

is an easier way to spell the same thing; the Medusa code is ancient.

> a.bind(("127.0.0.1", 19999))
> print a.getsockname()  # assigned (host, port) pair
>
> a.listen(1)
>
> print "a accepting:"
> r, addr = a.accept()  # r becomes asyncore's (self.)socket
> print "a accepted: "
> print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
> 
> a.close()

Key point:  no socket is _listening_ on address ("127.0.0.1", 19999)
after this close().  From what comes later, I guess you believe that
no socket should be allowed to listen on that address again until all
connections made with that `a` also close, but I don't think you'll
find anything in socket documentation to support that belief.  In the
world of socket connections, what needs to be unique is _the
connection_, and that's a 4-tuple:

    (side 1 host, side 1 port, side 2 host, side 2 port)

There's no prohibition against seeing either side's address in any
number of connections simultaneously, you just can't have two
connections simultaneouly that match in all 4 positions.  It so
happens that Windows is happy to allow another socket to bind to a
port the instant after a socket that had been listening on it closes
(and regardless of whether connections made via the latter are still
open), but I don't believe that's a bug.

What I appear to be seeing is that sometimes-- rarely --Windows allows
binding to a port by two sockets simultaneously, not serially as
you're showing here.  Simultaneous binding (in the absence of
SO_REUSEADDR on Windows) is a bug.
  
> msg = r.recv(100)
> print 'msg recieved:', msg
>
>
> #***********************
> # sock_client_writer.py
> #***********************
> import socket, random
> 
> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
> w.setsockopt(socket.IPPROTO_TCP, 1, 1)

> print 'w connecting:'
> w.connect(('127.0.0.1', 19999))
> print 'w connected:'
> print w.getsockname()
> print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername())
> msg = str(random.randrange(1000000))
> print 'sending msg: ', msg
> w.send(msg)
>
> There are two possible outcomes [a) and b)] of running two instances
> of this client/server pair (that is, 4 processes in total like the
> following).
> (Numbers 1 to 4 are steps executed in chronological order.)
>
> 1) python -i sock_server_reader.py

So -i keeps the connection open -- these programs never "finish".

> The server prints:
>     ('127.0.0.1', 19999)
>     a accepting:
> and waits for a connection
>
> 2) python -i sock_client_writer.py
> The client prints:
>     w connecting:
>     w connected:
>     ('127.0.0.1', 3774)
>      ('127.0.0.1', 3774), peer=('127.0.0.1', 19999)
>     sending msg:  903848
>     >>>
>
> and the server now accepts the connection and prints:
>     a accepted:
>      ('127.0.0.1', 19999), peer=('127.0.0.1', 3774)
>     msg recieved: 903848
>     >>>
>
> This is like it should be.

Agreed so far <wink>.

> Then lets try to setup a second
> client/server pair, on the same port (19999). The expected outcome of
> this is that the bind() call in sock_server_reader.py should fail with
> socket.error: (10048, 'Address already in use').

Sorry, I don't expect that.  sock_server_reader is no longer listening
on port 19999, so there's no reason some other socket can't start
listening on it.

> 3) python -i sock_server_reader.py
> The server prints:
>     ('127.0.0.1', 19999)
>     a accepting:
> 
> Already here the problem occurs, bind() is allowed to bind to a port
> that is in use, in this case by the client socket 'r'.
> [also on other windows ? Mikkel: yes. Diku:???]

I showed an example before of how you can get any number (well, up to
64K) of sockets simultaneously alive saying they're bound to the same
address, on Windows or Linux.  The socket returned by a.accept()
always duplicates a's (hosthame, port) address.  That's so that if the
peer asks for its peer, it gets back the address it originally
connected to.  It may be confusing, but that's how it works.

Windows and Linux seem to differ in how willing they are to reuse a
port after a listening socket is closed, but dollars to doughnuts says
Microsoft wouldn't accept a claim that their behavior is "a bug".

> 4) python -i sock_client_writer.py
> Now one out of two things happen:
> 
> a) The client prints:
>     w connecting:
>     Traceback (most recent call last):
>       File "c:\pyscripts\sock_client_writer.py", line 7, in ?
>         w.connect(('127.0.0.1', 19999))
>       File "<string>", line 1, in connect
>     socket.error: (10061, 'Connection refused')
>     >>>

How often do you see this?  I haven't seen it yet, but I can't make
hours today to do this hand.

>    The server waits on the call to accept(), still waiting for a
> connection. (This is the blocking behaviour I reported in my first
> mail, experienced when running two zope instances. The socket error
> was swallowed by the unconditional except clause).

The real reason (and well-hidden it is) the Medusa code puts its
connect() call in try/except is because the Medusa code (but not your
code here) set w to non-blocking mode before the connect, and
w.connect() on a non-blocking socket is always exceptional ("in
progress" on Linux, "would block" on Windows).  I have no idea why the
Medusa code set w to be non-blocking to begin with, and although I
haven't mentioned it here before, I saw all the same symptoms when I
removed the non-blocking convolutions.

> b) The client connects to the server:
>     w connecting:
>     w connected:
>     ('127.0.0.1', 3865)
>      ('127.0.0.1', 3865), peer=('127.0.0.1', 19999)
>     sending msg:  119105
>     >>>
>
> and the server now accepts the connection and prints:
>     a accepted:
>     ('127.0.0.1', 19999), peer=('127.0.0.1', 3865)
>     msg recieved: 119105
>     >>>

This is the outcome I've seen every time I've tried it by hand -- no problems.

> The second set of client/server processes are now connected on the
> same port as the first set of client/server processes.

You can get to a similar end more easily by having a server socket
accept more than one connection -- all accept()'ed connections have
the same socket address.  The connection 4-tuples all differ, though,
and that's what matters.

> In a port scanner the port now belongs two the second server process [3)].
>
>
> I always get one out of these two possibilities (a and b), I never
> see bind() raising socket.error: (10048, 'Address already in use').

If you can type very, very quickly <wink>, you should see that on
Windows if you manage to try binding to 19999 before the original
a.close() manages to complete.

> It is important to realize that both these outcomes are an error.

If it were true that outcome #b were in error, Windows would have a
trivially easy-to-reproduce gross bug here, of many years' standing 
Life's rarely that simple, alas.

> I tried the same process as above on a linux system, and 3) always
> raises (10048, 'Address already in use').

Same here, but I've found nothing in socket docs requiring this
behavior, and, indeed, there doesn't appear to be a _logical_
necessity for it.  It's in fact somewhat of a pain on Linux, becuase I
continue to get 'Address already in use' even after I close both ends
of the socket connection too, presumably waiting for the 4-minute
TIME_WAIT shutdown dance to end.

...

> In my case bind() always raises (10048, 'Address already in use') when
> there is an open server socket like 'a' bound to the same port.

That's as it should be.  Alas, what I believe the program I sent last
night shows is that Windows doesn't _always_ raise 'Address already in
use' when two server sockets are binding to the same port
simultaneously.  Instead it sometimes says "OK, you got it" to _both_
of them.  This is seriously difficult for me to provoke, BTW:  10-60
minutes per failure, on a 3.4 GHz hyperthreaded box.

> To summarize:
> Closing a server socket bound to a given port, alows another server
> ocket to bind to the same port, even when there are open client
> sockets bound to the port.

And in Windows, I believe that's by design.  Indeed, I expect the
Medusa Windows code tries such a small number of ports (no more than
about 50) precisely because Windows has always allowed reusing a
listening port so quickly.  Otherwise the code would need to try at
least as many ports as "the maximum" number of triggers that could
possibly be open simultaneously -- but there's no way to know what the
maximum is, and it's expensive to try binding to a large number of
ports.  It would have to impose an artificial limit.

The Medusa Linux code avoids all of this by creating pipes instead;
alas, Windows asyncore.py can't work with Windows pipes.


More information about the Zope mailing list