More signal 11 restarts....

List overview All Threads
Download

newer

older

more than one user: zope 'hangs'

recipe for trapping SIGSEGV and...

Daniel Duclos

26 Nov 2001 26 Nov '01

5:47 p.m.

I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

-- daniel lobato duclos -- daniduc@hiper.com.br -- http://www.hiperlogica.com.br -----------------------------------------------------------------------------

Show replies by date

Matthew T. Kromer

27 Nov 27 Nov

3:30 p.m.

New subject: [Zope-dev] More signal 11 restarts....

The only real suggestion I have is to attach the debugger to a running thread and hope it hits the fault while the debugger is attached. Linux core files are difficult to impossible to debug when threading is active. The 2.4 kernels may have addressed this some but I dont know that gdb has caught up.

On Monday, November 26, 2001, at 12:47 PM, Daniel Duclos wrote:

...

I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

-- daniel lobato duclos -- daniduc@hiper.com.br -- http://www.hiperlogica.com.br

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Dieter Maurer

8:04 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Daniel Duclos writes:

...

I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter?

I would enable "core" writing (if they are not already enabled), this is done with the "ulimit/limit" shell command.

Then, the SIG 11 will create "core" files that can be analysed with a debugger. This may help localize the problem...

Dieter

Harald Koschinski

5 Dec 5 Dec

4:32 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Daniel Duclos wrote:

...

I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

I have the same problem since we are live :-(((

I have the same versions running like you and I tried the same to fix - but no change.

What is your state with this problem. Fixed ? How?

regards

Harald

Chris McDonough

6:36 p.m.

New subject: [Zope-dev] More signal 11 restarts....

You folks should turn on "big M" logging (via -M) and see if you see a pattern to when the system dumps core. You can use the utilities/requestprofiler script to analyze the big M log.

----- Original Message ----- From: "Harald Koschinski" harald.koschinski@friatec.de To: "Daniel Duclos" daniduc@hiper.com.br; "zope-dev" zope-dev@zope.org Sent: Wednesday, December 05, 2001 11:32 AM Subject: Re: [Zope-dev] More signal 11 restarts....

...

Daniel Duclos wrote:

...
I have a zope that is dumping signal 11 every 40 minutes or so. I

have tried

...

...
recompile python 2.1.1 with-threads without-pymalloc, recompile

Zope with it,

...

...
recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1,

upgraded to

...

...
Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to

mention

...

...
abot my case! Thanks in advance!!

I have the same problem since we are live :-(((

I have the same versions running like you and I tried the same to

fix -

...

but no change.

What is your state with this problem. Fixed ? How?

regards

Harald

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Leonardo Rochael Almeida

7:26 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Hi Harald,

On Wed, 2001-12-05 at 14:32, Harald Koschinski wrote:

...

Daniel Duclos wrote:

...
I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

I have the same problem since we are live :-(((

I have the same versions running like you and I tried the same to fix - but no change.

What is your state with this problem. Fixed ? How?

It's not fixed, but we managed to make it bearable so as not to lose the client.

by replacing LoginManager with exUserFolder we managed to bring down the zope restart time from 5 min (we have a HUGE Data.fs) to 20 secs, and by installing ZEO we brought down the restart time to between 2 and 8 secs. By increasing the caching of requests we managed to increase the time between restart from 15 min to 2 hours. That and a nicely formated Apache error page for Proxy Errors, for the lucky bastards who happen to hit the server in the exactly 2 seconds of restart, managed to calm down the client enough for us to breath.

I'll try the requestprofiler tip Chris gave, but I don't have much hope, since we use mysql for authentication and the site is authenticated mostly everywhere.

We are seriously considering droping mysql for postgresql.

Cheers, Leo

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Leonardo Rochael Almeida

7:38 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Sorry, I made an incorrect statement. see below.

On Wed, 2001-12-05 at 17:26, Leonardo Rochael Almeida wrote:

...

Hi Harald,

On Wed, 2001-12-05 at 14:32, Harald Koschinski wrote:

...
Daniel Duclos wrote:

...
I have a zope that is dumping signal 11 every 40 minutes or so. I have tried recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

I have the same problem since we are live :-(((

I have the same versions running like you and I tried the same to fix - but no change.

What is your state with this problem. Fixed ? How?

It's not fixed, but we managed to make it bearable so as not to lose the client.

by replacing LoginManager with exUserFolder we managed to bring down the zope restart time from 5 min (we have a HUGE Data.fs) to 20 secs, and by installing ZEO we brought down the restart time to between 2 and 8 secs. By increasing the caching of requests we managed to increase the time between restart from 15 min to 2 hours.

that is incorrect. by recompiling Python without pymalloc we increased the time between restart from 15 min to 40 min, and by caching SQL requests we increased the time further to 2h between restarts.

...

That and a nicely formated Apache error page for Proxy Errors, for the lucky bastards who happen to hit the server in the exactly 2 seconds of restart, managed to calm down the client enough for us to breath.

I'll try the requestprofiler tip Chris gave, but I don't have much hope, since we use mysql for authentication and the site is authenticated mostly everywhere.

We are seriously considering droping mysql for postgresql.

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Matthew T. Kromer

8:10 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Leonardo Rochael Almeida wrote:

...

It's not fixed, but we managed to make it bearable so as not to lose the client.

by replacing LoginManager with exUserFolder we managed to bring down the zope restart time from 5 min (we have a HUGE Data.fs) to 20 secs, and by installing ZEO we brought down the restart time to between 2 and 8 secs. By increasing the caching of requests we managed to increase the time between restart from 15 min to 2 hours. That and a nicely formated Apache error page for Proxy Errors, for the lucky bastards who happen to hit the server in the exactly 2 seconds of restart, managed to calm down the client enough for us to breath.

I'll try the requestprofiler tip Chris gave, but I don't have much hope, since we use mysql for authentication and the site is authenticated mostly everywhere.

We are seriously considering droping mysql for postgresql.

Cheers, Leo

Leo,

Are you comfortable with hooking up gdb to Zope to try to catch this? I suspect, but do not know, that the MySQL python adapter is probably not doing something right w.r.t. memory management. Unfortunately, it is probably also the case that the problem only occurs with high-volume traffic -- particularly if it is a timing related bug.

We have not been able to reproduce this problem in any deterministic way -- and the only people who seem to have it are those who are heavy MySQL users; it makes me think there is something in the adapter which is not behaving the same way under Python 2.1 than it did under Python 1.5.2. I have not looked at the adapter, so I'm making a few guesses as what is going wrong.

Leonardo Rochael Almeida

8:37 p.m.

New subject: hooking up gdb (was Re: [Zope-dev] More signal 11 restarts....)

Well, one of the things I was going to ask next was for some help doing postmortem.

We aren't getting any core files, even after setting ulimit correctly (although we could be setting it uncorrectly. I'll look into that further). Anyway, someone else in this list said that core dumps for threaded apps in Linux were mostly useless, so we aren't investing much energy in it anyway.

With the short restart times we have, I'd prever a solution that didn't involve keeping a dead site dead for too long (as in, debugging with gdb). We are working in a ZEO scheme that would switch over the accelerator to proxy another zeo client, but we are not there yet.

It would be ideal if we could instruct python to grab the SIG11, invoke gdb, get a C stacktrace for all threads and let Zope die in peace. If it all happend in a few seconds, we will still keep the client happy.

So, to answer your question, yes, I am confortable hooking up gdb. I'd just prefer if it could be done in as little time as possible.

Cheers, Leo

On Wed, 2001-12-05 at 18:10, Matthew T. Kromer wrote:

...

Leo,

Are you comfortable with hooking up gdb to Zope to try to catch this? I suspect, but do not know, that the MySQL python adapter is probably not doing something right w.r.t. memory management. Unfortunately, it is probably also the case that the problem only occurs with high-volume traffic -- particularly if it is a timing related bug.

We have not been able to reproduce this problem in any deterministic way -- and the only people who seem to have it are those who are heavy MySQL users; it makes me think there is something in the adapter which is not behaving the same way under Python 2.1 than it did under Python 1.5.2. I have not looked at the adapter, so I'm making a few guesses as what is going wrong.

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Matthew T. Kromer

8:41 p.m.

New subject: hooking up gdb (was Re: [Zope-dev] More signal 11 restarts....)

Leonardo Rochael Almeida wrote:

...

Well, one of the things I was going to ask next was for some help doing postmortem.

We aren't getting any core files, even after setting ulimit correctly (although we could be setting it uncorrectly. I'll look into that further). Anyway, someone else in this list said that core dumps for threaded apps in Linux were mostly useless, so we aren't investing much energy in it anyway.

With the short restart times we have, I'd prever a solution that didn't involve keeping a dead site dead for too long (as in, debugging with gdb). We are working in a ZEO scheme that would switch over the accelerator to proxy another zeo client, but we are not there yet.

It would be ideal if we could instruct python to grab the SIG11, invoke gdb, get a C stacktrace for all threads and let Zope die in peace. If it all happend in a few seconds, we will still keep the client happy.

So, to answer your question, yes, I am confortable hooking up gdb. I'd just prefer if it could be done in as little time as possible.

Cheers, Leo

Well largely, ALL I want is the backtrace -- and I'm wondering if I could cobble something together that could get it. The problem is it needs to look at the symbol table, and I dont know how to get at that via C -- ie, gdb doesnt have an interface that I know of that you can link in to grab a stack trace and exit.

Its been a while since I prowled the gdb source. I may not be able to do anything automatic like you want -- but I sure wish that tool was available!

Leonardo Rochael Almeida

9 p.m.

New subject: [Zope-dev] maybe we could script it (Re: hooking up gdb)

On Wed, 2001-12-05 at 18:41, Matthew T. Kromer wrote:

...

Leonardo Rochael Almeida wrote:

Well largely, ALL I want is the backtrace -- and I'm wondering if I could cobble something together that could get it. The problem is it needs to look at the symbol table, and I dont know how to get at that via C -- ie, gdb doesnt have an interface that I know of that you can link in to grab a stack trace and exit.

Its been a while since I prowled the gdb source. I may not be able to do anything automatic like you want -- but I sure wish that tool was available!

Maybe we could script it. If we can convince python, in the event of a SIG11, freeze every other thread and run some python function, or even an external script, we could make this script invoke gdb with another script (a gdb script) that would attach to the zope python process and get a stack trace of every thread.

No need to get fancy with C.

Now the nice thing would be if I could generate a Zope package just like the binary Zopes at zope.org, but compiled with my options (such as -g).

Speaking of options, is linux-binary-Zope-2.4.3-python still being compiled with --pymalloc? As I mentioned before, our segfaults reduced drastically WITHOUT it.

Cheers, Leo

PS: if anyone is wondering why I keep changing the subject in this thread, check out this URL: http://udell.roninhouse.com/GroupwareReport.html

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Matthew T. Kromer

6 Dec 6 Dec

2:11 p.m.

New subject: [Zope-dev] maybe we could script it (Re: hooking up gdb)

----- Original Message ----- From: "Leonardo Rochael Almeida" leo@hiper.com.br

...

Speaking of options, is linux-binary-Zope-2.4.3-python still being compiled with --pymalloc? As I mentioned before, our segfaults reduced drastically WITHOUT it.

The answer is "Yes, I think" :)

I built the linux binaries for the Zope distribution of Python 2.1 some time ago and I was already aware of problems with pymalloc, so I certainly didn't enable it; and I think 2.1 did not enable it by default.

Dieter Maurer

5 Dec 5 Dec

10:41 p.m.

New subject: hooking up gdb (was Re: [Zope-dev] More signal 11 restarts....)

Matthew T. Kromer writes:

...

... Well largely, ALL I want is the backtrace -- and I'm wondering if I could cobble something together that could get it. The problem is it needs to look at the symbol table, and I dont know how to get at that via C -- ie, gdb doesnt have an interface that I know of that you can link in to grab a stack trace and exit.

I seem to remember that one can define gdb commands that are executed whenever the program stops.

It this were true, we could activate such a definition in a ".gdbinit" file and then run the program. When the Python stops due to a signal, the command would be activated. It would make a backtrace followed by a quit.

Not sure, it will work, though...

Dieter

Leonardo Rochael Almeida

6 Dec 6 Dec

5:05 p.m.

New subject: [Zope-dev] hooking up gdb is the problem, not scripting it (Re: hooking up gdb)

I don't think telling gdb what to do would be a problem. The worst that could happen is that I'd stuff my commands down gdb's stdin in a script.

The tricky part is convincing gdb to hook up to python when the sig 11 hits. Isn't there a trap instruction somewhere where I could tell python to run an external script when hit by a sig11?

Or perhaps I should be running zope from gdb? Wouldn't that be prohibitively slow? We are talking about a production machine here, with a really huge Data.fs (aprox. 1GB) and lots of data on MySQL

Yesterday I put some zLOG.LOG calls imediately before and after db.query() in ZMySQLDA/db.py. If I see calls that don't finish before a sig 11 then I'll be sure it is _mysql.so fault.

On Wed, 2001-12-05 at 20:41, Dieter Maurer wrote:

...

Matthew T. Kromer writes:

...
... Well largely, ALL I want is the backtrace -- and I'm wondering if I could cobble something together that could get it. The problem is it needs to look at the symbol table, and I dont know how to get at that via C -- ie, gdb doesnt have an interface that I know of that you can link in to grab a stack trace and exit.

I seem to remember that one can define gdb commands that are executed whenever the program stops.

It this were true, we could activate such a definition in a ".gdbinit" file and then run the program. When the Python stops due to a signal, the command would be activated. It would make a backtrace followed by a quit.

Not sure, it will work, though...

Dieter

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Dieter Maurer

6:45 p.m.

New subject: [Zope-dev] Re: hooking up gdb is the problem, not scripting it (Re: hooking up gdb)

Leonardo Rochael Almeida writes:

...

The tricky part is convincing gdb to hook up to python when the sig 11 hits. Isn't there a trap instruction somewhere where I could tell python to run an external script when hit by a sig11?

That's what I meant....

...

From "gdb" online help: (gdb) help stop There is no `stop' command, but you can set a hook on `stop'. This allows you to set a list of commands to be run each time execution of the program stops.

Dieter

Andy Dustman

2:43 p.m.

New subject: hooking up gdb (was Re: [Zope-dev] More signal 11 restarts....)

On Wed, 2001-12-05 at 15:41, Matthew T. Kromer wrote:

...

Leonardo Rochael Almeida wrote:

...
We aren't getting any core files, even after setting ulimit correctly (although we could be setting it uncorrectly. I'll look into that further). Anyway, someone else in this list said that core dumps for threaded apps in Linux were mostly useless, so we aren't investing much energy in it anyway.

With the short restart times we have, I'd prever a solution that didn't involve keeping a dead site dead for too long (as in, debugging with gdb). We are working in a ZEO scheme that would switch over the accelerator to proxy another zeo client, but we are not there yet.

It would be ideal if we could instruct python to grab the SIG11, invoke gdb, get a C stacktrace for all threads and let Zope die in peace. If it all happend in a few seconds, we will still keep the client happy.

Well largely, ALL I want is the backtrace -- and I'm wondering if I could cobble something together that could get it. The problem is it needs to look at the symbol table, and I dont know how to get at that via C -- ie, gdb doesnt have an interface that I know of that you can link in to grab a stack trace and exit.

If you don't think a core dump is going to be useful, gdb isn't going to be either.

I know I have gotten Zope to dump core before, and I think I did this with -Z '', i.e. don't start a management process. Then you need some other way to start Zope when it dies.

As for ZMySQLDA/MySQLdb, I do know that the MySQL client libraries will crash if you try use the same connection more than once simultaneously in two different threads. I have never quite been sure whether or not there is some kind of locking in Zope to prevent threads from simultaneously using two database connections, since I expect this would cause problems on virtually all implementations.

-- Andy Dustman PGP: 0x930B8AB6 @ .net http://dustman.net/andy You can have my keys when you pry them from my dead, cold neurons.

Matthew T. Kromer

3:22 p.m.

New subject: hooking up gdb (was Re: [Zope-dev] More signal 11 restarts....)

Andy Dustman wrote:

...

If you don't think a core dump is going to be useful, gdb isn't going to be either.

Well, the problem is on Linux, the core file is from the process that received the SIG11, not the one that caused it, in most cases (due to the way Linux implements threads). To the best of my knowledge, the core does NOT contain the necessary registers of the remaining threads; thus gdb can't show you which thread dumped core.

However, attaching gdb to the running zope usually does work, since gdb can inspect the processes when they are running to get the thread information.

Dario Lopez-Kästen

7 Dec 7 Dec

9:13 a.m.

New subject: With Oracle as well Re: [Zope-dev] More signal 11 restarts....

Matt Kromer wrote:

...

We have not been able to reproduce this problem in any deterministic way -- and the only people who seem to have it are those who are heavy MySQL users; it makes me think there is something in the adapter which is not behaving the same way under Python 2.1 than it did under Python 1.5.2. I have not looked at the adapter, so I'm making a few guesses as what is going wrong.

Well, sorry to disapoint everybody, but we have the same signal 11 restarts here.

We are using DCO2 latest from CVS and have _very_ high Oracle database usage.

We have yesterday changed from our solaris box to a linux box and performance has increased dramatically (the linux box ia a 1.8 GHz P3 :).

also the threading problems we previously had seem to have dissapeared.

Our current setup:

redhat 7.2 Oracle client 8.1.7 Python 2.1 source compiled --without-pymalloc Zope 2.4.3 with transparent folders, formulator, replace support, localfs lastest DCO2

Now, what can we do to pin down the problem. Is there anyone else that is a heavy databse user on similar circumstances that can share information?

I am starting to suspect that there is some kind of DA problem here...

Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

Any thoughts?

/dario

Matthew T. Kromer

1:19 p.m.

New subject: With Oracle as well Re: [Zope-dev] More signal 11 restarts....

Dario Lopez-Kästen wrote:

...

Well, sorry to disapoint everybody, but we have the same signal 11 restarts here.

Oh sure, go spoil my "blame it on the other guy" theory.

...

We are using DCO2 latest from CVS and have _very_ high Oracle database usage.

We have yesterday changed from our solaris box to a linux box and performance has increased dramatically (the linux box ia a 1.8 GHz P3 :).

That's to be expected with the clock speed differences. Unless you use sun's CC, you get fairly poor SPARC code out of gcc, IMHO.

...

also the threading problems we previously had seem to have dissapeared.

Yah! I think that had to do with the rather *stupid* act of forgetfulness on my part to re-enable python threading around execute().

...

Now, what can we do to pin down the problem. Is there anyone else that is a heavy databse user on similar circumstances that can share information?

I may see what I can do to try to write a script to be able to invoke gdb in the event of a crash. Stay tuned.

...

I am starting to suspect that there is some kind of DA problem here...

Actually, since its a mysterious sig 11, it's a C module someplace... there is probably ONE module which is referring to an object after it has been deallocated.

...

Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

Any thoughts

Well, that means "the browser user clicked 'stop'" -- Medusa is just telling you the channel went away on it. Thats normal when the browser chops the tcp connection.

...

Dario Lopez-Kästen

1:49 p.m.

New subject: With Oracle as well Re: [Zope-dev] More signal 11 restarts....

...

...
Well, that means "the browser user clicked 'stop'" -- Medusa is just telling you the channel went away on it. Thats normal when the browser chops the tcp connection.

*looong sigh of relief*

/dario

Leonardo Rochael Almeida

3:47 p.m.

New subject: [Zope-dev] browser closing connection

Hi, all

On Fri, 2001-12-07 at 11:19, Matthew T. Kromer wrote:

...

...
Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

Any thoughts

Well, that means "the browser user clicked 'stop'" -- Medusa is just telling you the channel went away on it. Thats normal when the browser chops the tcp connection.

As Dario said, it's a relief to know that. but if it's an expected scenario, shouldn't zope put a more graceful message on the log?

Also, I write several long running process pages, and I make use of RESPONSE.write() for that so as not to upset the browser or the user (or zope memory consumption, when I have a lot of data to show during the time). And I'd like a way to know when the user closes the channel, so that I can stop the processing.

Case in point, I work with Lalo in the ZUnit framework (it's a little dormant now (actually you could say it's in comatose), but I intend to reawaken it shortly). And I'd like to stop the processing of test cases when the user press the stop button in his browser.

In other cases, (like when I'm manually recataloging a huge bunch of objects for which using the 'ZCatalog Find' would just time-out) I just want to ignore if the user closed his browser after making the request.

Speaking of streaming pages, I'd really like if I didn't need to use an external method just to print a damn traceback. That facility, if used inside the standard_error_page would obviate the need to always print a traceback in the error page, but I guess this has already been discussed before...

Cheers, Leo

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Chris McDonough

4:12 p.m.

New subject: [Zope-dev] browser closing connection

The broken pipe error should be caught. Patches accepted, if you've got the time.

Thanks,

- C

----- Original Message ----- From: "Leonardo Rochael Almeida" leo@hiper.com.br To: "zope-dev" zope-dev@zope.org Sent: Friday, December 07, 2001 10:47 AM Subject: [Zope-dev] browser closing connection

...

Hi, all

On Fri, 2001-12-07 at 11:19, Matthew T. Kromer wrote:

...
...
Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_

send|21

...

...
...
4]

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|41 4]

...

...
[/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

...
...
We were seeing the same error (asyncore.py|send|330, etc) on

solaris.

...

...
...
Any thoughts

Well, that means "the browser user clicked 'stop'" -- Medusa is

just

...

...
telling you the channel went away on it. Thats normal when the

browser

...

...
chops the tcp connection.

As Dario said, it's a relief to know that. but if it's an expected scenario, shouldn't zope put a more graceful message on the log?

Also, I write several long running process pages, and I make use of RESPONSE.write() for that so as not to upset the browser or the user

(or

...

zope memory consumption, when I have a lot of data to show during

the

...

time). And I'd like a way to know when the user closes the channel,

...

that I can stop the processing.

Case in point, I work with Lalo in the ZUnit framework (it's a

little

...

dormant now (actually you could say it's in comatose), but I intend

...

reawaken it shortly). And I'd like to stop the processing of test

cases

...

when the user press the stop button in his browser.

In other cases, (like when I'm manually recataloging a huge bunch of objects for which using the 'ZCatalog Find' would just time-out) I

just

...

want to ignore if the user closed his browser after making the

request.

...

Speaking of streaming pages, I'd really like if I didn't need to use

...

external method just to print a damn traceback. That facility, if

used

...

inside the standard_error_page would obviate the need to always

print a

...

traceback in the error page, but I guess this has already been

discussed

...

before...

Cheers, Leo

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Chris Withers

4:11 p.m.

New subject: [Zope-dev] browser closing connection

Chris McDonough wrote:

...

The broken pipe error should be caught. Patches accepted, if you've got the time.

biiig +1 on this from me, especially if you cat ptu in a custom handler of your own for this case...

cheers,

V.I. ;-)

Matthew T. Kromer

10 Dec 10 Dec

11:11 p.m.

New subject: EINTR ... was Re: [Zope-dev] browser closing connection

...

...
...
Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

For what its worth, I tracked this down in the sources and confirmed that in Zope 2.3, we shipped a modified asyncore.py with Medusa that handled EINTR, but in Zope 2.4 we used stock Python's asyncore which does NOT handle EINTR being returned from select(). IMHO, the distributed Python 2.1 asyncore behavior is incorrect.

I've attached a diff of a portion of the differences (manually edited to take out other patches).

I suspect this patch never got integrated due to ugliness of "while 1"

Also, the "what should this be" comment relates to NT's error numbers. Visual C++ has an errno.h that lists EINTR as 4 -- And winsock.h defines WSAEINTR as 10004 (ie add 10,000 to the errno). SO that number should be 10004, not 0 for correctness on Windows.

--- /usr/local/python-2.1.1/lib/python2.1/asyncore.py Fri Nov 9 16:28:15 2001 +++ asyncore.py Sun Oct 1 11:58:56 2000 @@ -59,8 +39,10 @@ ECONNRESET = 10054 ENOTCONN = 10057 ESHUTDOWN = 10058 + EINTR = 0 # what should this be? else: - from errno import EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, ENOTCONN, ESHUTDOWN + from errno import EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET + from errno import ENOTCONN, ESHUTDOWN, EINTR

try: socket_map @@ -83,7 +65,13 @@ r.append (fd) if obj.writable(): w.append (fd) - r,w,e = select.select (r,w,e, timeout) + + while 1: + try: r,w,e = select.select (r,w,e, timeout) + except select.error, v: + if v[0] != EINTR: raise + else: break +

if DEBUG: print r,w,e

jeremy＠zope.com

11 Dec 11 Dec

2:52 p.m.

New subject: EINTR ... was Re: [Zope-dev] browser closing connection

...

...
...
...
...
"MTK" == Matthew T Kromer matt@zope.com writes:

MTK> For what its worth, I tracked this down in the sources and MTK> confirmed that in Zope 2.3, we shipped a modified asyncore.py MTK> with Medusa that handled EINTR, but in Zope 2.4 we used stock MTK> Python's asyncore which does NOT handle EINTR being returned MTK> from select(). IMHO, the distributed Python 2.1 asyncore MTK> behavior is incorrect.

This is fixed in Python 2.2.

A brief excerpt demonstrates the approach:

def poll (timeout=0.0, map=None): if map is None: map = socket_map if map: r = []; w = []; e = [] for fd, obj in map.items(): if obj.readable(): r.append (fd) if obj.writable(): w.append (fd) try: r,w,e = select.select (r,w,e, timeout) except select.error, err: if err[0] != EINTR: raise

In particular, I didn't use a "while 1:". I believe an operator could send a signal to a process using asyncore and expect it to cause the app to fall out of a poll() call immediately, instead of waiting for the timeout to occur. (It might never occur.)

I expect that the interrupted system call will be fairly uncommon, so it shouldn't matter than the poll() is returning without doing any work. In most cases, it will be called from loop() which already has a while loop.

A similar fix was made in poll3(), which uses the select module's poll(2) interface.

Jeremy

Leonardo Rochael Almeida

3:08 p.m.

New subject: EINTR ... was Re: [Zope-dev] browser closing connection

So, which is the "official" way of fixing Zope 2.4.3? wait for a hotfix? apply Matthew's patch? steal asyncore from Python 2.2?

On Tue, 2001-12-11 at 12:52, Jeremy Hylton wrote:

...

...
...
...
...
...
"MTK" == Matthew T Kromer matt@zope.com writes:

MTK> For what its worth, I tracked this down in the sources and MTK> confirmed that in Zope 2.3, we shipped a modified asyncore.py MTK> with Medusa that handled EINTR, but in Zope 2.4 we used stock MTK> Python's asyncore which does NOT handle EINTR being returned MTK> from select(). IMHO, the distributed Python 2.1 asyncore MTK> behavior is incorrect.

This is fixed in Python 2.2.

[...]

-- Ideas don't stay in some minds very long because they don't like solitary confinement.

John Ziniti

3:40 p.m.

New subject: EINTR ... was Re: [Zope-dev] browser closing connection

Replace your Python 2.1.1 asyncore.py with the one that is attached. I've been using it for months now with no problems. Notice, however, that it doesn't work on WinNT, b/c the author didn't know what EINTR looked like on NT

Leonardo Rochael Almeida wrote:

...

So, which is the "official" way of fixing Zope 2.4.3? wait for a hotfix? apply Matthew's patch? steal asyncore from Python 2.2?

On Tue, 2001-12-11 at 12:52, Jeremy Hylton wrote:

...
...
...
...
...
>"MTK" == Matthew T Kromer matt@zope.com writes: >

MTK> For what its worth, I tracked this down in the sources and MTK> confirmed that in Zope 2.3, we shipped a modified asyncore.py MTK> with Medusa that handled EINTR, but in Zope 2.4 we used stock MTK> Python's asyncore which does NOT handle EINTR being returned MTK> from select(). IMHO, the distributed Python 2.1 asyncore MTK> behavior is incorrect.

This is fixed in Python 2.2.

[...]

# -*- Mode: Python; tab-width: 4 -*- # $Id: asyncore.py,v 1.1.1.3 2001/02/08 13:08:34 tdickenson Exp $ # Author: Sam Rushing rushing@nightmare.com

# ====================================================================== # Copyright 1996 by Sam Rushing # # All Rights Reserved # # Permission to use, copy, modify, and distribute this software and # its documentation for any purpose and without fee is hereby # granted, provided that the above copyright notice appear in all # copies and that both that copyright notice and this permission # notice appear in supporting documentation, and that the name of Sam # Rushing not be used in advertising or publicity pertaining to # distribution of the software without specific, written prior # permission. # # SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, # INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN # NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR # CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS # OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, # NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN # CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. # ======================================================================

import exceptions import select import socket import string import sys

import os if os.name == 'nt': EWOULDBLOCK = 10035 EINPROGRESS = 10036 EALREADY = 10037 ECONNRESET = 10054 ENOTCONN = 10057 ESHUTDOWN = 10058 EINTR = 0 # what should this be? else: from errno import EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET from errno import ENOTCONN, ESHUTDOWN, EINTR

try: socket_map except NameError: socket_map = {}

class ExitNow (exceptions.Exception): pass

DEBUG = 0

def poll (timeout=0.0, map=None): global DEBUG if map is None: map = socket_map if map: r = []; w = []; e = [] for fd, obj in map.items(): if obj.readable(): r.append (fd) if obj.writable(): w.append (fd)

while 1: try: r,w,e = select.select (r,w,e, timeout) except select.error, v: if v[0] != EINTR: raise else: break

if DEBUG: print r,w,e

for fd in r: try: obj = map[fd] try: obj.handle_read_event() except ExitNow: raise ExitNow except: obj.handle_error() except KeyError: pass

for fd in w: try: obj = map[fd] try: obj.handle_write_event() except ExitNow: raise ExitNow except: obj.handle_error() except KeyError: pass

def poll2 (timeout=0.0, map=None): import poll if map is None: map=socket_map # timeout is in milliseconds timeout = int(timeout*1000) if map: l = [] for fd, obj in map.items(): flags = 0 if obj.readable(): flags = poll.POLLIN if obj.writable(): flags = flags | poll.POLLOUT if flags: l.append ((fd, flags)) r = poll.poll (l, timeout) for fd, flags in r: try: obj = map[fd] try: if (flags & poll.POLLIN): obj.handle_read_event() if (flags & poll.POLLOUT): obj.handle_write_event() except ExitNow: raise ExitNow except: obj.handle_error() except KeyError: pass

def loop (timeout=30.0, use_poll=0, map=None):

if use_poll: poll_fun = poll2 else: poll_fun = poll

if map is None: map=socket_map

while map: poll_fun (timeout, map)

class dispatcher: debug = 0 connected = 0 accepting = 0 closing = 0 addr = None

def __init__ (self, sock=None, map=None): if sock: self.set_socket (sock, map) # I think it should inherit this anyway self.socket.setblocking (0) self.connected = 1

def __repr__ (self): try: status = [] if self.accepting and self.addr: status.append ('listening') elif self.connected: status.append ('connected') if self.addr: status.append ('%s:%d' % self.addr) return '<%s %s at %x>' % ( self.__class__.__name__, string.join (status, ' '), id(self) ) except: try: ar = repr(self.addr) except: ar = 'no self.addr!'

return '<__repr__ (self) failed for object at %x (addr=%s)>' % (id(self),ar)

def add_channel (self, map=None): #self.log_info ('adding channel %s' % self) if map is None: map=socket_map map [self._fileno] = self

def del_channel (self, map=None): fd = self._fileno if map is None: map=socket_map if map.has_key (fd): #self.log_info ('closing channel %d:%s' % (fd, self)) del map [fd]

def create_socket (self, family, type): self.family_and_type = family, type self.socket = socket.socket (family, type) self.socket.setblocking(0) self._fileno = self.socket.fileno() self.add_channel()

def set_socket (self, sock, map=None): self.__dict__['socket'] = sock self._fileno = sock.fileno() self.add_channel (map)

def set_reuse_addr (self): # try to re-use a server port if possible try: self.socket.setsockopt ( socket.SOL_SOCKET, socket.SO_REUSEADDR, self.socket.getsockopt (socket.SOL_SOCKET, socket.SO_REUSEADDR) | 1 ) except: pass

# ================================================== # predicates for select() # these are used as filters for the lists of sockets # to pass to select(). # ==================================================

def readable (self): return 1

if os.name == 'mac': # The macintosh will select a listening socket for # write if you let it. What might this mean? def writable (self): return not self.accepting else: def writable (self): return 1

# ================================================== # socket object methods. # ==================================================

def listen (self, num): self.accepting = 1 if os.name == 'nt' and num > 5: num = 1 return self.socket.listen (num)

def bind (self, addr): self.addr = addr return self.socket.bind (addr)

def connect (self, address): self.connected = 0 try: self.socket.connect (address) except socket.error, why: if why[0] in (EINPROGRESS, EALREADY, EWOULDBLOCK): return else: raise socket.error, why self.connected = 1 self.handle_connect()

def accept (self): try: conn, addr = self.socket.accept() return conn, addr except socket.error, why: if why[0] == EWOULDBLOCK: pass else: raise socket.error, why

def send (self, data): try: result = self.socket.send (data) return result except socket.error, why: if why[0] == EWOULDBLOCK: return 0 else: raise socket.error, why return 0

def recv (self, buffer_size): try: data = self.socket.recv (buffer_size) if not data: # a closed connection is indicated by signaling # a read condition, and having recv() return 0. self.handle_close() return '' else: return data except socket.error, why: # winsock sometimes throws ENOTCONN if why[0] in [ECONNRESET, ENOTCONN, ESHUTDOWN]: self.handle_close() return '' else: raise socket.error, why

def close (self): self.del_channel() self.socket.close()

# cheap inheritance, used to pass all other attribute # references to the underlying socket object. def __getattr__ (self, attr): return getattr (self.socket, attr)

# log and log_info maybe overriden to provide more sophisitcated # logging and warning methods. In general, log is for 'hit' logging # and 'log_info' is for informational, warning and error logging.

def log (self, message): sys.stderr.write ('log: %s\n' % str(message))

def log_info (self, message, type='info'): if __debug__ or type != 'info': print '%s: %s' % (type, message)

def handle_read_event (self): if self.accepting: # for an accepting socket, getting a read implies # that we are connected if not self.connected: self.connected = 1 self.handle_accept() elif not self.connected: self.handle_connect() self.connected = 1 self.handle_read() else: self.handle_read()

def handle_write_event (self): # getting a write implies that we are connected if not self.connected: self.handle_connect() self.connected = 1 self.handle_write()

def handle_expt_event (self): self.handle_expt()

def handle_error (self): (file,fun,line), t, v, tbinfo = compact_traceback()

# sometimes a user repr method will crash. try: self_repr = repr (self) except: self_repr = '<__repr__ (self) failed for object at %0x>' % id(self)

self.log_info ( 'uncaptured python exception, closing channel %s (%s:%s %s)' % ( self_repr, t, v, tbinfo ), 'error' ) self.close()

def handle_expt (self): self.log_info ('unhandled exception', 'warning')

def handle_read (self): self.log_info ('unhandled read event', 'warning')

def handle_write (self): self.log_info ('unhandled write event', 'warning')

def handle_connect (self): self.log_info ('unhandled connect event', 'warning')

def handle_accept (self): self.log_info ('unhandled accept event', 'warning')

def handle_close (self): self.log_info ('unhandled close event', 'warning') self.close()

# --------------------------------------------------------------------------- # adds simple buffered output capability, useful for simple clients. # [for more sophisticated usage use asynchat.async_chat] # ---------------------------------------------------------------------------

class dispatcher_with_send (dispatcher): def __init__ (self, sock=None): dispatcher.__init__ (self, sock) self.out_buffer = ''

def initiate_send (self): num_sent = 0 num_sent = dispatcher.send (self, self.out_buffer[:512]) self.out_buffer = self.out_buffer[num_sent:]

def handle_write (self): self.initiate_send()

def writable (self): return (not self.connected) or len(self.out_buffer)

def send (self, data): if self.debug: self.log_info ('sending %s' % repr(data)) self.out_buffer = self.out_buffer + data self.initiate_send()

# --------------------------------------------------------------------------- # used for debugging. # ---------------------------------------------------------------------------

def compact_traceback (): t,v,tb = sys.exc_info() tbinfo = [] while 1: tbinfo.append (( tb.tb_frame.f_code.co_filename, tb.tb_frame.f_code.co_name, str(tb.tb_lineno) )) tb = tb.tb_next if not tb: break

# just to be safe del tb

file, function, line = tbinfo[-1] info = '[' + string.join ( map ( lambda x: string.join (x, '|'), tbinfo ), '] [' ) + ']' return (file, function, line), t, v, info

def close_all (map=None): if map is None: map=socket_map for x in map.values(): x.socket.close() map.clear()

# Asynchronous File I/O: # # After a little research (reading man pages on various unixen, and # digging through the linux kernel), I've determined that select() # isn't meant for doing doing asynchronous file i/o. # Heartening, though - reading linux/mm/filemap.c shows that linux # supports asynchronous read-ahead. So _MOST_ of the time, the data # will be sitting in memory for us already when we go to read it. # # What other OS's (besides NT) support async file i/o? [VMS?] # # Regardless, this is useful for pipes, and stdin/stdout...

import os if os.name == 'posix': import fcntl import FCNTL

class file_wrapper: # here we override just enough to make a file # look like a socket for the purposes of asyncore. def __init__ (self, fd): self.fd = fd

def recv (self, *args): return apply (os.read, (self.fd,)+args)

def write (self, *args): return apply (os.write, (self.fd,)+args)

def close (self): return os.close (self.fd)

def fileno (self): return self.fd

class file_dispatcher (dispatcher): def __init__ (self, fd): dispatcher.__init__ (self) self.connected = 1 # set it to non-blocking mode flags = fcntl.fcntl (fd, FCNTL.F_GETFL, 0) flags = flags | FCNTL.O_NONBLOCK fcntl.fcntl (fd, FCNTL.F_SETFL, flags) self.set_file (fd)

def set_file (self, fd): self._fileno = fd self.socket = file_wrapper (fd) self.add_channel()

jeremy＠zope.com

5:11 p.m.

New subject: EINTR ... was Re: [Zope-dev] browser closing connection

...

...
...
...
...
"JZ" == John Ziniti jziniti@speakeasy.org writes:

JZ> Replace your Python 2.1.1 asyncore.py with the one that is JZ> attached. I've been using it for months now with no problems. JZ> Notice, however, that it doesn't work on WinNT, b/c the author JZ> didn't know what EINTR looked like on NT

The asyncore.py in Python 2.2 has a number of bug fixes and improvements over 2.1.1 and over the patched version you attached. It does, for example, work correctly on win32.

I'd recommend grabbing a copy of asyncore.py from a Python 2.2 beta and using it.

Jeremy

Dieter Maurer

7 Dec 7 Dec

9:20 p.m.

New subject: With Oracle as well Re: [Zope-dev] More signal 11 restarts....

=?iso-8859-1?Q?Dario_Lopez-K=E4sten?= writes:

...

Well, sorry to disapoint everybody, but we have the same signal 11 restarts here. ... Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

They should be harmless:

They happen when the client closes the connection before ZServer delivered the full response.

However, many people reporting about crashes also reported about these error messages. Maybe, the Pyton socket module lacks a "Py_INCREF" for this kind of error?

Dieter

Leonardo Rochael Almeida

9 Dec 9 Dec

4:02 a.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

For the record, I also get Broken pipes on the segfaulting zope, but then again, I get broken pipes in ALL my Zopes (and we got a bunch of those here at Hiperlógica). So I don't believe they correlate.

On Fri, 2001-12-07 at 19:20, Dieter Maurer wrote:

...

=?iso-8859-1?Q?Dario_Lopez-K=E4sten?= writes:

...
Well, sorry to disapoint everybody, but we have the same signal 11 restarts here. ... Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

They should be harmless:

They happen when the client closes the connection before ZServer delivered the full response.

However, many people reporting about crashes also reported about these error messages. Maybe, the Pyton socket module lacks a "Py_INCREF" for this kind of error?

Dirk Datzert

9:03 a.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

Hello,

I can report the same failure:

First there is a OSerror signal 11 and from that point on there will be OSerror errno 32 Borken pipe until Zope is restarted.

Our System is:

Zope 2.4.3 (from source 2.4.1 with update 2.4.x_to_2.4.3, python 2.1.1, Apache 1.3.12

is there a solution out ?

Regards, Dirk

Leonardo Rochael Almeida schrieb:

...

For the record, I also get Broken pipes on the segfaulting zope, but then again, I get broken pipes in ALL my Zopes (and we got a bunch of those here at Hiperlógica). So I don't believe they correlate.

On Fri, 2001-12-07 at 19:20, Dieter Maurer wrote:

...
=?iso-8859-1?Q?Dario_Lopez-K=E4sten?= writes:

...
Well, sorry to disapoint everybody, but we have the same signal 11 restarts here. ... Also, for the record we usually get a bunch of these quite often:

2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel connected XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286 requests:4> (socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error (asyncore.py|send|330, etc) on solaris.

They should be harmless:

They happen when the client closes the connection before ZServer delivered the full response.

However, many people reporting about crashes also reported about these error messages. Maybe, the Pyton socket module lacks a "Py_INCREF" for this kind of error?

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Chris McDonough

8:38 p.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

These failure reports are alarming, but I haven't seen anything like them, and of course we can't fix what we can't find. If anybody can make the problem recur repeatably, we can almost certainly fix it.

Sorry,

- C

On Sun, 09 Dec 2001 10:03:36 +0100 Dirk Datzert Dirk.Datzert@rasselstein-hoesch.de wrote:

...

Hello,

I can report the same failure:

First there is a OSerror signal 11 and from that point on there will be OSerror errno 32 Borken pipe until Zope is restarted.

Our System is:

Zope 2.4.3 (from source 2.4.1 with update 2.4.x_to_2.4.3, python 2.1.1, Apache 1.3.12

is there a solution out ?

Regards, Dirk

Leonardo Rochael Almeida schrieb:

...
For the record, I also get Broken pipes on the

segfaulting zope, but

...
then again, I get broken pipes in ALL my Zopes (and we

got a bunch of

...
those here at Hiperlógica). So I don't believe they

correlate.

...
On Fri, 2001-12-07 at 19:20, Dieter Maurer wrote:

...
=?iso-8859-1?Q?Dario_Lopez-K=E4sten?= writes:

...
Well, sorry to disapoint everybody, but we have

the same signal 11 restarts

...
...
...
here. ... Also, for the record we usually get a bunch of

these quite often:

...
...
...
2001-11-04T09:04:33 ERROR(200) ZServer uncaptured python exception, closing channel <zhttp_channel

connected

...
...
...
XXX.XXX.XXX.XXX:2181 at fb4edc channel#: 2286

requests:4>

...
...
...
(socket.error:(32, 'Broken pipe')

[/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/asynchat.py|initiate_send|21 4] [/usr/local/zope/dist/Zope-2.4.1/ZServer/medusa/http_server.py|send|414] [/usr/local/zope/sw/Python2.1.1/lib/python2.1/asyncore.py|send|330])

We were seeing the same error

(asyncore.py|send|330, etc) on solaris.

...
...
They should be harmless:

They happen when the client closes the connection

before

...
...
ZServer delivered the full response.

However, many people reporting about crashes also

reported about

...
...
these error messages. Maybe, the Pyton socket

module lacks

...
...
a "Py_INCREF" for this kind of error?

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Jens Quade

11:28 p.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

"Chris McDonough" chrism@zope.com writes:

...

These failure reports are alarming, but I haven't seen anything like them, and of course we can't fix what we can't find. If anybody can make the problem recur repeatably, we can almost certainly fix it.

Sorry,

C

On Sun, 09 Dec 2001 10:03:36 +0100 Dirk Datzert Dirk.Datzert@rasselstein-hoesch.de wrote:

...
Hello,

I can report the same failure:

First there is a OSerror signal 11 and from that point on there will be OSerror errno 32 Borken pipe until Zope is restarted.

Our System is:

Zope 2.4.3 (from source 2.4.1 with update 2.4.x_to_2.4.3, python 2.1.1, Apache 1.3.12

is there a solution out ?

Regards, Dirk

Do you use Linux Kernel 2.2.17 (IIRC)? Updating or downgrading to another version should fix the problem then.

regards, jens

Dirk Datzert

10 Dec 10 Dec

7:01 a.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

Its a Linux 2.2.19. What does IIRC means ?

Dirk ----- Original Message ----- From: "Jens Quade" jq@jquade.de To: "Chris McDonough" chrism@zope.com Cc: "Dirk Datzert" Dirk.Datzert@rasselstein-hoesch.de; "Leonardo Rochael Almeida" leo@hiper.com.br; zope-dev@zope.org Sent: Monday, December 10, 2001 12:28 AM Subject: Re: [Zope-dev] yes, segv11 and Broken pipes

...

"Chris McDonough" chrism@zope.com writes:

...
These failure reports are alarming, but I haven't seen anything like them, and of course we can't fix what we can't find. If anybody can make the problem recur repeatably, we can almost certainly fix it.

Sorry,

...
Zope 2.4.3 (from source 2.4.1 with update 2.4.x_to_2.4.3, python 2.1.1, Apache 1.3.12

Do you use Linux Kernel 2.2.17 (IIRC)? Updating or downgrading to another version should fix the problem then.

Dario Lopez-Kästen

7:16 a.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

From: "Dirk Datzert" Dirk.Datzert@rasselstein-hoesch.de

...

Its a Linux 2.2.19. What does IIRC means ?

IIRC means "If I Recall Correctly", IIRC :-)

And these problems occur also on solaris, so there's nothing linux centric about them.

/dario

- -------------------------------------------------------------------- Dario Lopez-Kästen Systems Developer Chalmers Univ. of Technology dario@ita.chalmers.se ICQ will yield no hits IT Systems & Services

Jens Quade

11:37 a.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

"Dirk Datzert" Dirk.Datzert@rasselstein-hoesch.de writes:

...

Its a Linux 2.2.19. What does IIRC means ?

If I remember correctly. I could fix the problem (or a similar one) last summer by changing the Linux kernel.

http://mailman.beehive.de/pipermail/zope/2001-June/000590.html http://mailman.beehive.de/pipermail/zope/2001-November/000923.html

John Ziniti

2:58 p.m.

New subject: [Zope-dev] yes, segv11 and Broken pipes

Wasn't this the problem where asyncore.py is not catching the operating system's EINTR. I used to get these all the time, and was able to stop it using a modified asyncore which loops on an OS select() call, restarting the call if it catches EINTR from the OS ... I can send a modified asyncore.py for anyone who wants to give it a try??

Ziniti

Jens Quade wrote:

...

"Dirk Datzert" Dirk.Datzert@rasselstein-hoesch.de writes:

...
Its a Linux 2.2.19. What does IIRC means ?

If I remember correctly. I could fix the problem (or a similar one) last summer by changing the Linux kernel.

http://mailman.beehive.de/pipermail/zope/2001-June/000590.html http://mailman.beehive.de/pipermail/zope/2001-November/000923.html

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

Joseph Wayne Norton

8 Dec 8 Dec

2:09 p.m.

New subject: [Zope-dev] More signal 11 restarts....

Daniel -

I have **not** tried this myself yet, but I plan to check this week. Please take a look at the following URL:

http://www.humanfactor.com/cgi-bin/cgi-delegate/apache-ML/nh/1998/Oct/0130.h...

This mail is related to apache but the same analysis might apply to zope. I believe the truss command on the solaris platform is similiar to strace command on the linux platform.

I'm facing similar problems on the solaris platform but the restart is occuring maybe 2-3 times per day.

regards,

- j

At Mon, 26 Nov 2001 15:47:30 -0200, Daniel Duclos wrote:

...

  I have a zope that is dumping signal 11 every 40 minutes or so. I have tried 
recompile python 2.1.1 with-threads without-pymalloc, recompile Zope with it, recompile ZPAtterns, recompile and instal MYSQL for Python 0.9.1, upgraded to Zope 2.4.3, all this on a Debian Linux box. Nothing changed... still restarting... Anybody, please, has any ideia on this matter? Please, let me know if there's any relevant info that I forgot to mention abot my case! Thanks in advance!!

-- daniel lobato duclos -- daniduc@hiper.com.br -- http://www.hiperlogica.com.br

Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )

8209

Age (days ago)

8224

Last active (days ago)

zope-dev@zope.dev

37 comments

14 participants

tags (0)

participants (14)

Andy Dustman
Chris McDonough
Chris Withers
Daniel Duclos
Dario Lopez-Kästen
Dieter Maurer
Dirk Datzert
Harald Koschinski
Jens Quade
jeremy＠zope.com
John Ziniti
Joseph Wayne Norton
Leonardo Rochael Almeida
Matthew T. Kromer