[Zope] POSKeyError II: Dead By Dawn

Paul Winkler pw_lists@slinkp.com
Mon, 20 Jan 2003 23:58:12 -0800


They're baaaaaaack....

after a recent thread on Zope "Goblins" I decided
to try the fstest.py script and see what it had to
tell me.  Sadly it found errors in my Data.fs on
our development server. Lots of work has
been done in that server recently; worried
about losing data, I copied the Data.fs (plain
old unix cp command) and ran fsrecover.py on the copy 
to see what it finds.

I tee'd the output of fsrecover.py (stdout & stderr) to a file
and I'm glad I did, because I've never seen anything
this bad.

Grepping through the fsrecover output reveals:

12527 total occurrences of "Error"
11397 "bad transaction length" errors
719   "invalid transaction length" errors
193   "POSKeyError"
75    *unique* POSKeyErrors (filtered through uniq) 
       (so it's not just one bad oid)

Ow. Ow. Ow.

This is very odd considering that
we have not noticed any problems at all from this
zope installation!

Oh yeah - specs:
Zope 2.5.1
Python 2.1.3 built from source w/ largefile support
ZEO
RedHat Linux 2.4.7-10 SMP

ZODB is over 2 GB, but the large size is not the
problem - it was up to 2.5 at one point, and
we haven't changed Python or the kernel since then,
so it should be OK.
(double-checking ... yes, test_largefile.py runs fine.)

The really fun thing about POSKeyErrors in particular is that
I know of no way to find out what zope id the broken oid once
belonged to. Is there any way?

Tried starting up with the recovered Data.fs and discovered
that it seems "fine" except that most of our recent work is gone. :(

I tried mounting a copy of the backup with another Zope instance,
exporting the recent stuff, and importing it into the
zope running the "repaired" Data.fs, but every time I tried
to do that I got POSKeyErrors. (!) 
The repaired Data.fs passes fstest.py with no errors, so I
think somehow the exports from the backup bring the
POSKeyErrors with them. It's like a virus!!!

(well, not really. the import just fails... the repaired
Data.fs still passes fstest.py.)

At this point I think I'm going to go back to the backup, errors
and all... it may have tons of errors in it, but it has all of
our work and seems to behave OK. :-\


Hmmmmm, I notice DirectoryStorage is now at 1.0.0. Hmmmmmmmmmmmmm.

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's FILTHY  MIME!
(courtesy of isometric.spaceninja.com)