[ZODB-Dev] Recovering Corrupt Database

Cody Smith smithcc@uclink.berkeley.edu
Wed, 30 Oct 2002 02:53:32 -0800


We're running Zope 2.3.2, and haven't really had any problems with the back
end (which is my business, instead of the marketing team's business), so I
haven't had to fret over it recently.  Quite sadly, our zope server suffered
from some disk problems recently, and we lost a chunk of data in the
_middle_ of our Data.fs, and now Zope refuses to start.  We get this error
message:

zope# ./start
2002-10-30T10:37:40 PANIC(300) z2 Startup exception
Traceback (innermost last):
  File /opt1/Zope-2.3.2-src/z2.py, line 566, in ?
  File <string>, line 1, in ?
  File /opt/Zope-2.3.2-src/lib/python/Zope/__init__.py, line 110, in ?
  File ../lib/python/ZODB/FileStorage.py, line 308, in __init__
    (Object: /opt1/Zope-2.3.2-src/var/Data.fs)
  File ../lib/python/ZODB/FileStorage.py, line 1674, in read_index
  File ../lib/python/ZODB/FileStorage.py, line 219, in panic
CorruptedTransactionError: /opt1/Zope-2.3.2-src/var/Data.fs data record
exceeds transaction record at 300605488L

At this point, I find the fsrecovery script and run it.  Unfortunately, it,
like the start script, fails.

zope# python lib/python/ZODB/fsrecover.py var/Data.fs
Traceback (innermost last):
  File "lib/python/ZODB/fsrecover.py", line 94, in ?
    FileStorage.recover(sys.argv[1])
  File "lib/python/ZODB/FileStorage.py", line 1542, in recover
    file.truncate(npos)
NameError: npos

Now I'm forced to look at the script myself to see why this is happening,
and I note that, in the script, the last few lines before the error look
like this:

    if pos < sz:
        npos = shift_transactions_forward(
            index, vindex, tindex, file, pos, opos,
            )

    file.truncate(npos)

Now it makes sense: if pos >= sz, npos is undefined.  So I add in a
debugging line and find that pos = 406821544 and sz = 406821544.  After
that, I add an else clause to set npos=opos (or was it pos, I don't exactly
remember), that takes 100MB of the size of Data.fs and rolls my database
back to summer 2001 (angering the marketing team, who insist that my job
isn't done).  Now I'm stuck, but it seems to me based on the first error
message that I could probably fix the problem by correcting either the
broken transaction record or the broken data record (which is actually
broken, I'm not sure), or both.  Would this help?  Is there anything I can
do to recover the database?

Thank you _so_ much, if I don't fix this, the marketing team will literally
have lost a years worth of work, and no one will be happy.

Cody