[Zodb-checkins] SVN: ZODB/trunk/ Merge rev 27169 from 3.3 branch.

Tim Peters tim.one at comcast.net
Tue Aug 17 19:28:24 EDT 2004

Log message for revision 27170:
  Merge rev 27169 from 3.3 branch.
  Forward port from Zope 2.7 branch.
  Many changes to fsrefs.py, based on problems I hit using it in real life.

  U   ZODB/trunk/NEWS.txt
  U   ZODB/trunk/src/ZODB/FileStorage/fsdump.py
  U   ZODB/trunk/src/scripts/fsrefs.py

Modified: ZODB/trunk/NEWS.txt
--- ZODB/trunk/NEWS.txt	2004-08-17 23:23:16 UTC (rev 27169)
+++ ZODB/trunk/NEWS.txt	2004-08-17 23:28:23 UTC (rev 27170)
@@ -12,7 +12,21 @@
 octal 9.  Or was it meant to be decimal 11?  Or was it meant to be hex?
 Now it displays as 0x11.
+    When run with -v, produced tracebacks for objects whose creation was
+    merely undone.  This was confusing.  Tracebacks are now produced only
+    if there's "a real" problem loading an oid.
+    If the current revision of object O refers to an object P whose
+    creation has been undone, this is now identified as a distinct case.
+    Captured and ignored most attempts to stop it via Ctrl+C.  Repaired.
+    Now makes two passes, so that an accurate report can be given of all
+    invalid references.
 What's new in ZODB3 3.3 beta 2
 Release date: 13-Aug-2004

Modified: ZODB/trunk/src/ZODB/FileStorage/fsdump.py
--- ZODB/trunk/src/ZODB/FileStorage/fsdump.py	2004-08-17 23:23:16 UTC (rev 27169)
+++ ZODB/trunk/src/ZODB/FileStorage/fsdump.py	2004-08-17 23:28:23 UTC (rev 27170)
@@ -13,13 +13,14 @@
 def get_pickle_metadata(data):
     # ZODB's data records contain two pickles.  The first is the class
-    # of the object, the second is the object.
-    if data.startswith('(c'):
+    # of the object, the second is the object.  We're only trying to
+    # pick apart the first here, to extract the module and class names.
+    if data.startswith('(c'):   # pickle MARK GLOBAL sequence
         # Don't actually unpickle a class, because it will attempt to
         # load the class.  Just break open the pickle and get the
         # module and class from it.
         modname, classname, rest = data.split('\n', 2)
-        modname = modname[2:]
+        modname = modname[2:]   # strip leading '(c'
         return modname, classname
     f = StringIO(data)
     u = Unpickler(f)

Modified: ZODB/trunk/src/scripts/fsrefs.py
--- ZODB/trunk/src/scripts/fsrefs.py	2004-08-17 23:23:16 UTC (rev 27169)
+++ ZODB/trunk/src/scripts/fsrefs.py	2004-08-17 23:28:23 UTC (rev 27170)
@@ -39,22 +39,20 @@
 then the traceback corresponding to the load failure is also displayed
 (this is the only effect of the -v flag).
-Two other kinds of errors are also detected, one strongly related to
-"failed to load", when an object O loads OK, and directly refers to a
-persistent object P but there's a problem with P:
+Three other kinds of errors are also detected, when an object O loads OK,
+and directly refers to a persistent object P but there's a problem with P:
  - If P doesn't exist in the database, a message saying so is displayed.
    The unsatisifiable reference to P is often called a "dangling
    reference"; P is called "missing" in the error output.
- - If it was earlier determined that P could not be loaded (but does exist
-   in the database), a message saying that O refers to an object that can't
-   be loaded is displayed.  Note that fsrefs only makes one pass over the
-   database, so if an object O refers to an unloadable object P, and O is
-   seen by fsrefs before P, an "O refers to the unloadable P" message will
-   not be produced; a message saying that P can't be loaded will be
-   produced when fsrefs later tries to load P, though.
+ - If the current state of the database is such that P's creation has
+   been undone, then P can't be loaded either.  This is also a kind of
+   dangling reference, but is identified as "object creation was undone".
+ - If P can't be loaded (but does exist in the database), a message saying
+   that O refers to an object that can't be loaded is displayed.
 fsrefs also (indirectly) checks that the .index file is sane, because
 fsrefs uses the index to get its idea of what constitutes "all the objects
 in the database".
@@ -65,28 +63,52 @@
 in non-current revisions.
-from ZODB.FileStorage import FileStorage
-from ZODB.TimeStamp import TimeStamp
-from ZODB.utils import u64
-from ZODB.FileStorage.fsdump import get_pickle_metadata
 import cPickle
 import cStringIO
 import traceback
 import types
+from ZODB.FileStorage import FileStorage
+from ZODB.TimeStamp import TimeStamp
+from ZODB.utils import u64, oid_repr
+from ZODB.FileStorage.fsdump import get_pickle_metadata
+from ZODB.POSException import POSKeyError
+# So full of undocumented magic it's hard to fathom.
+# The existence of cPickle.noload() isn't documented, and what it
+# does isn't documented either.  In general it unpickles, but doesn't
+# actually build any objects of user-defined classes.  Despite that
+# persistent_load is documented to be a callable, there's an
+# undocumented gimmick where if it's actually a list, for a PERSID or
+# BINPERSID opcode cPickle just appends "the persistent id" to that list.
+# Also despite that "a persistent id" is documented to be a string,
+# ZODB persistent ids are actually (often? always?) tuples, most often
+# of the form
+#     (oid, (module_name, class_name))
+# So the effect of the following is to dig into the object pickle, and
+# return a list of the persistent ids found (which are usually nested
+# tuples), without actually loading any modules or classes.
+# Note that pickle.py doesn't support any of this, it's undocumented code
+# only in cPickle.c.
 def get_refs(pickle):
-    refs = []
+    # The pickle is in two parts.  First there's the class of the object,
+    # needed to build a ghost,  See get_pickle_metadata for how complicated
+    # this can get.  The second part is the state of the object.  We want
+    # to find all the persistent references within both parts (although I
+    # expect they can only appear in the second part).
     f = cStringIO.StringIO(pickle)
     u = cPickle.Unpickler(f)
-    u.persistent_load = refs
-    u.noload()
-    u.noload()
+    u.persistent_load = refs = []
+    u.noload() # class info
+    u.noload() # instance state info
     return refs
-def report(oid, data, serial, fs, missing):
+# There's a problem with oid.  'data' is its pickle, and 'serial' its
+# serial number.  'missing' is a list of (oid, class, reason) triples,
+# explaining what the problem(s) is(are).
+def report(oid, data, serial, missing):
     from_mod, from_class = get_pickle_metadata(data)
     if len(missing) > 1:
         plural = "s"
@@ -101,28 +123,41 @@
             description = "%s.%s" % info
             description = str(info)
-        print "\toid %s %s: %r" % (hex(u64(oid)), reason, description)
+        print "\toid %s %s: %r" % (oid_repr(oid), reason, description)
 def main(path):
     fs = FileStorage(path, read_only=1)
+    # Set of oids in the index that failed to load due to POSKeyError.
+    # This is what happens if undo is applied to the transaction creating
+    # the object (the oid is still in the index, but its current data
+    # record has a backpointer of 0, and POSKeyError is raised then
+    # because of that backpointer).
+    undone = {}
+    # Set of oids that were present in the index but failed to load.
+    # This does not include oids in undone.
     noload = {}
     for oid in fs._index.keys():
             data, serial = fs.load(oid, "")
+        except (KeyboardInterrupt, SystemExit):
+            raise
+        except POSKeyError:
+            undone[oid] = 1
-            print "oid %s failed to load" % hex(u64(oid))
             if VERBOSE:
             noload[oid] = 1
-            # If we get here after we've already loaded objects
-            # that refer to this one, we will not have gotten error reports
-            # from the latter about the current object being unloadable.
-            # We could fix this by making two passes over the storage, but
-            # that seems like overkill.
+    inactive = noload.copy()
+    inactive.update(undone)
+    for oid in fs._index.keys():
+        if oid in inactive:
+        data, serial = fs.load(oid, "")
         refs = get_refs(data)
         missing = [] # contains 3-tuples of oid, klass-metadata, reason
         for info in refs:
@@ -132,12 +167,14 @@
                 # failed to unpack
                 ref = info
                 klass = '<unknown>'
-            if not fs._index.has_key(ref):
+            if ref not in fs._index:
                 missing.append((ref, klass, "missing"))
-            if noload.has_key(ref):
+            if ref in noload:
                 missing.append((ref, klass, "failed to load"))
+            if ref in undone:
+                missing.append((ref, klass, "object creation was undone"))
         if missing:
-            report(oid, data, serial, fs, missing)
+            report(oid, data, serial, missing)
 if __name__ == "__main__":
     import sys

More information about the Zodb-checkins mailing list