[Zodb-checkins] CVS: ZODB3/Doc/guide - modules.tex:1.6

Tim Peters tim.one at comcast.net
Thu May 1 16:51:43 EDT 2003


Update of /cvs-repository/ZODB3/Doc/guide
In directory cvs.zope.org:/tmp/cvs-serv2352/Doc/guide

Modified Files:
	modules.tex 
Log Message:
Added a new subsection on total-ordering requirements.


=== ZODB3/Doc/guide/modules.tex 1.5 => 1.6 ===
--- ZODB3/Doc/guide/modules.tex:1.5	Thu May  1 12:24:06 2003
+++ ZODB3/Doc/guide/modules.tex	Thu May  1 15:51:43 2003
@@ -1,8 +1,8 @@
-
 % Related Modules
 %    PersistentMapping
 %    PersistentList
 %    BTrees
+%       Total Ordering and Persistence
 
 \section{Related Modules}
 
@@ -149,3 +149,168 @@
 also defines \function{weightedIntersection()} and
 \function{weighterUnion()}.  The function doc strings describe each
 function briefly.
+
+\subsubsection{Total Ordering and Persistence}
+
+The BTree-based data structures differ from Python dicts in several
+fundamental ways.  One of the most important is that while dicts
+require that keys support hash codes and equality comparison,
+the btree-based structures don't use hash codes and require a total
+ordering on keys.
+
+Total ordering means three things:
+
+\begin{enumerate}
+\item  Reflexive.  For each \var{x}, \code{\var{x} == \var{x}} is true.
+
+\item  Trichotomy.  For each \var{x} and \var{y}, exactly one of
+       \code{\var{x} < \var{y}}, \code{\var{x} == \var{y}}, and
+       \code{\var{x} > \var{y}} is true.
+
+\item  Transitivity.  Whenever \code{\var{x} <= \var{y}} and
+       \code{\var{y} <= \var{z}}, it's also true that
+       \code{\var{x} <= \var{z}}.
+end{enumerate}
+
+The default comparison functions for most objects that come with Python
+satisfy these rules, with some crucial cautions explained later.  Complex
+numbers are an example of an object whose default comparison function
+does not satisfy these rules:  complex numbers only support \code{==}
+and \code{!=} comparisons, and raise an exception if you try to compare
+them in any other way.  They don't satisfy the trichotomy rule, and must
+not be used as keys in btree-based data structures (although note that
+complex numbers can be used as keys in Python dicts, which do not require
+a total ordering).
+
+Examples of objects that are wholly safe to use as keys in btree-based
+structures include ints, longs, floats, 8-bit strings, Unicode strings,
+and tuples composed (possibly recursively) of objects of wholly safe
+types.
+
+It's important to realize that even if two types satisfy the
+rules on their own, mixing objects of those types may not.  For example,
+8-bit strings and Unicode strings both supply total orderings, but mixing
+the two loses trichotomy; e.g., \code{'x' < chr(255)} and
+\code{u'x' == 'x'}, but trying to compare \code{chr(255)} to
+\code{u'x'} raises an exception.  Partly for this reason (another is
+given later), it can be dangerous to use keys with multiple types in
+a single btree-based structure.  Don't try to do that, and you don't
+have to worry about it.
+
+Another potential problem is mutability:  when a key is inserted in a
+btree-based structure, it must retain the same order relative to the
+other keys over time.  This is easy to run afoul of if you use mutable
+objects as keys.  For example, lists supply a total ordering, and then
+
+\begin{verbatim}
+>>> L1, L2, L3 = [1], [2], [3]
+>>> from BTrees.OOBTree import OOSet
+>>> s = OOSet((L2, L3, L1))  # this is fine, so far
+>>> list(s.keys())           # note that the lists are in sorted order
+[[1], [2], [3]]
+>>> s.has_key([3])           # and [3] is in the set
+1
+>>> L2[0] = 5                # horrible -- the set is insane now
+>>> s.has_key([3])           # for example, it's insane this way
+0
+>>> s
+OOSet([[1], [5], [3]])
+>>>
+\end{verbatim}
+
+Key lookup relies on that the keys remain in sorted order (an efficient
+form of binary search is used).  By mutating key L2 after inserting it,
+we destroyed the invariant that the OOSet is sorted.  As a result, all
+future operations on this set are unpredictable.
+
+A subtler variant of this problem arises due to persistence:  by default,
+Python does several kinds of comparison by comparing the memory
+addresses of two objects.  Because Python never moves an object in memory,
+this does supply a usable (albeit arbitrary) total ordering across the
+life of a program run (an object's memory address doesn't change).  But
+if objects compared in this way are used as keys of a btree-based
+structure that's stored in a database, when the objects are loaded from
+the database again they will almost certainly wind up at different
+memory addresses.  There's no guarantee then that if key K1 had a memory
+address smaller than the memory address of key K2 at the time K1 and
+K2 were inserted in a BTree, K1's address will also be smaller than
+K2's when that BTree is loaded from a database later.  The result will
+be an insane BTree, where various operations do and don't work as
+expected, seemingly at random.
+
+Now each of the types identified above as "wholly safe to use" never
+compares two instances of that type by memory address, so there's
+nothing to worry about here if you use keys of those types.  The most
+common mistake is to use keys that are instances of a user-defined class
+that doesn't supply its own \method{__cmp__()} method.  Python compares
+such instances by memory address.  This is fine if such instances are
+used as keys in temporary btree-based structures used only in a single
+program run.  It can be disastrous if that btree-based structure is
+stored to a database, though.
+
+\begin{verbatim}
+>>> class C:
+...     pass
+...
+>>> a, b = C(), C()
+>>> print a < b   # this may print 0 if you try it
+1
+>>> del a, b
+>>> a, b = C(), C()
+>>> print a < b   # and this may print 0 or 1
+0
+>>>
+\end{verbatim}
+
+That example illustrates that comparison of instances of classes that
+don't define \method{__cmp__()} yields arbitrary results (but consistent
+results within a single program run).
+
+Another problem occurs with instances of classes that do define
+\method{__cmp__()}, but define it incorrectly.  It's possible but
+rare for a custom \method\{__cmp__()} implementation to violate one
+of the three required formal properties directly.  It's more common for
+it to fall back" to address-based comparison by mistake.
+For example,
+
+\begin{verbatim}
+class Mine:
+    def __cmp__(self, other):
+        if other.__class__ is Mine:
+            return cmp(self.data, other.data)
+        else:
+            return cmp(self.data, other)
+\end{verbatim}
+
+It's quite possible there that the \keyword{else} clause allows
+a result to be computed based on memory address.  The bug won't show
+up until a btree-based structure uses objects of class \class{Mine} as
+keys, and also objects of other types as keys, and the structure is
+loaded from a database, and a sequence of comparisons happens to execute
+the \keyword{else} clause in a case where the relative order of object
+memory addresses happened to change.
+
+This is as difficult to track down as it sounds, so best to stay far
+away from the possibility.
+
+You'll stay out of trouble by follwing these rules, violating them
+only with great care:
+
+\begin{enumerate}
+\item  Use objects of simple immutable types as keys in
+       btree-based data structures.
+
+\item  Within a single btree-based data structure, use objects of
+       a single type as keys.  Don't use multiple key types in a
+       single structure.
+
+\item  If you want to use class instances as keys, and there's
+       any possibility that the structure may be stored in a
+       database, it's crucial that the class define a
+       \method{__cmp__()} method, and that the method is
+       carefully implemented.
+
+       Any part of a comparison implementation that relies (explicitly
+       or implicitly) on an address-based comparison result will
+       eventually cause serious failure.
+end{enumerate}




More information about the Zodb-checkins mailing list