[Zope-Coders] Analysis: BTrees and Unicode and Python
Andreas Jung
Andreas Jung" <andreas@zope.com
Fri, 19 Oct 2001 12:08:26 -0400
----- Original Message -----
From: "Guido van Rossum" <guido@python.org>
To: "Andreas Jung" <andreas@zope.com>
Cc: "Jim Fulton" <Jim@zope.com>; <zope-coders@zope.org>
Sent: Friday, October 19, 2001 11:52
Subject: Re: [Zope-Coders] Analysis: BTrees and Unicode and Python
> > - one of these earlier comparision checks a Python string (containing
> > and accented character) against a unicode string and raises a
> > unicode exception (ASCII decoding error: ordinal notr in range(128)).
> > I assume because the default encoding is ascii.
>
> Note that this was a conscious design decision. Not all the world
> uses Latin-1, and many real-world programs and data use different
> interpretations of 8-bit characters with the high bit set. Assuming
> Latin-1 when comparing to Unicode would be wrong.
I assume the exception is raised before calling the PyUnicode_Compare
function. Otherwise silently ignoring this error condition is also not
a solution so I agree that Python behaviour is reasonable :)
>
> > - there is no check in the BTree code to check for an exception after
> > PyObject_Compare() and so this error got never cleared
>
> This should be fixed before proceeding.
jup !
>
> > - when when trying to compare two identical unicode strings, Python
> > calls default_3_way_compare() and runs into the following code:
> >
> >
> > static int
> > default_3way_compare(PyObject *v, PyObject *w)
> > {
> > int c;
> > char *vname, *wname;
> >
> > if (v->ob_type == w->ob_type) {
> > /* When comparing these pointers, they must be cast to
> > * integer types (i.e. Py_uintptr_t, our spelling of C9X's
> > * uintptr_t). ANSI specifies that pointer compares other
> > * than == and != to non-related structures are undefined.
> > */
> > Py_uintptr_t vv = (Py_uintptr_t)v;
> > Py_uintptr_t ww = (Py_uintptr_t)w;
> > puts("\t\t\tdefcmp 1");
> > return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
> > }
> >
> > This code returns -1 for the two identical unicode strings.
> >
> > I am not sure if this code is able to compare two unicode strings.
> > On the other hand it is still strange that the unittest works when
> > replacing the same unicode string in the list with the testdata in the
> > unittest
> > with self.s as described earlier.
> >
> > Any ideas about that ?
>
> It is definitely a bug if comparison of two unicode strings ends up
> calling default_3way_compare()!
>
> This normally doesn't happen though -- the Unicode object's comparison
> code is generally called.
>
> I'd like to see what's on the stack when default_3way_compare is
> called with two Unicode objects.
How can I determine that ?
> Which Python version is this? 2.1 or 2.1.1?
2.1.1
Andreas