[Zope-CMF] Re: cmfuid

Sun Nov 21 01:05:20 EST 2004

On Sat, 2004-11-20 at 23:41 -0500, Tres Seaver wrote:
> > I could use the object's physical path for this purpose or some other
> > attribute with business meaning, but for performance reasons, it's very
> > undesirable to need to "fix up" the disk files when these (invariably)
> > change.
> 
> You don't need to, if you store the hash key used to locate the file on 
> the object itself.

Yup, that's what I'm doing.

>   You should then plan to index them, in order to 
> traverse back from the file to the ZODB object. ;)

I hope I never have to do that.

> BTW, the performance hit of moving an inode from one directory to 
> another within the same filesystem is inconsequential when compared to 
> the other costs you are going to incur when you move placeful content.

You're right, sorry, it's not a performance thing.  It's a "I just don't
want to write that code" thing.  I'm trying to tie the writing of files
to transactions and it seems hard enough as it is without needing to
pseudo-transactionally move stuff too.  Although maybe that'd be simpler
in the end, I don't know.

> > I'm not sure there's much of a difference in cases like mine.  UUIDs
> > happen to claim to be "globally unique" but I really don't care whether
> > they are or not for my particular use; I just want them to be unique to
> > this Zope instance.  Any strategy that gets rid of the database write is
> > cool with me.
> 
> You are worried about adding a single entry in a btree-bucket when 
> creating all the other crap associated with a new content object in the 
> ZODB?  Really?

No, I'm worried about generating actual unique ids without read
conflicts causing the request to be retried.  (FWIW, there are no BTrees
around in CMFUid, just a Length object.)

> > FWIW, the current implementation of CMFUid depends entirely on conflicts
> > to assure that no two threads get the same uid at the same instant. 
> > This is broken, at least under 2.7 (no MVCC) inasmuch as it depends on
> > BTrees.Length, which ignores read conflicts by design.  The current
> > implementation *will* dole out the same UID to two simultaneous
> > callers.  Subclassing Length is possible to make it respect read
> > conflicts and thus generate unique ids, but this will be a
> > performance-eating hotspot under 2.7 and perhaps under 2.8 depending on
> > how efficient MVCC is.
> 
> That is a different problem.  The approach should probably do something 
> like the RID-generation stuff in the catalog, which doesn't rely on 
> write conflicts to avoid collisions.

Yup it is a different problem.

> I don't think you want a purely probabilistic / heuristic solution, but 
> I could be wrong (feel free to ding me for a beer, if so).

No, you're right, I don't care one way or another, but assuming I do
want unique ids, I'd like to get them without putting up with retried
requests.

I took a look at the rid generator and it scares me that I understand
it.  Out of curiosity, what is the main argument against probabilistic
uid generation?  It seems so much simpler.  Is that one of those
water-in-the-desert kind of mirages?

Thanks!

- C