[Zope-dev] RE: [ZC] 365/ 2 Reject "metadata in Catalog is spa ce inefficient"

Jay, Dylan djay@avaya.com
Mon, 29 Apr 2002 09:41:01 +1000


> -----Original Message-----
> From: Chris Withers [mailto:chrisw@nipltd.com]
> Sent: Friday, 26 April 2002 11:18 PM
> To: tdickenson@geminidataloggers.com
> Cc: Jay, Dylan; zope-dev@zope.org
> Subject: Re: [Zope-dev] RE: [ZC] 365/ 2 Reject "metadata in Catalog is
> space inefficient"
> 
> 
> Toby Dickenson wrote:
> > 
> > Note that this scheme may not necessarily give runtime performance
> > benefits. Loading the reverse index data may not be any faster than
> > loading metadata.
> 
> I'm betting in a lot of cases it'll be a damn site slower.
> 
> MetaData is specifically designed to be real quick to load. 
> For the small extra space
> usage (how much _does_ disk space, or RAM for that matter, 
> cost nowadays?! ;-), I'm more
> than happy to take the speed win...

Fair enough, I hadn't considered the time trade-off due to the reverse index
being loaded. However, let me couch my suggestion in a different way. We've
identified that the metadata can be located in possibly 3 different places.
Use of each has different speed/space trade-offs. Using each also has a
different API eg (from memory so won't be entirely correct)

 getObjectForRID[object_id_].field

 getIndexDataForRID[object_id_].field

 field

All these methods might result in the correct data being displayed for a
given search (each with a different tradeoff). Perhaps there should be a way
of making this tradeoff transparently. That way the report designer can be
ignorant of any optimizations eg

 <dtml-in Catalog>
  <dtml-var title>
 </dtml-in>

would work no matter whether the title was set as metadata field or not. If
it wasn't then the catalog might access the object and look it up (with the
resulting time penelty). Then, in the same way as in the RDBMS world, the
indexs (the Catalog) could be adjusted to make this operation more efficient
without changing any code (just add metadata to the Catalog). The same
tradeoff could be used if an admin decided that the object was too big and
overhead of repeating the field in metadata was too big, then they could
decide to obtain the data from the FieldIndex data instead.