[Zope-dev] Multiformatted Interface (was Re:

Rik Hoekstra hoekstra@fsw.leidenuniv.nl
Wed, 13 Oct 1999 21:56:49 +0100


> (I moved this thread to zope-dev)

Woops. I accidentally sent my reply to the Zope list. Here it is 
again.

> 
> Toby Dickenson wrote:
> > 
> > On Mon, 11 Oct 1999 15:06:50 -0700, you wrote:
> > 
> > >I'm creating a brand new ZClass (called "PDFClass"). It's my first ZClass.
> > >I've just added the Common Instance Property Sheet (called "PDFProperties").
> > >Now I'm trying to add properties to that property sheet. I added one
> > >"string" property OK, but now when I add the "date" property "pub_date", I
> > >get this error:
> > >
> > >          Invalid Date-Time String
> > 
> > Are you planning to extract properties from PDF files? That's a task
> > on my to-do list too.
>  
> Let's start a discussion on this before any code gets written, I've been
> thinking alot about document formats lately.
> 
> I'm working on a model and elaboration of what I call MFI, Multi-Format
> Interface, which will be a component of the Portal Toolkit and possible
> a future core feature of Zope. Basicly, this consolidates all of the
> document types, DTML, XML, PDF, what-have you, into a subclassable
> interface that allows you to define pluggable format types. 

That is a very good idea. Should it be able to guess what the 
document format is or will you have to indicate that by hand?

> This way,
> indestead of making a whole new type of object (PDFDocument, whatever)
> you make a new plug-in format for MFI that all Documents can then select
> as their format type.  This is much more flexible, extendable, and
> 'philosophically' correct than the current method.  As an example, an
> HTML formatter could be made (that extracts meta-information from a
> document and perhaps builds a DOM tree if it's parsable well enough), a
> PDF formatter (can DOM be put on PDF?) 

I doubt it. The structured documents and the page oriented markup 
seem to be rather different philosophies of representing a text 
document.  Turning PDF documents into html is no easy business 
either. But then, some sectioning of even a pdf document (and even of 
Word documents - sometimes :-) _is_ possible. Calling that a DOM is 
stretching the DOM concept a bit too much I think.


> a structured text (stx)
> formatter (we are elaborating a DOM interface for Stx).  The
> possiblities are endless, and it means that all of these document types
> in the add list can be reduced to one selection.
> 
> This is a pretty light description, but there are many other benefits
> I'm formalizing into a document right now.  What are you thoughts?
> 

What all documents do/could/should have (natively or added) are 
document properties (preferably conforming to the Dublin Core, for 
standardization). These should be extracted using DOM or COM (for 
Word documents) or via a PDF parser for PDF documents or whatever and 
added to the propertysheet. Or/and propertysheets could be filled by 
hand through the Zope Management interface.

I take it that your proposal also leads to inclusion of documents in 
catalogs for fielded and fulltext searches? Yes, please? 

However, won't this be conceptually difficult beyond full text 
searches? The level of access of the documents are so diverse. 
Compare the structured DOM access to XML documents to the (basically) 
mere word level access to pdf (and even html) documents.

Oh well, just my own preoccupations I guess. An very good idea 
Michel.

Rik

_______________________________________________
Zope maillist  -  Zope@zope.org
http://www.zope.org/mailman/listinfo/zope

(Related lists - please, no cross posts or HTML encoding!

To receive general Zope announcements, see:
http://www.zope.org/mailman/listinfo/zope-announce

For developer-specific issues, zope-dev@zope.org -
http://www.zope.org/mailman/listinfo/zope-dev )