[Zope-CMF] Dublin Core

Tres Seaver tseaver@palladion.com
Sun, 10 Jun 2001 12:00:51 -0400


Tim Lynch wrote:

> As someone who has been involved in designing systems that 
> incorporate support for Dublin Core for several years, and who
> works in a research library, I'd like to add a few comments
> about the use of DC.
> 
> First, DC grew out of the perceived need for a common subset
> of metadata elements drawn from _existing_ metadata schemas.
> For example, most academic libraries use an incredibly
> comprehensive and complex schema called MARC (http://www.loc.gov/marc/)
> to describe their holdings.  Folks who deal with geo-referenced
> datasets typically use the Federal Geographic Data Committee,
> FGDC, schema (http://www.fgdc.gov/). Many other schemas are
> in common use including the Global Information Locator Service
> (http://www.gils.net) and the newest kid on the block,
> the Open Archives Initiative (http://www.openarchives.org/).
> 
> If you look at any schema, however, you typically see a dozen
> or so elements that are common or "close enough" to be considered
> common for everyday use.  These dozen or so elements are what
> make up the Dublin Core.  The reason the DC elements appear to
> be "hideously underdescribed" is because corresponding elements
> are already elaborately described in other schemas.  The DC
> was never intended to serve as _the_ schema for anything. It's
> best interpreted as a core set of elements that you can always
> fall back to.  In this light, the required/optional status
> makes more sense: most DC elements are optional exactly because
> it is presumed your data is described by a metadata schema
> more appropriate than DC, that DC is simply the lcd and thus,
> given the nature of your data (and closely linked metadata
> schema) certain DC elements may or may not be present.
> 
> I, as a data provider, would never consider building a metadata
> schema around the Dublin Core. Rather, I'd build a schema that 
> works for my data and pay particular attention to making it 
> congruent with DC.


The CMF explicily makes this distinction already:  the DublinCore
interface is a *query interface*, which must be supported by all
content;  the actual storage mechanisms are entirely optionsl.
The default content objects in the CMF happen to use the Dublin
Core attributes as their standard metadata;  other content classes
are free to implement the interface in their own fashipn.

>  Thus, my system could potentially negotiate
> an exchange along the lines of:
> 
> client:  Hello -- do you speak MARC?
> server:  No, but I speak FGDC.
> client:  No, that won't do.  How about DC?
> server:  Sure, let's talk DC.


We don't have *any* requirment for doing this kind of negotiation,
nor do the specs I know about even deal with the query mechanisms
for exchanging metadata (DC or otherwise) across heterogenous systems.

We selected the DublinCore as the preferred metadata schema for the
CMF because (except for the Date field, anyway) it seemed an adequate
standard for general-purpose content sites.  Discoverability *within*
a given site was the driving goal;  syndication / interchange with
other sites was only a secondary or tertiary benefit of supporting
a common standard.  We were specifically starting from a point in which
there were *no* other reasonable standards available;

> So, DC is our lowest common denominator.  In the real world,
> a more extensive schema may well serve as a better lcd.


We ditched using 'Date' as a dependable field in the CMF, primarily
because its semantics were too weak/fuzzy to support the kinds of
queries a CMF site must perform.  We use instead four distinct "date"
fields, each with a "catalogable" version and a "human-readable
version. (Note that the proposed qualifier set for Date was hardly
a better match for our needs).


> In any case, to build a system with DC hardwired in is a
> mistake. Particularly a subset of DC.  DC is itself a subset of
> all other proven useful metadata schemas.  Everyone is going
> to want, no make that _need_, to extend the schema.


Again, we are doing *framework* here, specifically providing a
mechanism (interfaces) which allow for extensibility.  The *real*
dependency in the CMF on DC is on the "SearchableDublinCore"
interface, which requires using "real" date values for the (already
extended) date fields, in order to facilitate catalog searches.


> For another perspective on the utility of DC, take a look at
> Carl Lagoze's paper:
> 
>   http://www.dlib.org/dlib/january01/lagoze/01lagoze.html
> 
> I think Carl does an excellent job of laying out the pros and cons
> of DC and of the attempts to work with it as _the_ schema of record.
> 
> Bottom line: If you are going to implement DC, you should at least
> implement the full set. Furthermore, to make DC work as intended within 
> CMF, you need to incorporate the ability to extend the metadata element
> set beyond DC.


Since all elements are optional, leaving one out because it didn't
seem applicable to the kind of sites we were building (Coverage, in this
case) seems defensible to me.  Ricardo has laid out a use case which
requires Coverage, and which specifies the semantics enough to allow us
to choose a representation.  I have a CVS branch now which includes
'Coverage' support, as a result.

We left out the other missing elements, (Relation and the
highly-redundant Source) not because we thought they didn't apply, but
because we couldn't afford the development effort to do them "correctly";
we have a proposal to support them, and are actively considering the
mechanisms we will use to capture them.

Thanks for your constructive feedback!

Tres.
--
===============================================================
Tres Seaver                                tseaver@digicool.com
Digital Creations     "Zope Dealers"       http://www.zope.org