[Zope] RFC: Flaws in Structured text

Alexander Staubo alex@mop.no
Mon, 13 Mar 2000 06:13:39 +0100


I would like to outline a few problems, based on
observations of real-world usage, with Structured text.
Comments are appreciated.

As much as I like the format, the current syntax is either
too loose, or the transformation logic too dumb, resulting
in text that does not come out quite as expected. There are
workarounds for all such problems, but authors and editors
aren't aware of them, and continually make the same mistakes
over and over again.

Here are a few gotchas:

- Lines starting with the text number-dot (eg., "1.") are
considered to be bullets.

  Problem: In other languages, and sometimes in English,
  this is a valid non-bullet introduction. In Scandinavian
  languages (Norwegian is the language of my country), for
  instance, "1." is an oft-used way of spelling "1nd" or
  "first" ("2." for "2nd" or "second", and so forth). My
  experience is that this occurs frequently.

  Solution: Avoid transforming such paragraphs into bullets
  when they occur adjacently to valid non-bullet sentences,
  and only apply the transformation when the number is
  higher than the preceding paragraph. Validate the number
  itself -- today you can write
  
    43423. Something
    
    24324. Something else
    
  and it's still transformed into 
  
    <ol><li><p>Something</p><li><p>Something else</p></ol>

- Lines starting with dash-space (eg., "- The") are also
interpreted as bullets.

  Problem: As with number bullets, this clashes with non-
  English conventions. At least in Scandinavian languages,
  quotes are often given using a preceding dash. Here's an
  actual example, translated to English for clarity:

    - Pooh, said Rabbit kindly, you haven't any brain. - I know, said
Pooh humbly. 

  Solution: Make this feature optional.
  
- Em dash sequence ("--") usage clashes with definition
lists.

  Problem: People, including me, frequently use an ASCII
  form ("--") of the em dash ("-"), in the absence of this
  character in the 7-bit ASCII character set. (Actually,
  what *I* want is an en dash, which ISO provides.) This
  transforms into a HTML <dl></dl> list, even when the dash
  occurs late in the paragraph.
  
  Solution: Only transform paragraph to definition list if
  the dash sequence follows the first, dot-terminated
  sentence, like so:
  
    This is a definition term. -- and this is the definition
    itself.
  
  Not sure whether this is ideal. It would likely break old
  stx documents. Perhaps a "force literal" control character
  can be introduced, like the "\" token used in Python, Perl,
  C, etc.
  
  Note to self: There ought to be a way to transform such
  poor-man's dashes into en/em dashes, and quotation marks
  to "smart" quotation marks.

- Link transformation into HTML anchors tags is poor, or at
least too rigidly parsed.

  Problem: Some examples of links that do not work:
  
    The document (found "here":http://www.zope.org).
    
    <dtml-var "'This is a \x022link\x022:http://www.zope.org/'" 
      fmt="structured-text">, 
   
  I'd love to give cite other annoying cases I've come
  across, but I don't remember them. ;-)
   
  Solution: Tolerate parantheses, and accept that URLs that
  are terminated with end-of-line. Provide better syntax for
  specifying URLs, such as this:
  
    A product called Zope (http://www.zope.org) can be found
    on the _Zope site_ (http://www.zope.org).
    
  which would be transformed into:
  
    A product called <a href="http://www.zope.org">Zope</a>
    can be found on the <a href="http://www.zope.org">Zope
    site</a>.
  
- Structured text code should not wrap transformed text in
paragraph tags (<p></p>), or should at least make this
wrapping optional.

  Problem: I often display stx in places where a new
  paragraph creates unnecessary vertical padding that
  violates page design -- for this purpose I have been
  forced to write a simple External Method that removes the
  offending paragraph tags. IMHO, this ought to be
  unnecessary.

  Solution: Provide necessary option.
  
- Structured text code not available to DTML.

  Problem: The code is only available to External Methods
  and products. DTML can only get at stx through dtml-var's
  fmt attribute.
  
  Solution: It would be swell to have an _.stx() or
  StructuredText() construct.

-- 
Alexander Staubo         http://alex.mop.no/
"`This must be Thursday,' said Arthur to himself, sinking low over
his beer, `I never could get the hang of Thursdays.'"
--Douglas Adams, _The Hitchhiker's Guide to the Galaxy_