The Mercury Project
Re: [mercury-users] XML / DOM

Home

News

Information
  Overview
  Features
  Documentation
  Papers
  Developers
  Events
  Reports

Mailing Lists
  Developers
  Users
  Search

Download
  Current Release
  Snapshot
  Old Releases

Related
  Applications
  MCORBA
  Contributing Code

Contact

Search

Subject: Re: [mercury-users] XML / DOM
From: Richard A. O'Keefe (ok@atlas.otago.ac.nz)
Date: Tue Dec 19 2000 - 12:15:13 EST


Michael Day asked:
> Has anyone tried a DOM implementation in Mercury? Presumably it would be
> easier to implement one in a Mercury style on top of the existing XML
> parser than wrap a C/C++ DOM library, given the style of the interface?
        
Thomas Conway replied
    I haven't, but it would be fairly simple to implement one on top of
    the XML document representation used in my XML parser.
        
I must challenge this. Early this year I set out to implement the DOM
in Smalltalk. Smalltalk was a real joy to use, it takes the "Oh NO"
out of "O-O". But the DOM was a major pain. I started to write a
detailed critique of what is wrong with the DOM, but couldn't think of
anyone who'd publish it.

Basically:
    The Level 1 DOM is woolly. It's entirely typical of W3C recommendations
    in general: lots of (somewhat inexpert) attention to interfaces (the
    DOM is desperately in need of refactoring) and no real thought given to
    semantics. Here are some typical examples.

        1. Any time you ask the DOM for a string, it is allowed to say
        "sorry, that's too big for me to give you". But there is no way
        to ask "how big a string *can* I have, then?" and no guaranteed
        minimum.

        2. If you create a comment node whose content includes "--" as
        a substring, or a processing instruction node whose content
        includes "?>" as a substring, the interface is NOT ALLOWED TO
        COMPLAIN, even though the result can never be legal XML.

    The Level 2 DOM adds a huge amount of complexity to the Level 1 DOM,
    without fixing any of the basic problems. Here are three problems in
    the Level 2 DOM:
        
        3. Having comment nodes, and splitting CDATA section nodes out
        from other character data, is great if you are writing an XML
        editor, but lousy if you are doing almost anything else with XML.
        That is, from an SGML point of view,
            <p>Hello, world!</p>
        and
            <p>Hello<![CDATA[, ]]>world<![CDATA[!]]></p>
        are the *same* grove, with the <p> element having a *single*
        child. But in the DOM, the first one has one child, and the
        second one has four. You may never intend this to happen, but
        the possibility is enough to complicate your application code
        like you wouldn't believe (that, or simply have it wrong...)
        The Level 2 DOM provides a way to filter various things out
        in a traversal, but no way of never building them in the first
        place.

        4. The specification of traversal (iterators and such) is made
        enormously complicated by the question "What happens to an iterator
        if the position it refers to disappears?" The answer is complex,
        to me counter-intuitive, and difficult to implement correctly.
        I stopped at this point, because I didn't see the point of working
        hard to implement something I couldn't imagine any sane programmer
        wanting.

        5. The central design aspect of the DOM, Level 1, and Level 2,
        is "Thou shalt not share structure". This not only makes editing
        (the apparent primary purpose of the DOM) rather more expensive
        than it should have been, it makes even single-level UNDO hard to
        provide, let alone multi-level UNDO.

I note that since the Level 2 DOM is 469 pages, there is no way it
could possibly be "fairly simple" to implement it, even if the design
were lucid perfection.
--------------------------------------------------------------------------
mercury-users mailing list
post: mercury-users@cs.mu.oz.au
administrative address: owner-mercury-users@cs.mu.oz.au
unsubscribe: Address: mercury-users-request@cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-users-request@cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



This mail archive was generated by hypermail 2b25 on Sun Dec 31 2000 - 00:40:05 EST.