The Mercury Project
Re: [mercury-users] Sluggish XML parsing

Home

News

Information
  Overview
  Features
  Documentation
  Papers
  Developers
  Events
  Reports

Mailing Lists
  Developers
  Users
  Search

Download
  Current Release
  Snapshot
  Old Releases

Related
  Applications
  MCORBA
  Contributing Code

Contact

Search

Subject: Re: [mercury-users] Sluggish XML parsing
From: Thomas Conway (conway@cs.mu.OZ.AU)
Date: Tue Dec 26 2000 - 11:05:24 EST


On Sun, Dec 24, 2000 at 05:33:38PM EST, Michael Day wrote:
> Nonetheless, it's nice that compiling it with -O6 takes minutes rather
> than hours - is this particular sluggishness due to all the introduced
> preds involved with higher order code? Compiler dudes?

The reason that compiling is so slow is that the production for `letter'
works on Unicodes rather than just chars, so the production looks something
like:

letter -->
    baseChar or ideographic.

baseChar -->
    (0x0041-0x005A) or (0x0061-0x007A) or (0x00C0-0x00D6)
    or (0x00D8-0x00F6) or (0x00F8-0x00FF) or (0x0100-0x0131)
    or (0x0134-0x013E) or (0x0141-0x0148) or (0x014A-0x017E)
    or (0x0180-0x01C3) or (0x01CD-0x01F0) or (0x01F4-0x01F5)
    or (0x01FA-0x0217) or (0x0250-0x02A8) or (0x02BB-0x02C1) or lit1(0x0386)
    or (0x0388-0x038A) or lit1(0x038C) or (0x038E-0x03A1) or (0x03A3-0x03CE)
    ....

(where -/4 is a parser combinator that accepts a character from a range
in the obvious kind of way, and or/4 is a parser combinator that accepts
the alternation of two parsers, also in the obvious kind of way)

Which is slow to compile for two reasons: a vast number of unifications
get introduced to construct all the pred expressions, and gobs and gobs
of introduced predicates get generated.

This is why I split baseChar and a couple of other productions into a
separate submodule, to prevent them from being recompiled all the time.
If you think the compile-time is a performance bug, then send in a bug
report, and maybe someone (not me!) might do something about it. ;-)

-- 
 Thomas Conway              Mercurian )O+  
 <conway@cs.mu.oz.au>       Every sword has two edges.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users@cs.mu.oz.au
administrative address: owner-mercury-users@cs.mu.oz.au
unsubscribe: Address: mercury-users-request@cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request@cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



This mail archive was generated by hypermail 2b25 on Sun Dec 31 2000 - 00:40:05 EST.