HTML docs for browser writers (not users)??????

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Marc Rochkind

    HTML docs for browser writers (not users)??????

    Nearly everything written about HTML falls into one of two categories:

    1. Material written for HTML authors, or
    2. Material written for user-agent implementors about standard HTML

    However, programmers writing a browser need to know about invalid,
    obsolete, and non-standard HTML also, because that's what shows up on the
    web. For example, on the Yahoo site alone I see:

    -- use of hex numbers for color attributes without the #
    -- use of the <spacer> tag
    -- use of the <image> tag (in addition to <img>, of course)

    and more.

    Does anyone know of documentation about what one might call "reality HTML,"
    versus "standard HTML" or "recommende d HTML?" I don't mean documentation of
    deprecated tags... any tag that was ever standardized is more than
    adequately documented. What I'm looking for is documentation on tags or
    attributes that are in widespread use, but were never standardized.

    Does anyone know of any surveys of actual HTML that's in use, maybe in the
    form of a report of frequency of occurrence of tags, whether standardized
    or not? (If not, I'm thinking of conducting such a survey using a custom-
    designed web crrawler.)

    --Marc
  • Jim Dabell

    #2
    Re: HTML docs for browser writers (not users)??????

    Marc Rochkind wrote:

    [snip][color=blue]
    > Does anyone know of documentation about what one might call "reality
    > HTML," versus "standard HTML" or "recommende d HTML?"[/color]
    [snip]

    I'm afraid not. That's part of the problem with tag soup, of course, UA
    authors need to reverse engineer popular browsers rather than just code to
    a specification. There's been a few illuminating posts by Hixie & Hyatt on
    their blogs, in particular:

    <URL:http://ln.hixie.ch/?start=10379104 67&count=1>
    <URL:http://weblogs.mozilla zine.org/hyatt/archives/2003_03.html#00 2904>

    You'll want to have a skim of their sites for more stuff like this,
    especially now that Hixie has started at Opera (Hyatt works on Safari, and
    Mozilla before that). The bug databases of common browsers might be
    helpful:

    <URL:http://bugzilla.mozill a.org/> (Mozilla derivatives)
    <URL:http://bugs.kde.org/> (Konqueror's bugs are in the main KDE bug
    tracker, but fairly easy to pull out with a query)

    You may want to join the relevant W3C mailing lists as well, I'm sure you
    could get a few questions answered there:

    <URL:http://lists.w3.org/>

    html-tidy, www-validator, www-validator-css, www-html and www-style are
    probably the ones I would monitor in your position. Reviewing the source
    for tidy wouldn't be a bad idea either.

    It might be worth lurking here and next door in ciwas as well to get a
    handle on where the most common problems are.

    [color=blue]
    > Does anyone know of any surveys of actual HTML that's in use, maybe in the
    > form of a report of frequency of occurrence of tags, whether standardized
    > or not? (If not, I'm thinking of conducting such a survey using a custom-
    > designed web crrawler.)[/color]

    I haven't heard of anything like that. You'll want to make the distinction
    between tags and elements if you are writing a UA though.

    --
    Jim Dabell

    Comment

    • Alan J. Flavell

      #3
      Re: HTML docs for browser writers (not users)??????

      On Wed, Aug 13, Marc Rochkind inscribed on the eternal scroll:
      [color=blue]
      > Does anyone know of documentation about what one might call "reality HTML,"
      > versus "standard HTML" or "recommende d HTML?"[/color]

      I think you'd find http://www.blooberry.com/indexdot/index.html
      to be a useful resource.

      Comment

      • Marc Rochkind

        #4
        Re: HTML docs for browser writers (not users)??????

        On 14 Aug 2003 09:09:23 -0700, Brian Wilson <usenet@bloober ry.com> wrote:

        [snip]
        [color=blue]
        >
        > Actually, I do mention it under the page for the IMG element. 8-}
        >[/color]

        [snip]

        Indeed, I see it now....

        Brian, yours is a sensational site! Last night I found myself just reading
        pages at random, fascinated by all the detail. (And, if anyone ever tries
        to tell me that the best sites are the best looking sites, I can use yours
        to prove them wrong. ;-) )

        I wonder if HTML wins some sort of award for the most precisely specified
        yet most sloppily practiced standard? (In other areas, such as hardware
        standards, OS standards, computer language standards, and the like, a non-
        standard usage simply won't work. HTML is unusual in that the software --
        mostly browsers -- is so flexible. Can you think of any other similar
        situations?)

        Naturally, the conspiracy-theorists among us can point out that it's in
        Microsoft's interests to have web pages as difficult to process as
        possible, so as to raise the cost of developing a browser. I learned from
        my days working for a VC that the higher the development costs, the more
        effective the barrier to entry.

        What I don't understand is why some top sites, such as Yahoo, Google News,
        CNN, etc., are so poorly coded. Yahoo is such as mess it's laughable. Where
        does their HTML come from? It's obviously coded by hand, and by a weak hand
        at that.

        --Marc

        Comment

        • Marc Rochkind

          #5
          Re: HTML docs for browser writers (not users)??????

          On Thu, 14 Aug 2003 23:35:26 +0100, Nick Kew <nick@fenris.we bthing.com>
          wrote:

          [snip]
          [color=blue]
          >[color=green]
          >> What I don't understand is why some top sites, such as Yahoo, Google
          >> News, CNN, etc., are so poorly coded. Yahoo is such as mess it's
          >> laughable. Where[/color]
          >
          > Yahoo made its name early - before HTML standardisation - and got
          > themselves the strongest name amongst journos who had never actually
          > used the web - and hence the general public in the mid-90s. Now they
          > live
          > on their name. [snip][/color]

          But my point was that, assuming Yahoo wants the widest possible readership,
          why wouldn't they code their HTML in the most conforming possible way,
          instead of using non-standard and invalid constructs?

          It's a mystery to me... I find it hard to believe that with all their
          resources they don't know better.

          --Marc

          Comment

          • Chris Hoess

            #6
            Re: HTML docs for browser writers (not users)??????

            In article <oprtuzcffhojfy i9@den.news.spe akeasy.net>, Marc Rochkind wrote:
            [color=blue]
            > Does anyone know of any surveys of actual HTML that's in use, maybe in the
            > form of a report of frequency of occurrence of tags, whether standardized
            > or not? (If not, I'm thinking of conducting such a survey using a custom-
            > designed web crrawler.)[/color]

            <URL:http://www.ub.uib.no/elpub/2001/h/413001/>

            --
            Chris Hoess

            Comment

            • Marc Rochkind

              #7
              Re: HTML docs for browser writers (not users)??????

              On Fri, 15 Aug 2003 23:07:30 +0000 (UTC), Chris Hoess
              <choess@stwing. upenn.edu> wrote:
              [color=blue]
              > In article <oprtuzcffhojfy i9@den.news.spe akeasy.net>, Marc Rochkind
              > wrote:
              >[color=green]
              >> Does anyone know of any surveys of actual HTML that's in use, maybe in
              >> the form of a report of frequency of occurrence of tags, whether
              >> standardized or not? (If not, I'm thinking of conducting such a survey
              >> using a custom-designed web crrawler.)[/color]
              >
              > <URL:http://www.ub.uib.no/elpub/2001/h/413001/>
              >[/color]


              Thanks! Exactly what I was looking for.

              --Marc

              Comment

              • Henri Sivonen

                #8
                Re: HTML docs for browser writers (not users)??????

                In article <VmKdnTdvs8ocBK eiXTWQlg@gigane ws.com>,
                Jim Dabell <jim-usenet@jimdabel l.com> wrote:
                [color=blue]
                > Reviewing the sourcefor tidy wouldn't be a bad idea either.[/color]

                Also, reading the source of Mozilla's HTML parser might help, although
                the code is neither pretty nor easy to follow. (Safari's HTML parser
                doesn't handle as much quirkiness as Mozilla's but might be easier to
                read.)

                Then there's Tag Soup, a tag soup parser written in Java which from the
                application point of view appears to be a SAX parser that is parsing
                XHTML. http://mercury.ccil.org/~cowan/XML/tagsoup/

                If I were to write a non-browser program that had to deal with
                real-world HTML, I'd probably use Tag Soup.

                --
                Henri Sivonen
                hsivonen@iki.fi

                Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

                Comment

                • Henri Sivonen

                  #9
                  Re: HTML docs for browser writers (not users)??????

                  In article <dqesjv00ak5vub 3lq4krvv263uk6e 5vu2i@4ax.com>,
                  Tim <admin@sheerhel l.lan> wrote:
                  [color=blue]
                  > On Sat, 16 Aug 2003 12:38:36 +0300,
                  > Henri Sivonen <hsivonen@iki.f i> wrote:
                  >[color=green]
                  > > If I were to write a non-browser program that had to deal with
                  > > real-world HTML, I'd probably use Tag Soup.[/color]
                  >
                  > It looks like quite a few do that.
                  >
                  > e.g. Rather than properly parse a list in a document, they indent text
                  > after a UL or OL tag, they bullet text after a LI tag, together
                  > producing the common indented bulleted list, but separately still doing
                  > something.[/color]

                  I meant I'd use the parser called "Tag Soup" which would allow me to
                  write my app code as if I was dealing with a SAX parser parsing XHTML.

                  --
                  Henri Sivonen
                  hsivonen@iki.fi

                  Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

                  Comment

                  Working...