Character set

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hans Mabelis

    Character set

    I'm new here; got here because suddenly the question came up: is html a
    7-bit or an 8-bit language? Officially, I mean.
    I seem to consistently suffer from character set issues. Of course, I can
    specify a specific character set - but that doesn't guarantee the receiving
    computer will have that set on board.
    Can anyone tell me more? Where to find guidelines, and real-world info?

    Hans


  • Jukka K. Korpela

    #2
    Re: Character set

    "Hans Mabelis" <hans@mabelis.n l> wrote:
    [color=blue]
    > I'm new here;[/color]

    Checking the FAQ is advisable then. It's a bit dusty, but checking it
    is better than starting from scratch in every thread. You might start
    from http://www.htmlhelp.com/faq/html/bas...l#special-char
    [color=blue]
    > is html a 7-bit or an 8-bit language?[/color]

    Yes. And no. You can use a 7-bit encoding, or an 8-bit encoding, or any
    other encoding for an HTML document.
    [color=blue]
    > I seem to consistently suffer from character set issues.[/color]

    Then please specify them, with URLs, after checking the basic
    resources.
    [color=blue]
    > Of course, I can specify a specific character set[/color]

    I'm afraid that could mean rather different things,
    [color=blue]
    >- but that doesn't guarantee
    > the receiving computer will have that set on board.[/color]

    Indeed. The safest bet in practice is Ascii. The second-safest in
    theory (and pretty much in practice too, in worldwide considerations)
    is UTF-8, if you know how to produce and announce it. But I'm not sure
    whether you mean character encoding, character repertoire, or font.
    Three different beasts.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

    Comment

    • Alan J. Flavell

      #3
      Re: Character set

      On Sun, 7 Mar 2004, Hans Mabelis wrote:
      [color=blue]
      > I'm new here; got here because suddenly the question came up: is html a
      > 7-bit or an 8-bit language?[/color]

      No. Not since RFC2070 and HTML4.*
      [color=blue]
      > I seem to consistently suffer from character set issues.[/color]

      That's a bit vague. Do you want to understand the underlying
      principles (which is what I would recommend) or are you experiencing
      specific problems (in which case you'd need to say a bit more about
      what they are, and preferably put some of the problematic materials
      online so that people can see for themselves what's going on).
      [color=blue]
      > Of course, I can specify a specific character set[/color]

      Actually no. The Document Character Set is always iso-10646/unicode.
      What you _can_ specify is the character encoding, which in MIME
      terminology is confusingly called "charset". Until you understand the
      difference, none of this stuff is likely to make much sense, I'm
      afraid.

      Some people have found the materials in my area
      http://ppewww.ph.gla.ac.uk/~flavell/charset/ to be of use.

      But RFC2070 itself isn't bad, even if it's somewhat dated. The
      description of the character representation model in HTML/4.01 is also
      reasonably clear. The hardest part is often un-learning things that
      the student is convinced that they already understand.

      good luck

      Comment

      Working...