Char encoding and subsets

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ian Rastall

    Char encoding and subsets

    This is my understanding so far, and please correct any errors:

    1. US-ASCII is a subset of ISO-8859-1
    2. US-ASCII is a subset of UTF-8
    3. ISO-8859-1 is not a subset of UTF-8

    But ... are the numeric entities (in hex or decimal) for ISO-8859-1
    the same in UTF-8?

    Can an HTML document that uses only Latin-1 numeric entities have
    its content-type changed to UTF-8 and still be valid?

    Do Latin-1 numeric entities have to be written either as x## or ###,
    or can they have trailing zeros, like x00## or 0###, which is what
    you would have with UTF-8?

    TIA

    Ian
    --



  • Andreas Prilop

    #2
    Re: Char encoding and subsets

    Ian Rastall <idrastall@sbcg lobal.net> wrote:
    [color=blue]
    > 3. ISO-8859-1 is not a subset of UTF-8[/color]

    Since "subset" could only refer to character _sets_ but not encodings,
    your statement is meaningless.
    [color=blue]
    > But ... are the numeric entities (in hex or decimal) for ISO-8859-1
    > the same in UTF-8?[/color]

    There are entities like &#ouml; and numeric character references
    like &#246; . The number (246) refers to Unicode and it happens to be
    the same code position in ISO-8859-1 whenever number < 256. (That
    includes the range 128...159, which are no graphic characters.)
    <http://www.w3.org/TR/html4/charset.html#h-5.3.1>
    [color=blue]
    > Can an HTML document that uses only Latin-1 numeric entities have
    > its content-type changed to UTF-8 and still be valid?[/color]

    Yes - if you mean "numeric character references" and Content-type
    "text/html;charset=UT F-8".
    [color=blue]
    > Do Latin-1 numeric entities have to be written either as x## or ###,
    > or can they have trailing zeros, like x00## or 0###,[/color]

    What's the point in writing &#0246; ?
    [color=blue]
    > which is what you would have with UTF-8?[/color]

    No, we wouldn't.

    Comment

    • Ian Rastall

      #3
      Re: Char encoding and subsets

      Thanks, man. I knew what to expect, so I got a smile out of it. But
      that link will come in very handy, and I did get my questions
      answered.

      Ian
      --



      Comment

      Working...