@@ -45,6 +45,8 @@ support subclasses.
4545It raises ` TypeError ` on an unsupported type. This exception message
4646describes the invalid object.
4747
48+ It raises ` TypeError ` on a ` str ` that contains invalid UTF-8.
49+
4850It raises ` TypeError ` on an integer that exceeds 64 bits. This is the same
4951as the standard library's ` json ` module.
5052
@@ -100,6 +102,36 @@ b'{"bool":true,"\xf0\x9f\x90\x88":"\xe5\x93\x88\xe5\x93\x88","int":9223372036854
100102' {"bool": true, "\\ ud83d\\ udc08": "\\ u54c8\\ u54c8", "int": 9223372036854775807, "float": 1.337e+40}'
101103```
102104
105+ ### UTF-8
106+
107+ orjson raises an exception on invalid UTF-8. This is
108+ necessary because Python 3 str objects may contain UTF-16 surrogates. The
109+ standard library's json module accepts invalid UTF-8.
110+
111+ ``` python
112+ >> > import orjson, ujson, rapidjson, json
113+ >> > orjson.dumps(' \ud800 ' )
114+ TypeError : str is not valid UTF - 8 : surrogates not allowed
115+ >> > ujson.dumps(' \ud800 ' )
116+ UnicodeEncodeError : ' utf-8' codec ...
117+ >> > rapidjson.dumps(' \ud800 ' )
118+ UnicodeEncodeError : ' utf-8' codec ...
119+ >> > json.dumps(' \ud800 ' )
120+ ' "\\ ud800"'
121+ ```
122+
123+ ``` python
124+ >> > import orjson, ujson, rapidjson, json
125+ >> > orjson.loads(' "\\ ud800"' )
126+ JSONDecodeError: unexpected end of hex escape at line 1 column 8 : line 1 column 1 (char 0 )
127+ >> > ujson.loads(' "\\ ud800"' )
128+ ' '
129+ >> > rapidjson.loads(' "\\ ud800"' )
130+ ValueError : Parse error at offset 1 : The surrogate pair in string is invalid.
131+ >> > json.loads(' "\\ ud800"' )
132+ ' \ud800 '
133+ ```
134+
103135## Testing
104136
105137The library has comprehensive tests. There are unit tests against the
@@ -108,7 +140,8 @@ roundtrip, jsonchecker, and fixtures files of the
108140repository. It is tested to not crash against the
109141[ Big List of Naughty Strings] ( https://github.com/minimaxir/big-list-of-naughty-strings ) .
110142It is tested to not leak memory. It is tested to be correct against
111- input from the PyJFuzz JSON fuzzer. There are integration tests
143+ input from the PyJFuzz JSON fuzzer. It is tested to not crash
144+ against and not accept invalid UTF-8. There are integration tests
112145exercising the library's use in web servers (uwsgi and gunicorn,
113146using multiprocess/forked workers) and when
114147multithreaded. It also uses some tests from the ultrajson library.
0 commit comments