|
2 | 2 |
|
3 | 3 | orjson is a fast, correct JSON library for Python. It |
4 | 4 | [benchmarks](https://github.com/ijl/orjson#performance) as the fastest Python |
5 | | -library for JSON and is more correct than the standard json library or |
| 5 | +library for JSON and is more correct than the standard json library or other |
6 | 6 | third-party libraries. It serializes |
7 | | -[dataclass](https://github.com/ijl/orjson#dataclass) and |
8 | | -[datetime](https://github.com/ijl/orjson#datetime) instances. |
| 7 | +[dataclass](https://github.com/ijl/orjson#dataclass), |
| 8 | +[datetime](https://github.com/ijl/orjson#datetime), |
| 9 | +[numpy](https://github.com/ijl/orjson#numpy), and |
| 10 | +[UUID](https://github.com/ijl/orjson#UUID) instances natively. |
9 | 11 |
|
10 | 12 | Its features and drawbacks compared to other Python JSON libraries: |
11 | 13 |
|
12 | 14 | * serializes `dataclass` instances 40-50x as fast as other libraries |
13 | 15 | * serializes `datetime`, `date`, and `time` instances to RFC 3339 format, |
14 | 16 | e.g., "1970-01-01T00:00:00+00:00" |
| 17 | +* serializes `numpy.ndarray` instances 3-10x faster than other libraries |
15 | 18 | * serializes to `bytes` rather than `str`, i.e., is not a drop-in replacement |
16 | 19 | * serializes `str` without escaping unicode to ASCII, e.g., "好" rather than |
17 | 20 | "\\\u597d" |
@@ -49,8 +52,9 @@ available in the repository. |
49 | 52 | 2. [datetime](https://github.com/ijl/orjson#datetime) |
50 | 53 | 3. [float](https://github.com/ijl/orjson#float) |
51 | 54 | 4. [int](https://github.com/ijl/orjson#int) |
52 | | - 5. [str](https://github.com/ijl/orjson#str) |
53 | | - 6. [UUID](https://github.com/ijl/orjson#UUID) |
| 55 | + 5. [numpy](https://github.com/ijl/orjson#numpy) |
| 56 | + 6. [str](https://github.com/ijl/orjson#str) |
| 57 | + 7. [UUID](https://github.com/ijl/orjson#UUID) |
54 | 58 | 3. [Testing](https://github.com/ijl/orjson#testing) |
55 | 59 | 4. [Performance](https://github.com/ijl/orjson#performance) |
56 | 60 | 1. [Latency](https://github.com/ijl/orjson#latency) |
@@ -213,6 +217,11 @@ b'"1970-01-01T00:00:00"' |
213 | 217 | Serialize `dataclasses.dataclass` instances. For more, see |
214 | 218 | [dataclass](https://github.com/ijl/orjson#dataclass). |
215 | 219 |
|
| 220 | +##### OPT_SERIALIZE_NUMPY |
| 221 | + |
| 222 | +Serialize `numpy.ndarray` instances. For more, see |
| 223 | +[numpy](https://github.com/ijl/orjson#numpy). |
| 224 | + |
216 | 225 | ##### OPT_SERIALIZE_UUID |
217 | 226 |
|
218 | 227 | Serialize `uuid.UUID` instances. For more, see |
@@ -415,10 +424,10 @@ before calling `dumps()`. If using an unsupported type such as |
415 | 424 |
|
416 | 425 | ### float |
417 | 426 |
|
418 | | -orjson serializes and deserializes floats with no loss of precision and |
419 | | -consistent rounding. The same behavior is observed in rapidjson, simplejson, |
420 | | -and json. ujson is inaccurate in both serialization and deserialization, |
421 | | -i.e., it modifies the data. |
| 427 | +orjson serializes and deserializes double precision floats with no loss of |
| 428 | +precision and consistent rounding. The same behavior is observed in rapidjson, |
| 429 | +simplejson, and json. ujson is inaccurate in both serialization and |
| 430 | +deserialization, i.e., it modifies the data. |
422 | 431 |
|
423 | 432 | `orjson.dumps()` serializes Nan, Infinity, and -Infinity, which are not |
424 | 433 | compliant JSON, as `null`: |
@@ -454,6 +463,70 @@ JSONEncodeError: Integer exceeds 53-bit range |
454 | 463 | JSONEncodeError: Integer exceeds 53-bit range |
455 | 464 | ``` |
456 | 465 |
|
| 466 | +### numpy |
| 467 | + |
| 468 | +orjson natively serializes `numpy.ndarray` instances. Arrays may have a |
| 469 | +`dtype` of `numpy.int32`, `numpy.int64`, `numpy.float32`, `numpy.float64`, |
| 470 | +or `numpy.bool`. orjson is faster than all compared libraries at serializing |
| 471 | +numpy instances. |
| 472 | + |
| 473 | +Serializing numpy data requires specifying |
| 474 | +`option=orjson.OPT_SERIALIZE_NUMPY`. |
| 475 | + |
| 476 | +```python |
| 477 | +>>> import orjson, numpy |
| 478 | +>>> orjson.dumps( |
| 479 | + numpy.array([[1, 2, 3], [4, 5, 6]]), |
| 480 | + option=orjson.OPT_SERIALIZE_NUMPY, |
| 481 | +) |
| 482 | +b'[[1,2,3],[4,5,6]]' |
| 483 | +``` |
| 484 | + |
| 485 | +The array must be a contiguous C array (`C_CONTIGUOUS`). |
| 486 | + |
| 487 | +This measures serializing 92MiB of JSON from an `numpy.ndarray` with |
| 488 | +dimensions of `(50000, 100)` and `numpy.float64` values: |
| 489 | + |
| 490 | +| Library | Latency (ms) | RSS diff (MiB) | vs. orjson | |
| 491 | +|------------|----------------|------------------|--------------| |
| 492 | +| orjson | 286 | 182 | 1 | |
| 493 | +| nujson | | | | |
| 494 | +| rapidjson | 3,582 | 270 | 12 | |
| 495 | +| simplejson | 3,494 | 259 | 12 | |
| 496 | +| json | 3,476 | 260 | 12 | |
| 497 | + |
| 498 | +This measures serializing 100MiB of JSON from an `numpy.ndarray` with |
| 499 | +dimensions of `(100000, 100)` and `numpy.int32` values: |
| 500 | + |
| 501 | +| Library | Latency (ms) | RSS diff (MiB) | vs. orjson | |
| 502 | +|------------|----------------|------------------|--------------| |
| 503 | +| orjson | 225 | 198 | 1 | |
| 504 | +| nujson | 2,240 | 246 | 9 | |
| 505 | +| rapidjson | 2,235 | 462 | 9 | |
| 506 | +| simplejson | 1,686 | 430 | 7 | |
| 507 | +| json | 1,626 | 430 | 7 | |
| 508 | + |
| 509 | +This measures serializing 53MiB of JSON from an `numpy.ndarray` with |
| 510 | +dimensions of `(100000, 100)` and `numpy.bool` values: |
| 511 | + |
| 512 | +| Library | Latency (ms) | RSS diff (MiB) | vs. orjson | |
| 513 | +|------------|----------------|------------------|--------------| |
| 514 | +| orjson | 121 | 53 | 1 | |
| 515 | +| nujson | 5,958 | 43 | 49 | |
| 516 | +| rapidjson | 482 | 101 | 3 | |
| 517 | +| simplejson | 671 | 126 | 5 | |
| 518 | +| json | 609 | 127 | 5 | |
| 519 | + |
| 520 | +In these benchmarks, nujson is used instead of ujson, orjson and nujson |
| 521 | +serialize natively, and the other libraries use `ndarray.tolist()`. `nujson` |
| 522 | +is blank when it did not roundtrip the data accurately. The RSS |
| 523 | +column measures peak memory usage during serialization. The odd |
| 524 | +bool result for nujson is consistent. |
| 525 | + |
| 526 | +orjson does not have an installation or compilation dependency on numpy. The |
| 527 | +implementation is independent, reading `numpy.ndarray` using |
| 528 | +`PyArrayInterface`. |
| 529 | + |
457 | 530 | ### str |
458 | 531 |
|
459 | 532 | orjson is strict about UTF-8 conformance. This is stricter than the standard |
|
0 commit comments