Skip to content

Commit 1e599b3

Browse files
committed
numpy
1 parent 7df3a5a commit 1e599b3

17 files changed

Lines changed: 779 additions & 19 deletions

Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
name = "orjson"
33
version = "2.3.0"
44
authors = ["ijl <ijl@mailbox.org>"]
5-
description = "Fast, correct Python JSON library supporting dataclasses and datetimes"
5+
description = "Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy"
66
edition = "2018"
77
license = "Apache-2.0 OR MIT"
88
repository = "https://github.com/ijl/orjson"

README.md

Lines changed: 82 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,19 @@
22

33
orjson is a fast, correct JSON library for Python. It
44
[benchmarks](https://github.com/ijl/orjson#performance) as the fastest Python
5-
library for JSON and is more correct than the standard json library or
5+
library for JSON and is more correct than the standard json library or other
66
third-party libraries. It serializes
7-
[dataclass](https://github.com/ijl/orjson#dataclass) and
8-
[datetime](https://github.com/ijl/orjson#datetime) instances.
7+
[dataclass](https://github.com/ijl/orjson#dataclass),
8+
[datetime](https://github.com/ijl/orjson#datetime),
9+
[numpy](https://github.com/ijl/orjson#numpy), and
10+
[UUID](https://github.com/ijl/orjson#UUID) instances natively.
911

1012
Its features and drawbacks compared to other Python JSON libraries:
1113

1214
* serializes `dataclass` instances 40-50x as fast as other libraries
1315
* serializes `datetime`, `date`, and `time` instances to RFC 3339 format,
1416
e.g., "1970-01-01T00:00:00+00:00"
17+
* serializes `numpy.ndarray` instances 3-10x faster than other libraries
1518
* serializes to `bytes` rather than `str`, i.e., is not a drop-in replacement
1619
* serializes `str` without escaping unicode to ASCII, e.g., "好" rather than
1720
"\\\u597d"
@@ -49,8 +52,9 @@ available in the repository.
4952
2. [datetime](https://github.com/ijl/orjson#datetime)
5053
3. [float](https://github.com/ijl/orjson#float)
5154
4. [int](https://github.com/ijl/orjson#int)
52-
5. [str](https://github.com/ijl/orjson#str)
53-
6. [UUID](https://github.com/ijl/orjson#UUID)
55+
5. [numpy](https://github.com/ijl/orjson#numpy)
56+
6. [str](https://github.com/ijl/orjson#str)
57+
7. [UUID](https://github.com/ijl/orjson#UUID)
5458
3. [Testing](https://github.com/ijl/orjson#testing)
5559
4. [Performance](https://github.com/ijl/orjson#performance)
5660
1. [Latency](https://github.com/ijl/orjson#latency)
@@ -213,6 +217,11 @@ b'"1970-01-01T00:00:00"'
213217
Serialize `dataclasses.dataclass` instances. For more, see
214218
[dataclass](https://github.com/ijl/orjson#dataclass).
215219

220+
##### OPT_SERIALIZE_NUMPY
221+
222+
Serialize `numpy.ndarray` instances. For more, see
223+
[numpy](https://github.com/ijl/orjson#numpy).
224+
216225
##### OPT_SERIALIZE_UUID
217226

218227
Serialize `uuid.UUID` instances. For more, see
@@ -415,10 +424,10 @@ before calling `dumps()`. If using an unsupported type such as
415424

416425
### float
417426

418-
orjson serializes and deserializes floats with no loss of precision and
419-
consistent rounding. The same behavior is observed in rapidjson, simplejson,
420-
and json. ujson is inaccurate in both serialization and deserialization,
421-
i.e., it modifies the data.
427+
orjson serializes and deserializes double precision floats with no loss of
428+
precision and consistent rounding. The same behavior is observed in rapidjson,
429+
simplejson, and json. ujson is inaccurate in both serialization and
430+
deserialization, i.e., it modifies the data.
422431

423432
`orjson.dumps()` serializes Nan, Infinity, and -Infinity, which are not
424433
compliant JSON, as `null`:
@@ -454,6 +463,70 @@ JSONEncodeError: Integer exceeds 53-bit range
454463
JSONEncodeError: Integer exceeds 53-bit range
455464
```
456465

466+
### numpy
467+
468+
orjson natively serializes `numpy.ndarray` instances. Arrays may have a
469+
`dtype` of `numpy.int32`, `numpy.int64`, `numpy.float32`, `numpy.float64`,
470+
or `numpy.bool`. orjson is faster than all compared libraries at serializing
471+
numpy instances.
472+
473+
Serializing numpy data requires specifying
474+
`option=orjson.OPT_SERIALIZE_NUMPY`.
475+
476+
```python
477+
>>> import orjson, numpy
478+
>>> orjson.dumps(
479+
numpy.array([[1, 2, 3], [4, 5, 6]]),
480+
option=orjson.OPT_SERIALIZE_NUMPY,
481+
)
482+
b'[[1,2,3],[4,5,6]]'
483+
```
484+
485+
The array must be a contiguous C array (`C_CONTIGUOUS`).
486+
487+
This measures serializing 92MiB of JSON from an `numpy.ndarray` with
488+
dimensions of `(50000, 100)` and `numpy.float64` values:
489+
490+
| Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
491+
|------------|----------------|------------------|--------------|
492+
| orjson | 286 | 182 | 1 |
493+
| nujson | | | |
494+
| rapidjson | 3,582 | 270 | 12 |
495+
| simplejson | 3,494 | 259 | 12 |
496+
| json | 3,476 | 260 | 12 |
497+
498+
This measures serializing 100MiB of JSON from an `numpy.ndarray` with
499+
dimensions of `(100000, 100)` and `numpy.int32` values:
500+
501+
| Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
502+
|------------|----------------|------------------|--------------|
503+
| orjson | 225 | 198 | 1 |
504+
| nujson | 2,240 | 246 | 9 |
505+
| rapidjson | 2,235 | 462 | 9 |
506+
| simplejson | 1,686 | 430 | 7 |
507+
| json | 1,626 | 430 | 7 |
508+
509+
This measures serializing 53MiB of JSON from an `numpy.ndarray` with
510+
dimensions of `(100000, 100)` and `numpy.bool` values:
511+
512+
| Library | Latency (ms) | RSS diff (MiB) | vs. orjson |
513+
|------------|----------------|------------------|--------------|
514+
| orjson | 121 | 53 | 1 |
515+
| nujson | 5,958 | 43 | 49 |
516+
| rapidjson | 482 | 101 | 3 |
517+
| simplejson | 671 | 126 | 5 |
518+
| json | 609 | 127 | 5 |
519+
520+
In these benchmarks, nujson is used instead of ujson, orjson and nujson
521+
serialize natively, and the other libraries use `ndarray.tolist()`. `nujson`
522+
is blank when it did not roundtrip the data accurately. The RSS
523+
column measures peak memory usage during serialization. The odd
524+
bool result for nujson is consistent.
525+
526+
orjson does not have an installation or compilation dependency on numpy. The
527+
implementation is independent, reading `numpy.ndarray` using
528+
`PyArrayInterface`.
529+
457530
### str
458531

459532
orjson is strict about UTF-8 conformance. This is stricter than the standard

bench/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
matplotlib
2+
memory-profiler
3+
nujson
24
pytest-benchmark
35
python-rapidjson
46
simplejson

ci/azure-linux-container.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ steps:
1919
displayName: install
2020
- bash: PATH=$(path) pytest -s -rxX -v test
2121
displayName: pytest
22+
- bash: pip uninstall -y numpy
23+
displayName: remove optional packages
24+
- bash: pytest -s -rxX -v test
25+
displayName: pytest without optional packages
2226
- bash: PATH=$(path) ./integration/run thread
2327
displayName: thread
2428
- bash: PATH=$(path) ./integration/run http

ci/azure-posix.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ steps:
1818
displayName: install
1919
- bash: pytest -s -rxX -v test
2020
displayName: pytest
21+
- bash: pip uninstall -y numpy
22+
displayName: remove optional packages
23+
- bash: pytest -s -rxX -v test
24+
displayName: pytest without optional packages
2125
- bash: ./integration/run thread
2226
displayName: thread
2327
- bash: ./integration/run http

ci/azure-win.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ steps:
2222
displayName: install
2323
- script: python.exe -m pytest -s -rxX -v test
2424
displayName: pytest
25+
- script: python.exe -m pip uninstall -y numpy
26+
displayName: remove optional packages
27+
- script: python.exe -m pytest -s -rxX -v test
28+
displayName: pytest without optional packages
2529
- script: python.exe integration\thread
2630
displayName: thread
2731
- bash: ./deploy /d/a/1/s/target/wheels/*.whl

lint

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/bash -e
22

33
autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports .
4-
isort ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort
5-
black ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort
4+
isort ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort pynumpy
5+
black ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort pynumpy
66
mypy --ignore-missing-imports ./bench/*.py ./orjson.pyi ./test/*.py

pynumpy

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
#!/usr/bin/env python3
2+
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
3+
4+
import gc
5+
import io
6+
import json
7+
import os
8+
import sys
9+
import time
10+
from timeit import timeit
11+
12+
import nujson
13+
import numpy
14+
import orjson
15+
import psutil
16+
import rapidjson
17+
import simplejson
18+
from memory_profiler import memory_usage
19+
from tabulate import tabulate
20+
21+
os.sched_setaffinity(os.getpid(), {0, 1})
22+
23+
24+
kind = sys.argv[1] if len(sys.argv) >= 1 else ""
25+
26+
if kind == "int32":
27+
array = numpy.random.randint(((2 ** 31) - 1), size=(100000, 100), dtype=numpy.int32)
28+
elif kind == "float64":
29+
array = numpy.random.random(size=(50000, 100))
30+
assert array.dtype == numpy.float64
31+
elif kind == "bool":
32+
array = numpy.random.choice((True, False), size=(100000, 100))
33+
else:
34+
print("usage: pynumpy (bool|int32|float64)")
35+
sys.exit(1)
36+
37+
output_in_mib = len(orjson.dumps(array.tolist())) / 1024 / 1024
38+
39+
print(f"{output_in_mib:,.1f}MiB {kind} output (orjson)")
40+
41+
proc = psutil.Process()
42+
43+
44+
def default(__obj):
45+
if isinstance(__obj, numpy.ndarray):
46+
return __obj.tolist()
47+
48+
49+
headers = ("Library", "Latency (ms)", "RSS diff (MiB)", "vs. orjson")
50+
51+
LIBRARIES = ("orjson", "nujson", "rapidjson", "simplejson", "json")
52+
53+
ITERATIONS = 10
54+
55+
orjson_dumps = lambda: orjson.dumps(array, option=orjson.OPT_SERIALIZE_NUMPY)
56+
nujson_dumps = lambda: nujson.dumps(array).encode("utf-8")
57+
rapidjson_dumps = lambda: rapidjson.dumps(array, default=default).encode("utf-8")
58+
simplejson_dumps = lambda: simplejson.dumps(array, default=default).encode("utf-8")
59+
json_dumps = lambda: json.dumps(array, default=default).encode("utf-8")
60+
61+
gc.collect()
62+
mem_before = proc.memory_full_info().rss / 1024 / 1024
63+
64+
65+
def per_iter_latency(val):
66+
if val is None:
67+
return None
68+
return (val * 1000) / ITERATIONS
69+
70+
71+
def test_correctness(func):
72+
return orjson.loads(func()) == array.tolist()
73+
74+
75+
table = []
76+
for lib_name in LIBRARIES:
77+
gc.collect()
78+
79+
print(f"{lib_name}...")
80+
func = locals()[f"{lib_name}_dumps"]
81+
total_latency = timeit(func, number=ITERATIONS,)
82+
latency = per_iter_latency(total_latency)
83+
time.sleep(1)
84+
mem = max(memory_usage((func,), interval=0.001, timeout=latency * 2))
85+
correct = test_correctness(func)
86+
87+
if lib_name == "orjson":
88+
compared_to_orjson = 1
89+
orjson_latency = latency
90+
elif latency:
91+
compared_to_orjson = int(latency / orjson_latency)
92+
else:
93+
compared_to_orjson = None
94+
95+
if not correct:
96+
latency = None
97+
mem = 0
98+
99+
mem_diff = mem - mem_before
100+
101+
table.append(
102+
(
103+
lib_name,
104+
f"{latency:,.0f}" if latency else "",
105+
f"{mem_diff:,.0f}" if mem else "",
106+
f"{compared_to_orjson}" if (latency and compared_to_orjson) else "",
107+
)
108+
)
109+
110+
buf = io.StringIO()
111+
buf.write(tabulate(table, headers, tablefmt="grid") + "\n")
112+
113+
print(
114+
buf.getvalue()
115+
.replace("-", "")
116+
.replace("*", "-")
117+
.replace("=", "-")
118+
.replace("+", "|")
119+
.replace("|||||", "")
120+
.replace("\n\n", "\n")
121+
)

0 commit comments

Comments
 (0)