ITCSsDeveloper
diff --git a/‎Cargo.lock‎
Lines changed: 2 additions & 2 deletions b/‎Cargo.lock‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 82 additions & 9 deletions b/‎README.md‎
Lines changed: 82 additions & 9 deletions
diff --git a/‎bench/requirements.txt‎
Lines changed: 2 additions & 0 deletions b/‎bench/requirements.txt‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎ci/azure-linux-container.yml‎
Lines changed: 4 additions & 0 deletions b/‎ci/azure-linux-container.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎ci/azure-posix.yml‎
Lines changed: 4 additions & 0 deletions b/‎ci/azure-posix.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎ci/azure-win.yml‎
Lines changed: 4 additions & 0 deletions b/‎ci/azure-win.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎lint‎
Lines changed: 2 additions & 2 deletions b/‎lint‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pynumpy‎
Lines changed: 121 additions & 0 deletions b/‎pynumpy‎
Lines changed: 121 additions & 0 deletions
@@ -2,7 +2,7 @@
 name = "orjson"
 version = "2.3.0"
 authors = ["ijl <ijl@mailbox.org>"]
-description = "Fast, correct Python JSON library supporting dataclasses and datetimes"
+description = "Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy"
 edition = "2018"
 license = "Apache-2.0 OR MIT"
 repository = "https://github.com/ijl/orjson"
 
@@ -2,16 +2,19 @@
 
 orjson is a fast, correct JSON library for Python. It
 [benchmarks](https://github.com/ijl/orjson#performance) as the fastest Python
-library for JSON and is more correct than the standard json library or
+library for JSON and is more correct than the standard json library or other
 third-party libraries. It serializes
-[dataclass](https://github.com/ijl/orjson#dataclass) and
-[datetime](https://github.com/ijl/orjson#datetime) instances.
+[dataclass](https://github.com/ijl/orjson#dataclass),
+[datetime](https://github.com/ijl/orjson#datetime),
+[numpy](https://github.com/ijl/orjson#numpy), and
+[UUID](https://github.com/ijl/orjson#UUID) instances natively.
 
 Its features and drawbacks compared to other Python JSON libraries:
 
 * serializes `dataclass` instances 40-50x as fast as other libraries
 * serializes `datetime`, `date`, and `time` instances to RFC 3339 format,
 e.g., "1970-01-01T00:00:00+00:00"
+* serializes `numpy.ndarray` instances 3-10x faster than other libraries
 * serializes to `bytes` rather than `str`, i.e., is not a drop-in replacement
 * serializes `str` without escaping unicode to ASCII, e.g., "好" rather than
 "\\\u597d"
@@ -49,8 +52,9 @@ available in the repository.
     2. [datetime](https://github.com/ijl/orjson#datetime)
     3. [float](https://github.com/ijl/orjson#float)
     4. [int](https://github.com/ijl/orjson#int)
-    5. [str](https://github.com/ijl/orjson#str)
-    6. [UUID](https://github.com/ijl/orjson#UUID)
+    5. [numpy](https://github.com/ijl/orjson#numpy)
+    6. [str](https://github.com/ijl/orjson#str)
+    7. [UUID](https://github.com/ijl/orjson#UUID)
 3. [Testing](https://github.com/ijl/orjson#testing)
 4. [Performance](https://github.com/ijl/orjson#performance)
     1. [Latency](https://github.com/ijl/orjson#latency)
@@ -213,6 +217,11 @@ b'"1970-01-01T00:00:00"'
 Serialize `dataclasses.dataclass` instances. For more, see
 [dataclass](https://github.com/ijl/orjson#dataclass).
 
+##### OPT_SERIALIZE_NUMPY
+
+Serialize `numpy.ndarray` instances. For more, see
+[numpy](https://github.com/ijl/orjson#numpy).
+
 ##### OPT_SERIALIZE_UUID
 
 Serialize `uuid.UUID` instances. For more, see
@@ -415,10 +424,10 @@ before calling `dumps()`. If using an unsupported type such as
 
 ### float
 
-orjson serializes and deserializes floats with no loss of precision and
-consistent rounding. The same behavior is observed in rapidjson, simplejson,
-and json. ujson is inaccurate in both serialization and deserialization,
-i.e., it modifies the data.
+orjson serializes and deserializes double precision floats with no loss of
+precision and consistent rounding. The same behavior is observed in rapidjson,
+simplejson, and json. ujson is inaccurate in both serialization and
+deserialization, i.e., it modifies the data.
 
 `orjson.dumps()` serializes Nan, Infinity, and -Infinity, which are not
 compliant JSON, as `null`:
@@ -454,6 +463,70 @@ JSONEncodeError: Integer exceeds 53-bit range
 JSONEncodeError: Integer exceeds 53-bit range
 ```
 
+### numpy
+
+orjson natively serializes `numpy.ndarray` instances. Arrays may have a
+`dtype` of `numpy.int32`, `numpy.int64`, `numpy.float32`, `numpy.float64`,
+or `numpy.bool`. orjson is faster than all compared libraries at serializing
+numpy instances.
+
+Serializing numpy data requires specifying
+`option=orjson.OPT_SERIALIZE_NUMPY`.
+
+```python
+>>> import orjson, numpy
+>>> orjson.dumps(
+        numpy.array([[1, 2, 3], [4, 5, 6]]),
+        option=orjson.OPT_SERIALIZE_NUMPY,
+)
+b'[[1,2,3],[4,5,6]]'
+```
+
+The array must be a contiguous C array (`C_CONTIGUOUS`).
+
+This measures serializing 92MiB of JSON from an `numpy.ndarray` with
+dimensions of `(50000, 100)` and `numpy.float64` values:
+
+| Library    | Latency (ms)   | RSS diff (MiB)   | vs. orjson   |
+|------------|----------------|------------------|--------------|
+| orjson     | 286            | 182              | 1            |
+| nujson     |                |                  |              |
+| rapidjson  | 3,582          | 270              | 12           |
+| simplejson | 3,494          | 259              | 12           |
+| json       | 3,476          | 260              | 12           |
+
+This measures serializing 100MiB of JSON from an `numpy.ndarray` with
+dimensions of `(100000, 100)` and `numpy.int32` values:
+
+| Library    | Latency (ms)   |   RSS diff (MiB) |   vs. orjson |
+|------------|----------------|------------------|--------------|
+| orjson     | 225            |              198 |            1 |
+| nujson     | 2,240          |              246 |            9 |
+| rapidjson  | 2,235          |              462 |            9 |
+| simplejson | 1,686          |              430 |            7 |
+| json       | 1,626          |              430 |            7 |
+
+This measures serializing 53MiB of JSON from an `numpy.ndarray` with
+dimensions of `(100000, 100)` and `numpy.bool` values:
+
+| Library    | Latency (ms)   |   RSS diff (MiB) |   vs. orjson |
+|------------|----------------|------------------|--------------|
+| orjson     | 121            |               53 |            1 |
+| nujson     | 5,958          |               43 |           49 |
+| rapidjson  | 482            |              101 |            3 |
+| simplejson | 671            |              126 |            5 |
+| json       | 609            |              127 |            5 |
+
+In these benchmarks, nujson is used instead of ujson, orjson and nujson
+serialize natively, and the other libraries use `ndarray.tolist()`. `nujson`
+is blank when it did not roundtrip the data accurately. The RSS
+column measures peak memory usage during serialization. The odd
+bool result for nujson is consistent.
+
+orjson does not have an installation or compilation dependency on numpy. The
+implementation is independent, reading `numpy.ndarray` using
+`PyArrayInterface`.
+
 ### str
 
 orjson is strict about UTF-8 conformance. This is stricter than the standard
 
@@ -1,4 +1,6 @@
 matplotlib
+memory-profiler
+nujson
 pytest-benchmark
 python-rapidjson
 simplejson
 
@@ -19,6 +19,10 @@ steps:
   displayName: install
 - bash: PATH=$(path) pytest -s -rxX -v test
   displayName: pytest
+- bash: pip uninstall -y numpy
+  displayName: remove optional packages
+- bash: pytest -s -rxX -v test
+  displayName: pytest without optional packages
 - bash: PATH=$(path) ./integration/run thread
   displayName: thread
 - bash: PATH=$(path) ./integration/run http
 
@@ -18,6 +18,10 @@ steps:
   displayName: install
 - bash: pytest -s -rxX -v test
   displayName: pytest
+- bash: pip uninstall -y numpy
+  displayName: remove optional packages
+- bash: pytest -s -rxX -v test
+  displayName: pytest without optional packages
 - bash: ./integration/run thread
   displayName: thread
 - bash: ./integration/run http
 
@@ -22,6 +22,10 @@ steps:
   displayName: install
 - script: python.exe -m pytest -s -rxX -v test
   displayName: pytest
+- script: python.exe -m pip uninstall -y numpy
+  displayName: remove optional packages
+- script: python.exe -m pytest -s -rxX -v test
+  displayName: pytest without optional packages
 - script: python.exe integration\thread
   displayName: thread
 - bash: ./deploy /d/a/1/s/target/wheels/*.whl
 
@@ -1,6 +1,6 @@
 #!/usr/bin/bash -e
 
 autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports .
-isort ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort
-black ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort
+isort ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort pynumpy
+black ./bench/*.py ./orjson.pyi ./test/*.py pydataclass pymem pysort pynumpy
 mypy --ignore-missing-imports ./bench/*.py ./orjson.pyi ./test/*.py
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: (Apache-2.0 OR MIT)
+
+import gc
+import io
+import json
+import os
+import sys
+import time
+from timeit import timeit
+
+import nujson
+import numpy
+import orjson
+import psutil
+import rapidjson
+import simplejson
+from memory_profiler import memory_usage
+from tabulate import tabulate
+
+os.sched_setaffinity(os.getpid(), {0, 1})
+
+
+kind = sys.argv[1] if len(sys.argv) >= 1 else ""
+
+if kind == "int32":
+    array = numpy.random.randint(((2 ** 31) - 1), size=(100000, 100), dtype=numpy.int32)
+elif kind == "float64":
+    array = numpy.random.random(size=(50000, 100))
+    assert array.dtype == numpy.float64
+elif kind == "bool":
+    array = numpy.random.choice((True, False), size=(100000, 100))
+else:
+    print("usage: pynumpy (bool|int32|float64)")
+    sys.exit(1)
+
+output_in_mib = len(orjson.dumps(array.tolist())) / 1024 / 1024
+
+print(f"{output_in_mib:,.1f}MiB {kind} output (orjson)")
+
+proc = psutil.Process()
+
+
+def default(__obj):
+    if isinstance(__obj, numpy.ndarray):
+        return __obj.tolist()
+
+
+headers = ("Library", "Latency (ms)", "RSS diff (MiB)", "vs. orjson")
+
+LIBRARIES = ("orjson", "nujson", "rapidjson", "simplejson", "json")
+
+ITERATIONS = 10
+
+orjson_dumps = lambda: orjson.dumps(array, option=orjson.OPT_SERIALIZE_NUMPY)
+nujson_dumps = lambda: nujson.dumps(array).encode("utf-8")
+rapidjson_dumps = lambda: rapidjson.dumps(array, default=default).encode("utf-8")
+simplejson_dumps = lambda: simplejson.dumps(array, default=default).encode("utf-8")
+json_dumps = lambda: json.dumps(array, default=default).encode("utf-8")
+
+gc.collect()
+mem_before = proc.memory_full_info().rss / 1024 / 1024
+
+
+def per_iter_latency(val):
+    if val is None:
+        return None
+    return (val * 1000) / ITERATIONS
+
+
+def test_correctness(func):
+    return orjson.loads(func()) == array.tolist()
+
+
+table = []
+for lib_name in LIBRARIES:
+    gc.collect()
+
+    print(f"{lib_name}...")
+    func = locals()[f"{lib_name}_dumps"]
+    total_latency = timeit(func, number=ITERATIONS,)
+    latency = per_iter_latency(total_latency)
+    time.sleep(1)
+    mem = max(memory_usage((func,), interval=0.001, timeout=latency * 2))
+    correct = test_correctness(func)
+
+    if lib_name == "orjson":
+        compared_to_orjson = 1
+        orjson_latency = latency
+    elif latency:
+        compared_to_orjson = int(latency / orjson_latency)
+    else:
+        compared_to_orjson = None
+
+    if not correct:
+        latency = None
+        mem = 0
+
+    mem_diff = mem - mem_before
+
+    table.append(
+        (
+            lib_name,
+            f"{latency:,.0f}" if latency else "",
+            f"{mem_diff:,.0f}" if mem else "",
+            f"{compared_to_orjson}" if (latency and compared_to_orjson) else "",
+        )
+    )
+
+buf = io.StringIO()
+buf.write(tabulate(table, headers, tablefmt="grid") + "\n")
+
+print(
+    buf.getvalue()
+    .replace("-", "")
+    .replace("*", "-")
+    .replace("=", "-")
+    .replace("+", "|")
+    .replace("|||||", "")
+    .replace("\n\n", "\n")
+)