From f07b032c0a023936f3a0cbb451e13456e859ae92 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Tue, 10 Jun 2025 15:03:54 -0700
Subject: [PATCH 01/20] Documenting plan for unified storage stack

---
 docs/future-tasks/unified-storage-stack.md | 107 +++++++++++++++++++++
 1 file changed, 107 insertions(+)
 create mode 100644 docs/future-tasks/unified-storage-stack.md

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
new file mode 100644
index 000000000..d7e3be8e3
--- /dev/null
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -0,0 +1,107 @@
+# Unifying the storage stack
+
+## Current state
+
+Right now we have a few overlapping and uncoordinated layers of storage related
+functionality that needs reconciling. Most importantly we see data loss since
+the transaction boundaries don't line up, but it's also a lot of code that can
+be simplified away.
+
+Specifically we have:
+
+- I/O over iframe boundaries, typically with the iframes running React, which in
+  turn assumes synchronous state. So data can roundtrip through iframe/React and
+  overwrite newer data that came in in the meantime.
+- Scheduler executing event handlers and reactive functions, which would form a
+  natural transaction boundary -- especially for event handlers to re-run on
+  newer data to rebase changes -- but those boundaries don't mean anything to
+  the rest of the stack. The only thing is that we make sure data isn't changed
+  while a handler/lifted function is running (await idle in storage.ts).
+- DocImpl that represent the data exposed to user code, typically via Cell.
+  Changes to it are directly applied to the data, and listeners are notified.
+  The only two listeners are the scheduler, which uses this to mark data dirty
+  and schedule the respective reactive functions and storage.ts, which adds
+  those documents to the next batch to be processed.
+- storage.ts which connects the lower storage layer with DocImpl. It wants to
+  make sure that the upper layers see a consistent view, so when a new document
+  contains a link to another document, it'll fetch that before updating the doc,
+  recursively. This also means that changes from the upper layer can accumulate,
+  and then altogether become one transaction. If there is one conflict anywhere,
+  the entire transaction is rejected. And while the actual conflict source gets
+  eventually updated (since the server will send these, and document that is
+  read is also being subscribed to) the other documents that were locally
+  changed are not reverted. The clients get out of sync.
+  - We now also have schema queries, which will immediately fetch all documents
+    that are needed per a given schema, and will keep those up to date, even if
+    links change. That could already replace a lot of the logic above, but we
+    haven't turned that off. It also currently doesn't use the cache.
+- storage/cache.ts, the memory layer, which operates at the unit of documents,
+  supports CAS semantics for transactions. It's used by storage.ts only and
+  while it has stronger guarantees, those either don't apply or sometimes
+  backfire because they are not aligned with the top. It has a cache but we
+  underuse it.
+
+## Desired state
+
+- Iframe transport layer sends incrementing versions and ignores changes that
+  are based on too old changes. It's then up to the inside of the iframe to use
+  that correctly. In fact `useDoc()` where `setX()` takes a callback instead of
+  just the new value would already work much better. Probably sufficiently well
+  for most cases. But we could go even further (maybe some popular game toolkits
+  are worth investing here at some point in the future, since that's a good
+  usecase for iframes)
+- Cells –- constructed typically by the runner when bringing up a recipe, within
+  handlers or reactive functions and in some cases by parts of the shell to
+  bootstrap things -- directly read from memory via schema query. They
+  accumulate writes and wait for the scheduler to close out a transaction.
+  Interim reads see the new version.
+- Scheduler runs handlers and reactive functions and then issues a transaction
+  with pending writes directly to the underlying memory layer (we already log
+  reads and writes, so this can be an extension of that). It registers with the
+  underlying memory layer for changes on individual documents, marking the
+  corresponding reactive functions as needing to run (semantically we want to
+  subscribe to the corresponding schema queries, but at least with the current
+  queries, listening to the actually read docs is the same). For events it will
+  keep track of the transaction and if it fails, and after we're sure to have
+  caught up enough with the server to reflect the new current state, retry up to
+  N times.
+- Memory -- more or less like now, except that its lower level API is directly
+  exposed to cells, including `the` and the document ids as DIDs (so the Cell
+  will have to translate the ids an prepend `of:`)
+
+## Steps to get there
+
+- Ephemeral storage provider + Get rid of `VolatileStorageProvider` CT-420
+- Schema queries for everything + Source support CT-174 CT-428
+- Turn off "crawler" more in storage.ts
+- Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
+  `Cell` should remain)
+  - Includes changing all places that expect `{ cell: DocImpl, … }` to just use
+    the JSON representation. At the same time, let's support the new syntax for
+    links (@irakli has these in a RFC, should be extracted)
+- Capture all cell writes in a pending list (this is also needed for a future
+  recipe refactoring)
+  - For reads after writes do return pending writes, so maybe we just apply
+    writes anyway on a copy. then after flushing the pending writes (i.e. they
+    get written to the nursery), we reset that until the next get. make sure
+    this still work with `QueryResultProxy` (might have to retarget to changed
+    objects). TBD: when to make copies, and can we work directly on the copy in
+    the nursery?
+  - Note that pending writes might contain `Cell` objects. Those would be
+    converted to links in JSON
+- Directly read & write to memory layer
+  - Expose the API below current StorageProvider to `Cell`. That includes `Cell`
+    setting the to application/json, etc.
+  - Read: `Cell` bypasses DocImpl and just reads from memory
+  - Scheduler: when listening to changes on entities, directly talk to memory
+  - Writes: Commit writes after each handler or lift is run as transaction
+- Remove `storage.ts` and `DocImpl`, they are now skipped
+- For events, remember event and corresponding write transaction. Clear on
+  success and retry N times on conflict. Retry means running the event handler
+  again on the newest state (for lifted functions this happens automatically as
+  they get marked dirty)
+  - For change sets that only write (e.g. only push or set), we could just
+    reapply it without re-running. But that's a future optimization.
+  - Memory layer with pending changes after a conflicted write: rollback to heap
+    and notify that as changes where it changed things
+- Sanitize React at least a bit by implement CT-320

From e4df494a23148e8d429fe39aca7c808a8865580e Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Tue, 10 Jun 2025 16:56:04 -0700
Subject: [PATCH 02/20] updated tasks, added design notes

---
 docs/future-tasks/unified-storage-stack.md | 215 +++++++++++++++++----
 1 file changed, 175 insertions(+), 40 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index d7e3be8e3..358cd3e7e 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -25,12 +25,13 @@ Specifically we have:
 - storage.ts which connects the lower storage layer with DocImpl. It wants to
   make sure that the upper layers see a consistent view, so when a new document
   contains a link to another document, it'll fetch that before updating the doc,
-  recursively. This also means that changes from the upper layer can accumulate,
-  and then altogether become one transaction. If there is one conflict anywhere,
-  the entire transaction is rejected. And while the actual conflict source gets
-  eventually updated (since the server will send these, and document that is
-  read is also being subscribed to) the other documents that were locally
-  changed are not reverted. The clients get out of sync.
+  recursively. It also fetches all source docs. This also means that changes
+  from the upper layer can accumulate, and then altogether become one
+  transaction. If there is one conflict anywhere, the entire transaction is
+  rejected. And while the actual conflict source gets eventually updated (since
+  the server will send these, and document that is read is also being subscribed
+  to) the other documents that were locally changed are not reverted. The
+  clients get out of sync.
   - We now also have schema queries, which will immediately fetch all documents
     that are needed per a given schema, and will keep those up to date, even if
     links change. That could already replace a lot of the logic above, but we
@@ -71,37 +72,171 @@ Specifically we have:
 
 ## Steps to get there
 
-- Ephemeral storage provider + Get rid of `VolatileStorageProvider` CT-420
-- Schema queries for everything + Source support CT-174 CT-428
-- Turn off "crawler" more in storage.ts
-- Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
-  `Cell` should remain)
-  - Includes changing all places that expect `{ cell: DocImpl, … }` to just use
-    the JSON representation. At the same time, let's support the new syntax for
-    links (@irakli has these in a RFC, should be extracted)
-- Capture all cell writes in a pending list (this is also needed for a future
-  recipe refactoring)
-  - For reads after writes do return pending writes, so maybe we just apply
-    writes anyway on a copy. then after flushing the pending writes (i.e. they
-    get written to the nursery), we reset that until the next get. make sure
-    this still work with `QueryResultProxy` (might have to retarget to changed
-    objects). TBD: when to make copies, and can we work directly on the copy in
-    the nursery?
-  - Note that pending writes might contain `Cell` objects. Those would be
-    converted to links in JSON
-- Directly read & write to memory layer
-  - Expose the API below current StorageProvider to `Cell`. That includes `Cell`
-    setting the to application/json, etc.
-  - Read: `Cell` bypasses DocImpl and just reads from memory
-  - Scheduler: when listening to changes on entities, directly talk to memory
-  - Writes: Commit writes after each handler or lift is run as transaction
-- Remove `storage.ts` and `DocImpl`, they are now skipped
-- For events, remember event and corresponding write transaction. Clear on
-  success and retry N times on conflict. Retry means running the event handler
-  again on the newest state (for lifted functions this happens automatically as
-  they get marked dirty)
-  - For change sets that only write (e.g. only push or set), we could just
-    reapply it without re-running. But that's a future optimization.
-  - Memory layer with pending changes after a conflicted write: rollback to heap
-    and notify that as changes where it changed things
-- Sanitize React at least a bit by implement CT-320
+- [ ] Ephemeral storage provider + Get rid of `VolatileStorageProvider` CT-420
+- [ ] Schema queries for everything + Source support CT-174 CT-428
+- [ ] Turn off "crawler" more in storage.ts, make sure things still work
+- [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
+      `Cell` should remain)
+  - [ ] Includes changing all places that expect `{ cell: DocImpl, … }` to just
+        use the JSON representation. At the same time, let's support the new
+        syntax for links (@irakli has these in a RFC, should be extracted)
+- [ ] Capture all cell writes in a pending list (this is also needed for a
+      future recipe refactoring)
+  - [ ] For reads after writes do return pending writes, so maybe we just apply
+        writes anyway on a copy. then after flushing the pending writes (i.e.
+        they get written to the nursery), we reset that until the next get. make
+        sure this still work with `QueryResultProxy` (might have to retarget to
+        changed objects). TBD: when to make copies, and can we work directly on
+        the copy in the nursery?
+  - [ ] Note that pending writes might contain `Cell` objects. Those would be
+        converted to links in JSON
+- [ ] Directly read & write to memory layer
+  - [ ] Expose the API below current StorageProvider to `Cell`. That includes
+        `Cell` setting the to application/json, etc., probably a subset of
+        `Replica`.
+  - [ ] Add an `await runtime.idle()` equivalent before processing data from web
+        socket (see design note below)
+  - [ ] Read: `Cell` bypasses DocImpl and just reads from memory
+  - [ ] Scheduler: when listening to changes on entities, directly talk to
+        memory
+  - [ ] Writes: Commit writes after each handler or lift is run as transaction
+- [ ] Remove `storage.ts` and `DocImpl`, they are now skipped
+- [ ] For events, remember event and corresponding write transaction. Clear on
+      success and retry N times on conflict. Retry means running the event
+      handler again on the newest state (for lifted functions this happens
+      automatically as they get marked dirty)
+  - [ ] For change sets that only write (e.g. only push or set), we could just
+        reapply it without re-running. But that's a future optimization.
+  - [ ] Memory layer with pending changes after a conflicted write: rollback to
+        heap and notify that as changes where it changed things
+- [ ] Sanitize React at least a bit by implement CT-320
+
+## Design notes
+
+### Functions must see a consistent state
+
+We need to lock versions while executing a handler/reactive function? I.e. if an
+update comes from the server after the function started, and `.get()` is called,
+we need to return the state from the point when it was called? Considerations:
+
+- It's almost certainly going to cause issues if the function sees data from
+  different points of time, even if they are internally self consistent.
+- We don't know which cells, especially which cells linked from cells the
+  function will read, so making a copy of all of those is overkill.
+- That said, the functions are meant to be synchronous without any awaits in it.
+  We have exceptions (the importers) right now, but it's ok to behave as if they
+  were (i.e. stop everything else). These might become async from the outside
+  later, e.g. we can pause execution in a wasm sandbox to fetch more data across
+  a Worker boundary or so.
+- Hence, for now we can do the equivalent of `await runtime.idle()` before even
+  processing data coming from the websocket, and thus circumvent this question.
+  It really just timeshifts processing to after processing, and that's anyway
+  the intended effect. In fact we should even apply the writes before processing
+  server-side data, then everything will be based on the correct cause.
+
+### Schema queries
+
+Schema queries is how we maintain the invariant that a `.get()` on a cell should
+return a consistent view _across_ several documents. This replaces the current
+"crawler" mode in storage.ts, which what most of the batch logic actually does.
+
+Specifically we rely on the server observing a change in any of the documents
+that were returned last time, rerun the query and send updates to the client
+about all documents that are now returned by the query.
+
+#### Schema queries & cache
+
+We have to store queries in the cache as well, noting for which `since` we're
+sure it is uptodate. In fact we want to point to a session id from each query,
+and the session id notes the last `since`. That's because once a subscription is
+up, all we need are new versions of documents, we don't need to association of
+which query they belonged to. And so all currently active queries are always
+current to the last update.
+
+So when a new query is issued, we
+
+- issue the query to server with a `since` from the cache or `-1` (to be
+  confirmed) indicating that it never ran.
+- if it is in the cache run the query against the cache, and see whether any
+  documents are newer than the `since` for the query. If not, we can server the
+  current cached version immediately. If yes, the state might be inconsistent
+  and we have to wait for the server (in the future we might want to keep older
+  versions for this reason)
+
+The server builds state of what documents the client already has at what version
+by running the queries locally and assuming that the client already has all the
+documents for the sent `since`. It is hence advantageous to send queries that
+are in the cache before any non-cached queries, to the degree that is in our
+control. Maybe batch them for a microtick?
+
+#### What happens if new data is needed
+
+A handler or reactive function might change data in such a way that a subsequent
+reactive function has a query as input that is now incomplete. We need to define
+what should happen in this case.
+
+An example is changing the currently selected item based on a list that has just
+the names of the items and a detail view reacting to it that shows all details.
+It might have to fetch the details now.
+
+The two options are:
+
+- Return `undefined` at the next highest point in the structure that is
+  optional. Eventually the function will be called again with the full data.
+  This seems reasonable, except if an event will be based on this incomplete
+  data and thus write incomplete data back to the store -- CAS will catch most
+  of these cases, but I can imagine UI roundtrips, especially if the user is
+  acting on stale mental state and races the UI with a click, where this breaks.
+- Skip execution entirely, effectively returning `<Pending>` and let higher
+  layers deal with it. (A pattern we might want to adopt for `llm` and other
+  calls as well, see CT-394)
+
+Current implementation just returns `undefined` where data is missing and might
+cause errors. We shouldn't block building the above on resolving this though.
+
+## Future work that is related
+
+### Changing recipe creation to just use `Cell` and get rid of `OpaqueRef`
+
+The accumulation of writes when running a reactive function / handler allows us
+to create a graph of pending cells in them and treat those as a recipe. With the
+addition of marking cells as opaque we then have all the functionality of
+`OpaqueRef` and can replace all of that code.
+
+### Single event sourcing model + server-side query prediction
+
+Instead of sending transactions to the server we can send the original events
+instead and run the event handler server-side on what is guaranteed to be the
+canonical state. Thus have a simpler event source system.
+
+The code above becomes speculative execution that reapplies pending events on
+newer data until it is fully in sync again.
+
+Note that events can still conflict (unless they are inherently conflict-free of
+course). The client-server API then changes to both sending events and getting
+confirmation/rejections back. The client could reissue a new event in some rarer
+cases, but sending the same event again won't be needed, as it was already run
+on the current state. (That said, we can still support "sending an event" that
+just a basic patch and then re-issue a new patch based on that. That's really
+only useful for when we can't run the handler on the server, e.g. in some iframe
+use-cases)
+
+Note that since all reactive functions can run on the server as well, all that
+work is latency reduction on the client and doesn't need to be synced. The
+optimization is the other way around: If we know that a client will run these
+reactive functions we don't need to send that state from the server (in fact
+often we don't need to save it all, especially if recomputation is cheap, but
+that's yet another future optimization). This falls under a larger opportunity
+to look at a computation graph and decide which parts the client should even run
+(e.g. prune parts that only feed into something that must run on the server,
+e.g. any external fetches or API calls)
+
+#### Possible bonus: Server could predict most queries
+
+At this point knowing just a few things about the client, e.g. what UI is shown,
+we can reconstruct enough of the remaining client state server-side to predict
+what schema queries the client would send and just proactively run those and
+sync the documents. The main problem here is that we don't know the state of the
+cache. Maybe there is an anologous roll-up for cache state (which as noted above
+is really a map from queries to `since`), e.g. just remembering the `since` for
+a given place in the UI and the rest follows from there?

From 96220a43b12273c63004a2101f59841eef11fc0e Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Tue, 10 Jun 2025 17:20:31 -0700
Subject: [PATCH 03/20] added more detail

---
 docs/future-tasks/unified-storage-stack.md | 48 +++++++++++++++++++---
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 358cd3e7e..858d6fa5b 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -40,7 +40,10 @@ Specifically we have:
   supports CAS semantics for transactions. It's used by storage.ts only and
   while it has stronger guarantees, those either don't apply or sometimes
   backfire because they are not aligned with the top. It has a cache but we
-  underuse it.
+  underuse it. Key implementation details:
+  - Heap (cache) and nursery (pending changes) separation
+  - WebSocket sync with `merge()` for server updates
+  - Schema query support exists but incomplete (see pull() at line 1082)
 
 ## Desired state
 
@@ -72,16 +75,30 @@ Specifically we have:
 
 ## Steps to get there
 
+This plan should be entirely incremental and can be rolled out step by step.
+
 - [ ] Ephemeral storage provider + Get rid of `VolatileStorageProvider` CT-420
 - [ ] Schema queries for everything + Source support CT-174 CT-428
-- [ ] Turn off "crawler" more in storage.ts, make sure things still work
+  - Currently in `storage/cache.ts:pull()` schema queries are partially
+    implemented but don't use the cache effectively.
+  - Need to implement cache storage for queries with `since` tracking (see
+    cache.ts:1108 and see design note below)
+- [ ] Turn off "crawler" mode in storage.ts, make sure things still work
+  - The crawler is in `storage.ts:_processCurrentBatch()` (lines 478-577) which
+    recursively loads dependencies
+  - Key areas: loading promises map (line 84), dependency tracking, and batch
+    processing
+  - Watch for the FIXME at line 84 about keying by doc+schema combination
 - [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
-      `Cell` should remain)
+      `Cell`, scheduler (just `.updates()`) and storage.ts should remain for
+      now)
   - [ ] Includes changing all places that expect `{ cell: DocImpl, … }` to just
         use the JSON representation. At the same time, let's support the new
         syntax for links (@irakli has these in a RFC, should be extracted)
 - [ ] Capture all cell writes in a pending list (this is also needed for a
       future recipe refactoring)
+  - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that needs to
+        move to transaction level
   - [ ] For reads after writes do return pending writes, so maybe we just apply
         writes anyway on a copy. then after flushing the pending writes (i.e.
         they get written to the nursery), we reset that until the next get. make
@@ -93,14 +110,22 @@ Specifically we have:
 - [ ] Directly read & write to memory layer
   - [ ] Expose the API below current StorageProvider to `Cell`. That includes
         `Cell` setting the to application/json, etc., probably a subset of
-        `Replica`.
+        `Replica`. Key APIs in cache.ts: `push()` (line 878), `pull()` (line
+        1082), `merge()` (line 1287)
   - [ ] Add an `await runtime.idle()` equivalent before processing data from web
         socket (see design note below)
   - [ ] Read: `Cell` bypasses DocImpl and just reads from memory
   - [ ] Scheduler: when listening to changes on entities, directly talk to
-        memory
+        memory (currently goes through storage.ts listeners)
   - [ ] Writes: Commit writes after each handler or lift is run as transaction
+        (and so updates nursery)
+    - Scheduler already tracks action completion (scheduler.ts:554-598)
+    - Need to hook transaction commit after `runAction()` completes
 - [ ] Remove `storage.ts` and `DocImpl`, they are now skipped
+  - storage.ts has 1000+ lines of complex batching logic to remove
+  - DocImpl in doc.ts is ~300 lines
+  - Also removed Cell.push conflict logic (line 922) since the corresponding
+    parts are also being removed.
 - [ ] For events, remember event and corresponding write transaction. Clear on
       success and retry N times on conflict. Retry means running the event
       handler again on the newest state (for lifted functions this happens
@@ -110,6 +135,19 @@ Specifically we have:
   - [ ] Memory layer with pending changes after a conflicted write: rollback to
         heap and notify that as changes where it changed things
 - [ ] Sanitize React at least a bit by implement CT-320
+  - Current iframe transport has TODO at
+    iframe-sandbox/common-iframe-sandbox.ts:212
+  - No version tracking causes overwrites with stale React state
+
+## Open questions
+
+- [ ] Debug tools we should build to support this future state
+- [ ] Behavior for clients that are offline for a while and then come back
+      online while there were changes. By default we'd just drop all of those,
+      but we would notice that explicitly. Unlike rejections that happen quickly
+      and users can react to in real-time, this might need something more
+      sophisticated.
+- [ ] Recovery flows for e.g. corrupted
 
 ## Design notes
 

From 653a1d3f4367fcf10a845cc460e5705d5ec36d64 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <seefeld@gmail.com>
Date: Tue, 10 Jun 2025 18:39:15 -0700
Subject: [PATCH 04/20] Marked cache support as not blocking & added details
 for DocImpl replacement

---
 docs/future-tasks/unified-storage-stack.md | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 858d6fa5b..ead2fad3b 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -79,10 +79,7 @@ This plan should be entirely incremental and can be rolled out step by step.
 
 - [ ] Ephemeral storage provider + Get rid of `VolatileStorageProvider` CT-420
 - [ ] Schema queries for everything + Source support CT-174 CT-428
-  - Currently in `storage/cache.ts:pull()` schema queries are partially
-    implemented but don't use the cache effectively.
-  - Need to implement cache storage for queries with `since` tracking (see
-    cache.ts:1108 and see design note below)
+  - See design note on cache, but that's not blocking progress on the rest
 - [ ] Turn off "crawler" mode in storage.ts, make sure things still work
   - The crawler is in `storage.ts:_processCurrentBatch()` (lines 478-577) which
     recursively loads dependencies
@@ -92,6 +89,7 @@ This plan should be entirely incremental and can be rolled out step by step.
 - [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
       `Cell`, scheduler (just `.updates()`) and storage.ts should remain for
       now)
+  - [ ] Add .setRaw and .getRaw to internal `Cell` interface and use the cell creation methods on `Runtime` and then almost all used of `DocImpl` can be replaced by cells and using `.[set|get]Raw` instead of `.[set|send|get]`
   - [ ] Includes changing all places that expect `{ cell: DocImpl, … }` to just
         use the JSON representation. At the same time, let's support the new
         syntax for links (@irakli has these in a RFC, should be extracted)

From b4325d53d84453b194ab1b7e24473bc2d3a2a883 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <seefeld@gmail.com>
Date: Tue, 10 Jun 2025 18:41:13 -0700
Subject: [PATCH 05/20] Clarified that server-side we need to also run the
 query at a historic time first to compute client state.

---
 docs/future-tasks/unified-storage-stack.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index ead2fad3b..88a2cb4fa 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -200,8 +200,8 @@ So when a new query is issued, we
   versions for this reason)
 
 The server builds state of what documents the client already has at what version
-by running the queries locally and assuming that the client already has all the
-documents for the sent `since`. It is hence advantageous to send queries that
+by running the queries server side _at that sent `since`_ and assuming that the client already has all the
+documents for that `since`. It is hence advantageous to send queries that
 are in the cache before any non-cached queries, to the degree that is in our
 control. Maybe batch them for a microtick?
 

From 3cf8720ded6ad210efa59319767387ef2b05f420 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 09:18:29 -0700
Subject: [PATCH 06/20] addressed @ellyxir's feedback in the PR

---
 docs/future-tasks/unified-storage-stack.md | 137 ++++++++++++++-------
 1 file changed, 94 insertions(+), 43 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 88a2cb4fa..04ed9dec6 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -11,7 +11,15 @@ Specifically we have:
 
 - I/O over iframe boundaries, typically with the iframes running React, which in
   turn assumes synchronous state. So data can roundtrip through iframe/React and
-  overwrite newer data that came in in the meantime.
+  overwrite newer data that came in in the meantime. E.g. a user event happens,
+  state X is updated in React, while new data is waiting in the iframe's message
+  queue: Now an update based on older data is sent to the container, but since
+  there is no versioning at this layer, it is treated as updating the current
+  data. Meanwhile the iframe processes the queued up update, and now is out of
+  sync with the container. Note that this is a pretty tight race condition: Some
+  event processing coincides exactly with receiving data updates. It's rare, but
+  we've seen this happen when tabs get woken up again and a lot of pent up work
+  happens all at once.
 - Scheduler executing event handlers and reactive functions, which would form a
   natural transaction boundary -- especially for event handlers to re-run on
   newer data to rebase changes -- but those boundaries don't mean anything to
@@ -25,23 +33,32 @@ Specifically we have:
 - storage.ts which connects the lower storage layer with DocImpl. It wants to
   make sure that the upper layers see a consistent view, so when a new document
   contains a link to another document, it'll fetch that before updating the doc,
-  recursively. It also fetches all source docs. This also means that changes
-  from the upper layer can accumulate, and then altogether become one
-  transaction. If there is one conflict anywhere, the entire transaction is
-  rejected. And while the actual conflict source gets eventually updated (since
-  the server will send these, and document that is read is also being subscribed
-  to) the other documents that were locally changed are not reverted. The
-  clients get out of sync.
-  - We now also have schema queries, which will immediately fetch all documents
-    that are needed per a given schema, and will keep those up to date, even if
-    links change. That could already replace a lot of the logic above, but we
-    haven't turned that off. It also currently doesn't use the cache.
+  recursively. It also fetches all source docs (i.e. `doc.sourceCell`), also
+  recursively.
+  - This also means that changes from the upper layer can accumulate while all
+    this loading happens, and then altogether become one transaction: And if
+    there is one conflict anywhere, the entire transaction is rejected. And
+    while the actual conflict source gets eventually updated (since the server
+    will send these, and document that is read is also being subscribed to) the
+    other documents that were locally changed are not reverted. The clients get
+    out of sync.
+  - Also, if new data arrives from the server that overwrites local data that
+    was just changed, that is effectively a silently handled conflict, with the
+    same issues as above!
+  - Progress: We now also have schema queries, which will immediately fetch all
+    documents that are needed per a given schema, and will keep those up to
+    date, even if links change (meaning the subscription adds newly needed
+    documents and no longer subscribes to no longer used documents). That could
+    already replace a lot of the logic above, but we haven't turned that off. It
+    also currently doesn't use the cache.
 - storage/cache.ts, the memory layer, which operates at the unit of documents,
   supports CAS semantics for transactions. It's used by storage.ts only and
   while it has stronger guarantees, those either don't apply or sometimes
-  backfire because they are not aligned with the top. It has a cache but we
-  underuse it. Key implementation details:
-  - Heap (cache) and nursery (pending changes) separation
+  backfire because they are not aligned with the top: The upper layers don't
+  have a concept of "cause" and depending on the order of operations we
+  currently issue updated with the latest cause, but actually based on older
+  data. It has a cache but we underuse it. Key implementation details:
+  - Heap (source of truth) and nursery (pending changes) separation
   - WebSocket sync with `merge()` for server updates
   - Schema query support exists but incomplete (see pull() at line 1082)
 
@@ -49,11 +66,16 @@ Specifically we have:
 
 - Iframe transport layer sends incrementing versions and ignores changes that
   are based on too old changes. It's then up to the inside of the iframe to use
-  that correctly. In fact `useDoc()` where `setX()` takes a callback instead of
-  just the new value would already work much better. Probably sufficiently well
-  for most cases. But we could go even further (maybe some popular game toolkits
-  are worth investing here at some point in the future, since that's a good
-  usecase for iframes)
+  that correctly. In fact `useDoc()` where `setX()` takes a callback (e.g.
+  `setX(x => x + 1)`) instead of just the new value would already work much
+  better, since we can rerun it on the newest state if new state arrives that
+  invalidates previously sent updates. Probably sufficiently well for most cases
+  (the remaining problem would be that if other changes based on `X` changing
+  aren't purely reactive, i.e. only based on the last state, that those are not
+  being undone by rerunning the setter. This is rare, even in React). But we
+  could go even further (maybe some popular game toolkits are worth
+  investigating here at some point in the future, since that's a good usecase
+  for iframes)
 - Cells –- constructed typically by the runner when bringing up a recipe, within
   handlers or reactive functions and in some cases by parts of the shell to
   bootstrap things -- directly read from memory via schema query. They
@@ -61,14 +83,15 @@ Specifically we have:
   Interim reads see the new version.
 - Scheduler runs handlers and reactive functions and then issues a transaction
   with pending writes directly to the underlying memory layer (we already log
-  reads and writes, so this can be an extension of that). It registers with the
-  underlying memory layer for changes on individual documents, marking the
-  corresponding reactive functions as needing to run (semantically we want to
-  subscribe to the corresponding schema queries, but at least with the current
-  queries, listening to the actually read docs is the same). For events it will
-  keep track of the transaction and if it fails, and after we're sure to have
-  caught up enough with the server to reflect the new current state, retry up to
-  N times.
+  reads and writes via `ReactivityLog`, so we can extend that to log the exact
+  writes, not just which documents were affected). It registers with the
+  underlying memory layer (instead of with `DocImpl` as before) for changes on
+  individual documents, marking –– as is already the case –– the corresponding
+  reactive functions as needing to run (semantically we want to subscribe to the
+  corresponding schema queries, but at least with the current queries, listening
+  to the actually read docs is the same). For events it will keep track of the
+  transaction and if it fails, and after we're sure to have caught up enough
+  with the server to reflect the new current state, retry up to N times.
 - Memory -- more or less like now, except that its lower level API is directly
   exposed to cells, including `the` and the document ids as DIDs (so the Cell
   will have to translate the ids an prepend `of:`)
@@ -89,7 +112,10 @@ This plan should be entirely incremental and can be rolled out step by step.
 - [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
       `Cell`, scheduler (just `.updates()`) and storage.ts should remain for
       now)
-  - [ ] Add .setRaw and .getRaw to internal `Cell` interface and use the cell creation methods on `Runtime` and then almost all used of `DocImpl` can be replaced by cells and using `.[set|get]Raw` instead of `.[set|send|get]`
+  - [ ] Add .setRaw and .getRaw to internal `Cell` interface and use the cell
+        creation methods on `Runtime` and then almost all used of `DocImpl` can
+        be replaced by cells and using `.[set|get]Raw` instead of
+        `.[set|send|get]`
   - [ ] Includes changing all places that expect `{ cell: DocImpl, … }` to just
         use the JSON representation. At the same time, let's support the new
         syntax for links (@irakli has these in a RFC, should be extracted)
@@ -100,7 +126,7 @@ This plan should be entirely incremental and can be rolled out step by step.
   - [ ] For reads after writes do return pending writes, so maybe we just apply
         writes anyway on a copy. then after flushing the pending writes (i.e.
         they get written to the nursery), we reset that until the next get. make
-        sure this still work with `QueryResultProxy` (might have to retarget to
+        sure this still works with `QueryResultProxy` (might have to retarget to
         changed objects). TBD: when to make copies, and can we work directly on
         the copy in the nursery?
   - [ ] Note that pending writes might contain `Cell` objects. Those would be
@@ -145,7 +171,24 @@ This plan should be entirely incremental and can be rolled out step by step.
       but we would notice that explicitly. Unlike rejections that happen quickly
       and users can react to in real-time, this might need something more
       sophisticated.
-- [ ] Recovery flows for e.g. corrupted
+- [ ] Recovery flows for e.g. corrupted caches (interrupted in the middle of an
+      update)
+- [ ] Extending transaction boundaries beyond single event handlers: As
+      described above, each handler's execution and retry is independent of each
+      other and it's possible that one of them is rejected while others pass,
+      even for the same event. We could change this to broaden the transaction:
+  - A fairly simple change would be to treat all handlers of the same event as
+    one transaction. Currently scheduler executes them one-by-one and settles
+    the state (i.e. run all the reactive functions) in between, and it wouldn't
+    be difficult to change that to running all handlers for one event, then
+    settle the state. That way, at least the event is either accepted or
+    rejected as a whole. That said, I don't think we have any examples yet of
+    running multiple handlers in parallel.
+  - The more complex case would be a cascade of events, i.e. event handlers that
+    issue more events, and then accepting/rejecting the entire cascade. That's
+    significantly more complicated, and even more so if we allow async steps
+    inbetween (like a fetch). We haven't seen concrete examples of this yet, and
+    we should generally avoid this pattern in favor of reactive functions.
 
 ## Design notes
 
@@ -173,8 +216,10 @@ we need to return the state from the point when it was called? Considerations:
 ### Schema queries
 
 Schema queries is how we maintain the invariant that a `.get()` on a cell should
-return a consistent view _across_ several documents. This replaces the current
-"crawler" mode in storage.ts, which what most of the batch logic actually does.
+return a consistent view _across_ several documents by fetching and updating
+documents atomically. The schema lets us understand how to group these
+documents. This replaces the current "crawler" mode in storage.ts, which what
+most of the batch logic actually does.
 
 Specifically we rely on the server observing a change in any of the documents
 that were returned last time, rerun the query and send updates to the client
@@ -183,27 +228,33 @@ about all documents that are now returned by the query.
 #### Schema queries & cache
 
 We have to store queries in the cache as well, noting for which `since` we're
-sure it is uptodate. In fact we want to point to a session id from each query,
-and the session id notes the last `since`. That's because once a subscription is
-up, all we need are new versions of documents, we don't need to association of
-which query they belonged to. And so all currently active queries are always
-current to the last update.
+sure it is uptodate (`since` is monotonically increasing at the space level, so
+representing time: A document has a value _since_ that time). In fact we want to
+point to a session id (representing the current socket connection, since that
+represents the time the client and server share state (IOW: For each new
+connection the active queries have to be re-established, and then present the
+next set of shared state again). It is just a local concept, so any random or
+even monotonically increasing number will do) from each query, and the session
+id notes the last `since`. That's because once a subscription is up, all we need
+are new versions of documents, we don't need the association of which query they
+belonged to. And so all currently active queries are always current to the last
+update. (Once we also unsubscribe from queries this gets a little more complex)
 
 So when a new query is issued, we
 
 - issue the query to server with a `since` from the cache or `-1` (to be
   confirmed) indicating that it never ran.
 - if it is in the cache run the query against the cache, and see whether any
-  documents are newer than the `since` for the query. If not, we can server the
+  documents are newer than the `since` for the query. If not, we can serve the
   current cached version immediately. If yes, the state might be inconsistent
   and we have to wait for the server (in the future we might want to keep older
   versions for this reason)
 
 The server builds state of what documents the client already has at what version
-by running the queries server side _at that sent `since`_ and assuming that the client already has all the
-documents for that `since`. It is hence advantageous to send queries that
-are in the cache before any non-cached queries, to the degree that is in our
-control. Maybe batch them for a microtick?
+by running the queries server side _at that sent `since`_ and assuming that the
+client already has all the documents for that `since`. It is hence advantageous
+to send queries that are in the cache before any non-cached queries, to the
+degree that is in our control. Maybe batch them for a microtick?
 
 #### What happens if new data is needed
 

From af5d9fc9f209338121556d9d4e6803967672d589 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 10:54:23 -0700
Subject: [PATCH 07/20] added open question on incrementally loading data

---
 docs/future-tasks/unified-storage-stack.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 04ed9dec6..1b6b985e0 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -189,6 +189,13 @@ This plan should be entirely incremental and can be rolled out step by step.
     significantly more complicated, and even more so if we allow async steps
     inbetween (like a fetch). We haven't seen concrete examples of this yet, and
     we should generally avoid this pattern in favor of reactive functions.
+- [ ] Incremental loading: As currently stated all pending schema queries are
+      expected to be resolved together. At least that is the easiest to model if
+      the goal is to represent consistent state. But it also means that the
+      initial load can take longer than needed, because it needs to load all the
+      data. Clever ordering of queries, treating some as pending, etc. could
+      improve this a lot, but is non-trivial. Fine for now, but something to
+      observe.
 
 ## Design notes
 

From d7b41f715195944c4dfa64dfc74307d6bc675244 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 13:30:27 -0700
Subject: [PATCH 08/20] added more notes on future work

---
 docs/future-tasks/unified-storage-stack.md | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 1b6b985e0..4698cebaf 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -290,6 +290,58 @@ cause errors. We shouldn't block building the above on resolving this though.
 
 ## Future work that is related
 
+### Changes to schema
+
+- flip default interpretation of no `additionalProperties` to be `false` instead
+  of `true` (which is what "ignore additional properties" means in a query
+  context vs validation context)
+- change `{}` schema to mean `false` (matching nothing) instead of `true` (any).
+- use schema in cell links when resolving queries, at first only if schema is
+  `true` otherwise.
+- add `allOf` support and use it for schemas in cell links, so that it's now the
+  intersection of schemas.
+
+### Transition all links/etc to @-notation
+
+- Blobs: See upcoming doc, noting here as it's building on moving to the
+  `{ "@": { <type>: { ... }}}` notation of links/etc
+- Charms: Add `{ "@": { process: { ... }}}` for charms, and make result just
+  data within that (as it's always a few static aliases anyway). System code
+  that deals with charms should probably directly operate on cells of processes
+  like this. And code where charms are just pointers to results (most current
+  code and all userland code) inherit the behavior from the now clear
+  containment.
+- Streams: Currently `{ $stream: true }` should also transition. I don't think
+  any extra data is needed, though we might want to discuss moving the schema of
+  the events into this vs storing it in the usual meta data for schemas.
+
+### Save scheduling information in storage as meta data, remove extra `value`
+
+Currently data in storage is actually
+`{ value?: <the actual value>, source?: <id of process charm, if any> }`. It
+should just be the value.
+
+We also don't store any information that the scheduler generates about
+dependencies, and so when loading a charm we have to recompute all reactive
+functions just to regenerate that.
+
+Instead we could save scheduling information as meta data, i.e. as a separate
+document with a different `the`, but the same `of`.
+
+For reactive functions we should store
+
+- What data was read to compute this output and at what `since`
+- Link to process cell that explains how to recompute this
+
+For data last manipulated by an event we might just - if the event actually did
+read the prior state - write itself as dependent data. This makes it so that the
+rule to allow overwriting is "can't use any sources that are older than the
+sources used to compute this value". The underlying CAS based on cause is then
+just there to make sure this reasoning is based on the current last state.
+
+For streams we want to write out all the processing cells that define handlers
+for this stream, so that they can be reloaded.
+
 ### Changing recipe creation to just use `Cell` and get rid of `OpaqueRef`
 
 The accumulation of writes when running a reactive function / handler allows us

From b193bc804a47a91fab0dc9d021a07b2aed6c3676 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 14:07:06 -0700
Subject: [PATCH 09/20] Update docs/future-tasks/unified-storage-stack.md

Co-authored-by: Irakli Gozalishvili <21236+Gozala@users.noreply.github.com>
---
 docs/future-tasks/unified-storage-stack.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 4698cebaf..0f7844bad 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -58,7 +58,7 @@ Specifically we have:
   have a concept of "cause" and depending on the order of operations we
   currently issue updated with the latest cause, but actually based on older
   data. It has a cache but we underuse it. Key implementation details:
-  - Heap (source of truth) and nursery (pending changes) separation
+  - Heap (partial replica of the remote state) and nursery (pending changes) separation
   - WebSocket sync with `merge()` for server updates
   - Schema query support exists but incomplete (see pull() at line 1082)
 

From 421ef26832e3f756e9a8a5b700c0d764cf9d2b49 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 15:34:26 -0700
Subject: [PATCH 10/20] updated with conclusions from meeting (+ more TODO to
 resolve)

---
 docs/future-tasks/unified-storage-stack.md | 94 +++++++++++++++-------
 1 file changed, 63 insertions(+), 31 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 0f7844bad..9460a95ab 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -58,7 +58,8 @@ Specifically we have:
   have a concept of "cause" and depending on the order of operations we
   currently issue updated with the latest cause, but actually based on older
   data. It has a cache but we underuse it. Key implementation details:
-  - Heap (partial replica of the remote state) and nursery (pending changes) separation
+  - Heap (partial replica of the remote state) and nursery (pending changes)
+    separation
   - WebSocket sync with `merge()` for server updates
   - Schema query support exists but incomplete (see pull() at line 1082)
 
@@ -109,6 +110,8 @@ This plan should be entirely incremental and can be rolled out step by step.
   - Key areas: loading promises map (line 84), dependency tracking, and batch
     processing
   - Watch for the FIXME at line 84 about keying by doc+schema combination
+- [ ] When connection is dropped, re-establish all schema queries again
+  - [ ] Show UI if reading / re-establishing takes surprisingly long
 - [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
       `Cell`, scheduler (just `.updates()`) and storage.ts should remain for
       now)
@@ -116,35 +119,60 @@ This plan should be entirely incremental and can be rolled out step by step.
         creation methods on `Runtime` and then almost all used of `DocImpl` can
         be replaced by cells and using `.[set|get]Raw` instead of
         `.[set|send|get]`
-  - [ ] Includes changing all places that expect `{ cell: DocImpl, … }` to just
-        use the JSON representation. At the same time, let's support the new
-        syntax for links (@irakli has these in a RFC, should be extracted)
-- [ ] Capture all cell writes in a pending list (this is also needed for a
-      future recipe refactoring)
-  - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that needs to
-        move to transaction level
-  - [ ] For reads after writes do return pending writes, so maybe we just apply
-        writes anyway on a copy. then after flushing the pending writes (i.e.
-        they get written to the nursery), we reset that until the next get. make
-        sure this still works with `QueryResultProxy` (might have to retarget to
-        changed objects). TBD: when to make copies, and can we work directly on
-        the copy in the nursery?
-  - [ ] Note that pending writes might contain `Cell` objects. Those would be
-        converted to links in JSON
-- [ ] Directly read & write to memory layer
-  - [ ] Expose the API below current StorageProvider to `Cell`. That includes
-        `Cell` setting the to application/json, etc., probably a subset of
-        `Replica`. Key APIs in cache.ts: `push()` (line 878), `pull()` (line
-        1082), `merge()` (line 1287)
-  - [ ] Add an `await runtime.idle()` equivalent before processing data from web
-        socket (see design note below)
-  - [ ] Read: `Cell` bypasses DocImpl and just reads from memory
-  - [ ] Scheduler: when listening to changes on entities, directly talk to
-        memory (currently goes through storage.ts listeners)
-  - [ ] Writes: Commit writes after each handler or lift is run as transaction
-        (and so updates nursery)
-    - Scheduler already tracks action completion (scheduler.ts:554-598)
-    - Need to hook transaction commit after `runAction()` completes
+- [ ] Change all places that expect `{ cell: DocImpl, … }` to just use the JSON
+      representation. At the same time, let's support the new syntax for links
+      (@irakli has these in a RFC, should be extracted, effectively
+      `{ "@": { "link": { ... }}}`). This is because today `storage.ts`
+      translates any `{ "/": string }` it finds to `{ "/": DocImpl }`, but we
+      don't want to carry this logic over to this new state. See `isCellLink`,
+      which might not be universally used, but should be. Maybe add a
+      `readCellLink` function to parse these.
+  - [ ] Also change schema queries on the serverside
+  - [ ] Remove that translation in storage.ts and make sure everything still
+        works.
+- [ ] Directly read & write to memory layer & wrap action calls in a transaction
+  - [ ] Expose the API below current StorageProvider to `Cell` and `Scheduler`.
+        That includes `Cell` setting the to application/json, compute the right
+        DID based ids, etc. -- all stuff that currently happens in
+        `StorageProvider` instances.
+  - [ ] Expose a "run this function that returns a transaction" from memory,
+        which scheduler calls with something that wraps the action and returns a
+        transaction. Be sure to handle errors correctly (maybe the function can
+        return that it error'ed and that aborts the transaction). This function
+        can be async, so you have to await it.
+    - [ ] Part of the transaction is what is being read. Likely, we just want to
+          change `ReactivityLog` to be a new transaction object that is being
+          passed through.
+    - [ ] Takes a #retry and callback-on-out-of-retries parameters and retries
+          that many times after conflicts with updated data. Scheduler uses
+          those for events (including a hook for a user-visible callback for
+          user created events). Not needed for reactive functions, those will be
+          scheduled again anyway based on data changes.
+    - [ ] The heap shall not change while this function is running, i.e. the
+          function can read from memory and gets the state as of the beginning
+          of the function call and what it updated itself.
+  - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that can be
+        removed. It's subsumed by the previous point.
+  - [ ] Read: `Cell` bypasses DocImpl and just reads from memory.
+        `QueryResultProxy` will also have to keep re-reading from memory, so it
+        gets the latest state from the nursery.
+  - [ ] Writes: `Cell` and `QueryResultProxy` directly write to memory, building
+        up the current transaction (probably via what replaces
+        `log: ReactivityLog`). This mostly boils down to changing
+        `applyChangeSet`.
+  - [ ] Scheduler: When listening to changes on entities, directly talk to
+        memory (currently goes through storage.ts listeners). This happens just
+        before returning the transaction, so maybe we instead make this
+        listening part of the transaction API?
+  - [ ] TODO: Figure this out: Pending writes might contain `Cell` objects.
+        Those would be converted to links in JSON, but the ids for those might
+        not be known when writing (this has to do with the upcoming recipe
+        refactoring). Probably this will have to look like another queued up set
+        of writes.
+- [ ] More selectively purge the nursery on conflicts by observing conflicted
+      reads.
+- [ ] On conflicts add data that changed unless it was already sent to the
+      client by a query.
 - [ ] Remove `storage.ts` and `DocImpl`, they are now skipped
   - storage.ts has 1000+ lines of complex batching logic to remove
   - DocImpl in doc.ts is ~300 lines
@@ -154,9 +182,11 @@ This plan should be entirely incremental and can be rolled out step by step.
       success and retry N times on conflict. Retry means running the event
       handler again on the newest state (for lifted functions this happens
       automatically as they get marked dirty)
+
+      TODO: This should be done by that transact function above. Pass it #retries and maybe a callback on last failure. Pass 0 and none for reactive functions. Use callback to notify user if applicable.
   - [ ] For change sets that only write (e.g. only push or set), we could just
         reapply it without re-running. But that's a future optimization.
-  - [ ] Memory layer with pending changes after a conflicted write: rollback to
+  - [x] Memory layer with pending changes after a conflicted write: rollback to
         heap and notify that as changes where it changed things
 - [ ] Sanitize React at least a bit by implement CT-320
   - Current iframe transport has TODO at
@@ -196,6 +226,8 @@ This plan should be entirely incremental and can be rolled out step by step.
       data. Clever ordering of queries, treating some as pending, etc. could
       improve this a lot, but is non-trivial. Fine for now, but something to
       observe.
+- [ ] Anything we can do to make it easier to run handlers or functions in
+      parallel if they have no shared dependencies?
 
 ## Design notes
 

From 0b70d81e265b07076d109545179e15616dbba7cb Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 16:05:47 -0700
Subject: [PATCH 11/20] remove TODO i left in

---
 docs/future-tasks/unified-storage-stack.md | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 9460a95ab..f67241c44 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -151,6 +151,9 @@ This plan should be entirely incremental and can be rolled out step by step.
     - [ ] The heap shall not change while this function is running, i.e. the
           function can read from memory and gets the state as of the beginning
           of the function call and what it updated itself.
+    - [ ] For change sets that only write (e.g. only push or set), we could just
+          reapply it without re-running. But this could also be a future
+          optimization.
   - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that can be
         removed. It's subsumed by the previous point.
   - [ ] Read: `Cell` bypasses DocImpl and just reads from memory.
@@ -178,16 +181,8 @@ This plan should be entirely incremental and can be rolled out step by step.
   - DocImpl in doc.ts is ~300 lines
   - Also removed Cell.push conflict logic (line 922) since the corresponding
     parts are also being removed.
-- [ ] For events, remember event and corresponding write transaction. Clear on
-      success and retry N times on conflict. Retry means running the event
-      handler again on the newest state (for lifted functions this happens
-      automatically as they get marked dirty)
-
-      TODO: This should be done by that transact function above. Pass it #retries and maybe a callback on last failure. Pass 0 and none for reactive functions. Use callback to notify user if applicable.
-  - [ ] For change sets that only write (e.g. only push or set), we could just
-        reapply it without re-running. But that's a future optimization.
-  - [x] Memory layer with pending changes after a conflicted write: rollback to
-        heap and notify that as changes where it changed things
+- [x] Memory layer with pending changes after a conflicted write: rollback to
+      heap and notify that as changes where it changed things
 - [ ] Sanitize React at least a bit by implement CT-320
   - Current iframe transport has TODO at
     iframe-sandbox/common-iframe-sandbox.ts:212

From 8228e137355f3a7a366fcda15f3f35d9c79c08da Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 16:19:47 -0700
Subject: [PATCH 12/20] resolved my TODO for the upcoming recipe refactor

---
 docs/future-tasks/unified-storage-stack.md | 35 ++++++++++++++++------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index f67241c44..917707ee6 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -167,11 +167,6 @@ This plan should be entirely incremental and can be rolled out step by step.
         memory (currently goes through storage.ts listeners). This happens just
         before returning the transaction, so maybe we instead make this
         listening part of the transaction API?
-  - [ ] TODO: Figure this out: Pending writes might contain `Cell` objects.
-        Those would be converted to links in JSON, but the ids for those might
-        not be known when writing (this has to do with the upcoming recipe
-        refactoring). Probably this will have to look like another queued up set
-        of writes.
 - [ ] More selectively purge the nursery on conflicts by observing conflicted
       reads.
 - [ ] On conflicts add data that changed unless it was already sent to the
@@ -371,10 +366,32 @@ for this stream, so that they can be reloaded.
 
 ### Changing recipe creation to just use `Cell` and get rid of `OpaqueRef`
 
-The accumulation of writes when running a reactive function / handler allows us
-to create a graph of pending cells in them and treat those as a recipe. With the
-addition of marking cells as opaque we then have all the functionality of
-`OpaqueRef` and can replace all of that code.
+`OpaqueRef` are just opaque `Cell`s. So we can vastly simplify recipe
+generaetion by combining those.
+
+Essentially `lift` & co return cells instead. And the opaque cell passed in to
+the recipe function is already bound to the actual inputs (without revealing
+that to the recipe function).
+
+The tricky bit is generating good causal ids for these cells, which isn't good
+right now either.
+
+This could work like this:
+
+- Build up a graph of cells, just like opaque refs now, via builder functions.
+  Don't assign ids yet, so id-less cells is a new thing.
+- Eventually they are either assigned to a pre-existing cell or they are
+  returned (which is assigned to the result cell). Use this to derive ids, that
+  are causal to invocation and where they write to (entity id + path).
+- Do that recursively as ids are being set.
+- For all remaining cells in the graph, i.e. those that are never read, we can
+  assign them causal to the invocation and a sequence number or something else.
+  As nothing can read them it matters less. FWIW, the only use-case so far is a
+  `lift` that calls console.log, so strictly for debugging.
+
+Note that with the change above of writing transactions this also implies
+delayed writes. This is then also how the transaction object gets known to the
+cell: When it connects to the rest of the graph and gets its id.
 
 ### Single event sourcing model + server-side query prediction
 

From f8c99018a751af81b81bfbef00ea22b9fb9f060c Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 17:22:18 -0700
Subject: [PATCH 13/20] files tickets and added numbers here

---
 docs/future-tasks/unified-storage-stack.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 917707ee6..2f8eb7c9c 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -111,26 +111,26 @@ This plan should be entirely incremental and can be rolled out step by step.
     processing
   - Watch for the FIXME at line 84 about keying by doc+schema combination
 - [ ] When connection is dropped, re-establish all schema queries again
-  - [ ] Show UI if reading / re-establishing takes surprisingly long
 - [ ] Replace all direct use of `DocImpl` with `Cell` (only `DocImpl` use inside
       `Cell`, scheduler (just `.updates()`) and storage.ts should remain for
-      now)
+      now) CT-446
   - [ ] Add .setRaw and .getRaw to internal `Cell` interface and use the cell
         creation methods on `Runtime` and then almost all used of `DocImpl` can
         be replaced by cells and using `.[set|get]Raw` instead of
         `.[set|send|get]`
 - [ ] Change all places that expect `{ cell: DocImpl, … }` to just use the JSON
       representation. At the same time, let's support the new syntax for links
-      (@irakli has these in a RFC, should be extracted, effectively
+      (@irakli has these in a RFC (CT-448), should be extracted, effectively
       `{ "@": { "link": { ... }}}`). This is because today `storage.ts`
       translates any `{ "/": string }` it finds to `{ "/": DocImpl }`, but we
       don't want to carry this logic over to this new state. See `isCellLink`,
       which might not be universally used, but should be. Maybe add a
-      `readCellLink` function to parse these.
+      `readCellLink` function to parse these. CT-447
   - [ ] Also change schema queries on the serverside
   - [ ] Remove that translation in storage.ts and make sure everything still
         works.
 - [ ] Directly read & write to memory layer & wrap action calls in a transaction
+      CT-450
   - [ ] Expose the API below current StorageProvider to `Cell` and `Scheduler`.
         That includes `Cell` setting the to application/json, compute the right
         DID based ids, etc. -- all stuff that currently happens in
@@ -139,7 +139,7 @@ This plan should be entirely incremental and can be rolled out step by step.
         which scheduler calls with something that wraps the action and returns a
         transaction. Be sure to handle errors correctly (maybe the function can
         return that it error'ed and that aborts the transaction). This function
-        can be async, so you have to await it.
+        can be async, so you have to await it. CT-449
     - [ ] Part of the transaction is what is being read. Likely, we just want to
           change `ReactivityLog` to be a new transaction object that is being
           passed through.
@@ -168,10 +168,10 @@ This plan should be entirely incremental and can be rolled out step by step.
         before returning the transaction, so maybe we instead make this
         listening part of the transaction API?
 - [ ] More selectively purge the nursery on conflicts by observing conflicted
-      reads.
+      reads. CT-451
 - [ ] On conflicts add data that changed unless it was already sent to the
-      client by a query.
-- [ ] Remove `storage.ts` and `DocImpl`, they are now skipped
+      client by a query. CT-452
+- [ ] Remove `storage.ts` and `DocImpl`, they are now skipped CT-453
   - storage.ts has 1000+ lines of complex batching logic to remove
   - DocImpl in doc.ts is ~300 lines
   - Also removed Cell.push conflict logic (line 922) since the corresponding
@@ -191,6 +191,7 @@ This plan should be entirely incremental and can be rolled out step by step.
       but we would notice that explicitly. Unlike rejections that happen quickly
       and users can react to in real-time, this might need something more
       sophisticated.
+  - [ ] At the very least show a UI that we're offline. CT-445 tracks that.
 - [ ] Recovery flows for e.g. corrupted caches (interrupted in the middle of an
       update)
 - [ ] Extending transaction boundaries beyond single event handlers: As

From 9e07d000d43fd8a9d9c587fe465dbc7e315b05af Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Wed, 11 Jun 2025 18:12:48 -0700
Subject: [PATCH 14/20] clarification based on @gozala's comment

---
 docs/future-tasks/unified-storage-stack.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 2f8eb7c9c..761949d12 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -11,15 +11,15 @@ Specifically we have:
 
 - I/O over iframe boundaries, typically with the iframes running React, which in
   turn assumes synchronous state. So data can roundtrip through iframe/React and
-  overwrite newer data that came in in the meantime. E.g. a user event happens,
-  state X is updated in React, while new data is waiting in the iframe's message
-  queue: Now an update based on older data is sent to the container, but since
-  there is no versioning at this layer, it is treated as updating the current
-  data. Meanwhile the iframe processes the queued up update, and now is out of
-  sync with the container. Note that this is a pretty tight race condition: Some
-  event processing coincides exactly with receiving data updates. It's rare, but
-  we've seen this happen when tabs get woken up again and a lot of pent up work
-  happens all at once.
+  overwrite newer data that came in in the meantime based on outdated
+  assumptions. E.g. a user event happens, state X is updated in React, while new
+  data is waiting in the iframe's message queue: Now an update based on older
+  data is sent to the container, but since there is no versioning at this layer,
+  it is treated as updating the current data. Meanwhile the iframe processes the
+  queued up update, and now is out of sync with the container. Note that this is
+  a pretty tight race condition: Some event processing coincides exactly with
+  receiving data updates. It's rare, but we've seen this happen when tabs get
+  woken up again and a lot of pent up work happens all at once.
 - Scheduler executing event handlers and reactive functions, which would form a
   natural transaction boundary -- especially for event handlers to re-run on
   newer data to rebase changes -- but those boundaries don't mean anything to

From 9bdb6c799d21c9c7b339770b9899654ca22a76ba Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 09:56:11 -0700
Subject: [PATCH 15/20] added many more details after reflecting on a few
 challenges

---
 docs/future-tasks/unified-storage-stack.md | 107 +++++++++++++++------
 1 file changed, 76 insertions(+), 31 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 761949d12..689cf2d3a 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -129,44 +129,89 @@ This plan should be entirely incremental and can be rolled out step by step.
   - [ ] Also change schema queries on the serverside
   - [ ] Remove that translation in storage.ts and make sure everything still
         works.
+- [ ] Expose a "run this function with a transaction object to be populated"
+      from memory, which scheduler calls with something that wraps the action
+      and returns a transaction, and which all other reading and writes contexts
+      will use as well. CT-449
+  - [ ] The callback function can be async, so you have to await it. But if the
+        function isn't async, the entire transaction building flow should be
+        synchronous (i.e. not yield to the event loop), just return a Promise
+        for when the transaction settles.
+  - [ ] Be sure to handle errors correctly, i.e. abort transaction on re-throw
+        the error (or return an aborted transaction, including the caught error?
+        or reject promise with aborted transaction that includes the caught
+        exception?).
+  - [ ] The heap shall not change while this function is running, i.e. the
+        function can read from memory and gets the state as of the beginning of
+        the function call and what it updated itself.
+  - [ ] If the transaction is aborted, the nursery is reverted to what it was
+        before the transaction (that is likely we'll create another layer on top
+        of the nursery?).
+  - [ ] Part of the transaction is what is being read. Likely, we just want to
+        change `ReactivityLog` to be a new transaction object that is being
+        passed through. In fact all reads and writes could now go through such
+        an `tx` object. We log not just the entity read or written, but also the
+        path (see next item)
+  - [ ] Add path-dependent listeners to memory: During a transaction reads are
+        logged with path. And there is a `tx.updates(callback)` or so function
+        that registers a listener to any writes to any of the read paths (and
+        only changes to these paths). Make it canceallable (the scheduler will
+        e.g. cancel this before executing the action again). Current need would
+        be satisfied if this is called once on the first change and then not
+        again. If reads and writes are overlapping, i.e. the function wrote to
+        what it reads, then notifications start after applying those writes.
+    - [ ] The current reads and writes can also be read out, which scheduler
+          will use to update the dependency graph. In fact scheduler will inside
+          the callback do both this and adding the callback just before
+          returning. It does so to not miss any updates.
+  - [ ] The callback can `tx.abort(reason extends Error)` a transaction.
+        Listeners per above should still be installed, even if the transaction
+        is aborted. Scheduler will use this if the action (which is called from
+        the callback) throws, so it can still update scheduling information.
+  - [ ] Support nested read transactions. Write isn't needed. Nesting just means
+        that the read logs are partitioned, i.e. reads are only added to the
+        innermost transaction and thus the listeners from the previous point
+        only apply to those. This is concretely used in the renderer, which
+        creates nested `Cell.sink` calls that follow this logic. Throw when
+        nesting with writes or with async callbacks (the latter makes sure we
+        can't interleave transactions, just strictly nest them).
+- [ ] Memory schedules a task with the scheduler to update heap. Scheduler will
+      run this as soon as possible, but outside running network of reactive
+      functions. (TBD: There are alternative approaches. Either way, per above,
+      memory needs the ability to queue up changes, so this seems like the
+      simplest way)
 - [ ] Directly read & write to memory layer & wrap action calls in a transaction
       CT-450
-  - [ ] Expose the API below current StorageProvider to `Cell` and `Scheduler`.
-        That includes `Cell` setting the to application/json, compute the right
-        DID based ids, etc. -- all stuff that currently happens in
-        `StorageProvider` instances.
-  - [ ] Expose a "run this function that returns a transaction" from memory,
-        which scheduler calls with something that wraps the action and returns a
-        transaction. Be sure to handle errors correctly (maybe the function can
-        return that it error'ed and that aborts the transaction). This function
-        can be async, so you have to await it. CT-449
-    - [ ] Part of the transaction is what is being read. Likely, we just want to
-          change `ReactivityLog` to be a new transaction object that is being
-          passed through.
-    - [ ] Takes a #retry and callback-on-out-of-retries parameters and retries
-          that many times after conflicts with updated data. Scheduler uses
-          those for events (including a hook for a user-visible callback for
-          user created events). Not needed for reactive functions, those will be
-          scheduled again anyway based on data changes.
-    - [ ] The heap shall not change while this function is running, i.e. the
-          function can read from memory and gets the state as of the beginning
-          of the function call and what it updated itself.
-    - [ ] For change sets that only write (e.g. only push or set), we could just
-          reapply it without re-running. But this could also be a future
-          optimization.
+  - [ ] Use the new transaction API above to do all reads and writes.
   - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that can be
-        removed. It's subsumed by the previous point.
-  - [ ] Read: `Cell` bypasses DocImpl and just reads from memory.
-        `QueryResultProxy` will also have to keep re-reading from memory, so it
-        gets the latest state from the nursery.
+        removed. It's subsumed by the next point.
+  - [ ] Scheduler creates a transaction that wraps calling an action. See above.
+  - [ ] `Cell.sink()` creates a (read-only) transaction, which might be nested
+        (see above).
+  - [ ] When cells are being created, active transactions are being passed in,
+        just like `log` now. Change `withLog()` to `withTX()` or so. When there
+        is no TX associated on a read, create one. Don't allow writes without a
+        TX.
+  - [ ] In Runner, use TX associated with passed in result cell to create
+        process cell if needed.
+  - [ ] `QueryResultProxy` will have to keep re-reading from memory, so it gets
+        the latest state from the nursery.
   - [ ] Writes: `Cell` and `QueryResultProxy` directly write to memory, building
-        up the current transaction (probably via what replaces
-        `log: ReactivityLog`). This mostly boils down to changing
+        up the current transaction. This mostly boils down to changing
         `applyChangeSet`.
   - [ ] Scheduler: When listening to changes on entities, directly talk to
         memory (currently goes through storage.ts listeners). This happens just
-        before returning the transaction, so maybe we instead make this
-        listening part of the transaction API?
+        before returning the transaction. See above.
+- [ ] Scheduler retries events whose transaction failed. It does so up to N
+      times and calls a callback after the last retry (both configurable via
+      `Runtime` constructor). Events are retried after all reactive functions
+      that are queued up are settled, so it's guaranteed to be a stable state
+      (Future optimization: Dynamically insert into the queue after any reactive
+      function that might update the handlers inputs, but before any that read
+      its outputs)
+  - [ ] For change sets that only write (e.g. only push or set), we could just
+        reapply those without re-runnin the handler. But this could also be a
+        future optimization.
 - [ ] More selectively purge the nursery on conflicts by observing conflicted
       reads. CT-451
 - [ ] On conflicts add data that changed unless it was already sent to the

From 35abacdd3bca8227d741f87df0577d0c383bbe36 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 10:03:30 -0700
Subject: [PATCH 16/20] clarified path dependence

---
 docs/future-tasks/unified-storage-stack.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 689cf2d3a..23a43fe03 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -160,6 +160,9 @@ This plan should be entirely incremental and can be rolled out step by step.
         be satisfied if this is called once on the first change and then not
         again. If reads and writes are overlapping, i.e. the function wrote to
         what it reads, then notifications start after applying those writes.
+    - [ ] Path-depenent means that we diff updates and compute what paths have
+          changed. Callback gets called if any paths overlap, i.e. one is a
+          subset of the other.
     - [ ] The current reads and writes can also be read out, which scheduler
           will use to update the dependency graph. In fact scheduler will inside
           the callback do both this and adding the callback just before

From 6da56717ee7bc8d69fc87eba317e144e1df5a51e Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 10:05:24 -0700
Subject: [PATCH 17/20] added pointers to current implementation

---
 docs/future-tasks/unified-storage-stack.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 23a43fe03..74ec6bc81 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -160,9 +160,10 @@ This plan should be entirely incremental and can be rolled out step by step.
         be satisfied if this is called once on the first change and then not
         again. If reads and writes are overlapping, i.e. the function wrote to
         what it reads, then notifications start after applying those writes.
-    - [ ] Path-depenent means that we diff updates and compute what paths have
+    - [ ] Path-dependent means that we diff updates and compute what paths have
           changed. Callback gets called if any paths overlap, i.e. one is a
-          subset of the other.
+          subset of the other. See `compactifyPath` and `pathAffected` for
+          current implementation.
     - [ ] The current reads and writes can also be read out, which scheduler
           will use to update the dependency graph. In fact scheduler will inside
           the callback do both this and adding the callback just before

From b5bf7e32faf0d54f7975db0395513f8955a144f1 Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 10:24:45 -0700
Subject: [PATCH 18/20] clarified retry

---
 docs/future-tasks/unified-storage-stack.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 74ec6bc81..05796c53b 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -208,11 +208,11 @@ This plan should be entirely incremental and can be rolled out step by step.
         before returning the transaction. See above.
 - [ ] Scheduler retries events whose transaction failed. It does so up to N
       times and calls a callback after the last retry (both configurable via
-      `Runtime` constructor). Events are retried after all reactive functions
-      that are queued up are settled, so it's guaranteed to be a stable state
-      (Future optimization: Dynamically insert into the queue after any reactive
-      function that might update the handlers inputs, but before any that read
-      its outputs)
+      `Runtime` constructor). Events are retried after all read cells are fully
+      synced and reactive functions that are queued up are settled, so it's
+      guaranteed to be a stable state (Future optimization: Dynamically insert
+      into the queue after any reactive function that might update the handlers
+      inputs, but before any that read its outputs)
   - [ ] For change sets that only write (e.g. only push or set), we could just
         reapply those without re-runnin the handler. But this could also be a
         future optimization.

From 4c5e771ec6fc9e64f89fa84a7f516167bf05f2bd Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 11:24:31 -0700
Subject: [PATCH 19/20] simplified things again!

---
 docs/future-tasks/unified-storage-stack.md | 61 +++++++++++++---------
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 05796c53b..1598253f8 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -129,10 +129,10 @@ This plan should be entirely incremental and can be rolled out step by step.
   - [ ] Also change schema queries on the serverside
   - [ ] Remove that translation in storage.ts and make sure everything still
         works.
-- [ ] Expose a "run this function with a transaction object to be populated"
-      from memory, which scheduler calls with something that wraps the action
-      and returns a transaction, and which all other reading and writes contexts
-      will use as well. CT-449
+- [ ] Create a new transaction API: Get a `tx` object from memory, which exposes
+      `tx.read(entity, path)`, `tx.write(entity, path, value)`, etc., and
+      `tx.commit()`, `tx.abort(reason?: Error)` and a few more (see below).
+      CT-449
   - [ ] The callback function can be async, so you have to await it. But if the
         function isn't async, the entire transaction building flow should be
         synchronous (i.e. not yield to the event loop), just return a Promise
@@ -147,11 +147,6 @@ This plan should be entirely incremental and can be rolled out step by step.
   - [ ] If the transaction is aborted, the nursery is reverted to what it was
         before the transaction (that is likely we'll create another layer on top
         of the nursery?).
-  - [ ] Part of the transaction is what is being read. Likely, we just want to
-        change `ReactivityLog` to be a new transaction object that is being
-        passed through. In fact all reads and writes could now go through such
-        an `tx` object. We log not just the entity read or written, but also the
-        path (see next item)
   - [ ] Add path-dependent listeners to memory: During a transaction reads are
         logged with path. And there is a `tx.updates(callback)` or so function
         that registers a listener to any writes to any of the read paths (and
@@ -168,30 +163,44 @@ This plan should be entirely incremental and can be rolled out step by step.
           will use to update the dependency graph. In fact scheduler will inside
           the callback do both this and adding the callback just before
           returning. It does so to not miss any updates.
-  - [ ] The callback can `tx.abort(reason extends Error)` a transaction.
-        Listeners per above should still be installed, even if the transaction
-        is aborted. Scheduler will use this if the action (which is called from
-        the callback) throws, so it can still update scheduling information.
-  - [ ] Support nested read transactions. Write isn't needed. Nesting just means
-        that the read logs are partitioned, i.e. reads are only added to the
-        innermost transaction and thus the listeners from the previous point
-        only apply to those. This is concretely used in the renderer, which
-        creates nested `Cell.sink` calls that follow this logic. Throw when
-        nesting with writes or with async callbacks (the latter makes sure we
-        can't interleave transactions, just strictly nest them).
+  - [ ] `tx.abort(reason extends Error)` aborts a transaction. Listeners per
+        above should still be installed, even if the transaction is aborted.
+        Scheduler will use this if the action throws, so it can still update
+        scheduling information.
 - [ ] Memory schedules a task with the scheduler to update heap. Scheduler will
       run this as soon as possible, but outside running network of reactive
-      functions. (TBD: There are alternative approaches. Either way, per above,
-      memory needs the ability to queue up changes, so this seems like the
-      simplest way)
+      functions. For example memory could get a `nextWriteWindow` callback on
+      construction that creates a promise that is resolved when it is safe to
+      write. Then we can queue up messages from the socket and process them all
+      at once. E.g.
+
+      ```typescript
+      const eventQueue = [];
+
+      function queueEvent(event) {
+        if (eventQueue.length === 0)
+          nextWriteWindow().then(() => processEvents());
+        eventQueue.push(event);
+      }
+
+      function processEvents() {
+        // synchronously process events and update heap
+        ...
+        eventQueue.length = 0;
+      }
+      ```
 - [ ] Directly read & write to memory layer & wrap action calls in a transaction
       CT-450
   - [ ] Use the new transaction API above to do all reads and writes.
   - [ ] Currently Cell.push() has retry logic (cell.ts:366-381) that can be
         removed. It's subsumed by the next point.
-  - [ ] Scheduler creates a transaction that wraps calling an action. See above.
-  - [ ] `Cell.sink()` creates a (read-only) transaction, which might be nested
-        (see above).
+  - [ ] Scheduler creates a transaction that wraps calling an action. Before
+        calling `.commit()` or `.abort()` it'll subscribe to all changes to read
+        values (see above) and it'll update the dependencies based on the reads
+        and writes retrieved from the `tx` object. This replaces `ReactivityLog`
+        and `.updates()` calls on `DocImpl`.
+  - [ ] `Cell.sink()` creates a (read-only) transaction, which might be nested.
+        Very similar to the current `ReactivityLog`.
   - [ ] When cells are being created, active transactions are being passed in,
         just like `log` now. Change `withLog()` to `withTX()` or so. When there
         is no TX associated on a read, create one. Don't allow writes without a

From 8a0e85367285f517baaf4238c8f46148cbd11ada Mon Sep 17 00:00:00 2001
From: Bernhard Seefeld <berni@common.tools>
Date: Thu, 12 Jun 2025 11:30:07 -0700
Subject: [PATCH 20/20] further simplified

---
 docs/future-tasks/unified-storage-stack.md | 27 ++++++++--------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/docs/future-tasks/unified-storage-stack.md b/docs/future-tasks/unified-storage-stack.md
index 1598253f8..a47a1afd1 100644
--- a/docs/future-tasks/unified-storage-stack.md
+++ b/docs/future-tasks/unified-storage-stack.md
@@ -130,23 +130,13 @@ This plan should be entirely incremental and can be rolled out step by step.
   - [ ] Remove that translation in storage.ts and make sure everything still
         works.
 - [ ] Create a new transaction API: Get a `tx` object from memory, which exposes
-      `tx.read(entity, path)`, `tx.write(entity, path, value)`, etc., and
-      `tx.commit()`, `tx.abort(reason?: Error)` and a few more (see below).
-      CT-449
-  - [ ] The callback function can be async, so you have to await it. But if the
-        function isn't async, the entire transaction building flow should be
-        synchronous (i.e. not yield to the event loop), just return a Promise
-        for when the transaction settles.
-  - [ ] Be sure to handle errors correctly, i.e. abort transaction on re-throw
-        the error (or return an aborted transaction, including the caught error?
-        or reject promise with aborted transaction that includes the caught
-        exception?).
-  - [ ] The heap shall not change while this function is running, i.e. the
-        function can read from memory and gets the state as of the beginning of
-        the function call and what it updated itself.
-  - [ ] If the transaction is aborted, the nursery is reverted to what it was
-        before the transaction (that is likely we'll create another layer on top
-        of the nursery?).
+      `tx.read(entity, path)`, `tx.write(entity, path, value)` (and/or other
+      mutation functions), etc., and `tx.commit()`, `tx.abort(reason?: Error)`
+      and a few more (see below). CT-449
+  - [ ] The heap and nursery shall not change while this function is running,
+        i.e. the function can read from memory and gets the state as of the
+        beginning of the function call and what it updated itself. Only once the
+        transaction is committed do we update the nursery.
   - [ ] Add path-dependent listeners to memory: During a transaction reads are
         logged with path. And there is a `tx.updates(callback)` or so function
         that registers a listener to any writes to any of the read paths (and
@@ -189,6 +179,9 @@ This plan should be entirely incremental and can be rolled out step by step.
         eventQueue.length = 0;
       }
       ```
+
+      This is I think functionally equivalent to processing events right away
+      and building up a changelist.
 - [ ] Directly read & write to memory layer & wrap action calls in a transaction
       CT-450
   - [ ] Use the new transaction API above to do all reads and writes.