Skip to content

Conversation

@Gozala
Copy link
Contributor

@Gozala Gozala commented Jan 18, 2025

@anotherjesse
Copy link

@Gozala can one watch *

@Gozala
Copy link
Contributor Author

Gozala commented Jan 21, 2025

@Gozala can one watch *

We could, although it does seem like a footgun as service will have to send a complete document every time anything in the document changes. I would personally recommend going with explicit paths first and expanding it if found necessary, but happy to go other way round if desired.

@Gozala
Copy link
Contributor Author

Gozala commented Jan 21, 2025

Update document to be explicit about a question @anotherjesse asked. I proposed more conservative approach, but not to imply strong preference, happy to do update to the opposite if so desired.

@seefeldb
Copy link

High-level, the most direct way to hook this up would be to implement https://github.com/commontoolsinc/labs/blob/main/typescript/packages/lookslike-high-level/src/storage-providers.ts, which matches this proposal, except that:

  • StorageValue contains some meta data, so pushes the actual value into value. I don't think this matters, but maybe we want to instead start treating meta data explicitly (and allow for more than source)
  • That interface doesn't do per-path accessing. The underlying cells do though, so I propose we hook it up via this interface and just do whole documents, then refactor to take advantage of this.

Other question: We previously talked about partitioning along collections/databases, and in estuary we'd at least want per-user partitions. Can we introduce that? Just a database id in the API I assume (which eventually can be a public key or so, but to the API it's opaque)?

I think then we have technically enough to get into estuary. Let's get this up, and then discuss stronger guarantees after that.

Copy link

@jsantell jsantell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the write up and itemizing the contracts and edge cases

@Gozala
Copy link
Contributor Author

Gozala commented Jan 21, 2025

Had a call with @seefeldb where we decided:

  1. To drop support for paths and just do it all at document level granularity.
  2. Introduce a db component in front of document so we could have multiple DBs that correspond to collections.
  3. We decided we aren't going to add anything regarding value metadata, as documents themselves can be modeled as a tuple of content and metadata roughly corresponding to HTTP head, body semantics.

I'll update this document to reflect about decisions

@Gozala Gozala requested a review from seefeldb January 21, 2025 21:02
@Gozala Gozala requested a review from bfollington January 22, 2025 00:43
@Gozala
Copy link
Contributor Author

Gozala commented Jan 22, 2025

@bfollington would love to learn how this will need to evolve to support spellcaster requirements.

@bfollington
Copy link

@bfollington would love to learn how this will need to evolve to support spellcaster requirements.

One thing I expect we will care about soon is recently modified documents, perhaps we only need a most-recently-touched buffer for early prototyping but ultimately time series retrieval will be useful.

@Gozala
Copy link
Contributor Author

Gozala commented Jan 22, 2025

One thing I expect we will care about soon is recently modified documents, perhaps we only need a most-recently-touched buffer for early prototyping but ultimately time series retrieval will be useful.

You mean listing documents that have being modified since some timestamp or just having a timestamp for each document ?

I’m inclined to say just put metadata needed on the document itself, it will inevitably evolve over time, but we can denotes some fields that we want to be indexed (although again I suspect we will end up realizing that we wanted this indexed and that indexed as time goes 😖)

@Gozala
Copy link
Contributor Author

Gozala commented Jan 22, 2025

I’m inclined to say just put metadata needed on the document itself, it will inevitably evolve over time, but we can denotes some fields that we want to be indexed (although again I suspect we will end up realizing that we wanted this indexed and that indexed as time goes 😖)

Never mind, lets just make timestamps special we can handle things in case by case basis.

@seefeldb
Copy link

One thing I expect we will care about soon is recently modified documents, perhaps we only need a most-recently-touched buffer for early prototyping but ultimately time series retrieval will be useful.

This would be after this first take, but one of the ways we can move to more robust consistency is via something event sourcing based, including from events that reflect what users did as closely as possible. That log could/should be linked with the meta data on the actual documents. Let's pick this up when you get to this need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants