Implement a prototype of CFC #1140

ubik2 · 2025-05-07T23:57:56Z

This is mostly plumbing to pass classification data through the system and enforce it.

As a concrete demonstration, the tokens used for OAuth are marked "secret" so they cannot be accessed over the network by a schema query that doesn't include the necessary classification claims.

Summary by mrge

Added classification enforcement to the data system, so that data marked as "secret" or "confidential" cannot be accessed without the correct claims. Updated schema handling, storage, and tests to support classification tags and authorization checks.

New Features
- Enforced classification tags in schema queries and storage.
- Added support for "secret" and "confidential" labels in schemas and data.
- Introduced a CLI tool for reading and writing classified data.
- Updated OAuth and Gmail recipes to mark tokens as secret.
Bug Fixes
- Fixed schema traversal to correctly handle classification and additionalProperties.
- Improved test coverage for classified data access and authorization errors.

…to resultSchema.

Update a few "satisfies JSONSchema" to use as const. Tag the AuthSchema for google-oauth

add labels field to StorageValue to track classification tags better boolean schema handling in cfc and general api cleanup break out loadDocFacts from selectSchema function to avoid megafunction made SchemaQueryArgs its own type

fixed schema empty check

- Handle classification tags in selectSchema - Pass AuthorizationErrors through - Removed includedQueryView helper, since RemoteStorageProvider no longer exists - Added classification tests.

…ved or true if it should be retained. Copy the set of subscribers before iterating, since a subscriber could trigger another subscription. Fix an issue in provider where we weren't including the space in our fact address, and weren't matching some records. Fixed a potential deadlock in provider where our subsequent querySchema while in the perform loop seemed to deadlock. Bypassing the perform queue here addresses this (we're already in the perform block). Minor change to subscription to avoid shadowing a variable name Added a comment to remind me to follow up and check that we are getting the subscription updates properly on the client.

- opaque-ref.ts - Use ContextualFlowControl to walk schemas, since this needs to track any ifc tags encountered. - Rename one of the schema variables to nestedSchema to avoid shadowing the other. - Use our values of path and rootSchema when exporting, rather than those of the store. - recipe.ts: When constructing an opaqueRef, if our argumentSchema is just a string, don't pass that as the schema. - types.ts: Added the definitions field for future $ref implementation - opaque-ref-schema-test.ts: Changed provided schema to exclude additional properties. The JSONSchema default is to allow arbitrary keys on an object, but I wasn't previously handling this correctly. - recipe.test.ts: Correctly expect the ifc tags in the schema - iterate.ts: Changed scrubbed schemas to have additionalProperties false to disable other keys. - space-schema.ts: Moved isTrueSchema into ContextualFlowControl and use that class for this check. - gmail.tsx: include the Classification constants here directly - cell.ts - Use ContextualFlowControl class to descend the schema - Pass schema and rootSchema around much more - cfc.ts - Moved the isTruSchema and isInternalSchemaKey methods here - Added a getSchemaAtPath method, which better matches the expected interface for the rest of the system. - Allow lubSchema to return undefined if there is no classification requirement. - Corrected behavior when we encounter properties of an object which aren't included in the properties and when we don't have the additionalProperties property. - Handle the $ref of `#`, which requires passing the rootSchema in as well. - doc.ts - Added JSONSchema to setAtPath method, since we need that for the logs and callbacks. - Changed the callback signature for the updates, to have an optional labels argument (which contains any tags we store in the label). - In setAtPath, if the object changed, generate the appropriate labels prior to invoking callbacks. - schema.ts - Use ContextualFlowControl to detect the true schema, instead of checking for the empty object. - Pass schemas to resolveLinkToAlias and resolveLinks - Fix some cases where we didn't have the correct schema when generating cells in validateAndTransform. - Altered some places where we log reads to include more of the ref, since we may want to know about the schema. - Use ContextualFlowControl classes instead of walking the schema directly when walking the path. - storage.ts - Added Labels class - Changed _batchForStorage to take an optional labels parameter, which will then be included in the StorageValue. - Changed _subscribeToChanges so our callback gets the labels. - utils.ts - Added schema and rootSchema parameters to resolveLinkToValues, resolveLinkToAliases, and resolvePath. - Use ContextualFlowControl for schema walking. - Pass more complete CellLinks to the read log - Removed export of ChangeSet type, since this wasn't used elsewhere. - Generally track schema and rootSchema more in CellLinks. - When generating the change for an array length alteration, include any of the classification values that would have applied to the array. - base.ts: Changed labels to be the Labels type instead of just a string[], to be more extensible in the future. - cache.ts - Include deepEqual to compare label changes - If a StorageValue has any labels, generate a write to the `application/label+json` attribute of that entity. - schema-lineage.test.ts: Provide the additionalProperties: false schema entry when we want to exclude unexpected keys. schema.test.ts: Changed a couple tests to account for the schema in the result. utils.test.ts: Changed a test to account for a schema in the result.

cubic-dev-ai

mrge found 18 issues across 35 files. Review them in mrge.io

packages/runner/test/recipes.test.ts

packages/builder/src/opaque-ref.ts

packages/runner/src/schema.ts

packages/memory/space.ts

packages/runner/src/storage/cache.ts

packages/memory/interface.ts

…hema.

ubik2 · 2025-05-08T02:22:34Z

Integration tests are failing, but I think this is unrelated.

Error: Browser Console [error]: Search charms error: HTTP error! status: 400, body: {"error":"No response from LLM"}

Edit: Also, I wrote this error off because there were a bunch of other PRs with red status, but those aren't running into the same error as me, so I may have something else broken. Investigating more.
Edit2: Thanks to Jordan for tracking this down. This is because a schema change meant we weren't able to use cached LLM responses, so those needed to be regenerated. Jordan took care of that, and things are working again.

jsantell · 2025-05-08T16:23:16Z

Integration tests are failing, but I think this is unrelated.
Error: Browser Console [error]: Search charms error: HTTP error! status: 400, body: {"error":"No response from LLM"}

This is error is in previous (successful) CI runs FWIW, not from this PR

* add log message to debug on server (works locally) * log before and after the llm request * Update LLM cache * Remove logging that was added to debug * Better error message in integration tests when workflow fails. --------- Co-authored-by: Jordan Santell <jordan@common.tools>

seefeldb · 2025-05-12T17:22:55Z

packages/charm/src/iterate.ts

+          ? {
+            ...data.schema,
+            properties: scrubbed,
+            additionalProperties: false,


isn't additionalProperties: false the default? I saw you added that here and in a few other places, curious why.

It isn't the default, though I thought it was prior to this patch.
The text in the doc isn't very direct, but leaving it out is the same as the empty schema (which always matches).
https://json-schema.org/draft/2020-12/json-schema-core#section-10.3.2.3-5

Omitting this keyword has the same assertion behavior as an empty schema.

The guide page is a bit more direct:
https://json-schema.org/understanding-json-schema/reference/object#additionalproperties

By default any additional properties are allowed.

Oh!, thanks for clarifying, I did indeed have it the other way around. Sigh. Makes schemas even longer by default…

Oh, or maybe we should flip it in our case, at least for the query case: true would mean that we want to always read all other fields, which maximizes taint unless this is specified. On write it means that any other property is accepted, which is less problematic, but also not a great default behavior.

We need to start documenting our version of schema and the differences to json schema and openapi schema.

I'm inclined to align with the standard here, even though it's more verbose.
We do have the scrubbed versions of our schema, and if we wanted to use that as the transformer between ours and the standard, we could, However, I think it's likely to cause confusion, especially as we use the LLMs to generate schema.

ubik2 added 23 commits April 18, 2025 13:47

commit wip for a sanity check

ba5860c

Added Tarjan's to enabled SCCs if we need.

451f43f

Change back to ifc as the schema property.

d1fb1e4

Merge branch 'main' into feat/cfc

50d89e8

Propagate ifc properties through node graphs and from argumentSchema …

530bd98

…to resultSchema.

replace FIXME with a tagged TODO

9c5f28e

Remove cfcTest, since it's not useful enough

67750ae

Merge branch 'main' into feat/cfc

b3bedfd

Make constants for Classifications.

0cd0929

Update a few "satisfies JSONSchema" to use as const. Tag the AuthSchema for google-oauth

Merge branch 'main' into feat/cfc2

567c0e4

add test that result schema has had the confidential flag added

666b8d5

submit classification assertions in our storage pull

9b7d90b

add labels field to StorageValue to track classification tags better boolean schema handling in cfc and general api cleanup break out loadDocFacts from selectSchema function to avoid megafunction made SchemaQueryArgs its own type

Merge branch 'main' into feat/cfc2

cd10dd5

fixed unintentional removal of export

e834723

fixed schema empty check

minor cleanup

c44c166

check classification from labels

f0aa7fa

More changes to support classification

644f3b2

- Handle classification tags in selectSchema - Pass AuthorizationErrors through - Removed includedQueryView helper, since RemoteStorageProvider no longer exists - Added classification tests.

removed stray log statement

0b9b777

Added a command line utility to let me get and put data to storage.

7a54971

Allow passing the spaceName in the URL

e1827d6

Merge branch 'main' into feat/cfc2

62f0d89

ubik2 temporarily deployed to ci May 8, 2025 00:00 — with GitHub Actions Inactive

cubic-dev-ai bot reviewed May 8, 2025

View reviewed changes

ubik2 had a problem deploying to ci May 8, 2025 00:01 — with GitHub Actions Failure

ubik2 temporarily deployed to ci May 8, 2025 00:01 — with GitHub Actions Inactive

Use the rootSchema argument as the rootSchema for getting the childSc…

bf4b159

…hema.

ubik2 temporarily deployed to ci May 8, 2025 00:22 — with GitHub Actions Inactive

ubik2 had a problem deploying to ci May 8, 2025 00:24 — with GitHub Actions Failure

ubik2 temporarily deployed to ci May 8, 2025 02:06 — with GitHub Actions Inactive

ubik2 had a problem deploying to ci May 8, 2025 02:08 — with GitHub Actions Failure

ubik2 temporarily deployed to ci May 8, 2025 02:08 — with GitHub Actions Inactive

Use $defs instead of older definitions property

5180e20

ubik2 temporarily deployed to ci May 8, 2025 02:26 — with GitHub Actions Inactive

ubik2 had a problem deploying to ci May 8, 2025 02:28 — with GitHub Actions Failure

ubik2 temporarily deployed to ci May 8, 2025 02:28 — with GitHub Actions Inactive

ubik2 requested review from Gozala, jsantell and seefeldb May 8, 2025 02:39

ubik2 had a problem deploying to ci May 8, 2025 16:42 — with GitHub Actions Failure

ubik2 had a problem deploying to ci May 8, 2025 17:08 — with GitHub Actions Failure

exclude seeder templates from deno fmt action

90a7bfc

ubik2 temporarily deployed to ci May 8, 2025 17:19 — with GitHub Actions Inactive

ubik2 had a problem deploying to ci May 8, 2025 17:21 — with GitHub Actions Failure

ubik2 temporarily deployed to ci May 8, 2025 17:21 — with GitHub Actions Inactive

ubik2 temporarily deployed to ci May 8, 2025 18:56 — with GitHub Actions Inactive

ubik2 temporarily deployed to ci May 8, 2025 18:58 — with GitHub Actions Inactive

Back out this stringify, since we have a better solution available now.

7e2c203

ubik2 temporarily deployed to ci May 8, 2025 19:43 — with GitHub Actions Inactive

ubik2 temporarily deployed to ci May 8, 2025 19:45 — with GitHub Actions Inactive

seefeldb reviewed May 12, 2025

View reviewed changes

seefeldb approved these changes May 12, 2025

View reviewed changes

ubik2 merged commit cd72178 into main May 12, 2025
6 checks passed

ubik2 deleted the feat/cfc2 branch May 12, 2025 21:28

Implement a prototype of CFC #1140

Implement a prototype of CFC #1140

Uh oh!

Conversation

ubik2 commented May 7, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by mrge

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ubik2 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsantell commented May 8, 2025

Uh oh!

seefeldb May 12, 2025

Choose a reason for hiding this comment

Uh oh!

ubik2 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

seefeldb May 12, 2025

Choose a reason for hiding this comment

Uh oh!

seefeldb May 12, 2025

Choose a reason for hiding this comment

Uh oh!

ubik2 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ubik2 commented May 7, 2025 •

edited by cubic-dev-ai bot

Loading

ubik2 commented May 8, 2025 •

edited

Loading