Skip to content

Conversation

@anotherjesse
Copy link
Contributor

@anotherjesse anotherjesse commented Sep 5, 2024

cache to json on disk instead of in-memory

  • cache lives between restarts
  • facilitate debugging by viewing the json files
  • purge request(s) from cache by deleting file(s)

Copy link
Contributor

@bfollington bfollington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, I haven't had time to revisit this repo but lots of the model around Threads is probably unnecessary bloat from my current POV. Feel free to rip it up as needed.

Also, we might also benefit from looking at Anthropic's prompt caching https://www.anthropic.com/news/prompt-caching

@anotherjesse
Copy link
Contributor Author

@bfollington good callout on the prompt caching - although I think it might require us to be semi-stable / additive in prompt generation (meaning as we know more, are we able to just append the new content vs inserting new content before - which would break the prompt cache)

I'm guessing we will want to have hints from the caller .. (don't worry about caching this vs this content will be used a bunch in the next 5 minutes...)

Hmm.. given that prompt caching has a limited lifetime, we could do something like "the second time we see with the same first 2k tokens of context within 5 minutes" we switch to cache mode..

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-limitations

or ...

Prompt Caching introduces a new pricing structure where cache writes cost 25% more than base input tokens, while cache hits cost only 10% of the base input token price.

perhaps we just default to it, and monitor the cache hit rate ... emitting warnings when hit rate is low

@anotherjesse anotherjesse merged commit e19e0c8 into main Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants