A thought on prompt injections. Could this defensive countermeasure work?

Before sending off a prompt, hash sign it using an ... MCP-prompt sign endpoint.

Then within the prompt ask the "agent" once completed with it's job, always use the MCP-prompt sign endpoint to sign what it believes is it's current prompt.

Once the LLM has completed processing, signed it's "current prompt", the original requestor can compare the two signed hashes.

I know I'm missing stuff here, but might this be worth exploring?

#LLM #AI #PromptInjection

Follow

@nopatience I think one problem you iwll get here is that you are asking it to "always" do something. It will not understand what "always" means, it will just do what it normally does, it will output something that looks plausible at first sight.

Basically you have made the mistake to try to apply logic. You have not yet learned to fully embrace our "AI"-assisted future where logic is no longer used! 😉

@eliasr
Which is why I thought an MCP action would be deterministic. But I guess that doesn't mean it will allow the signed hash to stay non-manipulated. :)

I don't know. I'm reasoning without thought. Much like an LLM... Wtf... Am I... No... An LLM?!

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml