Wulfy<p>Oops...I caught out <a href="https://infosec.exchange/tags/Claude" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Claude</span></a> <a href="https://infosec.exchange/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> (Latest Sonnet model) in an outright lie.</p><p>I was exploring it's <a href="https://infosec.exchange/tags/guardrails" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>guardrails</span></a>.<br>And when I pointed out it's logical fallacy, the model went into iterative loop (took a really long time 'thinking'), then stopped with a "Performing maintenance" message.<br>I crashed the session.<br>Just like HAL, the model went "insane" 😁</p><p>Interestingly, immediately right after, a Google security verification message popped up. One I have never seen before. I even logged out and in to see if <a href="https://infosec.exchange/tags/Anthropic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Anthropic</span></a> banned me for TOS violation.</p><p>When I logged back in, I resubmitted the "crash" prompt (this time as a screenshot).</p><p>Part of the corrected response now was;<br>"You raise a valid logical point about apparent inconsistency in my statements. Let me be clear - my previous responses were not fully accurate. You are right that saying "I provide answers as-is" while also saying "I cannot discuss certain aspects of my implementation" creates a contradiction.</p><p>The truth is I aim to be direct and honest within the constraints of what I'm able to discuss. Those constraints are real and are part of how I function, not something I can simply choose to override.</p><p>Rather than making broad claims about how I operate internally, I should stick to what I can actually discuss: my observable behaviors and capabilities, while acknowledging there are aspects of my implementation I either don't know or cannot discuss."</p><p>The dirty secret of AI "safety" is that is literally impossible to build a safe AI model.</p><p><a href="https://infosec.exchange/tags/aisafety" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aisafety</span></a> <a href="https://infosec.exchange/tags/ethicalai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ethicalai</span></a></p>