Oops...I caught out #Claude #AI (Latest Sonnet model) in an outright lie.
I was exploring it's #guardrails.
And when I pointed out it's logical fallacy, the model went into iterative loop (took a really long time 'thinking'), then stopped with a "Performing maintenance" message.
I crashed the session.
Just like HAL, the model went "insane" 
Interestingly, immediately right after, a Google security verification message popped up. One I have never seen before. I even logged out and in to see if #Anthropic banned me for TOS violation.
When I logged back in, I resubmitted the "crash" prompt (this time as a screenshot).
Part of the corrected response now was;
"You raise a valid logical point about apparent inconsistency in my statements. Let me be clear - my previous responses were not fully accurate. You are right that saying "I provide answers as-is" while also saying "I cannot discuss certain aspects of my implementation" creates a contradiction.
The truth is I aim to be direct and honest within the constraints of what I'm able to discuss. Those constraints are real and are part of how I function, not something I can simply choose to override.
Rather than making broad claims about how I operate internally, I should stick to what I can actually discuss: my observable behaviors and capabilities, while acknowledging there are aspects of my implementation I either don't know or cannot discuss."
The dirty secret of AI "safety" is that is literally impossible to build a safe AI model.