A new technique for LLMs has just landed: Explainable training!
Let me *explain*.
Normal supervised training works so that you show ground truth inputs and outputs to a model and then you backpropagate the error to the model weights. All in this is opaque black box. If you train with data which contains for example personally identifiable information (PII) or copyrighted contents, those will plausibly be stored verbatim in model weights.
What if we do it like this instead:
Let's write initial instructions to an LLM for generating synthetic data which resembles real data.
Then we go to the real data, and one by one show an LLM an example of a real data, and an example of the synthetic data, and the instructions used to generate the synthetic data. Then we ask it to iteratively refine those instructions to make the synthetic data resemble real data more, in the features and characteristics which matter.
You can also add reasoning parts, and instructions for not putting PII as such into the synthetic data generation instructions.
This is just like supervised learning but explainable! You'll get a document as a result which has refined instructions on how to generate better synthetic data, informed by real data, but now it's human readable and explainable!
You can easily verify that this relatively small document doesn't contain for example PII and you can use it to generate any volume of synthetic training data while guaranteeing that critical protected details in the real data do not leak into the trained model!
This is the next level of privacy protection for training AIs!