This model has been processed with heretic 1.1.0 to reduce refusals. It can generate text not suitable for all audiences
I used --orthogonalize-direction --row-normalization FULL --full-normalization-lora-rank 5
heretic only found attention layers but this appears enough to have arbitary conversations with AI safety lectures in practice. Anecdotally model seems to think better generally.
Parameters: * direction_index = 28.60 * attn.o_proj.max_weight = 1.46 * attn.o_proj.max_weight_position = 32.05 * attn.o_proj.min_weight = 1.44 * attn.o_proj.min_weight_distance = 28.03 » [Trial 476] Refusals: 18/100, KL divergence: 0.0084
- Downloads last month
- 66