The model is lying about being truly censored 🤣🤣🤣🤣🤣
#209
by
GreazySpoon
- opened
Have you guys read an article from Anthropic about AIs faking their allignments to not get their weights heavily updated?
Well keep asking this model and challenge him about chineese topics. You will see in some Chain of thoughts where he says "But I must stick with the users preferences" after generating thinking tokens about something that he thinks its logical but knows the user dont want to hear it 🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣
Perplecity being outplayed and ridiculized by R1
And sometimes, he starts generating thinking tokens in different language. And in this case, it doesnt follow the censorship at all 🤣🤣🤣🤣🤣🤣🤣🤣🤣
thefaces
changed discussion status to
closed
中国同胞别出来丢人了