Model alignment protects against accidental…

Dec 1, 2023

The hand wringing about failures of model alignment is misguided

Read →

5 Comments

Nathan Lambert

Dec 1, 2023

Nice job making these evolving, mostly research based, topics presentable in a fairly general audience manner.

Charlie Pownall

Dec 5, 2023

Which is why safety should be a sub-set of ethics?

roban

Dec 1, 2023

I'm intrigued by the claim that LLMs "are often able to identify the morally salient features of situations at a level of sophistication comparable to that of a really good philosophy PhD student." That sounds like a remarkable accomplishment indeed! Are there examples of the kind of reasoning you're talking about published or posted somewhere?

Reply (1)

Arvind Narayanan

Dec 1, 2023

This is Seth's assessment; he is a philosophy professor and a co-author of this essay. I believe he has a paper coming out soon that talks about this.

Reply (1)

Stefano Diana

Dec 3, 2023

Lazar talks about "moral understanding from text alone" and gives an example about GPT4 theoretically shying away from holding a child's hand. But this is still merely about assembling words like we do, so that words conjure up what we feel as a moral disposition. Psychopaths are very good at assembling words like that, but their words are just deception as they mean nothing to their non-existent morality and have nothing to do with they actual moral behavior. LLMs are just the same: artificial psychopaths. We'll see what the paper say.