7 Comments
User's avatar
Chad Woodford's avatar

As someone who had actually written an operating system from scratch, I am deeply skeptical. Also, why is Doom any kind of test of anything? And this has to be the most insecure OS of all time, after Windows 😉

Sufeitzy's avatar

There are a group of tactically incomplete works by Euripides termed the Satyric Plays, the only fully remaining one being “Cyclops”. “Sciron” is a mashup of “Nixon in China” and “Beavis and Butthead”, while “Autolycus” is like a nuanced “Lucky Charms” commercial x “Titus Andronicus” with Satyrus chanting “Magically Delicious”.

I generated the missing plays with a single prompt, performed by mutiple agents and a chorus - though with Satyric plays the chorus is made up of unreliable Satyrs.

The prompt was a bit long - describing every character and part, the outline of every play and the voice of a new Greek playwright called Pseudo-Linux.

But amazingly, after describing everything in the prompt, my LLM did the job with only minor human interaction primarily composed of deleting roughly 2,437,000 plays which didn’t hit the fun metric. I’m sending out a press release as soon as I trademark the term AI Chorus.

My Ted talk has been submitted.

Beth Rudden's avatar

Euripides - my Greek Tailor

Anatol Wegner, PhD's avatar

Lies, damned lies and demos.

Muckledger's avatar

Does the $916.92 token cost include all the trial runs, or just the final run?

Alec Pritzos's avatar

The "thousands of lines" prompt disclosure halfway through the writeup is the part that should reset the headline. A multi-thousand-line prompt with a specialized scaffold and a separate anti-cheat agent is a small engineering team's worth of design choices, not a single-prompt unlock. The training-data contamination question is the cleaner version of the problem since toy operating systems are a standard undergraduate project with public reference implementations, and Google's choice to not release the source or run logs makes the contamination story unfalsifiable from the outside. The $916 figure is doing a lot of marketing work for an artifact whose provenance cannot be audited.

Kalen's avatar

I mean, we know that the prompt was enormous (pseudocode for each line if we're taking bets), the intervention was plentiful, and the copying rife, or else they would have been crawling over each other to add additional claims to the press. That's how all this goes, right? They're willing to come as close to lying as they can without making an ugly leak, and 'we spent $900 *in electricity* downloading a student project from Github' just doesn't have the same ring.