2025年10月4日 星期六

test case for AGI. "Floored you". the twist will floor you

 test case for AGI. the twist will floor you

AGI 的測試用例。其中的轉折會讓你大吃一驚

AGI de cèshì yònglì. Qízhōng de zhuǎnzhé huì ràng nǐ dàchīyījīng

"Floored you" means "you are confused or surprised" to the point where you don't know what to say or do. It's an idiom that implies a sudden, overwhelming emotion, whether positive or negative. 
How to understand "floored you"
  • Surprise or confusion: 
    The core meaning is to be so taken aback by something that you're momentarily unable to think clearly or respond. 
  • A powerful impact: 
    The term suggests an intense reaction, as if you were physically knocked down. 
  • Context is key: 
    The phrase can be used in situations of:
    • Astounding news: "His confession completely floored me". 
    • Overwhelming situations: "I was floored by the complexity of the problem". 
Examples of usage: 
  • "When she told me she was moving to Australia, it just floored me."
  • "I didn't know what to say; I was completely floored."
BREAKING: Sam Altman just proposed a new test for AGI. And it makes the Turing Test look like a complete joke.
During a conversation in Berlin, the OpenAI CEO asked physicist David Deutsch a mind-bending question:
"If GPT-8 solved quantum gravity and could explain HOW and WHY it did it, would that prove we've achieved AGI?"
Deutsch's answer? Yes.
Think about that for a second. We're not talking about passing a chat test or scoring high on benchmarks anymore. We're talking about AI solving one of the hardest problems in physics - and being able to explain its reasoning.
Here's what actually happened:
Altman sat down with David Deutsch - the legendary physicist who pioneered quantum computing - and they completely dismantled the way we currently measure AI intelligence.
Both agreed on one thing: The Turing Test is worthless.
Deutsch's reasoning is brutal:
"Large language models can hold conversations without true creativity or deep reasoning. Passing the Turing Test doesn't prove anything anymore."
He's right. ChatGPT already passes the Turing Test. So does Claude. So does Gemini.
But none of them are AGI.
The test we've used for decades to measure machine intelligence has become meaningless because AI got too good at mimicking human conversation.
The quantum gravity benchmark - why this is genius:
Altman isn't just asking "can GPT-8 solve quantum gravity?"
He's asking: Can it solve quantum gravity AND explain the chain of reasoning that led to the solution?
That's the difference between:
→ An AI that spits out answers (what we have now)
→ An AGI that actually understands what it's doing (what we don't have yet)
Here's what Altman actually wants to see:
☑ The problem-solving process
☑ How the model framed the problem
☑ Which decisions it made along the way
☑ Why it prioritized certain approaches
☑ The complete narrative of reasoning
Not just "here's the answer to quantum gravity."
But "here's how I discovered it, why I chose this path, and what my reasoning was at every step."
That's not mimicry. That's genuine scientific creativity.
Why this changes everything about AI benchmarks:
For years, we've measured AI progress with performance tests:
→ Can it pass the SAT?
→ Can it beat humans at chess?
→ Can it score higher on medical exams?
→ Can it write better code?
Altman and Deutsch just said: All of that is irrelevant.
The real test isn't performance. It's scientific creativity - the ability to create NEW knowledge that humans haven't discovered yet.
And not just create it, but explain HOW you created it.
The explanation requirement is the key:
Anyone can train an AI to produce outputs. That's what we've been doing for years.
But can you build an AI that understands WHY it produced that output? That can walk you through its reasoning? That can justify its decisions?
That's a completely different challenge.
And it's what separates intelligence from pattern matching.
.
.
Here's what's really happening behind the scenes.
AI leaders are running out of meaningful tests.
Every time they create a benchmark:
→ GPT-4 beats it
→ Claude beats it
→ Gemini beats it
→ The next model crushes it even harder
IQ tests? Maxed out.
Domain exams? Aced them.
Turing Test? Obsolete.
Coding benchmarks? Dominated.
Traditional performance metrics can't capture what AGI actually means anymore.
So Altman and Deutsch are proposing something radically different: Test for scientific breakthrough + explanatory reasoning.
But here's where it gets messy:
Some critical questions remain unanswered:
Is solving quantum gravity really SUFFICIENT proof of AGI? Or is it just one narrow domain of intelligence?
Can we even verify that an AI's explanation of its reasoning is genuine? Or could it just be generating plausible-sounding explanations after the fact?
What if models develop "black box" reasoning paths that are too complex for humans to understand?
And here's the big one: Will GPT-8 (or any future model) realistically be capable of solving quantum gravity?
That's pure speculation right now.
We've been measuring AI progress with the wrong yardstick for years.
Performance benchmarks made us feel like we were getting closer to AGI. But we were just getting better at training AI to pass human-designed tests.
Altman and Deutsch are saying: Real AGI isn't about passing tests. It's about making scientific breakthroughs that expand human knowledge.
And not just making them - but being able to explain the reasoning that led there.



沒有留言:

張貼留言

注意:只有此網誌的成員可以留言。