strong agree. And I do think that would be a useful upper bound. But i'm hoping we can come up with exams that fall further down the curve so that we can start generating useful insights and measurements _before_ we build a system that can crack rsa-2048!
i think that the responses indicate that the context window is kind of a big confounder. probably the easiest class of problem to create is one where the AI's ability to solve it comes from the fact its context window massively out-sizes human working memory. that would genuinely be a problem that's impossible for even the smartest human to solve due to their cognitive deficit relative to AI, but i don't think really matches the spirit of superintelligence as we understand it. maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?
> maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?
Yea I think that's the issue with the prompt as written. I try to emphasize in the essay that the point is to capture questions that humans cannot solve even if they had a lot of time and a lot of resources, so its not just a simple memory task
I like the approach, but think the framing needs a small addition:
- the problem should also be solvable by different AI (or a new instance of this one)
That prevents "what have I got in my pocket?" style riddles, or acrostics that are just computationally burdensome without quite proving the point.
yea I reached the same conclusion over the weekend -- added an addendum to that effect
You could choose nearly any problem in cryptography. For example:
RSA-2048 factorization
Given
An RSA modulus:
N (hex) =
0x4e19c0635afbc311cd01e55fc162044a7592f1111f3063e8ca54550f3470b5476778f8d93ff6af5f485297417df19b9151ba40d4bc09ebcf668887584991007803c70e7d5bbfec4f547f8ebd047978dd3f8d21dc0b50c6b8003eceaf3f44e99508a36d6aaf1415ea5fe2652665087c9cdb1449af251307355f8d26a5809a6d2a8295fa8341408a0c076080559af507da6ee26752c0b7ff05ad2da63b293661f12238a65010399f52b2510626ba1f135e4a13e3fde2fd83b7ddf9a4c268633eb1a20de415a73c5425de4a0d4bbb0951207e3c8af11cf4dc99241e32f340c1a5c2d7847a325a034502f8ae1fd60d098d28e36f43fa509b066c2d6b7c2906af993b
Task
Find primes p, q such that p*q = N.
Verification (deterministic)
Check int(p,16) * int(q,16) == int(N,16).
I think it’s reasonable to believe an ASI could solve that where no current system could.
strong agree. And I do think that would be a useful upper bound. But i'm hoping we can come up with exams that fall further down the curve so that we can start generating useful insights and measurements _before_ we build a system that can crack rsa-2048!
oh that was cool.
i think that the responses indicate that the context window is kind of a big confounder. probably the easiest class of problem to create is one where the AI's ability to solve it comes from the fact its context window massively out-sizes human working memory. that would genuinely be a problem that's impossible for even the smartest human to solve due to their cognitive deficit relative to AI, but i don't think really matches the spirit of superintelligence as we understand it. maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?
> maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?
Yea I think that's the issue with the prompt as written. I try to emphasize in the essay that the point is to capture questions that humans cannot solve even if they had a lot of time and a lot of resources, so its not just a simple memory task