7 Comments
User's avatar
Daniel Parshall's avatar

I like the approach, but think the framing needs a small addition:

- the problem should also be solvable by different AI (or a new instance of this one)

That prevents "what have I got in my pocket?" style riddles, or acrostics that are just computationally burdensome without quite proving the point.

theahura's avatar

yea I reached the same conclusion over the weekend -- added an addendum to that effect

Grant Jenks's avatar

You could choose nearly any problem in cryptography. For example:

RSA-2048 factorization

Given

An RSA modulus:

N (hex) =

0x4e19c0635afbc311cd01e55fc162044a7592f1111f3063e8ca54550f3470b5476778f8d93ff6af5f485297417df19b9151ba40d4bc09ebcf668887584991007803c70e7d5bbfec4f547f8ebd047978dd3f8d21dc0b50c6b8003eceaf3f44e99508a36d6aaf1415ea5fe2652665087c9cdb1449af251307355f8d26a5809a6d2a8295fa8341408a0c076080559af507da6ee26752c0b7ff05ad2da63b293661f12238a65010399f52b2510626ba1f135e4a13e3fde2fd83b7ddf9a4c268633eb1a20de415a73c5425de4a0d4bbb0951207e3c8af11cf4dc99241e32f340c1a5c2d7847a325a034502f8ae1fd60d098d28e36f43fa509b066c2d6b7c2906af993b

Task

Find primes p, q such that p*q = N.

Verification (deterministic)

Check int(p,16) * int(q,16) == int(N,16).

I think it’s reasonable to believe an ASI could solve that where no current system could.

theahura's avatar

strong agree. And I do think that would be a useful upper bound. But i'm hoping we can come up with exams that fall further down the curve so that we can start generating useful insights and measurements _before_ we build a system that can crack rsa-2048!

Polici's avatar

oh that was cool.

larkejbglerhkbglearh's avatar

i think that the responses indicate that the context window is kind of a big confounder. probably the easiest class of problem to create is one where the AI's ability to solve it comes from the fact its context window massively out-sizes human working memory. that would genuinely be a problem that's impossible for even the smartest human to solve due to their cognitive deficit relative to AI, but i don't think really matches the spirit of superintelligence as we understand it. maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?

theahura's avatar

> maybe its just a question of trying to get the AI to come up with a problem that more closely matches the "spirit" of the benchmark?

Yea I think that's the issue with the prompt as written. I try to emphasize in the essay that the point is to capture questions that humans cannot solve even if they had a lot of time and a lot of resources, so its not just a simple memory task