Fredgolm's Blog

alt text

Verification limits

Recently there were a few good articles on Verifier Rule which points out logically that some tasks are easier to verify than solve and some are the opposite. So AI is aiming at solving through RL post-training most of the verifiable tasks soon leave disadvantaged verifier asymetrical tasks to be automated as a long tail long term research.

RL here just a method of getting verification signal back to model weights. We ask model to try solving a task many times and when verifier is happy we update the weights according to the reward signal.

Tasks that are easier to verify than solve

Sudoku and Logic Puzzles: As mentioned in the article, solving a Sudoku puzzle requires navigating a large tree of possibilities and constraints. However, once the grid is filled, verifying the solution takes mere seconds—you simply check if every row, column, and box contains digits 1–9.
Software Engineering: Writing the code to build a complex platform (like Instagram) takes teams of engineers years of development. In contrast, verifying that the "solution" works can be done by a layperson in seconds simply by opening the app and seeing if the feed loads.
Cryptographic Hashing (Password Cracking): In computer security, finding a password that matches a specific "hash" is computationally expensive (often impossible without brute force). However, if someone provides a candidate password, the system can verify it instantly by running the hash function once.
Math Competition Problems: Solving a complex geometry or algebra problem might take hours of creative thinking and derivation. However, if you are given a proposed final answer (and potentially the steps), plugging the numbers back into the original equation to see if they hold true is often much faster.
Lock Picking vs. Key Usage: Physically, "solving" a lock without a key is a difficult skill requiring time and manipulation. "Verifying" the solution (using the correct key) is instantaneous—the lock either turns or it doesn't.

Tasks that are easier to solve than verify

Generative Text / Essays (Brandolini’s Law): It is very fast to write a convincing-sounding essay or blog post filled with statistics. It takes an order of magnitude more time for a human to verify that every fact, citation, and figure in that essay is actually correct.
Scientific Hypotheses: it is incredibly easy to propose a new diet (e.g., "Eating only blueberries improves memory"). It takes years of rigorous clinical trials, control groups, and data analysis to verify whether that hypothesis is scientifically true.
Code Security: A developer can write a "solution" to a coding problem in minutes that compiles and runs. However, verifying that the code is completely secure and free of vulnerabilities (like memory leaks or edge-case bugs) is much harder and often technically non-trivial.
Legal Accusations: In a courtroom setting, it is often easier to invent a narrative or "theory of the crime" (the solution) than it is to verify it through the collection of forensic evidence, witness testimony, and cross-examination.
Predictions: It is easy to generate a prediction (e.g., "This stock will double in value by next year"). Verifying this solution is impossible in the present; it requires waiting for the passage of time to see if the prediction materializes.

This has an interesting implication on the industries that are going to be automated by AI. If industry product verification cycle is long and requires manual labour, than the industry will be automated when their verification asymetry is either reduced or simulated well.

Industries to automate much later

Pharmaceuticals (Drug Discovery): A chemist can design a new molecular compound (the solution) in a day. However, verifying that the drug is safe and effective for humans requires a decade of clinical trials costing billions of dollars. Although there are labs like Isomorphic Lab that are aiming at increasing the speed of the process.
Civil Engineering (Infrastructure): Pouring a concrete bridge deck is a straightforward solution. Verifying the structural integrity and long-term fatigue resistance of that bridge over a 50-year lifespan is a massive undertaking involving sensors and periodic inspections.
Aerospace (Component Manufacturing): Manufacturing a single turbine blade for a jet engine is automated and fast. Verifying that the blade has zero microscopic fissures—which could cause a catastrophic failure mid-flight—requires expensive X-ray and ultrasonic testing.
Environmental Policy (Carbon Offsets): A company can easily claim to be "carbon neutral" by purchasing offsets. Verifying that those trees were actually planted, are still alive, and wouldn't have been planted anyway (additionality) is an ongoing global monitoring challenge.
Deep-Sea and Space Exploration: Proposing a mission or a landing site is a matter of calculation. Verifying the actual conditions of that environment (e.g., checking for life under the ice of Europa) is exponentially more difficult than the theoretical plan.
Food Safety (Supply Chain): A factory can produce thousands of jars of peanut butter daily. Verifying that every single jar is free of Salmonella or heavy metals requires complex sampling and lab work that lags far behind production speed.
Academic Peer Review: A researcher can write a paper in a few months. Verifying that the data isn't fraudulent and that the experiments are replicable often takes the scientific community years of follow-up study, especially for modern Physics.

So it seems like we need some foundational work in these industries to enable the progress there quicker.

How can we enable easier/quicker verification for these industries?

December 2025

alt text

AI x Robotics in Austin, TX

According to Statista current robotics market is $50B and it is going to grow to $220B by 2030 source. Essentially 4x growth is expected in 5 years, not too bad. To check the reality I visited the AI x Robotics Symposium in Austin Texas which was kindly organized by UT Austin and Austin Robotics. A few highlights from the symposium:

Austin Robotics lab is fascinating with 20+ different robots that they work with. They have a huge area with many different types of robots there. Very impressive workshop with modern 3d printers, high power laser cutters and FMT and resin 3d printers.
Academia is quite pessimistic about the robotics market in the next 3-5 years. They say there are a lot of missing bits and pieces in robotics and they do not expect quick adoption due to these significant gaps in the technology. Keynote speakers insisted on slow uptake in the robotics market despite their deep involvement with robotic startups.
There were quite a few entrepreneurs in the symposium including those who build (have been creating)humanoid robots. Some presented their humanoids on the stage and demonstrated their robots which still have quite limited capabilities.
Humanoid as a form factor is quite questionable. From one hand, we have all the things built for human hands and human height in house holds, from another, hands are quite hard to program and control.

Robotisists are consistent in their opinion on slow adoption, and here I listed the key gaps that prevent quick robots adoptions:

Lack of robust hand-inspired manipulators. We know this worked, when we see jungling robot which uses human-inspired fingers to throw and catch items.
Slow deployment rate for any robotic applications. We know it worked, when we see robots freely used among humans.
Robotic motors are still quite expensive, even though some robotic startups claim the reduction in cost in the recent years.
Lack of coherent safety framework for humanoid and domestic robots.
Lack of spatial awareness and spatial reasoning in modern LLMs, which we assume will be used for robot brains.

So despite high hype around robots it seems like we are going to see slow adoption especially in our house holds in the upcoming 5-10 years.

March 2025

Verification limits

Tasks that are easier to verify than solve

Tasks that are easier to solve than verify

Industries to automate much later

AI x Robotics in Austin, TX

First blog post