In this lecture, Dr. Legg sketches out how a general AI could be built. It starts with the idea of using Bayesian Statistics. The idea is to construct a "prior" knowledge of how the world might work, and assign different probability to each possibility. When new evidence is found, these probabilities are updated, and you get a new model for how the world works. The mathematics for this is a simple division problem if you're talking about a world with a finite number of possibilities, but alas the real world is continuous so we tend to get a bunch of integrals in the formula. Still, computers can generally handle it.
The catch here is finding the prior probability distribution. You need to start out with an "open mind", as there's always the possibility that none of your initial hypotheses will fit the data. It's not possible for an intelligent machine to keep track of the infinitely many hypotheses available, but it can keep track of the most likely ones and leave open the possibility that none of them work, in which case it would have to generate new ones.
But the machine still needs to calculate prior probabilities, which seems absurd. How can you make any assumptions BEFORE you see any data? The answer is given in the lecture, and it uses a principle called "Occam's Razor". Essentially, the simpler an explanation is, the more likely it is correct. That is why when we see the sequence 1,2,3,4,5... we usually assume that the next number will be 6, even though that's not always going to be the case. This measure is called, mathematically, the "Kolmogorov Complexity". It's proven impossible to write a program write a program that calculates this, but in my opinion this is not going to stop this approach from eventually working.
In my next post, I'm going to explain why I think this is the case.