About Me
I'm a first-year MSc student at Mila and the University of Montreal, advised by Professor Aaron Courville.
My life goal is to understand mind/consciousness/intelligence and to build the so-called AGI. I believe the important thing is to keep thinking, learning, and exploring, so I won't limit my research to some particular topics until I find an approach that I truly believe in.
Currently, I'm focused on reinforcement learning in general. No very specific interest yet.
Previously I have been working on learning object-centric representations using structured generative models with Professor Sungjin Ahn. I also have experience with computer vision when I started my research career in Professor Xiaowei Zhou's group.
About Mind/Intelligence
What I believe:
The agent has to be interacting with the world, either simulated or the real world. So, you cannot produce AGI from any static dataset.
I think a good one will be just interacting with the internet. Then we don't need to worry about boring things like controlling an robot arm.
The formulation is world-agnostic.
The agent should have a memory. It can be in any form though.
The agent should be self-improving, meaning that the mechanism used to update the agent is part of the agent.
In terms of implementation, it has to be scalable in some way.
Consciousness is crucial. It may be the only thing that matters. I don't know.
Thoughts:
Towards non-reactive, thinking agent:
Its performance scales with time and computing resources available.
The way it scales must not be hard coded, like SGD or some predefined search routine.
Consider a Go agent. Suppose we parametrize the policy network with an RNN. The RNN accepts the current board configuration, and is allowed to perform any number of steps of computation. Is there any way to train it, such that, at test time, the performance of the agent will scale with the number of steps allowed?
We know that there is way to scale: use MCTS. This is a highly systematic procedure, but unfortunately, hard-coded. Can the RNN learn this kind of systematic scaling?
About RL:
The "reward is part of the observation" assumption is weird. We don't receive rewards. Our observations are just $o_t$ . There is no $r_t$. Someone might say that reward should be internal. But, the point is, we don't need rewards to learn.
We can learn just by reading texts and watching videos. Think about which part of an RL agent implements this kind of learning. It is actually the state update function $s_{t+1} = f(o_t, s_t)$ in RL, which is totally non-essential in the current RL framework and often implemented using some kind of RNN, and its purpose is basically just for handling partial observability. However, learning by reading things and watching is probably the most important form of learning for us.
The online, continual learning aspects of RL rely on the fact that agents can receive rewards online.
To me, the meta-RL setting (without rewards as input) would make more sense. There could be an evaluation function that quantifies the performance of the agent and can be used to update the agent in the outer loop, but the agent should not be aware of it. Note that in this case, RL algorithms can still be applied in the outer loop, but they should not be considered part of the agent's learning process. Only things that happen in the agent state update function should be considered "learning".
Where do the reward signals come from, after all?
However, I'm not against the "reward is enough" argument. Actually, I pretty much agree with that.
As Rich points out, mathematics is not world knowledge and thus not a kind of predictive knowledge. So I assume that RL cannot deal with that. But the ability to understand math is essential.
Can you imagine an RL agent, justing by interacting with the world (assume the right reward is available, somehow), without having access to all the math textbooks that we have, can develop the same of level of understanding of modern mathematics as we did? It could probably learn Go in that way, but not math.
This is more a personal thing. I cannot imagine what kind of role consciousness would play in an RL agent.
Languages:
Yann LeCun says language is just an inefficient way to express your thought, but language may not just be a trivial thing. It can be used to represent anything. It is interesting that it could be so primitive while so useful. Most thought processes basically are like speaking to yourself.
We think using languages (and images) instead of some hidden, internal representations. At least that's what we perceive. This is interesting because this is very inefficient. There must be some reasons.
How are you supposed to convey the same amount of information in an academic paper to an agent without language?
Internal/Subjective experience
If you consider the part of our brain that implements consciousness as the agent, then you would see that it is not interacting with the world directly. It is interacting with the rest part of our brain:
Observations: raw observations are processed, parsed and integrated before consciousness gets access to it. Also, observations include past memory, chemicals released by our brain that induce emotions, and so on.
Actions. We can move our body. But at the same time, we get to query our memory and write to our memory. In that sense, the memory in our brain is external.
Adversarial learning might be important. Being better than yourself always seems to be something learnable.
Think about it. If AlphaZero is trained against a very strong human player, it is not going to improve. This is kind of similar to the exploration problem in RL. The balance issue of discriminator and generator in GANs is also similar. Also, this is how evolution works: all creatures start with a simple form, and they jointly evolve to today's complex forms (co-evolution). This might hint some solution for the exploration problem in RL.
About consciousness:
First, we need to admit that the problem is not well formulated. But to be clear, we also need to understand that the problem is not to answer "for what problem consciousness is the optimal solution" (it is a useful problem to consider though). The problem that I'm really interested in is "what consciousness is", and then "how to build it". This problem is in itself interesting. I don't actually care whether it is optimal in any sense, although I do believe it will be the optimal solution to some problems.
Currently, even though we don't know what objective we are trying to optimize, we have some clues about what the solution will be like: it is multi-step, sequential, with a memory, with attention, and so on. More ideas and experiments are needed.
If you think about it, the agent is not part of the environment. For example, in robotics, everything about the robot (position, etc.) is considered part of the environment/external world. This is very unnatural. I'm not saying that positions should not be considered an external world state. I'm just saying that maybe we should reconsider this problem.
Publications
Improving Generative Imagination in Object-Centric World Models
Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Bofeng Fu, Jindong Jiang, Sungjin Ahn
ICML 2020