About Me
I'm a first-year Ph.D. student at Mila and the University of Montreal, advised by Professor Aaron Courville.
My life goal is to understand mind/consciousness/intelligence and build the so-called AGI. I believe the critical thing is to keep thinking, learning, and exploring, so I won't limit my research to some particular topics until I find an approach that I truly believe in.
Currently, I'm focused on reinforcement learning in general.
Previously I have been working on learning object-centric representations using structured generative models with Professor Sungjin Ahn at Rutgers University. I also had experience with computer vision when I started my research career in Professor Xiaowei Zhou's group at Zhejiang University.
About Mind/Intelligence
What I believe:
The agent has to be interacting with the world, either simulated or the real world. Any static dataset has limitations.
The agent should have a memory. It can be in any form though.
In terms of implementation, it has to be scalable in some way.
Thoughts:
Towards non-reactive, thinking agent:
Its performance at test-time scales with time and compute available.
Consider a Go agent. Suppose we parametrize the policy network with an RNN. The RNN accepts the current board configuration and is allowed to perform any number of steps of computation. Is there any way to train it, such that, at test time, the performance of the agent will scale with the number of steps allowed?
We know that there is a way to scale: use MCTS. This is a highly systematic procedure, but unfortunately, hard-coded. Can the RNN learn this kind of systematic scaling?
About RL: I believe the true value of RL is that (1) it provides the right objective to optimize and (2) provides a way to optimize the objective. Just like SGD, RL itself is not intelligence. Intelligence is the part that is discovered by RL after training, which is just a recurrent agent with some form of memory. At some point, the agent itself would be good enough to optimize its own objective. At that point, the "outer loop", which is the hard-coded RL algorithm, may no longer be needed.
Language: we think with language (and images) instead of some hidden, internal representations. At least that's what we perceive. This is interesting because this is very inefficient. There must be some reasons.
Adversarial learning might be important. Being better than yourself always seems to be something learnable.
If AlphaZero is trained against a very strong human player, it is not going to improve. Also, this is how evolution works: all creatures start with a simple form, and they jointly evolve to today's complex forms. This might hint at some solution for the exploration problem in RL.
About consciousness:
The problem is not to answer "for what problem consciousness is the optimal solution" (it is a useful problem to consider though). The problem that I'm really interested in is "what consciousness is", and then "how to build it". This problem is in itself interesting. I don't actually care whether it is optimal in any sense, although I do believe it will be the optimal solution to some problems.
Publications
Improving Generative Imagination in Object-Centric World Models
Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Bofeng Fu, Jindong Jiang, Sungjin Ahn
ICML 2020