About Me
I'm a third-year Ph.D. student at Mila and the University of Montreal, advised by Professor Aaron Courville.
My life goal is to understand and build general intelligence, and my research interests reflect my overall beliefs about intelligence. Currently, I'm interested in long-context sequence models (particularly linear complexity models) and their potential applications in RL.
Previously, I've worked on learning object-centric representations using structured generative models with Professor Sungjin Ahn at Rutgers University. I also had experience with computer vision when I started my research career in Professor Xiaowei Zhou's group at Zhejiang University.
About Intelligence
What I believe:
Eventually, the agent needs online data. In other words, they need to generate data for themselves. This could mean interaction with the real world or an internal thinking process. Static datasets are very useful, but they have limitations.
I think of an agent as a sequence model. I believe the most important aspect of intelligence is related to the temporal, sequential nature of the world (memory, the experience stream, credit assignment...). In terms of the agent architecture, the most essential part is the mechanism that handles temporal dependencies.
Thoughts:
About RL: The real value of RL might be that it can be used to specify the right objective to optimize for an agent. How the objective should be optimized is less important (e.g., hardcoded RL algorithms or "pure", implicit in-context learning).
About language: A lot of the time, we think and reason with natural language instead of some hidden, internal representations. At least, that's what we perceive. This is interesting because this is very inefficient.
Human brains are limited in many ways. For example, most (if not all) people only have geometric intuition for 3D spaces. Can neural networks develop geometric intuition for high-dimensional vector spaces (or some even more abstract spaces)? A more general question is, what are fundamental abilities (the most interesting being mathematical intuition) that a neural network can develop but humans cannot?