Zhixuan Lin

Zhixuan Lin

Github CV Email

zxlin.cs [AT] gmail.com

About Me

I'm a final-year undergraduate student at the College of Computer Science, Zhejiang University in Hangzhou, China. I'm current a full-time research intern of the Rutgers Machine Learning Group, working with Professor Sungjin Ahn. Previously I was in the 3D Vision Group led by Professor Xiaowei Zhou at Zhejiang University.

My life goal is to understand mind/consciousness and intelligence/AGI, and to build them. I believe the important thing is to keep learning and exploring, so I won't limit my research to some particular topics until I have found an approach that I truly believe in (I believe it will also be extremely interesting and worth devoting my life to it). Nevertheless, recently I found the following interesting so these are my current research interests:

  • Reinforcement learning
  • (Temporal) representation learning
  • (Temporal) generative models

My recent projects focus on (unsupervised) (object-centric) (scene|video) representation and generation with structured generative models.


Important Note: my ICLR paper "Spatially Parallel Attention and Component Extraction for Scene Decomposition" has been renamed to "SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition".

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

ICLR 2020

[Project] [Paper]

GIFT: Learning Transformation Invariant Dense Visual Descriptors via Group CNNs

Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou

NeurIPS 2019

[Project] [Code] [Paper]


Some of research and course projects/code.


A bunch of structured generative models for scene understanding reimplemented when doing the SPACE project.

Code: AIR, SQAIR, IODINE. GENESIS coming soon


A mini database management system in C++.

Code: MiniSQL

C- Compiler

A compiler for C-, a subset of the C language

Code: C-minus Compiler

About Mind/Intelligence

What I am 100% sure:

  • It has to be sequential (stateful, has a memory, etc.)
  • It is computational.
  • It can evolve without external inputs.
    • Human thinking process doesn't rely on external inputs too much. External inputs are useful only in terms of updating beliefs (current world state, world models), in a very indirect way. That is, the updating process is not implemented as a feed forward network. Memory is involved in conditioning this. How external inputs affect our beliefs is highly conditioned on the current conscious state, and may be strongly affected by only a small number of bits in memory.
  • Consciousness is crucial.
    • I believe the answers to 1) why we are experiencing this subjective experience (what's the advantage of being conscious?) and 2) how this subjective experience is produced from pure physical processes? are the answer to intelligence.
    • I believe psychologists have been looking for these advantages of consciousness for a long time.
    • Interestingly, the "consciousness" seems only to be a small part of our mind. Think how limited is your short term memory, compared to the huge number of neurons in your brain. It is extremely low-capacity, but it is essential.
    • Some (weak) justifications:
      • We are weak in terms of computational ability. But we are self-conscious. There is no reason why a higher life form will not be self-conscious.
      • Cleverer animals tend to exhibit more consciousness.
      • An agent can be stupid while still being an AGI. Consider a human baby.
      • Evolution brings us here. It must be useful. And through these years we are getting more and more conscious. If it is unnecessary evolution would have taken it away.

What might be true:

  • Temporal state abstraction is crucial. In many cases, reaching a subgoal means reaching a "state" that has a high value (e.g., doing welling in an exam, making money), However, these states are not physical world states, but rather abstract states. This is so common that I believe it is very important.
  • Maintaining a world model might be crucial. However, learning and state update of the world model must be strongly decoupled from the raw sensory inputs. We cannot build world models in pixel space. We turn our heads all the time. What we percept at the moment only does a little update to the world model that we maintain, and in a very complicated way.
  • Might be defined in a simple mathematical sense, like a Turing machines. Actually as hinted by Yoshua Bengio attention plays an important part in consciousness.
  • In terms of implementation, it has to be scalable in some way (modular, or built with simple units, like neurons)
  • The definition involves "goals". This is the perspective taken by Rich Sutton.
  • The definition involves "meta".
  • The definition will depend on a definition of an external world, like in reinforcement learning problems.
  • Adversarial learning might be important (some kind of evolution)

What is not necessarily true (for breaking the norms):

  • RL is the way (it has nothing to do with "thinking". We might need some new paradigm)
  • Causal learning is important (instead, it should be something that naturally emerges if we have choose the right paradigm)

What is definitely not true:

  • A single feedforward network is sufficient (even if we have some super optimization algorithm)
  • It can be learned from purely i.i.d. samples.


  • We think using languages and images instead of some hidden, internal representations. At least that's what we perceive. There must be some reasons.

Other interesting questions:

  • Insects don't have a mind. We do. Some animals do. What's the fundamental criterion?


  • We think in a so abstract and flexible way that it is impossible that the process of thinking is hard coded in the structure of our brain (or weights in neural networks). By flexible I mean even if the knowledge, or content of our (short-term) memory changes a little bit, what we will be thinking will be completely different. And our short-term memory doesn't just change a little bit, it changes rapidly and drastically. This means that the thinking process cannot be implement as a simple feed-forward network. The behavior must be conditioned on the contents/knowledge of the rapid-changing memory (instead of weights. Weights are hard and slow to change, and are more like "function mappings" or "connections" instead of knowledge or states), in a very abstract way. However, the mechanism that thinking process is conditioned on memory must also be simple, scalable because this is a basic building block of the thinking process. Combined with the fact that 1) mind is sequential 2) memory is huge (so attention is required for useful read and write operation) 3) the conditioning mechanism must be flexible and simple at the same time, there are reasons to believe the thinking process act in a Turing machine way, or a modern computer CPU way, where a controller acts on an external memory using attention.
    • Feed-forward network won't work. What about RNNs? Yes, the internal states can change rapidly and drastically. But there is one problem: there is no attention. But memory can be huge. LSTM, with (some kind of) attention and read-write operations, can be a prototype of a mind. But of course, there are missing things.
    • Learning where to write and where to read and what to write (the transitions) is not different from how we interact with the world. It has to be learned. So RL may help here.
    • Let's think about this. If RL is the way that animals develop their reflexive behavior, then we can image how mind is developed be evolution
      • Single cell animals
      • Animals that learn reflexive behaviors by RL
      • Animals that have a simple, small memory
      • This small memory gets larger, and attention appears in aiding effective reading and writing
      • Animals learn advanced way to read and write memory by RL
      • Then consciousness appears?
  • Having a template of world models to facilitate knowledge transfer. When we are in a new scene, we build new models, quickly. Rules (e.g., state transitions) can be very different in different scenes. However, the world model might share similar structures. So if we learn this underlying structure, we can do fast adaptation. A crazy idea is trying to model the distribution of all world models, and trying to find the underlying variation factors.