I'm a fourth-year undergraduate student at the College of Computer Science, Zhejiang University in Hangzhou, China. I'm current a full-time research intern of the Rutgers Machine Learning Group, working with Professor Sungjin Ahn. Previously I was in the 3D Vision Group led by Professor Xiaowei Zhou at Zhejiang University. I am broadly interested in artificial intelligence, in particular Probabilistic Generative Models (VAEs), Reinforcement Learning and Computer Vision. My recent projects focus on (unsupervised) (object-centric) (scene|video) representation and generation with structured generative models. My ultimate goal is to build a general intelligence system that is truly capable of understanding the world, thinking, and acting. I’m applying for PhD/research MS in 2020.
Important Note: my ICLR paper "Spatially Parallel Attention and Component Extraction for Scene Decomposition" has been renamed to "SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition".
Some of research and course projects/code.
What I am 100% sure (well, I won't list it here if I'm not 100% sure):
- Must have some kind of "awareness" or "consciousness". This is my fundamental belief of a "mind". Unfortunately, I don't know what should be the definition of "awareness" and "consciousness".
- Interestingly, the "consciousness" seems only to be a small part of our mind. Think how limited is your short term memory, compared to the huge number of neurons in your brain. It is extremely low-capacity, but it is essential.
- It has to be sequential (stateful, has a memory, etc.)
What might be true:
- In terms of implementation, it has to be scalable in some way (modular, or built with simple units, like neurons)
- It is computational
- The definition involves "goals"
- The definition involves "meta"
- The definition must involve multiple agents
- The definition will depend on the environment and tasks
What is not necessarily true (for breaking the norms):
- Explicitly studying causal inference is important
- The definition must involve interaction with the environment (we can think when we are lying on the bed)
What is definitely not true:
- A single feedforward network is sufficient (even if we have some super optimization algorithm)
- It can be learned with purely i.i.d. samples
- We think using languages and images instead of some hidden, internal representations. At least that's what we perceive.