I'm a final-year undergraduate student at the College of Computer Science, Zhejiang University in Hangzhou, China.
My life goal is to understand mind/consciousness and intelligence/AGI and to build them. I believe the important thing is to keep learning and exploring, so I won't limit my research to some particular topics until I have found an approach that I truly believe in. Nevertheless, recently I found the following interesting so these are my current research interests:
(Temporal) representation learning
(Temporal) generative models
My recent interest (well, not really a research interest) is to figure out why Professor Rich Sutton believes that the reinforcement learning problem is exactly the problem of AI. This looks very unnatural to me. But many talks and ideas that he shared make very good sense to me. So I guess there are something that I didn't see. Let me try to figure it out.
One thing that he said (for example, this) is if we can learn a model of the world and we can plan with it, then we will be close to AGI. I really don't understand why... I can see why AlphaZero is successful: planning through MCTS drastically increases the amount of policy improvement, and improved policy and value can in turn accelerate MCTS planning. But that is based on several things:
Fully observability and known dynamics. I'm not saying that the board is a good representation. But that doesn't matter. What matters is the observability.
The states are discrete, and the number of states near a state is finite, making MCTS feasible.
In general, I'm in particular worried about two things:
Why learning a model of the world? Despite successes in MBRL, I don't think this is the right thing to do. The world is too complex to learn (and partial observability is standard), which Rich also agrees. So I don't understand this. The more central question to me is, how do we construct the agent state? The agent state doesn't need to be simulation of the world state. I guess this is where we can consider memory, consciousness and so on.
The old question: where do rewards come from? Though there have been numerous discussions on this, I don't see a satisfying one.
What I am 100% sure:
It has to be sequential (stateful, has a memory, etc.). This is not something trivial. If you say that this is because our world is stateful, then the question is will there be a "static" world? If you say a static world should not be called a world, then you are saying that being "stateful" is something special. Then why?
I get some very very rough clues now. It is about computation irreducibility.
It is world-agnostic (from Rich Sutton). Interestingly, consciousness or self-awareness is something world-agnostic.
Consciousness is crucial. Actually I am not 100% sure about this. It is more like a belief or intuition.
Two most central questions 1) why we are experiencing this subjective experience (what's the advantage of being conscious?) and 2) how this subjective experience is produced from pure physical processes?
Some (weak) justifications:
We are weak in terms of computational ability. But we are self-conscious. There is no reason why a higher life form will not be self-conscious.
Cleverer animals tend to exhibit more consciousness.
An agent can be stupid while still being an AGI. Consider a human baby.
Evolution brings us here. It must be useful. And through these years we are getting more and more self-conscious.
What might be true:
In terms of implementation, it has to be scalable in some way (modular, or built with simple units, like neurons)
The definition involves "goals". This is the perspective taken by Rich Sutton. But I still cannot understand this.
The definition involves "meta". This is about being self-aware.
Adversarial learning might be important. Being better than yourself always seems to be something learnable.
What I don't believe:
RL is the final answer. It is a good step though.
Causal learning is important. Instead, it should be something that naturally emerges if we have choose the right paradigm. Actually, I seriously doubt that we should explicitly study this.
We think using languages and images instead of some hidden, internal representations. At least that's what we perceive. This is interesting because this is very inefficient. There must be some reasons.
Other interesting questions:
Should the process of evolving from a single cell to today's human considered as part of the "general principle of intelligence" or we are only considering the process of how human developed consciousness? Or, the question is, where should we start from?
Does it make sense to say whether a grid-world agent has intelligence or not?
Insects don't have a mind. We do. Some animals do. What's the fundamental criterion?
If you think about it, the agent is not part of the environment. For example, in robotics, everything about the robot (position, etc.) is considered part of the environment/external world. This is very unnatural. I'm not saying that positions should not be considered an external world state. I'm just saying that maybe we should reconsider this problem.
State abstraction may be important. In many cases, reaching a subgoal means reaching a "state" that has a high value (e.g., doing welling in an exam, making money), However, these states are not physical world states, but rather abstract states.
We think in a so abstract and flexible way that it is impossible that the process of thinking is hard coded in the structure of our brain (or weights in neural networks). By flexible I mean even if the knowledge, or content of our (short-term) memory changes a little bit, what we will be thinking will be completely different. And our short-term memory doesn't just change a little bit, it changes rapidly and drastically. The behavior must be conditioned on the contents/knowledge of the rapid-changing memory (instead of weights. Weights are hard and slow to change, and are more like "function mappings" or "connections" instead of knowledge or states), in a very abstract way. However, the mechanism that thinking process is conditioned on memory must also be simple, scalable because this is a basic building block of the thinking process. Combined with the fact that 1) mind is sequential 2) memory is huge (so attention is required for useful read and write operation) 3) the conditioning mechanism must be flexible and simple at the same time, there are reasons to believe the thinking process act in a Turing machine way, or a modern computer CPU way, where a controller acts on an external memory using attention.
Learning where to write and where to read and what to write (the transitions) is not different from how we interact with the world. It has to be learned. So RL may help here.
Let's think about this. If RL is the way that animals develop their reflexive behavior, then we can image how mind is developed be evolution
Single cell animals
Animals that learn reflexive behaviors by RL
Animals that have a simple, small memory
This small memory gets larger, and attention appears in aiding effective reading and writing
Animals learn advanced way to read and write memory by RL
Then consciousness appears?
Maybe the result of learning after exploring a specific world is a generative model that encodes the most essential rules of that world. This generative model enables efficient learning of any tasks.
The exploration-exploitation in this case: at the start, we have little knowledge about the world, and we rely more on external inputs. Then, we rely more on this generative model.
SGD can learn generative models in some cases. It is just too naive.
Any knowledge is stored as interaction-observation functions. Both can be abstract.
Improving Generative Imagination in Object-Centric World Models
Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Bofeng Fu, Jindong Jiang, Sungjin Ahn