Humans are our best example of the ability to learn a structure of the world through observation of environmental regularities. Specifically, humans can learn about different objects, different classes of objects, and different class-specific behaviors. Fundamental to these human
...
Humans are our best example of the ability to learn a structure of the world through observation of environmental regularities. Specifically, humans can learn about different objects, different classes of objects, and different class-specific behaviors. Fundamental to these human abilities is evolved sensory hardware and automatic pattern recognition systems thought to be powered in part by the leading neuroscience theory of predictive coding. Artificial intelligence research is often inspired by neuroscience and algorithms already exist that implement predictive coding. In this paper, we seek to evaluate a leading predictive coding video-prediction algorithm, PredNet, for its ability to perform human-like learning of the types mentioned. By successfully training PredNet on a custom Simple Shape Motion (SSM) video dataset that explicitly requires structure learning to occur in order to accurately predict the next frame, we establish that PredNet is capable of rudimentary structure learning. We investigate PredNet filters and feature maps but find scant evidence of truly symbolic knowledge, and propose instead that PredNet performs semi-symbolic learning. We perform ablation studies that reveal the aspects of PredNet that critically contribute to its structure learning ability. Finally, we detail a set of modifications made to PredNet to allow object-centric processing as a promising step change towards human-like structure learning. Evaluation results and investigations are provided. Performance was slightly worse than Baseline, likely due to a noted implementation flaw. Code and instructions to reproduce dataset creation and model training / evaluation are available at https://github.com/ofSingularMind/parallel_prednet.