On the decomposition of visual sets using Transformers