Declarative Image Generation

More Info
expand_more

Abstract

Generating synthetic images has wide applications in several fields such as creating datasets for machine learning or using these images to investigate the behaviour of machine learning models. An essential requirement when generating images is to control aspects such as the entities or objects in the image. Controlling this helps in creating custom datasets tailored for the above applications and creates a platform to conduct diverse experiments with the generated images, enabling research in the application's field.Existing methods individually enable controllability over various elements in the image such as selecting the objects, their properties, colour or the relations between objects, etc. but we identify a research gap in this field where no single method allows the user to control all these aspects of the image. An additional research gap identified is that existing methods cannot generate images based on a query with a logic based combination of entities.

In this thesis, we aim to fill this research gap by developing SceneUI - a system that allows the user to specify and control aspects of the scene through a user interface such as the objects, object properties, spatial relations between objects, object colour, extent of contextual objects and the background of the image. Additionally, we include a component where the scene is generated based on an OR query specifying the objects as predicates, which serves as a foundation to generating images based on entity combinations. Owing to the limited range of attributes for objects and the lack of objects with attributes in the dataset, we augment the dataset by expanding the attributes of objects in the scene graphs of the dataset and introducing additional objects that have attributes. This was done by identifying recurring objects in the dataset that could be expanded and manually annotating the changes in the dataset. This increases the level of controllability and gives a wider range of an object's properties to choose from.

The goal of the thesis is to design and develop a method that allows the user to declare, specify and manipulate elements of the image and eventually use the images generated for two use cases - Generating Images for Interpretable Machine Learning and Generating Images from Queries as Ground Truth. We evaluate SceneUI to show its usability and effectiveness for the two use cases through two experiments. In the first experiment, we use SceneUI to create biased datasets where each bias is based on an object's colour or object type and the goal is to train a deep learning model that learns the biases. The results show that the model learns the biases well and thus, SceneUI can be used to control datasets which can be used to benchmark machine learning explainability methods. The second experiment generates a dataset of images using the OR query and training machine learning models on the dataset to ensure the suitability of the images for machine learning tasks. As the generated images will be used as ground truth given a query for a specialised machine learning model, the model is expected to identify the predicate objects in the image. The results show SceneUI can generate images based on objects in the query and can also accommodate the objects to have properties. The models identify important features in the SceneUI-generated images and are thus suitable to be used as ground truth. We also discuss the limitations and tradeoffs of SceneUI and present potential future directions for research to improve its scalability.