Aerial cinematography has seen an increased use of Unmanned Aerial Vehicle (UAV) due to technological advancements and commercialisation in recent years. The operation of such a robot can be complex and requires a dedicated person to control it. Automation of the cinematography a
...
Aerial cinematography has seen an increased use of Unmanned Aerial Vehicle (UAV) due to technological advancements and commercialisation in recent years. The operation of such a robot can be complex and requires a dedicated person to control it. Automation of the cinematography allows for the use of multiple robots, which further increases the complexity of performing cinematography. High-level command interpretation is required to allow for an intuitive interface suited for an inexperienced user to control such a system.
Natural Language (NL) is an intuitive interface method which allows a user to specify a extensive range of commands. A Cinematographic Description Clause (CDC) is defined to extract information from a processed NL command. A minimum input approach is considered such that a user has to merely specify the number of robots and the people to record, whereby the specification of a behaviour is optional. An environment is considered in which up to three robots have to frame two people. Taking into account their orientation, relative global location and the user command, a set of behaviours can be determined based on cinematographic practices. Camera views and image parameters are determined through behaviour specific non-linear optimisations and assigned to the robots using a Linear-Bottleneck Algorithm (LBA). A collision-free global path is computed for each robot with an A* search algorithm. Finally, a Model Predictive Control (MPC) determines low-level inputs such that the user command can be achieved.
Three situations are considered to validate the performance of the system given the minimal user input. First, tracking of the dynamic orientations of the people is evaluated for up to three robots, whereby camera positions are determined autonomously. Next, dynamic motions of the two people through an environment highlight the limitations of the system due to collision mitigation, mutual visibility and robot dynamics. An extension to multiple simultaneous commands increases the quantity of robots and people that can be tracked. This allows for an assessment of the flexibility and scalability of the proposed high-level command interpretation methodology.