Comparison of Optimal Control Techniques for Learning-based RRT

More Info
expand_more

Abstract

Kinodynamic motion planning for a robot involves generating a trajectory from a given robot state to goal state while satisfying kinematic and dynamic constraints. Rapidly-exploring Random Trees (RRT) is a sampling-based algorithm that has been widely adopted for this. However, RRT is not fast enough to enable its use in industrial applications. Recently, supervised learning has been used to pre-learn time consuming steps of RRT which resulted in improvement in planning times. The supervised learning models require cost and control input of the system as training data which are generated using optimal control.

The training data can be obtained either by indirect optimal control or direct optimal control techniques. In this thesis, both the techniques are each used to generate cost and control inputs for a two-link manipulator using random initial-final state pairs. Then each dataset is used to train a model and the datasets are compared based on certain training metrics. K-nearest neighbours regression and multi-layer perceptron neural network are the supervised learning models used in this thesis. It is observed that both the datasets result in similar convergence of the models, but indirect optimal control approach allows upto 24-fold faster data generation and upto 3-fold reduction in dimensionality of training data compared to the direct optimal approach.

Real-world robots have torque limits based on actuator configuration. The torque limits are modeled as control constraints in both the optimal control techniques and the effect of
this restriction on data generation and supervised learning is studied in this thesis. Direct
optimal control is found to be better for data generation in this case due to the ease of
applying control bounds as inequality constraints on the function approximations. Indirect
optimal control is very tedious as active constraints should be known a priori to determine
the switching points. An alternate method is explored instead where samples are generated similar to the unconstrained case but samples violating the constraints are removed. Poor control input learning is observed in both approaches and the models struggled to extrapolate. It is hypothesised that this is due to inability of the constrained data to fully capture the system dynamics. However, good cost prediction is achieved using neural networks.