# Sequence optimization with the aid of enhanced learning in RobotStudio

This article describes an automated learning process for a robot in a collaborative application. This is done through reinforcement learning.

The usual way for an operator to learn how to optimally assemble a product through practice is expensive. It is time consuming and often requires extensive experience to find a strategy that is suitable for the operator and also results in a high-quality product and a time efficient process. However, one can imagine an automated method where an algorithm finds an optimal strategy for a specific operator and product by testing different actions in a simulated environment. Instead of an operator having to adapt to a robot or an engineer programming the optimal assembly process, the task is moved to a computer or a robot and automated. This can be accomplished by means of reinforcement learning; an algorithm that learns optimal behaviour in each situation to maximize or minimize a numerical reward signal. Reward signals in production applications may be lead time, which should be minimized, or a certain quality measure to be maximized by choosing different actions. The various operation options may represent possible mounting operations.

Today's collaborative robots are different from traditional industrial robots in that they can work safely together with humans. They can slow down when people get too close, stop when a person is blocking the robot, collide without causing damage and be guided by a human by pushing the robot arm in different directions. To strengthen the collaboration, this thesis shows that RL can be used to get a robot to observe patterns in a human operator's simulated behaviour and learn to adapt its own movements to optimize the assembly process. This can be combined with learning different optimized mounting sequences depending on the operator's preferences. Since it is difficult to define a mathematical model that accurately represents human behaviour, RL methods have the advantage over traditional optimization techniques that require a known model of the environment.

Three main methods have been evaluated: Tabular Q-learning, linear function approximation and nonlinear function approximation using neural networks. Furthermore, challenges and opportunities to increase the learning speed have been investigated through parallel learning. The results indicate that tabular Q-learning finds the optimal solution faster than both methods of function approximation. However, Q-learning with non-linear function approximation can generalize to an unlimited number of human behaviour profiles, which is virtually impossible with both linear function approximation and tabular Q-learning. implemented and compared. Although some results are difficult to draw from general conclusions, it is clear that all strategies have the ability to accelerate learning and reduce misconceptions about the environment while comparing them differently depending on problem complexity and number of parallel training instances.

The results presented in this project are promising for future research. They show that it would be possible to learn both a mounting sequence of a complex product and how to adapt the process to a complex representation of a human operator. Instead of a collaborative robot being controlled by humans, intelligent robots will begin to guide and understand their employees, which will open great opportunities for the industry in the future.

Below is a movie showing the most important methods and strategies in this thesis along with explanations and examples.

Follow the link to get to Oliver and Arsam's thesis.

### Tags

This article is tagged with these tags. Click a tag to see all the articles with this tag.