Considering humans as a non-deterministic factor makes anomaly detection in Human-Robot Interaction scenarios rather a challenging problem. Anomalous events like unexpected user interaction or unforeseen environment changes are unknown before they happen. On the other hand, the work process or user intentions could evolve in time. To address this issue, a modular deep learning approach is presented that is able to learn normal behavior patterns in an unsupervised manner. We combined the unsupervised feature extraction learning ability of an autoencoder with a sequence modeling neural network. Both models were firstly evaluated on benchmark video datasets, revealing adequate performance comparable to the state-of-the-art methods. For HRI application, a continuous training approach for real-time anomaly detection was developed and evaluated in an HRI-experiment with a collaborative robot, ToF camera, and proximity sensors. In the user study with 10 subjects irregular interactions and misplaced objects were the most common anomalies, which system was able to detect reliably.