The application of Deep Learning methodologies to Non-Intrusive Load Monitoring (NILM) gave rise to a new family of Neural NILM approaches which increasingly outperform traditional NILM approaches. In this extended abstract describing our ongoing research, we analyze recent Neural NILM approaches and our findings imply that these approaches have difficulties in generating valid, reasonably-shaped appliance load profiles. We propose to enhance Neural NILM approaches with appliance load sequence generators trained with a Generative Adversarial Network to mitigate the described problem. The preliminary results of our experiments with Generative Adversarial Networks show the potential of the approach, albeit there is no strong evidence yet that this approach outperforms the examined end-to-end-trained Neural NILM approaches. In the progress of our investigations, we generalize energy-based NILM performance metrics and establish the complete classification confusion matrix based on the estimated energy in appliance load profiles. This enables the adaption of all known classification scores to their energy-based counterparts.