Utilizing parallel algorithms is an established way of increasing performance in systems that are bound to real-time restrictions. Sensor-based sorting is a machine vision application for which firm real-time requirements need to be respected in order to reliably remove potentially harmful entities from a material feed. Recently, employing a predictive tracking approach using multitarget tracking in order to decrease the error in the physical separation in optical sorting has been proposed. For implementations that use hard associations between measurements and tracks, a linear assignment problem has to be solved for each frame recorded by a camera. The auction algorithm can be utilized for this purpose, which also has the advantage of being well suited for parallel architectures. In this paper, an improved implementation of this algorithm for a graphics processing unit (GPU) is presented. The resulting algorithm is implemented in both an OpenCL and a CUDA based environment. By using an optimized data structure, the presented algorithm outperforms recently proposed implementations in terms of speed while retaining the quality of output of the algorithm. ... mehrFurthermore, memory requirements are significantly decreased, which is important for embedded systems. Experimental results are provided for two different GPUs and six datasets. It is shown that the proposed approach is of particular interest for applications dealing with comparatively large problem sizes.