A muon-track reconstruction exploiting stochastic losses for large-scale Cherenkov detectors

IceCube is a cubic-kilometer Cherenkov telescope operating at the South Pole. The main goal of IceCube is the detection of astrophysical neutrinos and the identification of their sources. High-energy muon neutrinos are observed via the secondary muons produced in charge current interactions with nuclei in the ice. Currently, the best performing muon track directional reconstruction is based on a maximum likelihood method using the arrival time distribution of Cherenkov photons registered by the experiment's photomultipliers. A known systematic shortcoming of the prevailing method is to assume a continuous energy loss along the muon track. However at energies $>1$ TeV the light yield from muons is dominated by stochastic showers. This paper discusses a generalized ansatz where the expected arrival time distribution is parametrized by a stochastic muon energy loss pattern. This more realistic parametrization of the loss profile leads to an improvement of the muon angular resolution of up to $20\%$ for through-going tracks and up to a factor 2 for starting tracks over existing algorithms. Additionally, the procedure to estimate the directional reconstruction uncertainty has been improved to be more robust against numerical errors.

: IceCube is a cubic-kilometer Cherenkov telescope operating at the South Pole. The main goal of IceCube is the detection of astrophysical neutrinos and the identification of their sources. High-energy muon neutrinos are observed via the secondary muons produced in charge current interactions with nuclei in the ice. Currently, the best performing muon track directional reconstruction is based on a maximum likelihood method using the arrival time distribution of Cherenkov photons registered by the experiment's photomultipliers. A known systematic shortcoming of the prevailing method is to assume a continuous energy loss along the muon track. However at energies > 1 TeV the light yield from muons is dominated by stochastic showers. This paper discusses a generalized ansatz where the expected arrival time distribution is parametrized by a stochastic muon energy loss pattern. This more realistic parametrization of the loss profile leads to an improvement of the muon angular resolution of up to 20% for through-going tracks and up to a factor 2 for starting tracks over existing algorithms. Additionally, the procedure to estimate the directional reconstruction uncertainty has been improved to be more robust against numerical errors.

K
: Neutrino detectors, Cherenkov detectors, Data analysis

Introduction
The IceCube Neutrino Observatory is a cubic-kilometre neutrino telescope located at the geographic South Pole [1,2]. It consists of 5160 digital optical modules (DOMs), each containing a 10-inch photomultiplier tube (PMT). The PMTs detect Cherenkov photons emitted from charged secondary particles created in neutrino interactions in the surrounding ice. The ice in which IceCube is deployed is of glacial origin and exceptionally pure. However, it contains impurities such as dust and volcanic ash, most prominently in a layer between ∼ 2000 m and ∼ 2100 m depth [3][4][5]. Further irregularities in the ice include bubble columns in the refrozen ice around the strings of DOMs, a tilt of the ice sheet, and an anisotropic attenuation aligned with the local flow of the ice [6,7]. These medium qualities are partially mitigated by IceCube's calibration system. A series of light-emitting diodes (LEDs) are used to illuminate the PMTs and parametrize the ice properties [5].
In 2013, the IceCube collaboration detected the first astrophysical neutrinos in the TeV-PeV range [8]. Since then, further studies have been initiated to understand the origin of these neutrinos, e.g. [9][10][11]. Many of these are point-source studies searching for correlations between an excess in neutrino events and known astrophysical source locations [12]. The coincidence in 2017 of a high energy neutrino event and the flaring blazar TXS 0506+056 [13] reinforced the idea of a fraction of blazars being the sources of high-energy neutrinos. In point-source analyses, a precise reconstruction of the direction of the neutrino is a central aspect that contributes most to the detection sensitivity. Apart from time integrated searches that collect large statistics samples, single real-time neutrino alerts that are sent out to the astronomical community also require a precise directional reconstruction [14].
An important detection channel for point source identifications are muons which originate from charged current (CC) interactions, which appear as track signatures in the IceCube detector. Above a few TeV, these muons are nearly co-aligned with the direction of the parent neutrino due to relativistic kinematics (see Section 2 for a more quantitative discussion). At these energies, a confidence interval on the muon arrival direction approximately translates into a confidence interval on the parent neutrino direction. This paper describes a new likelihood reconstruction that incorporates the stochastic energy losses of muons along their tracks into the likelihood hypothesis and thereby improves the accuracy and precision of the reconstructed arrival direction. Section 2 summarizes the previous likelihood approaches and lists their limitations. Section 3 introduces the new likelihood approach and discusses how some shortcomings of the previous algorithms are solved. The discussion here concentrates around both the angular reconstruction and uncertainty estimation. Section 4 and Section 5 show comparisons of the various methods and conclude this manuscript with a final discussion.

Previous algorithms
This section illustrates a typical reconstruction process and in this context describes various existing reconstructions that are useful to understand the benefits of the new approach. Some of the details in this section have been already partially covered in Refs. [15] and [16].

Angular reconstruction
IceCube collects Cherenkov photons from charged particles with a few nanosecond time resolution at the various DOM locations [2]. This time resolution is required because a Cherenkov light front from a muon passing by a DOM yields a photon arrival time probability distribution function (PDF) that can have temporal structures of this magnitude [17]. This time resolution can best be exploited in an unbinned likelihood approach. The prevailing angular reconstructions typically assume an infinite muon track length and neglect stochastic losses. Such reconstructions employ two broad classes of unbinned likelihood approaches to model the arrival time of photons: where DOM and hit are the total number of DOMs and hits, respectively. The first likelihood (Equation 2.1) is the standard unbinned likelihood and uses the photon arrival PDF ( ) for each observed hit. Since multiple photons can arrive at the DOM simultaneously, the correct application weights each hit by the observed charge . In practice, this is often neglected since the effect it has is subdominant compared to the unphysical assumption of the minimally ionizing track hypothesis. The second likelihood (Equation 2.2) uses the photon arrival PDF of only the first photon, ,1st ( 1 ), which technically corresponds to the first-order-statistic PDF. This PDF can be calculated exactly given the PDF ( ) and the cumulative distribution function (CDF) ( ) [15]. The likelihood uses data that is weighted with the observed charge per hit [2], and the total observed charged per DOM, = . The motivation for the first-order statistic PDF is twofold. While the standard PDF (Equation 2.1) is able to model the time distribution of a minimally ionizing muon in homogeneous ice, muons at energies beyond TeV energies undergo stochastic energy losses. Additionally, the real ice properties are not homogeneous. These known unphysical assumptions are partly mitigated by using 1st , because it evaluates only the first observed photon in each DOM which is less likely to have undergone significant scattering and hence is less affected by these uncertainties. The optimization is a 6-dimensional problem for the track positional parameters , , , and orientation via two angles, e.g. zenith and azimuth . In practice, the angles are sometimes re-parametrized as two parameters lying in the sphere tangent plane that is perpendicular to the seed direction in order to avoid numerical issues near the poles (see Appendix A).
In a previous publication [15] the likelihood is called "MPE" likelihood instead of 1st .
seed seed seed seed Figure 1: An example track reconstruction seeding chain depicting the underlying assumed PDF in the respective likelihood reconstruction. The PDF approximation quality and CPU time requirements broadly increase with more sophisticated reconstructions later in the chain.
The following sections describe a processing pipeline in which several reconstructions with increasing complexity are applied to the same event (see Figure 1). Each reconstruction outcome is used as a seeding strategy to the next one, starting with the fastest and simplest reconstruction and becoming gradually more time consuming and more precise. Note that in practice different processing pipelines are used based on the needs of a given event selection. However, the pipeline illustrated in Figure 1 is representative in that it shares the property of starting simple and fast and becoming gradually more time consuming and precise. Furthermore, it illustrates the thought process that went into the development of the new reconstruction.
Analytic Gaussian PDF (Least-Square) with plane-wave assumption The "first-guess" fit is typically a least-square fit which does not assume a minimal-ionizing muon, but a plane-wave There is an exception in the time consumption of 1 , which only requires the evaluation of the first photon, and is therefore actually faster, while also giving a more precise result. moving through the detector with a constant velocity. Assuming a standard normal arrival time PDF of this plane wave one can analytically calculate a mean position and velocity vector of the least-square problem [18]. To partially mitigate this rather simplified approximation a more robust regression with respect to outliers is typically used, for example by the inclusion of a Huber loss term [19]. The robust algorithm is shown in Figure 2.
Analytic Gamma PDF A more physically motivated likelihood description of the photon arrival PDF comes from assuming a minimally ionizing track as discussed above. This can be analytically described with a gamma distribution [15] and implicitly contains the Cherenkov emission geometry and scattering properties of homogeneous ice. This parametrization is more accurate not only because of its more realistic assumptions, but also because it allows to calculate analytically the first-order-statistic PDF and thereby 1st . Empirically, the gamma PDF with likelihood 1st gives better results than the Gaussian first guess (least-square fit) or the standard likelihood , as shown in Figure 2. As described earlier, this can be understood as partly mitigating the inaccurate modeling assumption of an infinite muon by only looking at the first photon. Also shown is the mean angle between the muon and parent neutrino direction. The uncertainty from the displayed muon reconstructions is well above this intrinsic uncertainty, especially above 10-100 TeV.
B-Spline PDF modelling Further improvement can be made by incorporating a more realistic ice model, which involves two steps. First, the ice's optical properties are inferred from fitting an ice model to LED flasher data [5]. Second, infinite muon minimal-ionizing muon tracks are This algorithm has been previously been referred to as "line-fit" [15].
In a previous publication [15] this approximation of the PDF with a gamma function has been called "Pandel" approximation. The SplineReco reconstruction is shown using the standard likelihood , 1st with default settings, and 1st with max settings. Each reconstruction is seeded with the previous best performing algorithm, following the chain of Figure 1. The statistical error on the median is calculated using bootstrapping. The MC muon energy is calculated at the neutrino interaction point.
simulated in many different positions and orientations, and the resulting photons are recorded in high-dimensional histograms [20]. These histograms are then fitted with multi-dimensional interpolating B-Splines [21] to represent the photon arrival PDF in dependence of the track position and orientation. Since the relevant ice properties match reality better, the dominant remaining inaccuracy comes from the fact that the stochastic losses of the muon are neglected, as the track hypothesis still assumes a minimally ionizing particle. Several methods have been implemented to mitigate the effects of this unphysical assumption and improve the reconstruction, besides the use of the first-order statistic PDF (usage of 1st instead of ). These include using effective photon arrival PDFs from averaged stochastic tracks instead of minimally ionizing tracks, including non-uniform photo-multiplier noise modeling, removing photons that might arise from large stochastic losses, and convolving the first-order statistic PDF with an energy-dependent Gaussian kernel [22]. We call this ensemble of modifications "max settings" in the following. The general reconstruction scheme with B-splines as described above is referred to as SplineReco. When no added information is given, SplineReco uses likelihood 1st and the previously described max settings. As can be seen in Figure 3, usage of likelihood 1st is better than the standard likelihood ( ) and the gamma distribution-based approach ( Figure 2). The use of the modifications we refer to as max settings gives some additional improvement. Compared to the analytic gamma approximation, the PDFs based on B-splines are significantly more precise and still reasonably fast. A typical application of the reconstruction on a 10 TeV muon takes less than a hundredth of a second. A more thorough comparison of running times is given in Section 4.3. While the modifications have been empirically shown to somewhat circumvent the underlying unphysical assumption of an infinite muon track with a smooth energy loss profile, it is preferable to model a more correct hypothesis. The new algorithm that models stochastic losses directly is described in Section 3.

Uncertainty estimation
The previously discussed likelihood optimizations typically run over 6 parameters: , , , , and . The first four define a point lying on the track ( , , , ) and the last two specify the direction using zenith and azimuth ( , ). The angle parameters are often re-parametrized for technical reasons (see Appendix A for details). In order to obtain an uncertainty estimate for the two angles, the typical practice is to fit a 2-D paraboloid to the profile-likelihood at the minimum [16], where the , , and parameters are profiled, i.e. optimized for each value of the two angles. This uncertainty estimation is referred to as the traditional method in the following. In practice, a grid is constructed in a rotated coordinate system 1 , 2 , which is localized at the equator such that the two new angle coordinates are comparable [16] (see Appendix A.2 for more information). A problem with this construction is that the fixed grid used for the evaluation of the likelihood space is optimized for a typical uncertainty of about a degree. If the actual uncertainty is much smaller or larger than this typical uncertainty, this can lead to failures on the paraboloid fit. Additionally, if a fit converges, it is not clear how well it describes the shape of the log-likelihood maximum. Especially at lower energies that shape is typically not parabolic. In Section 3 an updated strategy is described that avoids both the problem with the fixed grid and the fit quality check.

New algorithm: SegmentedSplineReco
At energies above ∼ 1 TeV, muons predominantly lose energy stochastically via bremsstrahlung, pair production, and nuclear interactions. Therefore, the assumption of a continuous energy loss pattern used in the reconstructions presented up to now is no longer valid. The result of these stochastic energy losses is the production of clustered light depositions on top of the track signature. Light created in such stochastic losses has a different emission spectrum, and it influences the photon arrival time distribution. Therefore, these stochastic energy losses should ideally be included directly in the track parametrization. A new reconstruction implementing this idea is described in this section. This reconstruction is referred to as SegmentedSplineReco in the following.

Angular reconstruction
SegmentedSplineReco is a maximum likelihood reconstruction that uses a segmented muon hypothesis ( Figure 4). Each segment effectively models electromagnetic and hadronic stochastic losses ("cascades"), and contributes to the PDF of the photon arrival times, together with a constant DOM-dependent noise term and an optional infinite minimum ionizing muon track hypothesis. The number of photons and their time arrival distributions are obtained from high-dimensional splines fitted to Monte Carlo simulations of photons propagating in ice. Similar splines for the infinite muon hypothesis are used in the SplineReco reconstruction. The main difference are the additional B-Splines for the stochastic losses in SegmentedSplineReco.
The reconstruction performs several steps which are described below: 1. The initial hypothesis is twofold: (1) a track direction and (2) an energy loss pattern parametrized by electromagnetic cascades placed at the center of each segment and located along the initial track hypothesis (see Figure 4). These first guesses are given by previous reconstructions. Alternatively the energy loss pattern can also be determined directly by SegmentedSplineReco, in which case only an initial guess of the track direction is required.
2. The total PDF of the photon arrival time at a DOM position in the detector is given by the weighted sum of PDFs using each cascade segmented as a source of photon emission: where the index runs over the segments and = =1 . The parameter denotes the expected number of photons of the source in segment in the given DOM, where the different sources are the electromagnetic cascades, the constant noise contribution, and the contribution of a minimum ionizing muon if requested. The total number of photons from the cascades and the muon are again obtained from high-dimensional spline distributions fitted to simulations. Figure 5 shows the PDF for all cascades produced along a muon track as a function of hit time, their weighted sum and the infinite muon and noise PDFs.
3. The PDF is then used to define a likelihood function, which is maximized varying the track parameters ( , , , , , ). Three likelihood functions have been implemented: Here we use the parameters and , the usual angle parameters in spherical coordinates. In a more practical optimization these variables are re-parametrized in one of two different ways and then denoted either as 1   The index runs over all DOMs while the index runs over all hits for a given DOM . The total charge produced by a hit is denoted by . The PDF ,1 in likelihood (c) is the first-order-statistic PDF, this time derived from the total PDF of all source contributions. The derivation is mathematically similar to the one for a single minimal-ionizing muon 1st in Equation 2.2.
SegmentedSplineReco has been implemented in C++ and Python within the IceCube software framework. It includes several improvements with respect to the previous algorithms. This includes support for exact gradient and calculation of the second-order partial derivatives matrix, the Hessian matrix, from the underlying high-dimensional B-splines and the possibility to fit the energies jointly with the track parameters. The latter option is only feasible with available gradient information due to the rather high-dimensional (> 100-D) problem. Supplying the optimization algorithm with an exact gradient leads to substantially improved convergence speed. For the often used algorithm MIGRAD contained within the high-energy physics package minuit [23] the resulting increase in speed is a factor of two compared to not using a gradient.
A new coordinate system has also been implemented, equivalent to the one described in Section 2.2: the seed track is rotated to the equator of the coordinate system, defining a new origin. The coordinates near the equator are quasi-euclidean for small values and remain interpretable as angles for larger values. More details can be found in Appendix A.
Finally, an energy-dependent convolution of the first-order statistic PDF has been implemented similar to the implementation in SplineReco. The SegmentedSplineReco likelihood is convoluted with a Gaussian distribution using a fast recursive approximation algorithm as implemented in [24]. Such a convolution implicitly models timing inaccuracies between modules and also mitigates remaining model mis-specification. The likelihood is calculated at several sample points around the requested time. From these sample points, the convoluted PDF is calculated with adjustable accuracy depending on sample point density and recursion step count. The convolution is sometimes called post-jitter below.

Uncertainty estimation
Two new uncertainty estimation methods have been implemented for SegmentedSplineReco. Both of them estimate the Hessian matrix at the log-likelihood optimum. The first approach (Method 1) calculates it analytically using the ability of the B-spline PDFs to yield exact higher-order derivatives. The second approach (Method 2) samples the 6-D minimum of the negative log-likelihood function in the track parameters ( , , , , 1 , 2 ) with a Markov Chain Monte Carlo (MCMC) sampler. The result is used to fit a 6-D elliptic paraboloid to the log-likelihood landscape which again requires the calculation of the 6 × 6 Hessian matrix. Details are given in Appendix B. While this approach is more time consuming, it can be a little more robust against non-Gaussianities close to the optimum if a modified 2 loss function is used (see Appendix B) and often leads to slightly wider contours as shown in Section 4.
The inverse of the obtained Hessian yields the covariance matrix containing the parameter correlations. A reduction to the 2 × 2 submatrix of the angle parameters marginalizes the other parameters and results in a 2-D uncertainty ellipse for the direction. In comparison, the traditional method described in Section 2.2 performs a paraboloid fit using profile likelihood evaluations in the two angular dimensions. Performing these profile likelihood evaluations is time consuming and can lead to unstable results, while the fixed grid size makes it unadaptable to different uncertainty angular scales. Both of these problems are solved with either of the new approaches. In particular, the new methods can detect when the outcome of the uncertainty estimation is unreliable as explained in Appendix B.
Another possibility that comes with the analytic Hessian (Method 1) is to jointly calculate the Hessian with respect to the six track parameters and additionally the energy parameters of all individual energy losses. Computationally, this full calculation has nearly no overhead, but it broadens The angles here are defined in the rotated parametrization (see Appendix A). the final uncertainty contours over the two angular dimensions due to the extra marginalization over the energy dimensions if it is used. This can be desirable, since the uncertainty is by construction too small if the energies of stochastic losses are fixed (see Section 4.2). However, if the energies are fixed in the previous optimization procedure, and also due to the high dimensionality of the problem, the enlarged Hessian is usually not calculated at a local minimum and often not positive definite, so we do not use this procedure in practice.

Performance comparisons
The new reconstruction has been applied to three different selections of simulated muon track events. The first dataset contains muon tracks that pass quality cuts (so-called NDir/LDir cuts on the number of hit DOMs and track length, respectively; see [12] for more information) based on SplineReco, which to some extent mimics events that are usually found on the final analysis selections used in IceCube. In the following they are referred to as SplineReco-optimized. Events with large stochastic losses typically obtain low NDir/LDir values with SplineReco. These events subsequently do not pass the cuts and should be mostly absent in this selection. The other two datasets are based on a geometrical Monte Carlo-based selection. One contains muon tracks starting outside the detector volume with a minimal track length of 700 m (Through-going events), the other muon tracks that start in the detector volume and have a minimal track length of 400 m (Starting events).

Angular resolution
To illustrate the performance of the new reconstruction, Figure 6 shows the median angular difference between true muon direction and reconstructed muon direction for the three classes of track-like events and the three likelihood formulations. These are compared to SplineReco default settings, i.e. without any of the modifications (see Section 2 for a description of these modifications). It can be seen that SegmentedSplineReco yields up to 20% better angular resolutions at high energies for through-going tracks, and up to a factor 2 better resolutions for starting tracks. The likelihood that looks only at the first hit ( 1st ) performs generally as good or better than the other two, which is the known behavior that is observed for the prevailing reconstructions (Section 2) and understood as mitigation of the unphysical hypothesis and the uncertainties of the ice model. The similarity of all likelihoods for through-going tracks indicates that the stochastic modelling in SegmentedSplineReco improves the overall data description and narrows the advantage of 1st , even though the slight difference in outcomes shows that the modelling is not perfect. The difference in outcomes is to be compared with the respective difference for SplineReco (compare SplineReco + and SplineReco + 1st in Figure 3), which is much larger. For starting tracks, a clear advantage for 1st remains. A potential explanation for this behavior is that the seed reconstruction used for these tracks is often skewed and the pure energy fit to determine a fixed energy loss profile subsequently gives a result that is rather far from the true energy loss profile. In these cases, 1st best manages the resulting inaccuracies on the model.  In Figure 6, the initial energy loss pattern is obtained by performing a pure energy fit with SegmentedSplineReco. This energy profile is then used as seed for the track reconstruction. Seg-mentedSplineReco can also perform a simultaneous fit of track parameters and energy cascades. However, this second option gives worse angular resolution, as shown for L ext with joint vertex and energy optimization in Figure 7. The worse performance is probably due to numerical instability issues of the high-dimensional problem. For this reason, by default the energies are always determined independently before the vertex parameters are optimized. Figure 7 also shows how SegmentedSplineReco with 1st compares to SplineReco with ("max") and without modifications ("default"). One of those modifications is an energy-dependent convolution of the time PDF of the first hit (post-jitter), which models absolute time detection uncertainty between DOMs. Since SegmentedSplineReco improves the angular resolution only at the highest energy when compared to SplineReco with max settings, this convolution has also been applied to the new reconstruction. As shown in the figure, besides an energy-dependent convolution, also a fixed-time resolution convolution has been implemented. However, the energy-dependent post-jitter convolution improves the resolution at all energies. Therefore, the energy-dependent convolution is used as default setting in SegmentedSplineReco. If no energy estimator is available, a 4.5 ns convolution gives almost comparable results. Other modifications are not really applicable to Seg-mentedSplineReco, like the effective stochastic loss profile, since they are already naturally captured in the explicit stochastic modelling within SegmentedSplineReco. It can be seen in Figure 8 that standard SegmentedSplineReco is on par or slightly better than SplineReco with these settings. The extra energy-dependent time PDF convolution is further improving the resolution by a few percent in all datasets.  Through-going events 10 4 10 5 10 6 Starting events

Angular error estimation
As discussed in Section 3.2, two methods for the calculation of uncertainty contours are implemented in the new reconstruction. Figure 9 shows the uncertainty contours for two example events. Additionally, the marginalized PDF from the MCMC samples is indicated assuming a flat prior. In general, the uncertainty contour from the analytic Hessian (Method 1) is smaller than the paraboloid fit (Method 2) for all events that show some non-Gaussian behavior. The quality of the uncertainty estimation can be judged by its coverage.
For simplicity we average major and minor axes of the uncertainty contour to create an averaged angular error for ΔΨ, the difference between reconstructed direction and true direction.
This averaged uncertainty is calculated using Ψ = √︃ 2 1 + 2 2 2 . The quantity ΔΨ/ Ψ is called the pull. Ideally, it is distributed like a Rayleigh distribution with = 1 which has a median of 1.17. Figure 10 shows the median pull for the three types of events. It compares both uncertainty calculations, and also includes the old 2-D paraboloid fit [16] for the SplineReco-optimized event class. It can be seen that the pull is generally flatter with the new uncertainty estimation, which shows that the shortcomings of fixed grid size and instability from profiling in the old uncertainty estimation are gone. However, a slight energy dependence remains, which seems to increase at the highest energy. This is reflected in the overall offset between the two curves in the median pull. Overall, the median is larger than 1.17 in any approach, which shows that the likelihood contours are underestimating the true uncertainty. The underestimation partly stems from the fact that the energies of the stochastic losses are fixed. Additional contributions likely come from remaining model mis-specifications, like the assumption that all energy losses are modelled as point-like electromagnetic cascades at fixed distances along the track and from non-Gaussian behavior around the local optima.  Table 1 shows a comparison of the running time for SplineReco and SegmentedSplineReco with the corresponding uncertainty estimations for different energy ranges. The reconstruction times are the average CPU time for 100 standard likelihood evaluations, expressed in minutes. Each reconstruction has been performed on the SplineReco-quality dataset. The new reconstruction is significantly more time consuming. On average, the running time per event for SegmentedSplineReco that includes a standard vertex fit and the error estimation with Method 1 is of ∼ 1.25 minutes, about 6 times larger than the running time required to perform a SplineReco vertex fit and uncertainty estimation with the traditional method. This run time fraction doubles when also the energy fit is performed for SegmentedSplineReco and can get up to 100 times larger when performing the energy-dependent post-jitter convolution and using Method 2 for the uncertainty estimation.

Discussion and outlook
We have introduced a new directional reconstruction for muons that explicitly models the stochastic losses in the likelihood by equidistant electromagnetic showers along the track. For all track topologies and energies, the new reconstructions shows a better muon angular resolution. The improvement increases with energy as the stochastic modelling of the muon becomes more important. For throughgoing tracks the improvement is up to 10 − 20% at PeV energies. For starting tracks, it is much more pronounced and larger than a factor of 2 above 100 TeV. In addition to the reconstruction, the uncertainty estimation has been improved by a more numerically stable determination of the Hessian at the optimum. The resulting pull distribution is now nearly flat with energy using a likelihood that only looks at the first hit, which is a common procedure to mitigate systematics. While the new reconstruction is more precise it is also more time consuming. Depending on the settings, the reconstruction including uncertainty estimation takes about 6 (using no energy-dependent convolution and Hessian Method 1) to over 100 times more processing time (using energy-dependent convolution and Hessian Method 2) per event than the previously state-of-the-art muon reconstruction SplineReco. In later stages in typical event selection chains these processing times can still be feasible. A few limiting factors remain. Currently the stochastic losses are modeled by pointlike electromagentic showers at fixed distances from each other. In reality, these losses have longitudinal emission profiles and are not equi-distant. In particular if the track passes close to a DOM such an unphysical hypothesis can bias the reconstruction. However, it is hard to see how such an assumption can be made more realistic in the parametric likelihood approach. A remaining drawback that can potentially be fixed is the currently neglected correlation between the true muon energy and the energy losses. Fitting simultaneously for the muon energy could stabilize and improve the energy loss solution. We leave that for future work.

A Parametrizations of track orientations
Instead of zenith and azimuth , it is more stable to change the parametrization of the angles to auxiliary coordinates that are independent of the position on the sphere. This is the case for both the optimization and uncertainty estimation procedures. Two common re-parametrizations are described in the following.

A.1 Tangent plane parametrization
At the current seed direction the tangent plane defines a new coordinate system which replaces and (see Figure 11). The coordinate axes are uniquely defined by a choice of orthogonal axes with respect to the seed track ì = ( , , ). An example choice is ì Φ 1 = (0, , − ), and The current track hypothesis is located at (Φ 1 , Φ 2 ), where Φ 1 and Φ 2 are new optimization parameters. A drawback of this scheme is that for large deviations from the seed the parameters lose the meaning of an angle. This can be undesired behavior seed track Φ 1 Φ 2 Figure 11: The auxiliary angular coordinates behave as Euclidean coordinates Φ 1 and Φ 2 in a unique tangential plane defined by the seed track.
if uncertainty estimation is performed in these coordinates. The tangent plane parametrization is often used by the prevailing reconstructions.

A.2 Rotation to (1,0,0)
This parametrization is defined by the unique rotation of the current seed position to the -axis (see Figure 12). The auxiliary angles 1 and 2 measure the rotated track coordinates relative to the rotated seed which aligns with the -axis. The inverse rotation −1 applied to the current rotated track yields back the track orientation in the original coordinate system. An advantage of this parametrization is that 1 and 2 are always defined as angles and it is therefore suited for uncertainty contours that can measure tens of degrees in diameter. The parametrization is used by the prevailing uncertainty estimation precedure [16]. It is also used by the new reconstruction SegmentedSplineReco and the subsequent uncertainty estimation.

B Technical details of the uncertainty estimation
This section describes the technical details of the uncertainty estimations introduced in section Section 3.2. From Wilks' theorem [25] it follows that the quantity = 2 · (logL( ) − logL( 0 )) is approximately 2 -distributed with degrees of freedom if is the number of nested parameters. At the same time, the 2 -distribution is equivalent to the logarithmic difference of the n-dimensional Multivariate Gaussian distribution at its maximum with any other point drawn from the Gaussian [26]. Therefore, can be approximated around 0 with which is the formula for a general elliptic paraboloid in dimensions with position A, parameter matrix B/2 and offset C. With these relations it is clear that determination of 0 allows to determine the covariance matrix, and thereby Gaussian contours, after matrix inversion. As discussed in Section 3.2, two methodologies to estimate 0 have been developed and are described in the following.

Method 1
The first method calculates the Hessian matrix at the optimum, 0 , analytically. Advances in automatic differentiation have only recently made this feasible, in particular we used autograd to crosscheck the implementation.

Method 2
The second method fits a paraboloid to samples logL(ˆ) of the negative likelihood function near the optimum. The samples are obtained with an affine-invariant particle-based Markov-Chain sampler, emcee [27]. The particle positions are initialized as samples drawn from a Gaussian distribution with covariance −1 0 , where 0 is the analytically calculated Hessian at the optimum (Method 1). This initialization skips the burn-in phase completely if the loglikelihood optimum is nearly Gaussian, and otherwise drastically speeds up convergence. and thereby fit a paraboloid shape to the samples. The term ( ) = yields the standard least-square loss. Empirically an often more robust fit is achieved with ( ) = 2· ( √ 1 + −1), which represents a "soft-l1" distance that better handles non-Gaussianities or irregularities in the likelihood samples. The parameter represents an n-dimensional mean and represents a 1-dimensional offset. The × parameter matrix has to be positive definite. This is ensured by a parametrizion in terms of its lower-triangular Cholesky-decomposition, which involves strictly positive parameters on the diagonal and 2 − 2 parameters for the lower-triangular off-diagonal elements. The total number of parameters of the least-square fit is then 2 + 2 + + 1. If the resulting mean is sufficiently different from 0 , or if the offset is sufficiently different from −logL( 0 ), this can indicate strong non-Gaussian behavior that even a modified least-square fit can not handle. In general this method yields slightly wider and more conservative contours than the analytic calculation (see also Figure 9), because non-Gaussianities in combination with the soft least-square fit widen the tails of the paraboloid solution.