The Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), on-board the European ENVIronmental SATellite (ENVISAT) launched on 1 March 2002, is a middle infrared Fourier Transform spectrometer measuring the atmospheric emission spectrum in limb sounding geometry. The instrument is capable to retrieve the vertical distribution of temperature and trace gases, aiming at the study of climate and atmospheric chemistry and dynamics, and at applications to data assimilation and weather forecasting. MIPAS operated in its standard observation mode for approximately two years, from July 2002 to March 2004, with scans performed at nominal spectral resolution of 0.025 cm−1 and covering the altitude range from the mesosphere to the upper troposphere with relatively high vertical resolution (about 3 km in the stratosphere). Only reduced spectral resolution measurements have been performed subsequently. MIPAS data were re-processed by ESA using updated versions of the Instrument Processing Facility (IPF v4.61 and v4.62) and provided a complete set of level-2 operational products (geolocated vertical profiles of temperature and volume m ... mehrixing ratio of H2O, O3, HNO3, CH4, N2O and NO2) with quasi continuous and global coverage in the period of MIPAS full spectral resolution mission. In this paper, we report a detailed description of the validation of MIPAS-ENVISAT operational ozone data, that was based on the comparison between MIPAS v4.61 (and, to a lesser extent, v4.62) O3 VMR profiles and a comprehensive set of correlative data, including observations from ozone sondes, ground-based lidar, FTIR and microwave radiometers, remote-sensing and in situ instruments on-board stratospheric aircraft and balloons, concurrent satellite sensors and ozone fields assimilated by the European Center for Medium-range Weather Forecasting. A coordinated effort was carried out, using common criteria for the selection of individual validation data sets, and similar methods for the comparisons. This enabled merging the individual results from a variety of independent reference measurements of proven quality (i.e. well characterized error budget) into an overall evaluation of MIPAS O3 data quality, having both statistical strength and the widest spatial and temporal coverage. Collocated measurements from ozone sondes and ground-based lidar and microwave radiometers of the Network for the Detection Atmospheric Composition Change (NDACC) were selected to carry out comparisons with time series of MIPAS O3 partial columns and to identify groups of stations and time periods with a uniform pattern of ozone differences, that were subsequently used for a vertically resolved statistical analysis. The results of the comparison are classified according to synoptic and regional systems and to altitude intervals, showing a generally good agreement within the comparison error bars in the upper and middle stratosphere. Significant differences emerge in the lower stratosphere and are only partly explained by the larger contributions of horizontal and vertical smoothing differences and of collocation errors to the total uncertainty. Further results obtained from a purely statistical analysis of the same data set from NDACC ground-based lidar stations, as well as from additional ozone soundings at middle latitudes and from NDACC ground-based FTIR measurements, confirm the validity of MIPAS O3 profiles down to the lower stratosphere, with evidence of larger discrepancies at the lowest altitudes. The validation against O3 VMR profiles using collocated observations performed by other satellite sensors (SAGE II, POAM III, ODIN-SMR, ACE-FTS, HALOE, GOME) and ECMWF assimilated ozone fields leads to consistent results, that are to a great extent compatible with those obtained from the comparison with ground-based measurements. Excellent agreement in the full vertical range of the comparison is shown with respect to collocated ozone data from stratospheric aircraft and balloon instruments, that was mostly obtained in very good spatial and temporal coincidence with MIPAS scans. This might suggest that the larger differences observed in the upper troposphere and lowermost stratosphere with respect to collocated ground-based and satellite O3 data are only partly due to a degradation of MIPAS data quality. They should be rather largely ascribed to the natural variability of these altitude regions and to other components of the comparison errors. By combining the results of this large number of validation data sets we derived a general assessment of MIPAS v4.61 and v4.62 ozone data quality. A clear indication of the validity of MIPAS O3 vertical profiles is obtained for most of the stratosphere, where the mean relative difference with the individual correlative data sets is always lower than ±10%. Furthermore, these differences always fall within the combined systematic error (from 1 hPa to 50 hPa) and the standard deviation is fully consistent with the random error of the comparison (from 1 hPa to ~30–40 hPa). A degradation in the quality of the agreement is generally observed in the lower stratosphere and upper troposphere, with biases up to 25% at 100 hPa and standard deviation of the global mean differences up to three times larger than the combined random error in the range 50–100 hPa. The larger differences observed at the bottom end of MIPAS retrieved profiles can be associated, as already noticed, to the effects of stronger atmospheric gradients in the UTLS that are perceived differently by the various measurement techniques. However, further components that may degrade the results of the comparison at lower altitudes can be identified as potentially including cloud contamination, which is likely not to have been fully filtered using the current settings of the MIPAS cloud detection algorithm, and in the linear approximation of the forward model that was used for the a priori estimate of systematic error components. The latter, when affecting systematic contributions with a random variability over the spatial and temporal scales of global averages, might result in an underestimation of the random error of the comparison and add up to other error sources, such as the possible underestimates of the p and T error propagation based on the assumption of a 1K and 2% uncertainties, respectively, on MIPAS temperature and pressure retrievals. At pressure lower than 1 hPa, only a small fraction of the selected validation data set provides correlative ozone data of adequate quality and it is difficult to derive quantitative conclusions about the performance of MIPAS O3 retrieval for the topmost layers.