Developing and Evaluating a Five Minute Phishing Awareness Video

,


Introduction
More than twenty years after its emergence, phishing still succeeds [1,37,43]. Phishing attacks are increasingly sophisticated. It used to be easy to spot phishing messages due to poor language use and incorrect spelling; nowadays phishers are far smarter, sending plausible-looking messages calculated to deceive. They have also migrated from exclusively using email to plying their trade on a range of messaging platforms including messages in social media and messaging apps. A very popular trick is to entice the target to follow a link that will install malware or visit a doppelgänger website. The latter will persuade victims to divulge sensitive information, such as their access credentials. Automated detection is a powerful defence, but far from 100% effective [3,15]. To narrow the gap left by technical measures, we need to make recipients of online messages aware of how to detect phishing attempts.
Our research group has a long history of developing phishing awareness programmes (including apps, flyers, reading material, presentations for seminars) and have carried out several user studies verifying their effectiveness [5][6][7]26,28,33,[39][40][41]. Our initial programmes required learners to spend between 20 to 45 minutes completing the awareness programmes. Evaluations showed that all programmes significantly increased phishing detection rates. However, companies are concerned about the amount of time employees have to commit to these programmes. In response, we developed a video, which made it possible for us to shorten the time people needed to commit, because we could benefit from the visualisation functionality videos offer. The video now lasts only 5 minutes. The video was developed iteratively, incorporating feedback from people with various backgrounds (such as lay users, video producers, psychologists and security experts). The final video was evaluated by 89 participants who detected phishing messages significantly more often after watching the video. Many were able to demonstrate a retained ability to do this eight weeks later. The video was improved even further based on the feedback provided during the evaluation and the result of the evaluation, i.e. for those attack types participants performed worst, the explanations in the video were improved. Thus, our contribution is twofold: (1) the developed video based on previous research on phishing awareness programmes, and (2) its evaluation both straight after watching the video and during a retention study eight weeks after watching the video. We published the video 3 under the Creative Commons licence CC BY-SA 4.0 to remove all barriers to its use.

Identification of the Relevant Content
The content to be covered is the following. The video should make the watcher aware of commonly-used phisher strategies. For example, trustworthy-looking messages, with familiar design and language employing psychological tricks to entice victims to click on an embedded link. They should also be aware of the possible consequences of clicking on a link. For example, malware could be downloaded onto their device. The web page they visit could look authentic, but actually be owned by a phisher. If credentials are divulged on this faux site, they could be used to facilitate identity theft. The video also deliberately addresses common phish-related misconceptions identified in the literature [11,12,17,18,20]. These include the following: (1) Phishers only send emails. In fact, they also use other mediums such as short text and social networking messages; (2) Phishing always harvests online banking credentials. In fact, phishers can actually fake any arbitrary web site: credentials are what they want; (3) The displayed name of the sender can be trusted to reflect the actual sender. In fact, details are faked very easily. The displayed sender name cannot be relied upon to signal authenticity; (4) Only wealthy people are targeted by phishers. On the contrary, anyone can be targeted, independently of how well known they are, how wealthy or their status in an organisation. (5) Technical security mechanisms are able to catch and block all phish messages. Actually, sophisticated phishers design their messages in such a way that the technical measures do not catch them; (6) The 'S' in HTTPS is an infallible signal of integrity. In reality, many phishers use website certificates to ally fears 4 ; and (7) Trustworthy phrases in a website URL are a signal of trustworthiness. In fact, these are merely tricks used by the unscrupulous to trap the unwary. Thus, the video should help people to distinguish between phishing and genuine messages. Similar to our more time-consuming awareness programmes, we focus the learner's attention on the difference between the URL's actual destination and the destination it seems to be. Only by examining the link can people reliably distinguish between phish and genuine messages. The following instructions, explanations, and hints were included in the video: Instruction-1: Locate the Actual Destination of a Displayed Link: The first step in phish detection is to know how to identify the actual destination the link will send people to. It might be in a tooltip, a status bars or in a special dialogue. They also need to be aware of the nuances behind links. Sometimes the actual destination is concealed behind a button or image or text like 'click here'. The actual destination is often hidden unless the person knows how to look for it. In rare cases the actual URL is displayed in the clear. The displayed tooltip might be faked too, in order to lull people into a false sense of security.
Instruction-2: Identify the So-Called Who-Area of the URL: After people have identified the actual destination URL, they should know how to identify the domain, what we refer to as the who-area. In the video, we told people that this is the last two terms that are separated by a dot before the first stand-alone "/" of a URL 5 .
We also tell them that phishers deceive people by embedding the genuine company name somewhere in the URL rather than the who-area. They could place it either before or after the who-area. They should not rely on the signal conveyed by the use of HTTPS. Examples of phishing URLs are provided: https://www.gmail.com.mail-nows.com/login https://mail-nows.com/https://www.gmail.com/login. Instruction 3: Check Authenticity of the Who-Area: Having located the who-area, the final step is to verify its authenticity, basically by checking it character by character. They are made aware of the fact that phishers often (1) use trustworthy terms (e.g. 'secure-shop.com') in the whoarea; (2) stealthily replace characters. For example, they might replace a 'd' with 'cI' or introduce typos such as 'mircosoft'.

Video Development
We developed a story and a text for the voice-over based on the content being communicated by the messages, together with someone professionally developing awareness videos. We used simple language and non-technical terms (e.g. use of the phrase "who-area" for domain). We labelled screenshots to direct their attention to the location of important information (e.g. the status bar). We asked people of different ages with varying backgrounds and levels of expertise with IT and security to provide feedback to help us to improve and refine the video. The professional video producer developed the video using our text and underlying story. The video was improved, based on feedback from a number of people who were representative of the anticipated participants.

Evaluation -Methodology
The evaluation focused on the video's effectiveness in order to reveal significant improvements, in terms of phish detection ability. The following hypotheses were formulated: H1: Participants, having watched the video, correctly judge the legitimacy of messages more often i.e. identify more phishing messages, and identify legitimate messages more reliably.
H2: Eight weeks after watching the video, participants correctly judge the legitimacy of messages more often i.e. identify more phishing messages, and identify legitimate messages more reliably.

Study Design
We conducted an online between-subjects study in two sessions. Hypothesis 1 is evaluated with the data from the first session and hypothesis 2 is evaluated with the data from both sessions. The tasks in the first evaluation session were: 1. Judge screenshots of messages. Decide whether each is a phish or legitimate. 2. Watch the video. 3. Judge screenshots of messages. When participants were asked to judge messages, the question was: "Is this a fraudulent message?'. Possible answers were: 'Yes, it is a fraudulent message'. 'No, it is not a fraudulent message'. 4. Provide video feedback (free text answers). 5. Provide demographic information. 6. Grant permission for us to contact them to engage with a retention study.
If they consented, we requested their email address and provided them with a random code to ensure an anonymous link between sessions.
During the second session (approximately eight weeks later), consenting participants were invited to participate in the retention study, which required them again to judge screenshots of phish and legitimate messages (i.e. purely step 3 from the first session). We used the SosciSurvey online platform. The study was pre-tested and the feedback from the pre-test addressed and refinements effected. The changes were mainly related to the content of the messages participants were asked to judge. We decided to go for a quiz-like evaluation, with security being the participant's primary task. The alternative would have been a study design in which the participant's primary task is related to a cover story. This would theoretically not prime them to expect and detect phish messages.
We had a number of reasons for choosing the former design. While one could argue that the second option would have more external validity, it would have been hard to maintain the deception in a lab study. As soon as participants watched the video they would have known what the study was about. We could have attempted a field study i.e. getting people to watch the video then sending phishing alike messages at some unpredictable time in the future and measure how many click on links or open an attachment. It is challenging to measure the participant's ability to identify phish at a distance though. If they do not click, the message might not have been delivered, or they might not click because they do not have an account with the "source". When they click, it might be because they know of the study setting and want to know what happens if they click. We would not be able to determine whether they actually inspected the URL or not, which is what the video trains them to do correctly. In particular, in a field study, we cannot control whether some receive the email on a smartphone with a more challenging setting than on a laptop. Thus, there are many uncontrollable factors that could confound the findings. Therefore, we decided to go for a study design in which security is participants' primary task. Note that improved awareness is only the first step towards taking action. In other words, if a user is not able to detect a phish when the primary task is security, it is unlikely that he/she will detect the phish when security is a secondary task. Thus, it is worth using a study design in which security is participants' primary task. In essence, this gives us an upper boundary for video effectiveness: the best we can hope to achieve.

Material
Messages were designed in such a way that a judgement could only be made based on the actual URL. We had to acknowledge that participants could consider a message as phish because they did not know the sender or did not have an account with the web service provider. Therefore, we asked participants to imagine the following scenario. They were Max Müller, who has an account with all web services used in the study, and who has a colleague named Jonas Schmidt. Furthermore, they were told that it was important for them to decide whether a message was legitimate or not because the fraudulent messages would harm them and ignored genuine messages could lead to negative consequences (we wanted to avoid their simply classifying all messages as phishes, just to be safe). This scenario was displayed when screenshots of messages had to be judged.
We used 16 messages in each task (pre, post, retention). All messages contained plausible content. They were displayed during the evaluation in a randomized order. Some more information about the messages: -One half contained suspicious links, the other half legitimate links. 6 -We derived messages from messages received from web services and private contacts. Messages from web service providers were in the original design with original text (only the URL was replaced for the "phishing" messages). -For all screenshots, the mouse was positioned so that the actual URL was displayed, depending on the software in place either in the tooltip (with Outlook) or in the status bar (with Thunderbird or a web browser). The usage of both types was equally distributed both for phishing messages as well as for legitimate messages. It was technically not possible to only show the URL when participants actually position the mouse on the link on SoSciSurvey. -We designed phishing messages where instruction-1 was enough to judge as well as those where instruction-2 or instruction-3 was needed (see Table 1)

Recruiting and Ethics
An attempt was made to recruit participants from a wide range of ages. Recruitment also took place via online platforms, social networks, flyers and personal invitations. Participants were not compensated for participating but we encouraged participation by telling them they learn how to avoid falling victim. The requirements for research involving the human being, defined by the ethics committee of our university 7 , were satisfied. This includes the fact that all data was collected independently of the identity of the participants. The email addresses they provided to permit us to contact them for the retention study were stored in a different database in a different order (as compared to their answers in the two sessions). The entries from the first session were linked to the one from the retention by asking participants to provide a random looking but well-defined code -well defined because they were told how to generate it based of names and birthdays from particular relatives. Furthermore, no third party (besides SosciSurvey) has a copy of the data and no third party was involved in the evaluation of the data.

Evaluation -Results
There are two groups of people in our sample: Those participating only in the first session (89: 39F/50Mx=36.1 years) and those who participated in both sessions (22: 12F/10Mx=38.09 years). There were no statistically significant differences between the groups; neither for age nor for gender. The distribution of the degree of education is as follows: from the 89 participants in the first session 50 have a university or university of applied science degree and 21 have an A-level qualification. The corresponding numbers for those 22 who participated in both sessions are: ten and five respectively. For the descriptive statistic see Table 2. The performance change in detecting phishing and legitimate messages was measured in terms of correctly detected phish and legitimate messages. The difference in performance before and after watching the video H1 was analysed using a Repeated Measures ANOVA for both groups separately: (1) those participating only in session one and (2) those participating in both sessions. Furthermore we analysed the retention performance changes H2 using the Repeated Measures ANOVA considering only the answers from those participating in both sessions. The Mauchly Test indicates that there is a violation of Sphericity and therefore a Greenhouse-Geisser correction was needed for the comparison of preand post-performance. There was no violation of Sphericity to compare pre-and retention-performance.

Phishing Detection
Pre-Post for All Participants: We first report the Repeated Measures ANOVA with Greenhouse-Geisser correction for violated sphericity for the detection of phishing messages by all participants: The within-subject factor in time (pre and post performance) is significant with p < .001 and a η 2 =.526, i.e. the performance in detecting phishing messages changes significantly. In combination with the descriptive data (see Table 2), detection of phishing messages increases significantly after watching the video. Thus, H1 can be accepted.
Participants during Retention: A Repeated Measures ANOVA for fraudulent detection reveals a significant effect for the time (pre-, post-and retentionperformance) with p < .001 and a η 2 =.636. A post-hoc test with Bonferroni correction shows that there is a significant difference between pre-and post-with p < .001 and there is a significant difference for pre-and retention-performance with p < .001. Thus, H1 and H2 can be accepted for the group of participants taking part in both sessions.

Identifying Legitimate Messages
Pre-Post for All Participants: We first report the Repeated Measures ANOVA for identification rates. The within-subjects factor in time (pre-and post-performance) is significant with p < .001 and a η 2 =.219, i.e. the performance in detecting legitimate messages changes significantly. In combination with the descriptive data (see Table 2), the identification of legitimate messages improves significantly after watching the video. Thus, H1 can be accepted.
Participants during Retention: A Repeated Measures ANOVA reveals a significant effect for the time (pre-, post-and retention-performance) with p = .019 and a η 2 =.173. A post-hoc test with Bonferroni correction shows that there is a significant difference between pre-and post-performance with p < .001. Thus, H1 can be accepted.

Individual Messages
We also looked at the individual messages and their performance in order to improve the video. The corresponding mean values are depicted in Tables 3 and  4 respectively (note the number for pre and post are for all 89 participants).

Open Feedback
Positive comments mentioned the simplicity of the video, the clarity of the content and the general comprehensibility. In particular, they liked the fact that the video was not overloaded with information. Regarding the overall design, participants liked the idea of using this type of animated video for general knowledge transfer. Feedback for improving the video was: 'More examples of the different phishing tricks' and 'Summary at the end of the video'.

Discussion
The five-minute video significantly improved phish and legitimate message detection. In other words, after watching the video, participants were able to detect phishing URLs without becoming overly cautious.
The retention part of our study is of special interest since in real life people do not receive phishing messages on a daily basis due to improvements in technical measures that filter out these messages. It is unlikely that people will use their newly-acquired knowledge very often, so they are likely to forget instructions and hints from the video. Our participants improved significantly in terms of detecting phishing messages, whereas detection rates for legitimate messages stabilised. We suggest possible explanations for this observation: -"www.vodafone.de/(...)": The mean detection rate, after watching the video, was 71.9% with 59.1% at retention. The message contained a telephone number and, instead of starting with 'Dear Martin ...', it started with 'Dear +1 121 34329' 8 . A paragraph in the email stated that Vodafone would always address their customers by their name. The issue, in this case, is that Vodafone does send emails to the phone number if the customer has not provided their name. We acknowledge that we did not spot this problem ourselves during the video refinement. -"https://buchung.lufthansa.com/s(...)/cc?soDBYCT(...)": The mean detection rate, after watching the video, was 86.5% with 77.3% at retention. The problem here might be the length of the URL. The path contains HTTP twice, includes a number of dots and the term 'redirect'. This probably elicited suspicion. Again, the email was not altered from the original sent out by the company, besides changing the name of the customer.
The two phishing messages that evaded detection to the greatest extent were related to hints provided in instruction-2. In the first case, participants did not detect the mismatch between the HTML-text-based URL displayed in the message and the actual destination URL displayed in the status bar. In the second case, participants are likely to have considered the path to be relevant in making their judgement. An improved video will have to explain these cases more clearly. The video did a great job with respect to instruction-3. While these cases always performed worst in previous evaluations, they performed better after the video.
Most of the feedback regarding improving the video was related to extending it. This is interesting because we tried to keep it as short as possible while retaining efficacy. Two aspects might be worth considering in terms of improving the video: (1) make the fact that only the URL matters even more salient. (2) provide a summary at the end of the video to consolidate and reinforce concepts.
Note that, unlike studies reported by [29], we did not observe any age differences. This might be because security was their primary task. However, if the study had been carried out in the wild, our findings might well have coincided with those reported by [29].
Finally, it was interesting to observe that those participants who had many issues with detecting phishing messages in the pre-quiz were most likely to participate in the retention study. One possible explanation is that they really enjoyed the video and were thankful for their improved awareness. One additional interpretation is that the video, in particular, addressed those who had very little pre-existing awareness of phishing.

Limitations
Almost half of the original 89 participants gave us their email addresses to contact them for the retention study. Half of these responded to our retention study request. We ended up with a sample of only 22 participants to participate in the retention session. This means that we cannot realistically generalise the results to the whole population. The participant sample, as a whole, is not representative, as most of our participants had an A-level certificate or university degree. Furthermore, due to the fact that we told participants, during recruiting, that they would learn how to protect themselves against online fraud if they took part, we probably attracted participants who were already interested in this topic. Thus, as future work, we should run the study with a different demographic.
Furthermore, participant performance should definitely be considered a "bestcase" scenario, because security was their primary task. Their actual detection rates are likely to be poorer in the real world. However, an increased awareness of phishing detection is a necessary first step to resilience. Before watching the video, people were not able to detect phishes despite it being their primary task. This is why awareness programs are important.
We used the same messages in all sessions. It may be argued that one explanation for post-video improvement was that they already knew what to look for. This might be a valid observation, but the chances seem small because legitimate message detection rates actually decreased. Furthermore, it is unlikely that after eight weeks they would still remember all the messages they had seen before, including the correct judgement. It is also worth mentioning that participants were not given feedback about which messages they had judged correctly, and which not.
Due to technical limitations of SosciSurvey, participants did not need to hover over the link deliberately in order to display the actual destination URL. It was automatically provided on the screenshot. It could be argued that we don't know whether people would hover over links because our study did not require this essential first step. On the other hand, being aware of the need to hover must be helpful.

Video Improvements
Based on the results, we identified a number of issues with the video, which we addressed in order to maximize performance. The new video lasts only 5:09 minutes and spends more time on examples. Previously, the who-areas were highlighted in green and only sub-domains highlighted in red. Now this highlighting is extended to the path. Moreover, the video now explicitly tells people when a URL is a phishing URL. We now conclude the video with a summary of the lessons learned, including tips and hints.

Related Work
A number of user studies were conducted to gain insights into the mental models of message recipients, or to evaluate the effectiveness of anti-phishing measures. For example, a game-based anti-phishing educational approach in was used by [30], [2, 4, 13, 14, 16, 32, 34-36, 42, 44]. The effectiveness of some of these games has been evaluated in user studies [4,14,16,30,32,34,35,44] with [16] and [32] comparing the effectiveness of a game-based approach with text-based awareness.
Of these, only the proposal in [32] is further evaluated in a retention study a week later [25]. Usually the participants in user studies are adults, but some researchers have started studying phishing education for children [27].
Another approach to anti-phishing education, utilises the so-called teachable moments: participants were sent a simulated phishing email with a suspiciouslooking link. If they clicked on it, they were directed to a landing page containing phishing-related information. Such an approach, in particular, has been used by Caputo et al. [8]. The authors also conducted a retention study of antiphishing training in a corporate setting after a period of 90 days. The results of the study, however, did not indicate any significant improvement. A similar approach has been used in further research [19,23,24], both of which conducted retention studies after one week, that did show significant improvements in terms of reducing participants' susceptibility to phishing emails. A further study [22] built upon the evaluation in [23,24] and tested the participants' retention via multiple simulated phishing emails sent over the course of 28 days, with the results showing no significant loss of retention by the end of study. A similar approach but with spear phishing messages was studied in [9,38]. The study in [10] further evaluated the effectiveness of an anti-phishing training based on three simulated phishing trials over a two month period, showing significant improvements even after the end of the study.
Other anti-phishing educational approaches include training materials, educational videos and e-learning modules. As such, a study in [45] evaluated an anti-phishing training coupled with motivational videos. Participants in the control group watched cooking videos instead. The study found that the training increased participants' ability to detect phishing emails, but also increased the rate of false positives. The authors did not test the retention effect of the training. A study reported by [31] compared the performance of several anti-phishing educational approaches, including the game-based approach from [32], training materials from [24] and popular anti-phishing materials found on the web. All of these approaches significantly improved participant ability to detect phishing links. Retention was not tested. An anti-phishing e-learning module was developed by [21] but was only evaluated with a small sample of participants, and retention was also not tested.

Conclusion
Modern technology allows confidence tricksters to target a large number of people using phishing messages, at minimal cost. It is still desirable for people to know how to detect these messages as technology is far away from detecting 100%. In this paper we report on the development of a video to raise phishing awareness without deterring confirmation of the legitimacy of genuine messages. We used our knowledge and experience from past research to develop a short yet effective phish-awareness video. Our main aim was to cover the most relevant content in a short video to address the companies' needs to raise awareness, but not necessarily to have the luxury of spending between 20 and 45 minutes to do so. The video was evaluated by 89 participants. Furthermore, a retention study, in which 22 of these 89 participants took part, was conducted after eight weeks. The results of the study show that the ability of the participants to distinguish between phishing and legitimate links increased significantly directly after watching the video and that, even after eight weeks, the participants were significantly better at detecting phishing links than before watching the video.