Slovenian Version of the Deese–Roediger–McDermott Paradigm: A Normative and Exploratory Study

: The research with the Deese–Roediger–McDermott (DRM) paradigm has accumulated substantial evidence for the reliable cross-cultural false memory effect. However, only a limited set of languages has had the benefit of receiving their own version of the DRM paradigm, let alone the corresponding normative data. In this study, we used 728 participants (594 women) to create the first Slovenian version of the DRM paradigm, and another 90 participants (66 women) to test its effectiveness in inducing false memories and conduct exploratory analyses. We are the first one to conduct a normative DRM study entirely online and among the few that measured reaction times (RTs) besides commonly employed accuracy measures. Overall, the participants recalled 69% of the list items and 14% of the critical lures, and additionally recognised 79% of the list items and 45% of the critical lures. The participants recalled and recognised list items as fast as critical lures and significantly faster than unrelated words. The study shows that false memories can be effectively explored by using the online format of the DRM paradigm. Future studies should investigate the relationship between RTs and accuracy measures, the relationship between recall and recognition, as well as the effects of the experimental environment on the results of the DRM studies in more detail. We hope that the provided material and its scrutiny will contribute to the progress in false memory research, especially in the Slovenian language environment.

While the DRM paradigm is known for its reliability in inducing spontaneous false memories in the laboratory setting (Brainerd & Reyna, 2005;Gallo, 2006), studies show that this ability varies highly from list to list.In Deese's study (1959), for example, the critical lure butterfly was never recalled, whereas the critical lure sleep was recalled in 44% of cases.Similarly, Stadler et al. (1999) demonstrated that the critical lure king was recalled in only 10% of the cases, whereas the critical lure window was recalled in 65% of the cases.While these studies were limited to English-speaking environments, the same finding was also observed in studies employing the DRM paradigm in other languages (e.g., Anastasi, De Leon, & Rhodes, 2005).High variability across lists has been connected to forward association strength (FAS), backward association strength (BAS), the structure of the lists, instructions, etc. (Brainerd et al., 2008;Gallo & Roediger, 2002).Notwithstanding, this suggests that the researchers employing the DRM paradigm should be informed about the list-to-list effectiveness of inducing false memories in order to conduct well-designed studies.
Most of the time, the research with the DRM paradigm has focused on categorical accuracy-related dependent variables, such as old/new responses in a recognition test, whereas more continuous measures, such as reaction times (RTs), have been rarely employed, especially with recall tests (Gallo, 2006).This is surprising given that RTs have proven to be a useful objective measure of memory strength (Gallo, 2006;Jou, 2008) that can provide new insight into the cognitive processes underlying false memory, such as separating different stages of false recall (Jou et al., 2004), comparing memory processes between younger and older participants (Tun et al., 1998), or exploring the speed-accuracy trade-off (Coane et al., 2007).Despite the lack of research, the RT data appears to be consistent with the relatedness effect on false memory, according to which critical lures-due to their similarity to studied list items-are recalled and recognised faster than unrelated lures (Gallo, 2006).However, the findings are still inconclusive whether-and under what circumstancesdifferences exist between RTs of remembering list items and critical lures.Some have demonstrated that the RTs were shorter for recognising list items than for recognising critical lures (Jou et al., 2004;Payne et al., 2002), and similar The phenomenon of false memory, commonly defined as the recollection of an event that did not occur, has attracted considerable attention in the past few decades.It has been demonstrated in patients with neuropsychological disorders and everyday people of different ages, cultures, and educational levels (for a review, see Brainerd & Reyna, 2005;Gallo, 2006Gallo, , 2010;;Johnson & Raye, 1998).This has led researchers to increasingly adopt the constructivist understanding of memory (e.g., Schacter & Addis, 2007) and challenge important social and legal issues, such as cases of recovered memories of childhood abuse (e.g., Loftus, 1993) or the reliability of eyewitness testimony (e.g., Loftus & Ketcham, 1991).To experiment with the phenomenon, researchers have invented several methods for inducing false memories, one of the most frequently used being the Deese-Roediger-McDermott (DRM) paradigm (Deese, 1959;Roediger & McDermott, 1995).In line with some other recent attempts (Anastasi, DeLeon, & Rhodes, 2005;Iacullo & Marucci, 2016), we present here the first Slovenian version of the DRM paradigm together with the normative and exploratory data obtained entirely online.
In the DRM paradigm, participants are presented with lists of words where the words from each list (i.e., list items) are associatively related to one word that is not presented (i.e., critical lure).When participants are asked to recall or recognise these words, they many times report the critical lure as being part of the list (for reviews, see Gallo, 2006Gallo, , 2010)).For example, list items, such as hard, light, pillow, plush, loud, etc., often elicit a report of the critical lure soft even though it was not presented (Stadler et al., 1999).Typically, critical lures are recalled and recognised less frequently than list items, but more frequently than words that are unrelated to list items (i.e., unrelated lures; Gallo, 2006).Two dominant explanations of the DRM false memory effect are activationmonitoring theory (McDermott & Watson, 2001) and fuzzytrace theory (FTT; Brainerd & Reyna, 2002).Both maintain that two distinct processes, error-inflating and error-reducing processes, work together in true memory retrieval, but against each other in false memory retrieval (Arndt & Gould, 2006;Gallo, 2010;McDermott & Watson, 2001).Activationmonitoring theory claims that false memories arise because bringing to mind list items repeatedly activates a critical lure, which is then mistaken for a list item if not appropriately monitored.Similarly, FTT assumes that false memories arise because perceiving related list items construct such a strong gist trace (general meaning, usually consistent with the corresponding critical lure) that verbatim traces (specific episodic details) are unable to reject it (Brainerd & Reyna, 2002).
Most findings related to the DRM paradigm are based on English-speaking participants (Anastasi, DeLeon, & Rhodes, 2005;Ulatowska & Olszewska, 2013).Recently, however, several researchers have recognised the need to extend the research to non-English languages.Some researchers directly translated the English word lists (Deese, 1959;Roediger & McDermott, 1995;Stadler et al., 1999) into Chinese (e.g., Lee et al., 2008), Dutch (e.g., Zeelenberg & Pecher, 2002), French (e.g., Cabeza & Lennartson, 2005), German (e.g., Anastasi, Rhodes, et al., 2005), Japanese (e.g., Anastasi, Rhodes, et al., Slovenian version of the DRM paradigm 728 (594 women) were used to create the Slovenian word lists.All were university students of various disciplines from the University of Ljubljana, the University of Maribor, and the University of Primorska, aged between 19 and 31 (M = 21.7 years, SD = 2.2 years).The participants were told that they would take part in the process of creating the Slovenian version of the psychological memory test by providing their associations for a set of words.
Additional 111 volunteers unaware of the first stage of the study were recruited to take part in the second stage of the study (i.e., normative/exploratory part) via mailing lists and various social media groups, but 21 of them were excluded from the analysis due to having a history of neuropsychological disorders, not being Slovenian native speakers, or having experienced technical issues during the study.The remaining 90 (66 women), aged between 18 and 33 (M = 22.3 years, SD = 3.6 years), were mostly university students (67), but also younger employees (13), unemployed (7), and high school students (2).Out of 67 university students, 37 were promised course credit for their participation.The sample size was determined roughly as per previous research (Anastasi, Rhodes, et al., 2005;Iacullo & Marucci, 2016;Johansson & Stenberg, 2002).The participants were told that they would take part in the process of creating a psychological memory test by listening to audio recordings and completing memory tests, but not that they would be tested for false memories because this could have influenced the results of the study (Gallo et al., 2001).They were fully debriefed after the data collection was completed.

Creating the word lists
Three independent translators translated 100 words from the Kent-Rosanoff association test (Kent & Rosanoff, 1910) to arrive at a basic set of critical lures.Differences in translations were discussed and one translation was selected.Ten words were excluded mostly due to ambiguity (e.g., the word modra [blue] was ambiguous, for it could mean being wise or a blue colour in Slovenian), and the remaining 90 words were used in further steps.
Next, an anonymised online questionnaire was used to ask the participants about the first association that comes to mind for each of the 90 words.The obtained associations for each word were ranked by the frequency of occurrence.When very similar associations were given to the same word, the most frequent one was selected and its frequency was replaced by the sum of the frequencies of both associations (e.g., words svetlo [light or bright] and svetloba [light] were both listed as the associates to the critical lure okno [window], but only svetloba was kept in the final selection as it occurred more frequently in the association test).Very vague, uninformative, or general associations (e.g., word najboljše [the best] as an association to the word spanje [sleep]) were also replaced.
Out of 90 words and their associations, 36 words were selected with the least overlap among their 15 most frequent associations.The remaining overlap was further taken care of by substituting an association that received fewer responses for the most frequent unused association to the same word (e.g., the word bela [white] was present among the most results were obtained for the recall data (Jou, 2008).Others, however, found no differences in the recognition RTs between list items and critical lures (Thomas & Sommers, 2005;Tun et al., 1998).
In our study, we developed the Slovenian DRM paradigm from the ground up, tested its effectiveness for inducing false memories, and conducted exploratory analyses by considering both accuracy measures as well as RTs.This is one of the first DRM studies that were conducted online and the first DRM study that obtained extensive normative and exploratory data completely online1 , which is in line with the growing practice of internet-based empirical research (Peirce et al., 2019).Although testing theoretical hypotheses was not our objective, we still had some broad expectations.First, according to the relatedness effect, we expected critical lures would be recalled and recognised more frequently than unrelated lures and less frequently than list items (Anastasi, De Leon, & Rhodes, 2005;Iacullo & Marucci, 2016;Johansson & Stenberg, 2002;Stadler et al., 1999;Ulatowska & Olszewska, 2013).Also, we expected critical lures would be recalled and (positively) recognised either slower (Jou et al., 2004) or equally fast (Tun et al., 1998) as list items, and that both critical lures and list items would be recalled and (positively) recognised faster than unrelated lures (Coane et al., 2007).

Method Outline of the research design
The study consisted of two stages: (1) creating the Slovenian word lists that can induce false memories, and (2) exploring the effects of the word lists in the normative/ exploratory study.In the first stage, we roughly followed the procedures presented in prior research (Deese, 1959;Roediger & McDermott, 1995;Russell & Jenkins, 1954).We translated the words from the Kent-Rosanoff association test (Kent & Rosanoff, 1910), used the translated words to carry out the Slovenian version of the Kent-Rosanoff test, and finally created the word lists.In the second stage, we tested the efficiency of the created word lists to induce false memories, as in the study by Stadler et al. (1999) and conducted exploratory analyses.

Participants
For the first stage of the study (i.e., creating word lists), 1177 volunteers responded to the research participation initiative promoted via mailing lists and various social media groups.Of them, 449 were excluded from the analysis due to not being university students (only students were kept as they represented the majority of the initial sample), having a history of neuropsychological disorders, not being Slovenian native speakers, or providing incomplete data.The remaining take part in the study, the participants received an email with the link to the online socio-demographic questionnaire, a custom-made link to the online experiment, and some general instructions/recommendations on how to proceed.These included: (1) finding a time and place that would minimise potential interruptions; (2) setting up the computer; and (3) testing the speakers or the headphones.The use of either Google Chrome or Mozilla Firefox with no or few tabs open was recommended.The participants were asked to conduct the experiment in one piece, earnestly and with commitment.To prevent tiredness (because of the length of the experiment) from interfering with the results, the 36 lists were divided randomly into three sets of 12 and each participant was pseudo-randomly assigned either to the first, the second or the third set3 .Since the researcher was not physically present when the participants conducted the online experiment, special care was taken to design the application in a way that helped the participants clearly and unambiguously understand each step in the process.
For the recall test, the participants were told that they would listen to 12 lists of words, each starting and ending with a beep and that after each list, they would have two minutes to type as many words as they could remember.As per previous normative studies (e.g., Stadler et al., 1999), they were told to type only those words for which they were certain were presented in the previous list.The participants were instructed to press the Return key if they wished to type another word, or the Right key if they could not remember any more words.Each recalled word (i.e., submitted by the Return key) appeared in a vertical list on the left side of the screen frequent associations to the words ovca [sheep] and luna [moon], but the frequency of the second association was lower, so it was substituted for the less frequent association okrogla [round] to the same word).This led to the development of 36 word lists, with each of them comprising the 15 strongest associations to the list's critical lure (see Appendix A for the complete set of word lists).

Materials
The 36 word lists were audio-recorded and edited so that each recording started with a short beep, followed by 15 words uttered in the descending order of the associative strength2 with a two-second pause in between, and ended with another short beep.This way, the presentation of the words (e.g., volume, intonation, pace, tone of voice) was identical for all participants.The words were spoken by a male speaker whose native language was Slovenian.
An open-source tool for designing behavioural experiments, PsychoPy v3.0 (Peirce et al., 2019), was used to develop and run an online version of the DRM paradigm.The participants accessed the application via Pavlovia (https:// pavlovia.org/),an online repository for running PsychoPy experiments, using their computer, keyboard, and mouse to finish the experiment.

Procedure
Figure 1 summarises the process of data preparation, participant allocation, and study protocol.After agreeing to

Figure 1 Data preparation, participant allocation, and study protocol for one participant from Set 1 and one participant from Set 3 (the protocol was equivalent for Set 2)
participants did not recall or recognise all item types (i.e., list items, critical lures, and unrelated lures), the final RT dataset was unbalanced, comprising 34 incomplete participant cells for recall, and 40 incomplete participant cells for recognition.
For the analysis of accuracy data, we calculated recall and recognition rates per participant and word list for each item type.To get the participant's list item recall/recognition rates, we divided the number of all correctly recalled/recognised list items of the participant by the number of all list items of all lists that the participant listened to (recall rates) or by the number of all list items that were included in the recognition test (recognition rates).To get the participant's recall/ recognition rates of the critical lures, we divided the number of all recalled/recognised critical lures of the participant by the number of lists that the participant listened to.We used a similar procedure for the calculation of unrelated lure rates and rates per list.For the RT analysis, we calculated average recall and recognition RTs for each item type and response to the recognition test (i.e., yes, no).The focus of the analysis was on testing the differences and the relationships (between item types, responses, and recall/recognition tests) within accuracy data and RTs separately, as well as the relationships between them.
Since the data violated the assumptions of common parametric procedures (e.g., the accuracy and RT distributions differed from normal, some RTs were unusually long, etc.), we decided to use robust statistical methods from the R packages MANOVA.RM (Friedrich et al., 2019) and WRS2 (Mair & Wilcox, 2020), developed to maintain the power in the presence of outliers and non-normal distributions 4 .We used a robust repeated measures ANOVA (RM function from MANOVA.RM) to compare recognition RTs across all combinations of item type and response variable levels (list item-yes, list item-no, critical lure-yes, critical lure-in uppercase letters (e.g., "1.MOŽGANI" if the first recalled word was možgani [brain]).The list with recalled words was shown until the Right key was pressed.To approximate the conditions of the usual physical format of the DRM paradigm, the participants were additionally told that they could delete the word if they made a mistake by clicking on the red button next to that word.For each participant, 12 lists of words were presented in random order.
After finishing the recall of the last list from the set, the instructions for the recognition test were presented, followed by the test itself.The recognition test consisted of 72 words: 36 words (i.e., list items) taken from positions 1, 8 and 10 from each of the 12 lists from the set used, 12 critical lures of all 12 lists from the set used, and 24 words taken from the set of words that were excluded in the process of creating the lists (see section Creating word lists) and were unrelated to the studied lists (see Appendix B for the complete set of unrelated words used in the recognition test).The words were presented at the centre of the screen in white uppercase letters against a grey background.A blank screen, lasting for one second, was shown between each consecutive word.The participants were instructed to press the D key if they were certain that the word was presented in one of the lists, or the N key if the word was not presented or if they were not sure whether it was presented.The order of the words was randomised for each participant.
RTs were measured for each recalled and recognised word.Apart from the two minutes timeframe for recalling the words from the individual list, the participants were not told to recall as fast as possible or that RTs are being measured so that no additional (time) pressure was introduced.For the recall test, the RT of the individual word was defined as the time that elapsed from the confirmation of the previous word by pressing the Return key (or, for the initial word, when the recall test started) to the confirmation of the current word and so on until two minutes have passed or until the Right key was pressed, indicating the end of the recall.For the recognition test, the RT of the individual word was defined as the time that elapsed from the word onset to the output of a response, i.e., pressing the D or the N key.

Statistical analysis
We used Excel and Python for data preparation and list creation, and the open-source software environment R (R Core Team, 2022) for the statistical analysis.
The exclusion of some participants (see section Participants) left us with data from 30 participants for each set of lists.We additionally took care of some fortuitously made errors in the material (e.g., one list had an incorrect number of words on it) and problematic responses in the recall and recognition tests (e.g., participants made several typos when recalling words; see Appendix C for more details).Since we distributed the word lists and the participants to the three sets of lists randomly, we assumed that there should not be any systematic differences in terms of the results of different groups.To improve the statistical power, we therefore combined and jointly analysed the accuracy and RT data from all three sets of lists (see Appendix D and E for descriptive statistics of each set of lists individually).As some 4 Before using robust procedures, we considered trimming the RTs by using an absolute lower threshold of 100ms and an upper threshold of three standard deviations above the mean separately for each item type (on the recall and recognition test) and response (on the recognition test).Yet, as pointed out by one of the reviewers, there are many problems with trimming the data according to rules of thumb (e.g., Field & Wilcox, 2017).Also, the unusually long RTs in the present study (e.g., the longest recall RTs were 138914 ms, 122552 ms, 118913 ms, etc., and the longest recognition RTs were 99533 ms, 46303 ms, 35684 ms, etc.) could have been the consequence of the online format of the study.Namely, the participants could have experienced internet connection problems, got distracted easier because of the physical absence of the researcher, or taken a little more time for deliberation since they were not instructed to recall and recognise the words as fast as possible.Accordingly, visual inspection suggested that the possible differences between the RTs of list items, critical lures, and unrelated lures exist in the tails of the corresponding distributions.Thus, trimming could indeed skew the data and robust procedures were probably a better choice.We thank the anonymous reviewer for this suggestion. 5Since the RM function does not work with unbalanced designs, we balanced the recognition RT dataset by using the complete cases approach just for this procedure, removing 40 (i.e., 44%) participant cells.Each condition (list item-yes, list item-no, critical lure-yes, critical lure-no, unrelated lure-yes, unrelated lure-no) comprised 50 participant cells.

Results
Table 1 shows descriptive statistics for all datasets used in the inferential procedures.

Recall accuracy
All participants recalled at least one list item, 67 out of 90 participants recalled at least one critical lure, and 69 out of 90 participants recalled at least one unrelated lure.Of all recalled words, 97.16% were list items, 1.32% were critical lures, and 1.51% were unrelated lures.
Table 2 shows recall rates of critical lures and list items for each list.The recall rates of list items varied from 80% (sadje [ fruit]) to 54% (kratko [short]), whereas the recall rates of critical lures were lower and varied from 47% (zdravnik [doctor]) to 0% (e.g., štedilnik [stove]).Robust paired Yuen's test showed the participants were more likely to recall list items than critical lures, M diff = 0.57, 95% CI [0.53, 0.62], Y t (53) = 27.04,p < .001,δ t = 3.48.no, unrelated lure-yes, unrelated lure-no) 5 .We report a resampling version of Wald-type statistic (WTS) based on a permutation method.For multiple within-subject comparisons within one variable, we used a bootstrap-based repeated measures ANOVA for trimmed means (rmanovab function from WRS2) and the corresponding post hoc tests (pairdepb function from WRS2).Because the pairdepb function does not work with unbalanced designs, we sometimes used its non-bootstrap alternative (rmmcp function from WRS2).For post hoc tests, we report a difference between trimmed means (ψ ^ ), an associated bootstrap confidence interval, and the bootstrapped effect size (δ t ; Algina et al., 2005), where 0.2, 0.5, and 0.8 correspond to small, medium, and large effects.For comparing two dependent means, we used Yuen's modified t-test for trimmed means (yuend function from WRS2) and bootstrapped effect size (Algina et al., 2005).Finally, for correlational analyses, we used R package boot to implement Pearson's correlation with bootstrapping.We report bias-corrected and accelerated (BCa) bootstrap confidence intervals.
As per recommendations (Field & Wilcox, 2017), we used 20% trimming for all trimmed means procedures and 2000 samples for all bootstrap-based procedures.The level for assessing the statistical significance was .05.Note.Because the maximum number of recalled unrelated lures was not known in advance (as it was with recall/recognition list items and critical lures), we could not calculate comparable rates.N = the number of participants with at least one response in a given category.

Recognition accuracy
All participants recognised at least one list item, 87 out of 90 participants recognised at least one critical lure, and 53 out of 90 participants recognised at least one unrelated lure.Of all recognised words, 81.64% were list items, 15.64% were critical lures, and 2.72% were unrelated lures.

Comparison of recall and recognition accuracy
As indicated by the robust paired Yuen's tests, the comparison of recall and recognition rates showed that the Note.The lists are ordered in descending order of the mean critical lure rates.By accident, the list with the critical lure dekle contained 16 instead of 15 words, hence its results are excluded from the total M and SD calculation.J. Černe and U. Kordeš Note.The lists are ordered in descending order of the mean critical lure rates.Total M and SD are calculated based on all sets of lists.By accident, the list with critical lure dekle had 16 instead of 15 words on it, hence its results are excluded from the total M and SD calculation.

Table 4
Mean Rates for Critical Lures and List Items Recalled and Mean Rates for Critical Lures and List Items Recognised for the Top 18 Lists from the Present Study, and from the English (Stadler et al., 1999), Swedish (Johansson & Stenberg, 2002), Spanish (Anastasi, De Leon et al., 2005), Polish (Ulatowska & Olszewska, 2013) and Italian (Iacullo & Marucci, 2016)

Comparison of accuracy data with similar normative studies
We averaged the list item and critical lure rates for the top 18 lists (upper half) from the present study and the five most similar studies that provided norms for English (Stadler et al., 1999), Swedish (Johansson & Stenberg, 2002), Spanish (Anastasi, De Leon, & Rhodes, 2005), Polish (Ulatowska & Olszewska, 2013) and Italian (Iacullo & Marucci, 2016).Table 4 shows the results.Considering critical lures, the present study showed the lowest recall and recognition rates (23% and 60%, respectively), the Swedish study had the highest recall rate (65%), and both the Swedish and the Italian studies had the highest recognition rate (87%).Considering list items, the Spanish study showed the lowest recall rate (52%), the Italian study had the lowest recognition rate (59%), and the present study had the highest recall and recognition rates (67% and 80%, respectively).

Recall reaction times
In terms of recall data, a robust one-way repeated measures ANOVA showed RTs differ across item types, F t = 11.85,F crit = 3.75, p < .05.On average, as can be seen in Figure 2, the participants recalled list items slightly faster than critical lures, and almost three seconds faster than unrelated lures.Post hoc tests (non-bootstrap version) showed statistically significant differences for list items and unrelated lures, ψ ^ = -2659.47,95% CI [-4916.77, -402.17],p t < .05,δ t = 0.41, and for critical lures and unrelated lures, ψ ^ = -2787.36,CI [-5273.67, -301.06],p t < .05,δ t = 0.40, but not for list items and critical lures, ψ ^ = 680.27,CI [-312.33, 1672.87],p t > .05,δ t = 0.29.One possible reason for the significant differences Note.Error bars depict the standard errors of the mean.between the recall RTs of different item types is that recalled unrelated lures comprised longer words, thus requiring more time for typing them.However, the comparison of the average length (number of letters) of all the recalled words across different item types did not support such speculation: the length of both recalled list items (M = 6.0,SD = 1.1) and critical lures (M = 6.0,SD = 1.2) was even slightly greater than the length of the unrelated lures (M = 5.6, SD = 1.3).

Recognition reaction times
Figure 3 shows the recognition RTs for positive (pressing the D key as a response to the presented word indicating that the word was included in one of the lists) and negative (pressing the N key as a response to the presented word indicating that the word was not included in one of the lists) responses across showed significant differences between different item types for positive, F t = 15.60,F crit = 4.45, p < .05,and negative responses, F t = 45.01,F crit = 3.18, p < .05,so we proceeded with further robust pairwise comparisons (non-bootstrap version; Table 5).In terms of positive responses, the RTs of list items did not differ from the RTs of critical lures, however, the RTs of list items and critical lures were significantly shorter compared to unrelated lures.As for negative responses, the results showed that the participants were the slowest at correctly rejecting critical lures, faster at wrongly rejecting list items, and the fastest at correctly rejecting unrelated lures.
As far as the differences between positive and negative responses within individual item types are concerned, a robust paired Yuen's test showed that the RTs for positive responses different item types.A robust two-way repeated measures ANOVA (complete cases) showed a significant interaction between item types and responses, WTS(2) = 33.24,p < .001,indicating that when the participants responded to list items, critical lures, and unrelated lures, the differences in RTs were dependent on whether they responded positively or negatively.More concretely, for list items and critical lures, the RTs were shorter for the positive responses compared to the negative ones, but the opposite was true for unrelated lures.
We conducted post hoc tests by comparing (i) different item types within positive and negative responses separately, and (ii) by comparing different responses within list items, critical lures, and unrelated lures separately.For the first comparison, a robust one-way repeated measures ANOVA  3), a robust one-way repeated measures ANOVA showed that they varied significantly across item types, F t = 75.47,F crit = 3.96, p < .05.Robust pairwise comparisons (nonbootstrap version) showed that the RTs for correct responses to list items (Yes) and unrelated lures (No) were roughly equal, ψ ^ = -40.79,95% CI [-123.44, 41.87], whereas the RTs for correct responses to critical lures (No) were significantly longer than the RTs for correct responses to list items, ψ ^ = -1011.27,CI [-1255.53, -767.01], and to unrelated lures, ψ ^ = 944.18,CI [685.18, 1203.25].

Results of Correlational Analysis
The results of the correlational analysis are presented in Table 6.

Correlations within accuracy data
In terms of accuracy rates per participant, we could not find any correlation between recalled list items and critical lures, as well as between recognised list items and critical lures, or between recognised list items and unrelated lures.However, we found a mild positive correlation between the rates of recognised critical and unrelated lures.Concerning participants' recall and recognition rates, we found a moderate positive correlation between the recall and recognition rates both for list items and critical lures.In terms of data averaged by word lists, we observed a similar positive correlation between the recall and recognition rates for list items and critical lures.

Correlations within reaction times
Regarding participants' recall RTs across item types, we found a moderate positive correlation between list items and critical lures, but not between list items and unrelated lures or between critical and unrelated lures.In other words, the participants who recalled list items faster also took less time when recalling critical lures.
As with recall RTs, the analysis of recognition RT showed a moderate positive correlation between list items and critical lures, and no correlation between list items and unrelated lures.However, contrary to the recall RTs, there was a mild positive correlation between recognition RTs of critical and unrelated lures, which indicates that the participants who recognised critical lures faster also did so for unrelated lures.
Considering recall RTs and RTs of positive responses to the recognition test, we found a mild positive correlation for list items, but no correlation for critical lures or unrelated lures, indicating that the participants who recalled list items faster also recognised them faster.

Correlations between accuracy data and reaction times
We found no correlation between recall rates and RTs for list items or critical lures.Likewise, we could not find a correlation between recognition rates and RTs for list items or unrelated lures; however, we found a mild negative correlation for critical lures, which indicates that the participants who responded faster to the onset of critical lures in the recognition test were also more likely to respond to critical lures positively.

Discussion
The purpose of the present study was to develop the first Slovenian (online) version of the DRM paradigm, explore its effectiveness to induce false memories and provide exploratory analyses.In the remaining part of the paper, we comment on the obtained results and relate them to prior literature.
We demonstrated a similar, albeit lower false memory effect to that of prior research (Anastasi, De Leon, & Rhodes, 2005;Iacullo & Marucci, 2016;Johansson & Stenberg, 2002;Stadler et al., 1999;Ulatowska & Olszewska, 2013).According to our expectation, the participants were more likely to recall and recognise list items than critical lures.While recall rates were lower than recognition rates, their between-list variability was high for both list items and critical lures (80%-54% and 99%-60% for the recall and recognition of list items, and 47%-0% and 77%-13% for the recall and recognition of critical lures, respectively).The reason why participants recognised list items and critical lures more frequently than recalled them is probably due to the recognition test being applied after the participants had conducted the recall test.As suggested by Roediger & McDermott (1995), this type of experimental design might cause source monitoring errors (Johnson et al., 1993).
Contrary to our expectation, the participants recalled slightly fewer critical lures than unrelated lures (1.32% vs. 1.51%).This might be so because recalled unrelated lures of a certain list may have been presented in-or were even a critical lure of-some other list, making them more readily available for recall as opposed to true unrelated lures.Indeed, further analysis showed that out of the total 169 recalled unrelated lures, 124 were truly unrelated, and 45 were either a list item or a critical lure from one of the previously presented lists.If we consider truly unrelated lures only, then critical lures were recalled more frequently than unrelated lures (1.32% vs. 1.00%), which is in line with our expectations.In contrast, the recognition test consisted of unrelated lures that were completely (or mostly) unrelated to all the lists that were presented to the participant and the participants recognised many more critical lures than unrelated lures (15.6% vs. 2.7%), suggesting that false memory effect was indeed present.
Correlational analysis showed that the participants who recalled or recognised more list items did not necessarily do J.Černe and U. Kordeš and the slowest in rejecting the critical lures.While it is not surprising that the participants rejected the unrelated lures the fastest, it is unexpected that the critical lures were rejected at a slower pace compared to the list items.Since positive responses to list items and critical lures were processed similarly fast, one would expect negative responses to be processed similarly fast (Tun et al., 1998).Another reasonable outcome would be that list items, being actually presented, would be rejected slower than critical lures (Jou et al., 2004).Neither was true for our results, which points to the unique character of the critical lures used in the present study.
In terms of RTs of correct responses to the recognition test, the results showed the participants were similarly fast at accepting the list items and rejecting the unrelated lures, but significantly slower at rejecting the critical items, which agrees with prior research (Atkins & Reuter-Lorenz, 2008).This suggests that only closely related false memories (when they are not mistaken for true memories) require additional processing to be evaluated correctly, which makes the DRM paradigm particularly interesting for studying cognitive inhibition (Howe, 2005).
Correlational tests on RTs provided further evidence for a close resemblance between true and closely related false memories, as well as a dissimilarity between true and unrelated false memories.Namely, for both recall and recognition RTs, list items were associated with critical lures, but not with unrelated lures.In addition, recall RTs of critical lures were not associated with unrelated lures.However, we also observed a mild correlation between recognition RTs of critical lures and unrelated lures, which is inconsistent with the above conclusion and points to the similarity between closely related and unrelated false memories.The comparison between RTs of recall and recognition showed a correlation for list items, but not for critical and unrelated lures, which suggests that the relationship between recall and recognition memory is more complex for RTs than accuracy data.Lastly, we found that critical lure recognition rates were negatively associated with corresponding RTs, which indicates that the participants who responded faster to the onset of critical lures also produced more false memories or vice versa.
Together with the fact that the present study demonstrated the highest rate of true memories and the lowest rate of closely related false memories in comparison with five similar normative studies, the negative correlation between time and accuracy of false recognition implies that the participants favoured accuracy at the cost of RTs.Such a speed-accuracy trade-off is not surprising given the fact that we instructed the participants to avoid making mistakes, but we did not instruct them to respond as fast as possible.This finding is also consistent with two-process theories of false memory, such as activation-monitoring theory and FTT, which assume that the error-reducing mechanism is slower than the errorinflating mechanism (Arndt & Gould, 2006).Namely, studies showed that speeded retrieval disrupts the error-reducing mechanism, whereas unspeeded retrieval enhances it (Arndt & Gould, 2006).Put differently, increased retrieval time contributes to more true memories and fewer false memories.
To sum up, we showed that the effect of the DRM paradigm can be extended to the Slovenian language environment and demonstrated that the DRM paradigm can so for critical lures (recall and recognition) or unrelated lures (recognition).This suggests that the participants solved the task as instructed ("earnestly and with commitment") and did not simply strive to score as many words as possible, be they true or not.The correlational analysis also showed that the participants who recognised more critical lures also recognised more unrelated lures, which suggests that some participants are more prone than others to false remembering in general.In line with prior research (Anastasi, De Leon, & Rhodes, 2005;Iacullo & Marucci, 2016;Stadler et al., 1999), the results showed that the list item and critical lure recall were correlated with the list item and critical lure recognition on the level of both participants and word lists.This supports the idea that recall and recognition memory are strongly related (Haist et al., 1992) and indicates that the lists used to induce false memories are similarly effective for both recall and recognition.
The comparison of the 18 most effective lists between our study and five similar normative studies showed that the present study demonstrated the lowest rates for critical lures and the highest rates for list items.Although this is not supported by the only other published online DRM study (Murre et al., 2013), one plausible reason for such an outcome is that our results were affected by the online format of the study.Despite our best efforts to control the experimental environment (e.g., by asking participants to find a time and place that would minimise potential interruptions or by excluding from the analysis those who reported having experienced technical issues), the participants might have experienced additional distractions compared to the usual laboratory format which could have undermined the internal validity of the study (Dandurand et al., 2008) and decreased the false memory effect.Moreover, since the participants conducted the study without the physical or virtual presence of the researcher, they might have acted differently from how they normally would (e.g., Friesen et al., 2020).Indeed, varying the demand characteristics (i.e., the effects that are not deliberately caused by the experimental scenario) in the DRM studies has been shown to decrease the false memory effect (Gallo et al., 2001).Taking all this together, a challenge for future studies is to find a way to increase the environmental control in online studies and to explore the effects of the socio-environmental context (e.g., online vs laboratory setting or presence vs absence of a researcher) on the results of the DRM studies.
Turning to RTs, we obtained the same pattern of results for the recall and the (positive responses to the) recognition data.As expected, the RTs of list items and critical lures did not differ from one another, whereas the RTs of unrelated lures were significantly longer than the RTs of both list items and critical lures.These results suggest not only that associatively related information requires less cognitive processing than unrelated information, as predicted by the relatedness effect (Gallo, 2006), but also that closely related false memories (i.e., critical lures) are processed like true memories (i.e., list items), which is in line with the findings by Thomas and Sommers (2005) and Tun et al. (1998).
Considering RTs of negative responses to the recognition test, the results showed the participants were the fastest in rejecting the unrelated lures, slower in rejecting the list items, Slovenian version of the DRM paradigm effectively be employed online.Future studies should explore the relationship between RTs and accuracy measures, the relationship between recall and recognition, as well as the potential effects of the experimental environment on the outcome of the DRM paradigm in more detail.We hope that the provided material, the normative data, and rich exploratory analyses will be of use not only to researchers interested in studying false memories in the Slovenian language but also to the wider field of false memory research.

Figure 2
Figure 2Differences in recall RTs of list items, critical lures, and unrelated lures

Table 1
Descriptive Statistics for All the Measures Used in the Inferential Statistics of Accuracy and RT Data

Table 2
Recall Rates of Critical Lures and List Items for the 36 Slovenian Word Lists

Table 3
Recognition Rates of Critical Lures and List Items for the 36 Slovenian Word Lists StudiesThe data is ordered in descending order of the mean critical lure recall rate for the Slovenian lists.The Polish and Swedish studies did not report Ms and SDs for recalled and recognised list items.However, the authors generously provided all missing data except for the Ms and SDs for the recognised list items in the Swedish study.Y t (53) = 9.74, p < .001,δ t = 1.23, and for critical lures, M diff = 0.33, CI [0.28, 0.37], Y t (53) = 14.16, p < .001,δ t = 1.23.