Psychometric properties of the Slovenian adaptation of the Revised Generic Occupational Stress Index Questionnaire ( RG-OSI )

The Revised Generic Occupational Stress Index questionnaire (RG-OSI) employs the cognitive ergonomics approach that quantifies the burden of stressors on cognitive resources of the employee. The model is structured as a 2dimensional matrix, where each element is scored from 0 to 2 (sometimes with intermediate values of 0.5, 1.5 or 1.75) as a combination of various items based on multiple criteria. Due to uncommon scoring system of the questionnaire, our study aimed to explore the appropriateness of the existing scoring system and to get some information on validity of the scale on a Slovenian sample. The questionnaire has been applied on 349 Slovenian employees from different occupational groups and the data were analysed by means of correspondence analysis, classical reliability and item analysis and item response theory analysis. The results of correspondence analysis demonstrate that the response categories on individual variables are not always ordered. Furthermore, we conducted reliability analysis for scales, developed short versions of the scales, and obtained some preliminary information on their validity. The current study provides evidence that the described original scoring system in psychological measures may not be appropriate from the psychometric viewpoint.

Changes in content and organisation of work in recent decades have resulted in an increasing exposure to psychosocial risks in the workplace that can considerably impair workers' and organisational health.Several studies (EU-OSHA, 2010;Parent-Thirion, Fernández Macías, Hurley, & Vermeylen, 2005;Parent-Thirion et al., 2010) showed that exposure to psychosocial risk factors at work may result in a state of work-related stress that has negative impact on employees, organizations and national economies.Longitudinal studies and systematic reviews (Niedhammer, Tek, Starke, & Siegrist, 2004;Salavecz et al. 2010;van Stolk, Staetsky, Hassan, & Kim, 2012) have indicated its association with heart disease, anxiety, depression, and musculoskeletal disorders.Moreover, there is strong and consistent evidence that high job demands, low control, and effort-reward imbalance present risk factors for mental and physical health problems (Halbesleben & Buckley, 2004;Siegrist, 2002).
Due to its negative impact on employees and work organisations, work-related stress has been widely studied.Most commonly used models for studying the effects of psychosocial work factors on workers' health are the demand-control model (JDC or job strain; Johnson & Hall, 1988;Karasek, Baker, Marxer, Ahlbom, &Theorell, 1981) and Effort-Reward Imbalance model (ERI; Siegrist, 2002).The JDC model emphasizes the extrinsic and situational components of the work (job demands, control/decision latitude and support at work), while the ERI model incorporates the intrinsic characteristics of an individual (motivation, commitment to the work).According to the JDC model exposure to high levels of psychological demands and low levels of social support and job control are associated with more negative health outcomes.The ERI model, on the other hand, proposes that high efforts of employees (external demands or internal motivations), if not matched with high rewards (economic, recognition, promotion prospects, job security, etc.), may as well lead to strain and negative health outcomes of the employees.Due to the popularity of the described models occupational stress is mostly assessed using their corresponding instruments (e.g., Job Content Questionnaire, Effort-Reward Imbalance questionnaire).
Complementary to sociological models of work-related stress, but less frequently used, are models arising from cognitive ergonomics approach.Cognitive ergonomics is most commonly defined as a discipline that focuses on mental processes such as perception, memory, information processing, reasoning, and motor response as they affect interactions among humans and other elements of a system (Hollnagel 2003;Vicente, 1999).One of the current models, developed from a cognitive ergonomics, is Occupational Stress Index (OSI) model (Belkić, 2003).It is based on Welford's information processing model (Welford, 1968) and incorporates risk factors for work-related stress at different levels of information transmission: sensory input, decision process and action.Therefore, factors as nature and temporal density of incoming information as well as complexity, completeness and coherence of the processed information are included to assess the burden of work processes on the central nervous system of the employee.The model includes risk factors arising from different levels of work environment; from task-level, work schedule, physical and chemical factors to broader organizational factors.Taking this theoretical approach Belkić and Savić (2008;Belkić, 2003) developed the Revised Generic OSI questionnaire (RG-OSI).

The Revised Generic OSI questionnaire
The generic form of the questionnaire is applicable to workers of any occupational profile and allows betweengroup analyses.It can also be taken as a starting point for the development of occupation-specific OSI questionnaires (e.g., Belkić, Emdad, & Theorell, 1998;Belkić & Nedić, 2007).Following a cognitive ergonomics perspective work-related stress is evaluated in terms of demands of the work on mental resources of the employee.
The OSI model is structured as a 2-dimensional matrix.Seven stress aspects (underload, high demand, strictness, extrinsic time pressure, aversive/noxious exposures, avoidance/symbolic aversiveness, conflict/uncertainty) are combined with 4 levels of information transmission (input, central decision making, task performance, and general level; Figure 1).First three levels of information transmission represent the basic cognitive ergonomic processes as described by Welford (1968), whereas 'general' level is added to include the elements related to broader work context.
Each stress aspect therefore includes risk factors from all three levels of information transmission and some general risk factor as presented in Table 1.For example, underload stress aspect includes 'low frequency of incoming signals' and 'no communication needed' on input level, 'automatic decisions' on central decisionmaking level, 'homogenous tasks' and 'simple tasks' on task performance level and 'inadequate pay' and 'no chances for upgrade' on general level.Summations by levels of information transmission and by stress aspects can be made.The sum of all scores represents an attempt to quantify the overall burden of working condition on an employee.
Technically, each risk factor is defined as a pair of coordinates, defined by the type of stress and the level of information transmission.It is scored according to an intricate scoring system; from 0 (absence) to 2 (strongly present), sometimes with intermediate values of 0.5, 1.5 or 1.75, where scores are related to (typically non-linear) combinations of multiple criteria.Number of scoring categories for each risk factor and corresponding scoring rules were based on subjective judgment of the authors.Detailed instructions about which items need to be taken into account to get scores for each element in the matrix are provided in the scoring sheet.For example, the risk factor 'automatic decision making' (placed on the central decision-making level of stress aspect underload) can be scored with 0, 1 or 2 points.Zero points refers to the absence of the risk factor and is obtained when Sedlar, G. Sočan and L. Šprah Psychometric properties of the RG-OSI the respondent's answers reflect that he is not making automatic decisions; Decisions not automatic (I2 = c, d or e) OR Some supervisory work (I1 = b, c or d).On the contrary, 2 points are ascribed when the content of included items reflects automatic decision making of the respondent; Fully automatic decisions (I2 = a) AND No supervisory work (I1 = a); see Table 2. Despite the fact the RG-OSI is used in organizational research (e.g., Soori, Rahimi, & Mohseni, 2008), there is no available evidence concerning the validity of the scoring rules and the psychometric characteristics of the instrument in general.

Aim of the study
Both the classical test theory approach and the congeneric (factor-analytical) approach to the analysis of the internal structure of a test imply a (possibly weighted) sum score.The unique scoring system of the RG-OSI questionnaire is therefore not suitable for analysis with psychometric tools, based on the linear model, because the total test scores are not linear composites of the ratings at the most elementary level.Furthermore, the item response theory (IRT) is not optimal either, since IRT-similarly to the previously mentioned approaches-assumes that the test components can be freely combined.Nevertheless, we aimed to explore the appropriateness of the existing scoring system and to estimate the reliability of the scale on a Slovenian sample, using the classical and IRT approach, since we could find no viable non-compensatory model that could be fitted to our data.In order to get some initial assessment of the validity of RG-OSI, its correlations with some negative stress outcomes (intent to turnover, burnout, and work-family conflict) were explored.(Belkić and Nedić, 2007, p. 63)

Method Participants
Data collection was carried out within different occupations from public and private sector where several employers have been invited to participate in the study.The approval of the local psychological ethics committee has been obtained prior to the study.After giving informed consent, employees were asked to anonymously complete the questionnaires according to the instructions and return it in a sealed envelope.
The participants (N = 141) were employees of four work organizations from different sectors (health, construction, industrial work) that took part in the project »The Support Programme for Employers and Employees for Reducing Work-related Stress and Its Adverse Effects«.The rest of the data were collected on employees of the Slovenian Association of Free Trade Unions (N = 33), employees of various police directorates across Slovenia (N = 83), and on an opportunity sample of Slovenian employees in different occupations (N = 92).The sample (N =349) predominantly consisted of employees from government, public administration and defence (34.6%), health care and social work (20.8%), industry or manufacturing (18.3%), construction (8.7%), and education (5.1%).According to Things-Data-People taxonomy (Fine & Cronshaw, 1999) the majority of participants worked primarily with people (39.7%), 36.6% worked primarily with things and 20.8 % worked primarily with information.170 participants (48.7%) were male.More than a third of the sample was 31 to 40 years old (35.1%), 26.4% of the participants were 41 to 50 years old, 22.6% 20 to 30 years old, 15.9% more than 50 years old, and 8.1% less than 20 years old.Regarding education, most of the participants completed either high school (35.5%), university (28.5%), vocational (7.6%) or higher vocational school (11.8%).The mean working experience was 14.7 years (SD = 12.2), the mean organizational tenure was 9.6 years (SD = 8.9) and most of the participants (90.2) worked under long-term contract.

Instruments
Workplace stress was assessed by the Revised Generic OSI questionnaire (RG-OSI, Belkić & Savić, 2008) that consists of 65 items.It measures risk factors at different levels of information transmission (input, central decision making, task performance and general level) that cover 7 stress aspects: Underload, High Demand, Strictness, Extrinsic Time Pressure, Aversive/Noxious Exposures, Avoidance/Symbolic Aversiveness, Conflict/Uncertainty. Higher score on a particular stress aspect and higher total OSI score indicate higher level of work-related demands on mental resources of the employee.Detailed information about the instrument is provided in the introductory section of the article.
Intent-to-turnover was measured with two items, adapted from existing questionnaires measuring an employee's desire to remain with the current employer (Konovsky & Cropanzano, 1991;Scott, Bishop, & Chen, 2003): 'I often think of quitting my current job', 'It is very possible for me to leave for another company next year'.
Burnout was measured with The Oldenburg Burnout Inventory (OLBI; Demerouti, Bakker, Nachreiner, & Schaufeli, 2001).Items measuring two dimensions of burnout exhaustion and disengagement are scored on a four-point scale from strongly agree (1) to strongly disagree (4).Exhaustion subscale refers to general feelings of emptiness, overtaxing from work, a strong need for rest, and a state of physical exhaustion.Disengagement subscale refers to distancing oneself from the object and the content of one's work and to negative, cynical attitudes and behaviours toward one's work in general.Each subscale includes four items that are positively framed and four items that are negatively framed.The reliability of the scales in Slovenian validation study of the questionnaire (Sedlar, Šprah, Tement, & Sočan, submitted) was moderate (α = .73for Exhaustion scale and α = .71for Disengagement scale).Higher reliability (α = .88)was obtained for scales consisting only of negatively framed items.The »negative items only« model also fitted the data better than other models, tested by means of confirmatory factor analysis.Therefore, only this part of the original questionnaire was used in our study.
Work-family conflict was measured with the Slovenian version of Work-family conflict scales (Carlson, Kacmar, & Williams, 2000;Tement, Korunka, & Pfifer, 2010).For the purpose of this study, we used only three subscales measuring more difficult participation (lack of time, strain or incompatible behaviour) in the family domain because of work responsibilities.Subscales Time-Based Work-Family Conflict and Strain-Based Work-Family Conflict refer to work and family competing for a person's time or energy, respectively, while subscale Behaviour-Based Work-Family Conflict refers to incompatible behaviours between these two domains.Each dimension consists of three items with a five-point response scale (1 = strongly disagree; 5 = strongly agree).

Procedure
The process of translation and adaptation of the RG-OSI instrument

. Final version and back-translation
The final version of the instrument in the Slovenian language has been created after all stages described above and translated back to the English by an independent translator, whose mother tongue was English and who had no previous knowledge of the questionnaire.The English back-translation of the Slovenian version of the RG-OSI instrument has been introduced to the author of the original RG-OSI questionnaire, who approved modifications and its use in Slovenia.Final version of a translated questionnaire closely resembled the English version.

Analyses
Inspection of the data revealed that all variables have at least one missing value and the maximum number of missing values was 19.On average there were 1.9% missing values per variable.In order to avoid the loss of these data, missing data were estimated.Although Little's MCAR test (Little & Rubin, 1989) was significant (p = .01),we see no substantive reasons to expect the presence of Missing Not At Random (MNAR) missing mechanisms.Missing data were imputed using the Expectation-Maximization (EM) algorithm that estimates missing values by an iterative process and has been demonstrated to be an effective method of dealing with missing data (Graham, 2009).All analyses were conducted using a total of 349 participants.
SPSS Statistics 21.0 was used to perform all the statistical analyses, except for item response analysis that was done in IRTPRO (Cai, Thissen, & du Toit, 2011).A correspondence analysis was conducted to check whether multivariate categorical responses are ordered (i.e., scaled on at least an ordinal scale).To evaluate the ability of each item to discriminate between levels of different stress aspects the classical and the item response analysis were conducted.The reliability of the scales was estimated with internal consistency coefficients (α).Items with lower discrimination power were eliminated and the correlations of both the obtained shorter and the original scales with selected variables were compared.

Results
This section consists of three parts.We begin with the presentation of the scoring system for one of the scales, followed by evaluation of the scoring system by means of correspondence analysis and IRT analysis.Finally, we present the results of the reliability analysis for variant scales and their correlations with selected variables.

Evaluation of the scoring system (subscale Underload)
For the purpose of this article, we will illustrate the evaluation of the scoring system of the first dimension Underload.Table 2 presents the structure of the dimension according to the cognitive ergonomic theory.The total score is obtained as a sum of risk factors on three levels of information transmission (input, central decision making, task performance) and general risk factors.Information about items that need to be taken into account to get scores for each risk factor and corresponding scoring rules are presented in the scoring sheet (see Table 3 for a demonstration).Some of the risk factors are scored with 0 and 2 (IU2, IU3, OU2, GU1), some with 0, 1 and 2 (IU1, CUT, GU3), while risk factors (OU1, OU3, GU2, GU4) also include values of 0.5 and 1.5.Scoring instructions in the right column show that some scores are obtained from responses on a single item (OU1, OU3, GU1-GU4) and some are related to a (non-linear) combinations of multiple criteria (IU1-IU3, CUT, OU1, OU2), as exemplified in the introduction.

Table 2. Structure of the Underload dimension on different levels of information transmission
In the first step, we checked whether the possible item score values are empirically ordered (for instance, whether a score 0.5 statistically implies a lower level of the measured trait than a score of 1).We used a multiple correspondence analysis for this purpose.The correspondence analysis is primarily a (multivariate) descriptive data analytic technique (in which case it is analogous to principal component analysis), but it can also be used as a scaling method (Greenacre, 2007;McDonald, 1999).
The largest amount of variability in the data points is captured by the first dimension; in our example, the first dimension explained 20% of inertia and had a moderate reliability estimate (α = 0.61).Frequency distributions of  To illustrate the relative ranking and ordering of centroid coordinates for response categories a graphical presentation was made (Figure 1).Centroid coordinates for response categories of a certain variable should be placed on an ordinal scale if the response categories were accurately measuring burden of a certain risk factor (higher number of points on a variable means higher burden than lower number of points).Colours in rows should therefore be arranged from bright to dark or vice versa.As we can see, this was not the case for variables IU1, OU3 and GU4, where the intermediate values were not ordered as expected.We can also note large differences in the spread of scaled positions of the score values.Items with smaller spread are more likely to have low discrimination indices than items with larger spread, which will be demonstrated in the follow-up.
In the next step, item response analysis using the IRTPRO programme (Cai et al., 2011) was performed to evaluate Underload scale variables.Analysis was conducted using all 11 variables related to all four levels of information transmission, because the number of variables on separate levels of information transmission was too small.
Item response theory (IRT) relates characteristics of items and characteristics of individuals to the probability of choosing each of the response categories.This probabilistic relationship is mathematically defined as a nonlinear regression of the probability of choosing an item response category on a latent trait (item response function).The two-parameter logistic model (2 PL) was used for dichotomously scored variables.For variables with more than two response options, the graded response model was used, which is a generalization of the 2 PL model for polytomous items (for more information on the models used and item response theory in general, see for instance Embretson & Reise, 2000).
Table 4 presents the item statistics for each item.The item discrimination indices, item thresholds, and the test of model fit are presented.The discrimination index should always be positive, and values above 2 generally indicate a very good discrimination.The threshold (difficulty) index of category j is the value of the measured trait (in our case, Underload), at which the probability of responding with category j (or higher) is 50%.For dichotomous items, this simplifies to the trait value where the positive and the negative response are equally likely.
Variables IU1 and OU1 had very high discrimination, while the other variables discriminated moderately or non-adequately between respondents.The chi-square goodness-of-fit test indicated a lack of fit for domains IU1, OU1, GU2, GU3; for the remaining majority of the items, the fit seemed to be satisfactory.Since the domains are classified into levels of transmission, local dependence of items within a level could be expected; however, the analysis of residuals (not presented here) did not reveal any problems in this respect.The highest absolute residual value was 2.9, which is well below the value of 10, which Cai et al. (2011) consider to be high.
Item information curves showed that items IU1 and OU1 provided almost all information (see Figure 2), while the other items had little information value (see Figure 3 for a sample graph).The results of item response analysis also showed that the intermediate values of response categories were problematic from the discrimination point of view (see Figure 4).Figure 4 shows that in case of domain GU3, the intermediate item score category 1 was practically useless: its category response function was almost flat, which means that this category did not play any role in discriminating between persons.In general, category response curves typically emerge when low discrimination is combined with relatively small distances between category thresholds (see Table 4).
We might also note that extreme difficulty estimates of some dichotomous items (in this case, IU3 and GU1) are related to their low discrimination: when the slope of the item characteristic curve is low, a small variation of the slope results in a large difference in the threshold value, which is also reflected in large standard errors for such parameters (standard errors are not shown in Table 4; for instance, the standard error for difficulty parameter of domain IU3 was about 1066).Such values should be therefore considered with great reservation.
We do not discuss the results for the remaining dimensions in details; we only note that the situation was quite similar.There were many items with either low or implausibly high estimate of the discrimination parameter (the latter often in combination with implausibly high threshold values).Furthermore, the analysis of residuals revealed slight local dependency problems (i.e., one or two high local dependence (LD) values larger than 10) with two scales and severe local dependency problems in three scales.For more detailed information see the Appendix.

Reliability analysis
A reliability analysis of all dimensions was conducted, first on the level of subscales (i.e., Information transmission level of stress aspects).Only the results for the dimension Underload are presented to illustrate the procedure and computations.The values of the reliability coefficients, corrected item-total correlations and reliability coefficients if item deleted are presented in Table 5. Reliability coefficients were very low (α = 0.20-0.27),indicating a limited psychometric usefulness of the scale that could be due to poor interrelatedness between items or too heterogeneous constructs.Therefore, the variables with the lowest item-total correlations (0.00-0.09) were deleted and the analysis was conducted on the remaining seven variables.
In the second round of item elimination (Table 6), items IU2 and IU3 were deleted.The short Underload scale therefore consisted of 5 variables, but still had a relatively low reliability (α = .40).
After we have conducted the reliability analysis, all the IU items were eliminated.Hence the Underload dimension of the short RG-OSI scale in the end had no input items.In a similar manner items were sequentially removed from other dimensions to maximise reliability.In Table 7 it can be seen that it was necessary to eliminate only a few items from the dimensions Extrinsic Time Pressure (one item), Aversive/Noxious Exposures (two items), Avoidance/Symbolic Aversiveness and Conflict/ Uncertainty (three items).More items had to be removed from dimensions Strictness (five), Underload (seven) and High Demand (nine).After eliminating redundancies Extrinsic Time Pressure dimension included the lowest number of items (five), whereas dimension Conflict/ Uncertainty consisted of the highest number of items (twelve).Reliabilities of scales after two rounds of item elimination were ranging from relatively low (Extrinsic Time Pressure α = .50;Strictness α = .66)to moderate (Aversive/Noxious Exposures α = .70;High Demands α = .71;Avoidance/Symbolic Aversiveness α = .80;Conflict/ Uncertainty α = .83).Similar to Underload dimension, Extrinsic Time Pressure dimension and Strictness dimensions of the short RG-OSI scale in the end had no input items, whereas the latter also lacked output items.

Evaluation of the shorter version
The aim of the construct validation is to embed a purported measure of a construct in a nomological network, that is, to establish its relation to other variables (Cronbach & Meehl, 1955).Although the construct validation of RG-OSI was not the principal aim of this study, we tried to gather some preliminary information .11.15General Underload Total (GUT) = GU1 + GU2 + GU3 + GU4 .20 Note. r jt = item discrimination (corrected item-total correlation); α (i) = coefficient alpha if item deleted * eliminated items  on the relations between RG-OSI scales and some related variables.Specifically, we tested whether the participants implicitly differentiated between stress aspects measured by RG-OSI and different constructs (fluctuation, burnout and work-family conflict).Original and short RG-OSI scales were correlated with the respective measures.The results are presented in the Table 8.The low correlations between the OSI scales and the correlates imply that the participants differentiated between different constructs, but on a limited range.Moreover, the newly constructed shorter scales seemed to have almost identical correlations with measures of similar variables than the original scales.

Discussion
We evaluated the basic psychometric structural characteristics of the Slovenian translation of the RG-OSI.Due to the less commonly used cognitiveergonomics approach that tries to quantify the burden of work-related stress on mental resources of the employee, the questionnaire seemed to be a useful alternative to the existing questionnaires measuring work-related stress.
However, one of the major restraints seemed to be an intricate scoring system, where each variable score is obtained as a combination of multiple criteria.Both correspondence and item-response analysis indicated that this approach is of limited psychometric value.Centroid coordinates indicated that response categories are not ordered, which may put the predetermined scoring system established by original authors under question (example in Table 3).Intermediate values seem to be especially problematic, also from the discrimination point of view.The examination of item characteristic curves indicated that most of the information from the examined dimension is provided in 2 items (we remind the reader that the maximum information of an item is related to its discrimination), while other items had considerably smaller information value and hence smaller discrimination power.The major reason for this most probably resides in the construct's definition that is too heterogeneous to be conceptualized as a unidimensional latent trait.The fact that only two out of seven subscales appeared to be free of local dependency indicates a possible need for a multidimensional representation.Unfortunately, the instrument in its present form does not include enough items to enable such modeling.
Based on the results of the reliability analysis (Table 5), items with low discrimination power were eliminated.As a result of two rounds of elimination process, the final scale consisted of five instead of eleven variables.Similar procedure was conducted for all subscales.Correlations between shortened scales and related variables constructs (fluctuation, burnout and work-family conflict) were compared to correlations between the original RG-OSI scale and before mentioned external correlates, showing no major differences (Table 8).We would expect that better internal structure of the short RG-OSI (according to item response analysis and reliability analysis) would result in higher correlations with other variables.It is possible that this was not the case because some important contents were eliminated from the questionnaire (e.g., Underload dimension has no input underload items), which seems problematic from cognitive-ergonomics perspective.In order to assess the overall burden of work circumstanced on the employee, the cognitive-ergonomics approach presumes that demands on mental resources of the employee include different levels of information transmission: sensory input, decision process and action input level.Absence of any of these aspects could therefore severely impair the validity of the questionnaire.On the other hand, these results may be an instance of a well-known phenomenon: increasing internal consistency may decrease the correlations with complex criteria (McDonald, 1999).Still, we believe that finding scales with a sound internal structure should always be the first step in a test construction; in the second step, such scales may be combined to achieve higher predictive power.
In general, the results of all internal structure analyses showed weak internal structure, which was reflected in the fact that many items had low spread of scaled values in  Psychometric properties of the RG-OSI multiple correspondence analysis and low discrimination indices (both classical and IRT ones).Since the items in the RG-OSI are relatively clearly presented, the answer should probably be looked for in unclear wording or similar item deficiencies.It seems that the major reason for the weak structure resides in the constructs' definitions.In other words, the measured constructs may be too heterogeneous to be conceptualized as latent traits.
Of course, this makes the interpretation of the proposed dimension scores unclear.It is a matter of further research to determine whether a more suitable psychometric model can be found for RG-OSI, or should it be abandoned altogether.

Limitations of the study
First potential drawback concerns a rather specific sample, which has not been randomly selected from the full range of possible occupations.Our sample was predominantly restricted to employees of the government, public administration and defence, industry or manufacturing, construction and education.Moreover, it was overrepresented by employees from 31 to 50 years old, and employees with either completed high school or university.However, establishing norms was not the aim of this study.Although the structure of the sample did not perfectly reflect the structure of the Slovene population of employees, we are convinced that our sample was sufficiently heterogeneous to rule out the interpretation of the low reliability as a consequence of a highly homogeneous sample.Another potential drawback is the reliance on self-report and very long questionnaires containing many items (e.g., RG-OSI consists of 65 items), which could affect concentration and motivation of participants.

Future research and implications for practice
To our knowledge, this is one of the first attempts to assess the quality of the OSI questionnaire in its general form, mainly due to more frequently used occupation specific OSI questionnaires that can be developed from the general model.So far our results showed that the questionnaire in its original and short form has some severe psychometric limitations, which should not be overlooked when interpreting results.The short RG-OSI scales showed better internal characteristics and seemed to behave similarly to the original ones with regard to their relation to some other variables, but lacked some important aspects of work-related demands.Moreover, the use of short RG-OSI would considerably shorten the time needed to complete the questionnaire, but some additional work would be needed to replace the eliminated aspects and improve the validity and criterion correlations.We should also note that reliability estimates of short scales should be taken with some precaution, since they were not determined on a new sample and are therefore overestimated at least to a certain extent.
Hierarchical structure of the OSI model with components at the highest level and more specific attributes nested within components could be better tested using a non-compensatory model with multiple latent dimensions.So far, a multidimensional IRT models are mostly used in the field of cognitive diagnosis, where relatively long tests with heterogeneous items that vary in numbers and types of cognitive operations or skills are normally used (see Embretson &Yang, 2013).Apart from this, there is a lack of non-additive measurement models that would be appropriate for psychological tests consisting of multiple dimensions.Multidimensional models as suggested by Embretson and Yang (2013) could not be applied to RG-OSI questionnaire because it lacks theoretically and empirically plausible theory of the underlying components and attributes.Our use of mainstream analyses may have therefore done an injustice to the instrument.However, before more suitable models emerge, such results may be viewed as the best available approximations to the optimal evidence about the psychometric quality of the test.Therefore, we cannot recommend a routine use of RG-OSI in applied psychology at this stage.

Conclusions
Evaluation of psychometric adequacy of the Slovenian adaptation of the RG-OSI questionnaire showed that the questionnaire has some severe psychometric limitations regarding its scoring system and the conceptualization of the measured constructs.Shortened scales have been proposed, which would considerably shorten the time needed to complete the questionnaire, but would require further development to cover all the theoretical aspects of the cognitive ergonomics perspective.
Our study also suggested that non-additive measurement models, appropriate for psychological tests consisting of multiple dimensions, should be given further attention.Response patterns of seven items (IAVOI1, IAVOI3, OAVOIT1, GAVOI2, GAVOI3, GAVOI4, GAVOI5) had a significant (p < .05)lack of fit to their respective model.

Appendix
Lack of promotion prospects= GU3Lack of recognition of good work= GU4General Underload Total (GUT) = GU1 + GU2 + GU3 + GU4Total Underload Score= IUT + CUT + OUT + GUT Number of variables on different levels of information transmission differs; Input and Output level consist of three risk factors, Central decision making of one risk factor, and General level of four risk factors.Total Underload score is calculated as a sum of all subscales.
b (no monotonous tasks)) OR (H1=several tasks) OR (J2-9=interacts with persons or several machines) 1 Moderately homogeneous information (H3=c or d) AND [(H1= a few tasks) OR (J2-9=limited interactions with persons / few machines)] 2 Maximally homogeneous information (H3=d (monotonous tasks)) AND (H1=very few, simple tasks) AND (J2-9=minimal interactions with persons/ few machines) IU2 0 Controls speed him or herself (F3=a or b AND if J3=yes, controls speed of devices) OR >1 new signal/minute (overall assessment, see especially J3-9) 2 Doesn't control speed (F3=c or d, OR J3=yes and doesn't control speed of devices) AND <1 new signal per minute (overall assessment, see especially J1-(I2=c, d or e) OR some supervisory work (I1=b, c or d) 1 Fairly automatic decisions but judgment required (I2=b) AND No supervisory work (, b, or c (There are possibilities to upgrade job title or advance one's career, and no active opposition, to some extent) 1.5 C5=c (Not very much) 2 C5=d (Not at all) N. Sedlar, G. Sočan and L. Šprah Psychometric properties of the RG-OSI the variables (not shown in the paper) showed skewed distribution of categorical answers for the first few variables included into the Underload scale (IU1-OU2), because the majority of the participants answered with 0.

Figure 1 .
Figure 1.Graphical presentation of response scale values.

Figure 2 .
Figure 2. Item information curves for the domains IU1 (on the left) and OU1 (on the right).

Figure 3 .
Figure 3. Item characteristic curve for the domain GU1.The graph shows the probability of choosing an item response category on a latent trait theta.

Figure 4 .
Figure 4. Item characteristic curve for the domain GU3.The graph shows the probability of choosing an item response category on a latent trait theta.

Table 1 .
The Occupational Stress Index (OSI),Version 2003, Sočan and L. ŠprahPsychometric properties of the RG-OSI specialized in occupational psychology, all of them experienced in instrument development and translations) reviewed discrepancies between English version of the RG-OSI and Slovenian translations, as well as suggested alternatives.They ensured that proposed alternatives were conceptually and culturally equivalent to the original.3. Pre-testing of the RG-OSI instrument and cognitive interviewing Pre-testing of the instrument was carried out in the group of 11 respondents.They represented males and females from all age groups (18 years of age and older), with different socioeconomic backgrounds and different occupations.The interviews were conducted by experienced interviewers.Each respondent individually filled out the RG-OSI questionnaire and was afterwards systematically debriefed.During the interview, they were asked the following set of 2. Expert panel on the suitability of the EN-SLO translationFour experts (two general psychologists, one psychologist specialized in health and one psychologist N. Sedlar, G. questions: 1) Do you find the question understandable?2) Could you repeat it in your own words?3) Did you have any problems with answering it?4) Were there any words you did not understand or any word or expression that you find unacceptable or offensive?4

Table 3 .
Scoring sheet for the Underload dimension

Table 4 .
Item analysis statistics for the Underload dimension * IRTPRO could not perform the test of fit.N.Sedlar, G. Sočan and L. Šprah

Table 5 .
Reliability estimates for the subscales of Underload dimension

Table 6 .
Classical item statistics for Underload dimension after first round of item elimination

Table 7 .
Number of items for original and short RG-OSI scale and coefficient α's for short RG-OSI scale

Table 8 .
Kendall's tau correlations of the original and short scales with related constructs Notes.ITQ think -think of quitting job; ITQ next year -very possible to change employer next year; OLBI neg -negative items for measuring burnout; WFC time -Time-based work-family conflict; WFC strain -Strain-based work-family conflict; WFC behaviour -Behaviour-based work-family conflict.

Table 9 .
Item analysis statistics for the remaining dimensions LD values indicating local dependency.Response patterns of six items (CH1, CH2, CH3, OH2, OH3, GH2) had a significant (p < .05)lack of fit to their respective model.LD values indicating local dependency.Response patterns of two items (GS1, GS3) had a significant (p < .05)lack of fit to their respective model.There were 7 high (>10) LD values indicating local dependency.Response patterns of four items (INOX1, INOX2, GNOX1, GNOX2) had a significant (p < .05)lack of fit to their respective model.