Researchers often utilize more than one mode of data collection (i.e., mixed-mode surveys) in survey projects in order to capitalize on (1) greater potential response rates and/or (2) improved coverage of the population of interest.  The first advantage, greater response rates, may reduce survey “non-response error. [1]” By offering more than one mode of data collection, researchers attempt to improve the attractiveness of the survey to the respondent.  Research has illustrated that the public has varied preferences for different modes of survey data collection.  For example, in a 2006 MRA telephone survey,[2]  adult Americans indicated which mode of data collection they would select first if they were given the choice.  The results, displayed in the table below, illustrate the variety in respondent preferences for survey modes. 

Table: 2006 Research Profession Image Study Results  (Published 2007)

Survey Mode of Data Collection

First Choice

Mail Survey

30%

Telephone Survey

27%

Internet Survey

18 %

In-person Survey

09%

Other/No Answer

16%

Sample Size

841

By approaching respondents with more than one mode of data collection, the researcher attempts to raise the appeal of participation to the respondent.  However, the success of this technique in boosting the survey response rate appears to depend on the manner in which the respondent is offered each mode.  Presenting multiple survey modes in sequence has been demonstrated as a successful way to improve response rates. By contrast, research has generally not shown that offering the respondent a choice of modes up-front will boost cooperation. (de Leeuw, 2005; Dillman, 2009)

The second major advantage offered by mixed-mode survey designs is an increase in the coverage of the population of interest.  As technologies and modes of communication have changed over-time, there have been increased options for contacting individuals and organizations.  Mixed-mode designs may offer a more inclusive frame with respect to the target population.  For example, surveying individuals by both landline telephone and postal mail through address-based sampling (ABS) offers greater coverage of the residential population of the United States than using landline telephone alone. (Shuttles, 2008)  Likewise, sample lists of all types may have incomplete records for individuals, emails, postal addresses, telephone or fax numbers, etc.  Utilizing multiple modes of data collection may offer the researcher the opportunity to include more cases in his/her sample frame. 

Other advantages mixed-modes may offer include a reduction of cost/increased efficiency, the establishment of credibility and trust with the respondent and an improvement in the degree of privacy offered to the respondent. (de Leeuw, 2005)  Problematically, however, since each mode of data collection is unique in terms of the transmission of information and the environment of the interview, survey questions are received and processed by respondents in different ways.  In turn, the same respondent may provide a different response to a survey question simply based on the method used to deliver that question.  “Mode effects” are an important downside to the advantage(s) offered by a mixed-mode design.

Mode Effects

“Mode effects” refer to the difference(s) in the way that a respondent may answer questions through one mode of survey data collection (e.g., telephone survey) compared with another (e.g., Web survey).  Clarissa David, in “Polling America: An Encyclopedia of Public Opinion,” defines the scope of mode effects: “The term “mode effects” does not include all effects or errors caused by mode, but rather it refers more specifically to certain types of systematic errors in survey response...examples of mode-related response errors are survey satisficing, social desirability, concurrent validity, recency and primacy…it should be noted that other survey errors that can be influenced by mode, such as nonresponse and sampling errors, are excluded from the usual coverage of mode effects.[3]

In defining mode effects in this manner, David highlights the nature of the problem- the linkage between the mode of data collection and the respondent’s processing of, and/or responses to, survey questions.  Differences of this nature are primarily thought to arise from the presence or absence of an interviewer and the way survey questions are communicated or transmitted to respondents.  Numerous theories have been developed as to why respondent answers may differ from one mode to the next.[4]  Further, evidence suggest that mode effects are a legitimate threat to the data quality of surveys.  As such, several prudent steps may be advisable for concerned researchers when considering the use of mixed-mode designs:

  1. Consider mode effects in design phase of research.

 

The researcher should weigh the advantages and disadvantages (including data quality & financial considerations) of mixed-mode a design before electing to utilize multiple modes.

    2.   If mixed-mode design is selected, determine questionnaire development strategy to minimize likelihood of mode effects.

Traditional questionnaire design involves crafting questions to fit the mode of data collection being used. However, in mixed-mode designs, optimizing questions for each mode may result in mode effects.  Consider alternate strategies presented in table 2 of the appendix to maximize data quality.[5]

      3.   If mixed-mode design is implemented, examine data for modal differences.

If differences have occurred between modes, consider multiple explanations. Differences within the dataset between modes may not be attributable to mode effects.  Other sources of error are also possible (e.g., non-response error, early versus late responders, non-random assignment to modes/sample source, etc.).  It may not be possible to determine the source of error, but the research should investigate the problem by considering all possible causes.

       4.   If mode effects are likely cause of error, attempt to discern probable causes and factors.

Consider the explanations presented in the appendix of this document (tables 1, 3 and annotated bibliography) as well as other sources to determine whether there are plausible explanations for the error.

       5.   Include explanation of possible causes of error in survey report technical appendix.

A description of all sources of error will help the end-user of that research to evaluate the results.  Further, researcher should justify their decision of whether or not they combined the data from separate modes together for analysis.

          Table 1: Theories of Mode Effects

 

 

Type

 

 

Theory

 

 

Explanation

 

 

Question Delivery

 

 

Questions tend to be asked in different ways for different modes. (Dillman D. a., 2003 )

 

 

Providing different measurements may result in different responses.

 

 

 

See ” table “3”  for summary of modal tendencies in question formatting and wording.

 

 

 

 

Question Delivery

 

 

Different modes provide different context. (Dillman D. a., 2003 )

 

 

 

 

 

Respondents interpret question items based upon the presentation of those questions (either aurally or visually) and response options relative to one another.   This requires different demands on the respondent’s memory and visualization.  Conceptual and subjective questions (e.g., satisfaction) may be more subject to contextual effects than objective, straightforward questions (e.g., year of birth)

 

 

 

 

 

Question Delivery

 

 

Context is presented through visual and auditory queues. (Dillman D. a., 2003 ) (de Leeuw, 2005) 

 

 

 

 

 

Numbers, symbols, auditory queues, body language, formatting (font, spacing, shapes, brightness), color, etc. all provide context and may affect the outcome of a respondent’s processing of a question.  (Dillman D. a., 2003 )  These different qualities are sometimes referred to as “paralanguage.”

 

 

Question Delivery

 

 

Aural modes may encourage respondents to select the last offered response option. (Dillman D. a., 2003 ; Bowling, 2005) (de Leeuw, 2005)

 

 

Since aural modes do not present the respondent with a visualization of the question and response options, the respondent is forced to rely on their short-term, working memory.  As such, the last option may be the one that the respondent remembers most clearly and may be selected more often.  This may also happen because the respondent may first consider the most recent response option (the last option) heard and select it without fully considering the earlier options. This problem is referred to as a “recency effect.” 

 

 

Question Delivery

 

 

Visual modes may encourage respondents to select first offered response option.  (Dillman D. a., 2003 ; Bowling, 2005)

 

 

Respondents may select the first option more frequently (a “primacy” effect) since they may be more likely to use it as a base of comparison for all other options or may simply select the first option if it appears to be appropriate and move on, not considering the other options. 

 

 

Interviewer Effect

 

 

Presence of interviewer may affect respondent’s candor and/or honesty. (Dillman D. a., 2003 ; Bowling, 2005) (de Leeuw, 2005)

 

 

Interviewer-administered modes may result in “social desirability bias,” where the respondent feels the need to respond in a socially acceptable manner to avoid judgment.  Additionally, respondents may feel a lack of privacy when answering sensitive questions in the presence of an interviewer.

 

 

Interviewer Effect

 

 

Presence of interviewer may result in more respondent agreement. (Dillman D. a., 2003 ; Bowling, 2005)

 

 

The theory of “response acquiescence” posits that people tend to find agreeing with one another easier than disagreeing.  Because of this, it is theorized that respondent’s may be more likely to select agreement-oriented response options (e.g., “agree” option on agree/disagree scale) in interviewer-administered surveys since the presence of an interviewer mirrors typical everyday social interaction.

 

 

Interviewer Effect

 

 

Interviewer-administered modes may be subject to more cognitive processing, more thoughtfulness, and less satisficing than self-administered modes. (de Leeuw, 2005)

 

 

 

 

 

In interviewer-administered modes, the interviewer controls the pace of the interview.  However, in self-administered modes, the respondent controls the pace and may skip around or advance through the interview with less engagement and thoughtfulness.  Notably, a counter argument is that self-administered modes may encourage more cognitive engagement, since the respondent has the ability to answer questions at their own pace and will not feel pressured to avoid awkward silences that pauses on the telephone/in-person to fully weigh the question would create.

 

 

Interviewer Effect

 

 

Interviewer-administered modes may result in less item non-response and responses with greater detail. (de Leeuw, 2005; Bowling, 2005) (de Leeuw, 2005)

 

 

The interviewer has a greater ability to motivate the respondent to complete each question. A counter argument to this theory is that self-administered modes may offer the respondent more privacy to answer sensitive questions.

 

 

Interviewer Effect

 

 

Interviewer and self-administered modes may differ in encouraging the respondent’s access to their memory. (Bowling, 2005)

 

 

In interviewer-administered modes, interviewers may probe, clarify and instruct respondents.  In self-administered modes, the respondent may consult with records and may feel free to spend more time on a given question absent the presence of an interviewer.

 

 

Interviewer Effect

 

 

Respondent’s to telephone surveys may feel pressure to provide shorter responses to open-ended questions than in face-to-face (and perhaps other) surveys (Bowling, 2005)

 

 

It is thought that the telephone interview provides a pressure to maintain a consistent transition from one question to the next to avoid awkwardness.  This may encourage telephone respondents to give comparatively shorter answers to open-ended questions.

 

 

 

Table 2: Questionnaire Development Strategies

Name

Description

Proposed Usage

Traditional Design

Questions are crafted to optimally fit the mode of data collection in use.

Single mode designs

Unimode Design

Questions are crafted to deliver equivalent question stimulus between modes, but may not be optimal for either or both modes. (See Chapter 6, Mail and Internet Surveys: The Tailored Design Method by Don A. Dillman)

Mixed-mode designs

Generalized Design

Questions are crafted to deliver equivalent question stimulus across modes, but may be crafted uniquely. (de Leuuw, 2005)

Mixed-mode designs

 

          Table 3: Question Formatting Tendencies by Mode*

          *Summarized from “Survey Mode as a Source of Instability in Responses across Surveys” by Don A. Dillman and Leah Melani Christian

Mode

Delivery

Interviewer

Question & Questionnaire Tendencies

Face-to-face

Visual/Aural

Yes

May use visual aids in survey.  May accommodate longer surveys, longer, more complex scales, fully labeled scales.  May more easily facilitate open-ended questions due with assistance from interviewer probing, ability to hide “don’t know” and “refusal” options unless requested.

Telephone

Aural

Yes

Shorter scales, polar end-points (reading all response labels in scales becomes tedious, thus telephone questionnaires often present end-point labels and indicate that any option in-between is acceptable for respondent to select), scales/questions broken into steps (e.g., first asking “satisfied/dissatisfied” then asking “would that be very/somewhat,” dividing income question into broad category and following with narrower categories, etc.), more screening/branching as a result of simplified questions, shorter surveys than face-to-face (easier to break-off interview than face-to-face), interviewer ability to probe, ability to hide “don’t know” and “refusal” options unless requested.

Interactive Voice Response

Aural/Visual

No

Very short/brief surveys (very easy to break-off interview), brief wording and shorter questions.

Mail

Visual

No

Avoidance of branching questions and tendency for relatively more complex question that incorporate the several dimensions of the simple questions, fully-labeled scale options, tendency to use ‘check-all-that-apply’ format over yes/no format, open-ends less attractive since no interviewer present to probe, open-ends often broken into multiple parts, clear choice (or lackthereof) offered for “don’t know” or “refusal” options, visual context present, ability for respondent to see previous and latter questions, potentially more complex questions due to ability to complete at own pace, see other questions, consult with records, etc.

Internet

Visual/Aural

No

Dynamic features possible (sound, video, color, unique formats), choice of whether previous and latter questions visible (as divided by screen) and the context (or lackthereof) this creates, choice of whether to allow respondent to re-visit previous questions, tendency to use check-all-that-apply question formats over yes/no, radio buttons and check boxes as standard options, longer, fully-labeled scales, branching facilitated for questionnaire by programming, pressure for survey to be short, choice to force question to be answered or to allow respondent to skip ahead, choice to overtly present “don’t know” or “refusal” options.

          Table 4: Evidence of Common Mode Effects Reported in Survey Literature

Effect

Literature

Social Desirability Bias

Social desirability bias has been found in interviewer-administered modes of data collection compared to self-administered modes, where interviewer-administered modes tend to elicit responses in the direction of the social desirable answers. (Dennis, 2007; de Leeuw, 1992)

 

Greater reporting of sensitive questions has been found for self-administered interviews than interviewer-administered.  Mail surveys have been found to produce less acquiescence, greater reporting on sensitive questions and less social desirability than telephone and face-to-face (de Leeuw, 1992).  In another comparison, Web respondents illustrated no differences compared to telephone in terms of acquiescence or attitudes and support for scientific research (Fricker, 2005).  Web and IVR seemingly demonstrated somewhat greater performance against social desirability situations than telephone (CATI) for a study of UMaryland alumni (Krueter, 2008).  Further, Web respondents similarly reported more attitudes that would be expected to be consider socially unacceptable in questions of attitudes on a variety of subjects from the General Social Survey (GSS). (Dennis)

 

Telephone respondents have been shown to report greater extreme positive answers to questions than mail (Dillman, 2009), IVR (Dillman, 2009)  or Web (Dillman, 2009; Christian 2008).

Primacy & Recency

Primacy and recency effects have shown mixed results (inconclusive) in survey literature (Dillman, 2009).

 

Schuman and Presser conduct a variety of experiments to test response-order effects in telephone research (Schuman 1996).  Results were mixed and the Authors had difficulty predicting response order effects in advance of the experiments.

Formatting

In one experiment, researchers found that scale formats and color contrast can affect survey responses.  Specifically, more contrast in colors, using positive and negative (e.g., -3 to +3 as opposed to 1 to 7) scales and fully labeled scales produced more positive responses. (Tourangeau, 2007)  This research demonstrates the power of the context that formatting can create on survey responses, particularly in cases where the respondent is not given overt and clear meaning from the question and response option alone (and may fall back on other cues).

 

In another project, more directly testing modal differences and formatting, Web respondents demonstrated a greater degree of knowledge for scientific facts than telephone respondents (Fricker, 2005). 

Item Non-response

Self-administered modes of data collection may tend to result in higher incidence of item non-response, but the results appear to be somewhat mixed.  Moreover, item non-response appears to be dependent on the formatting of the question and, in particular, the manner in which a “no response” option is presented to the respondent.

 

Web has been shown to produce higher item non-response than telephone (Brogger, 2002; Heerwegh, 2008) and face-to-face (Dennis, 2007). However, these effects were mitigated in web interviews versus face-to-face when no ‘no opinion’ option was offered directly in the body of each question, but instead the respondent was instructed that he/she could skip questions at the beginning of the survey (Dennis, 2007)  Moreover, Smyth, et al. found no significant differences in item non-response between web and telephone. 

 

Further, Web was shown to have less item non-response than CATI or IVR for socially sensitive questions in a University of Maryland study. (Krueter, 2008)  Further, a telephone interviewer-administered mode produced more item non-response than telephone self-administered for one study, except for a question asking a racial related question in which case the opposite was true (Harmon 2009).

 

Mail  has been shown to produce higher item-non-response than telephone. (de Leeuw, 1992)

Satisficing

Web respondents have been shown to differentiate in their responses to survey questions less than in face-to-face modes (Heerwegh, 2008) and telephone modes (Fricker, 2009).  Moreover, Web respondents have been shown to select slightly fewer affirmative responses when presented with a check-box list scenario (i.e.,’ please select all that apply’ format) as opposed to forced-choice “yes” or “no” items. (Smyth, 2008)

 

Annotated Bibliography

Brogger, Jan, Bakke, Per, Eide, Geri E., and Gulsvik, Amund. 2002. “Comparisons of Telephone and Postal Survey Modes on Respiratory Symptoms and Risk Factors” Practice of Epidemiology 155: 572-576

Examined telephone versus mail respondents by first interviewing a large group (15-70 years in age) in areas in Norway by mail, then later following up with a re-interview of a small subsample of respondents.  The question items focused on “respiratory symptoms, cardio-respiratory diagnosis, smoking habits, and risk factors.” Findings include:

1)       Item non-response between modes:  The postal survey had an average of 2.8 incomplete answers per survey.  The telephone survey had an average of .6 incomplete answers per survey.

2)       Differences between modes: The postal survey resulted in a ‘significantly larger percentage of respondents with “morning cough” than the telephone survey, there were significant differences between the modes in dyspnea (without a clear pattern), there was significantly less passive smoking in the home and at work reported by postal survey takers, and there were small differences for smoking, education, and episodes of phlem and cough.’

3)       Similarities between modes: The authors note that there were no differences for pack-years, number of siblings and number of siblings with asthma.  There were “small changes of little significance” with height and weight. 

The authors report that there was no “global test of tendency to report more symptoms or diagnosis with one survey mode.” 

Christian, Leah Melani, Dillman, Don A., and Smyth, Jolene D. "The Effects of Mode and Format on Answers to Scalar Questions in Telephone and Web Surveys." Advances in Telephone Survey Methodology.  Ed. Lepkowski, James M., Tucker, Clyde, Brick, Michael J., de Leeuw, Edith D., Japec, Lilli, Lavrakas, Paul J., Link, Michael W., Sangster, Roberta L. Hoboken: John Wiley and Sons, Inc., 2008: 250-275. Print.

Examined mode effects by utilizing 6 survey versions (3 telephone, 3 web) to Washington State University (Pullman Campus) undergraduate respondents.  Authors administered 70 experiments with 13 scale questions, utilizing different scales (5-point fully labeled, 5-point polar labeled, 11-point polar labeled) and different wording and formatting. Findings include:

                1) Respondents provided "higher ratings when surveyed by telephone than by web" with respect to fully labeled 5 point scales.  This includes mean/average scores as well as greater selection of 8 out of 9 extreme positive response options. Question types include satisfied/dissatisfied, agree/disagree, and construct-specific labels.

                2) Similarly, telephone respondents generally provide more positive answers (mean and extreme positive selection) than web respondents for 5-point polar labeled scales. Question types include (agree/disagree, extremely likely/not at all likely, satisfied/dissatisfied, and best possible/worst possible).

                3) Telephone respondents tend to give more positive answers (mean) overall to the 11-point scales (worst possible/best possible), but do not demonstrate a statistically significant higher percentage of extreme positive category selections.

The Authors' note that telephone research has been demonstrated to generate more positive responses regardless of whether compared to web, mail or face-to-face.  Further, social desirability cannot account for all of the increased positive ratings since face-to-face has been shown to provide less positive ratings than telephone.  The Authors reason that the pace of the interview as well as lack of visual queues (historical face-to-face comparison included visual cards) may be the cause of the differences.

de Leeuw, E.D. (1992). Data Quality in Mail, Telephone, and Face-to-face Surveys. Amsterdam: TT-Publicaties. 

In an experiment between mail, telephone and a face-to-face survey, ‘mail surveys resulted in higher unit non-response (respondent cooperation for entire survey)  and item non-response (respondent cooperation for individual survey questions) overall than telephone and face-to-face, but had greater data quality for sensitive questions- less social desirability, more reporting for sensitive questions, and less item non-response for income questions.’ The mail survey resulted in “more reliable and consistent responses and less acquiesce than in interviews.” (de Leeuw, 2005)

Dillman, Don A.,Phelps, Glen, Tortora, Robert, Swift, Karen, Kohrell, Julie, Berck, Jodi, Messer, Benjamin L. 2009. "Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet." Social Science Research 38: 1-18

Examined telephone, mail, Web, and IVR surveys for mode effects utilizing a "quasi-general population" sample on satisfaction questions towards long distance telephone services. Questions were formatted similarly between modes and non-response was analyzed by comparing demographics of non-respondents to respondents.  Also, the Authors’ theorized that the variables of interest, satisfaction with long-distance telephone service, was not likely to result in social desirability bias with interviewer presence.  Further, Authors' used follow-up with mixed modes to examine possible differences between early & late responders.   Findings include:

                1) Visual modes of data collection (Web & mail) produced similar results from respondents.

                2) Telephone respondents were more likely to select extreme positive response options to scales than mail respondents.  This was tested with 5-point scales (ends labeled only), a 5-point scale (all points labeled) and a 10-point scale (ends-labeled only). Differences were most pronounced with 5-point scales (ends labeled).

                3) IVR respondents were more likely to select extreme positive response options to scales than mail respondents. This was tested with 5-point scales (ends labeled only), a 5-point scale (all points labeled) and a 10-point scale (ends-labeled only).Differences were statistically significant between IVR and mail across all scales.  However, 10-point scale differences were not great.

                4) Telephone and IVR respondents were generally more likely                 than Web respondents to select extreme positive answers.   This was tested with 5-point scales (ends labeled only), a 5-point scale (all points labeled) and a 10-point scale (ends-labeled only). Differences were significant between Telephone/IVR versus Web, except no statistically significant differences were found on the fully labeled scale for telephone versus Web.

                5) Telephone respondents were more likely than IVR to choose a higher positive response on 5-point sale, ends-labeled.  On 10-point scales and fully labeled scales, the opposite is true.  Dillman et al., point out that this is consistent with Tourangeau et al.'s (2002) study and could suggest that telephone respondents are more likely than IVR to choose the middle or lower categories when the scales are greater than 5 categories and when the scale is fully labeled. 

                6) Use of a second, sequential follow-up mode to improve response rates was successful.  However, based on demographic item non-response, it does not appear as though the higher response rate was successful in reducing non-response error for demographic variables.  Also, “recency” effects were not found in the data.  The Authors included an experiment, offering 2 scales with opposing response option orientations (i.e. negative-to-positive versus positive-to-negative) for comparison.  No significant differences were found between groups.

The Author’s note (in discussion section) that the differences in findings “cannot be accounted for by ‘recency hypothesis’…we believe that a more plausible hypothesis is that the mail and web versions have all response choices visually presented to the respondents.”

Dennis, Michael J and Li, Rick. 2007. “More Honest Answers to Web Surveys? A Study of Data Collection Mode Effects” Journal of Online Research found online at www.ijor.org 

Examined Web & telephone against in-person data collection results for comparison using the “national priority” items listed on the GSS Social Survey (fielded concurrently).  Knowledge Networks conducted the web and telephone surveys and compared results against GSS social survey (in-person) to examine for substantive differences.  Findings include:

1)       Mode effects and “No Opinion/Don’t Know:” Based on earlier research geared at delivering an equivalent stimulus, Web respondents were not given the direct “no opinion” choice in the survey question.  Instead, they were told at the beginning of the interview that they could skip questions that they did not want to answer.  Doing this produced similar average levels of “dk” responses in the telephone and web versions to the GSS in-person version, though there were several items were the GSS had substantially larger proportions of “dk” responses.

2)       Differences between modes:  The three modes differed across the 15 question items, with the greatest differences occurring between web and in-person (authors claim an average absolute difference of between 4 & 6 percentage points) versus telephone and in-person (authors claim an absolute difference of 2 percentage points). 

3)       In-person versus Web: The in-person survey respondents tended to provide responses that were more in-line with social desirability than Web respondents.  For example, a greater proportion of in-person respondents than Web felt as though the USA was spending too little to ‘improve the condition of blacks’ and ‘deal with drug addiction.’

Fricker, Scott, Galesic, Mirta, Tourangeau, Roger, Yan, Ting. 2005. "An Experimental Comparison of Web and Telephone Surveys." Public Opinion Quarterly 69: 370-392

Examined telephone and Web surveys for mode effects utilizing a sample of respondents with Internet access drawn from a general population RDD design. The survey topic was "knowledge of science and scientific facts." The study is somewhat unique in that it includes variables of knowledge as opposed to opinion.  Findings include:

                1) Attitudes/support for science and scientific research were gauged using 5-point agree/disagree scales.  No statistically significant differences were found between the samples of Internet users who completed via Web versus those that completed via telephone mode.

                2) Knowledge scores (i.e. correct answers to factual science questions) were higher for Web respondents than telephone respondents.  The question formats included "multiple choice, true/false, and open-ended format." The Authors' attribute this to the respondents having more time to consider the question and a having a lower burden placed on working memory for the Web survey.  This is consistent with the amount of difference found in the questions, where the most demanding question formats (open-ended) had a higher degree of variation compared to the least demanding (true/false).

                3)  Web respondents took longer to complete the survey than did telephone respondents, this was particularly true of older respondents.  The Authors' hypothesize that this is due to Web respondents having the discretion to proceed through the survey at their own pace.

                4)  Web respondents gave fewer "don't know" responses than did telephone respondents. The web survey did not show "don't know" options.  Respondents were given a notice if they attempted to move to the next questionnaire item without answering.  Telephone respondents were not informed of a "don't know" option, but instead needed to volunteer it to move forward.

                5) There were no differences between the two modes in the amount of respondents displaying acqueiscing behavior (as measured in this case as the proportion of "agree" or "yes" answers to questions). 

                6)  Web respondents were more likely to non-differentiate and straight-line through grid questions than were telephone respondents.

Harmon, Thomas, Turner, Charles F., Rogers, Susan M., Eggleston, Elizabeth, Roman, Anthony M., Villarroel, Maria A., Chromy, James R., Laxminarayana Ganapathi, and Li, Sheping. 2009. “Impact of T-ACASI on Survey Measurements of Subjective Phenomena” Public Opinion Quarterly 73: 255-280

Examined traditional interviewer-administered telephone interviewing (T-IAQ) compared with T-ACASI (telephone-audio computer assisted self interviewing) on a “wide range of social and personal issues including the acceptability of corporal punishment, traditional gender roles in families, marijuana use, same-gender sex, residential segregation, and respondents’ evaluation of their own attractiveness.”  Sample included national adult 18-45 y/o sample & Baltimore area strata.  Respondents were randomly routed to one of the two modes of data collection.  Findings include:

1)       Social desirability between modes.  A pattern showed respondent s of T-ACASI collection mode to report more responses than T-IAQ in the direction that would be expected given social desirability. 

2)       Respondent demographics across modes.  The use of different mode produced no statistically significant differences in the composition of demographic subgroups in the samples (gender, age, marital status, education, race/ethnicity, region, urbanization and sample strata [Baltimore & national]) 

3)       Item non-response across modes. Generally, a T-ACASI produced greater levels (1-3%) of item non-response than T-IAQ (“and most were statistically reliable”).  However, the opposite is true when respondents were asked whether they would prefer to live in a neighborhood reflecting only their own race.  (“The mode effect did not vary for black or white respondents.”)

4)       Mode effects and subgroups.  The greatest mode effects were found among the less educated and younger respondents, also from the Baltimore strata as compared to national.  Mode effect was also driven for a question regarding corporal punishment and marijuana use among black respondents.  Notably, black respondents were more likely to be assigned to a white interviewer than white respondents a black interviewer.

Heerwegh, Dirk and Loosveldt, Gert. 2008. "Face-to-Face Versus Web Surveying in a High-Internet-Coverage Population: Differences in a Response Quality." Public Opinion Quarterly 72: 836-846

Examined differences in Web versus Face-to-Face respondents drawn from a high-Internet-coverage population (Katholieke Universiteit Leuven freshmen, Belgium)).  The Authors' theorize that Web respondents may be more likely to "multi-task" and add that they do not have the face-to-face interaction with an in-person interviewer that (due to facial queues) might require them to stay fully cognitively engaged (borrowed from Holbrook et al. [2003]).  Finally, the Authors' argue that there is a higher burden placed on Web respondents as they are required to read the content on the computer screen as well as be computer literate to a certain degree. As such, the Authors' hypothesize that Web respondents will engage in higher levels of satisficing.  The Authors' examine the data for "don't know" responses, item non-response and non-differentiation. Findings include:

                1)  Face-to-Face questions were presented on cards visually to aid the comparison to the Web mode.  The Web survey explicitly listed "don't know" as the last option on 12 questions.  The interviewers conducting the Face-to-Face survey noted the option on those questions as well.  Web respondents were more likely to give "don't know" answers to questions, though the Authors acknowledge that this may be due to the interviewers ability to probe/clarify.  Moreover, the question format differences may have contributed to these differences.

                2)  Web respondents appeared to have differentiated in their responses slightly less than face-to-face survey respondents.

                3) Web respondents had greater incidence of item non-response, though both web and face-to-face item non-response were low.  Notably, this difference could be due to the manner in which face-to-face respondents proceeded through the survey and their perceived ability to refuse a question (versus indicate "don't know").  Moreover, web respondents did not receive a reminder if they left their question blank.

Kreuter, Frauke, Presser, Stanley and Tourangeau, Roger. 2008. “Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity” Public Opinion Quarterly 72: 847-865

Examined IVR, CATI, and Web surveys in a comparison of social desirability bias.  Experiment involved recent University of Maryland college graduates assigned randomly to 1 of the 3 different modes of data collection.  Variables of interest included sensitive questions relating to ‘university experience (e.g., GPA), community involvement (e.g., whether an alumni association member), relationship to the university and question ratings of the sensitivity of the questions.’  The authors used university records for validation as well as examine non-response bias. The authors note that the traditional “lack of validation data forces investigators to make two assumptions in determining which mode leads to “better” results.  The first assumption is that social desirability concerns leads respondents to underreport socially undesirable behaviors so that data the collection mode that yields higher levels of reporting is the more accurate one.  The second assumption is that lower reports of socially desirable behaviors reflect more accurate answers.  The extent to which these assumptions are correct cannot be determined without validation data.” Findings include: 

1)       Unit Non-response. The authors examined the respondents to each of the modes against validation records.  There were no statistically significant differences between the modes in terms of validation records, meaning that statistically significant differences in non-response bias could not be found between IVR, CATI and Web even though response rates were different.  Notably, there were some general non-response biases, with ‘those with poor academic records and those with poor relationships to the university being less likely to respond.’   Again, these biases did not differ significantly between mode.

2)       Social desirability between modes.  Generally speaking, a pattern was illustrated where socially undesirable attributes were more likely to be reported in Web than CATI, with both low GPA and whether respondent ever received a “D” or an “F” being statistically significant (to varying degrees).  IVR also had greater reporting of socially undesirable attributes than CATI.  Most differences between modes were not statistically significant, but mode sample sizes were somewhat limited IVR [410], CATI [368], Web [329]. 

3)       Social Desirability between modes. No differences or patterns were apparent among socially desirable attributes (i.e. GPA, whether received honors, whether donated to university, whether a member of the alumni association).

4)       Item Non-Response & social desirability between modes.  The Authors also examined item non-response to each of the questions, noting that “another way respondents can avoid making embarrassing admissions about themselves is to skip the question.” In general the respondents to the Web version of the survey were more likely (statistically significant) to answer the socially desirable/undesirable questions than were CATI and IVR respondents.

5)       Social Desirability compared to validation records.  The Authors compared respondents of each mode against the actual records that were available for those respondents.  Social desirability (over-reporting of positive attributes and under-reporting of negative attributes) was substantial across all modes.  However, under-reporting of negative attributes was least substantively pronounced with the Web survey respondents than CATI & IVR, and similarly, IVR social desirability in under-reporting was less pronounced than CATI as well.  (Items stat sig compared to CATI & IVR for GPA<2.5 and whether respondent ever received a “D” or “F”). Over-reporting was a problem for all modes, though the prevalence was roughly similar in all modes.

6)       Social desirability compared to validation records.  The Authors combined items in regression analysis.  Web was less likely to result in misreporting than CATI in direction of social desirability bias.  IVR was marginally less likely to result in misreporting than CATI.  Web and IVR were not statistically significant in terms of likelihood of misreporting.

7)       Respondent ratings of sensitivity of questions.  The Authors demonstrate that more respondents completing the CATI survey felt the socially desirability questions (that asked about negative attributes) were sensitive compared to those respondents completing the IVR and Web.  Additionally the difference was found to be generally (3 out of 4) statistically significant when comparing those respondents who had the socially desirable attributes versus those that didn’t (i.e. those that did felt the question to be more sensitive than those that did not).

Schuman, Howard and Presser, Stanley. Questions & Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. San Diego, California: Sage Publications, 1996.

The authors first replicated 2 experiments from a 1950s study on the subject of oil.  Two questions were presented to groups of telephone respondents, who received the 2 response options (for each question) in either the first place or the second.  In total significant effects were evident in 3 out of 4 experiments.  In particular, one of the differences amounted to 14 percentage points.  The questions presented rather long dichotomous response options.

The authors then conducted another 3 experiments.  First, they alternated the order (in 5 ways) of the 5 response options of 2 particular questions and presented them to respondents.  One question related to work values, the other to the biggest issue facing the country.  Notably, not every permutation possible was attempted for either question.  No trends or substantial differences were observed, leading the authors to the conclusion that response order placement did not affect the 2 questions.  Interestingly, these questions, with a greater number of response options, were not subject to the effects present in the initial question with the dichotomous variable.

The authors then attempted a 3rd group of experiments, where they gave groups of respondents two questions, each with 3 response options (along a continuum) where a ”middle option” was included.  In half the cases, the middle option was listed in the second place, in the other half, it was listed last.  The questions related to USA support for South Vietnam in the Vietnam War and about the ease in obtaining a divorce in the USA.  Respondents in all cases were more likely to select the “middle option” when it was listed last (a “recency” effect).

The authors then conduct an experiment with another dichotomous variable (relating to union membership) where an argument and counter argument are presented in varying orders.  No response order effects are evident.

Finally, the authors test 3 more questions for response order effects, again varying response presentation order to each group.  First, a question asking whether individuals or societal conditions are responsible for crime (no effect observed), second a question asking whether the Federal Gov’t or individuals are responsible for adequate housing provisions (primacy effect observed), and third, and finally, a question on home owner’s ability to discriminate based on race when selling their home (no effect observed).

Smyth, Jolene D., Christian, Leah Melani, Dillman, Don A. 2008. "Does "Yes or No" on the Telephone Mean the Same as "Check-All-That-Apply" on the Web?" Public Opinion Quarterly 72: 103-113

Examined forced-choice "yes" or "no" question formats compared to "check-all-that-apply" question formats with a mixed mode telephone and Web survey delivered to Washington State University undergraduates.  The Authors draw from theory that posits that "check-all" format allows respondents to adopt a satisficing behavior more easily than "forced-choice" question formats which "require deeper processing."  The Authors also reference literature (Smyth et al. 2006) that illustrate that respondents spend less time answering "check-all" questions, and, those that spend less than average time in answering the "check-all" questions are more likely to select the first response options listed.  Further, respondents who spend more than average time answering "check-all"  questions select as many (or more) items as "forced-choice" questions.  Conversely, respondents who spent over the average amount of time answering "forced-choice" questions selected equally as many options as those who did not, "suggesting that all respondents more deeply process the response options in this question format." Findings include:

1)       Web respondents completing "forced-choice" questions selected "yes" slightly more than Web respondents completing "check-all" selected options (42.3 to 38.3%). 46 options.

2)       Telephone respondents completing "forced-choice" questions selected "yes" slightly more than Web respondents completing "check-all" selected options (41.3 to 37.2%). 54 options.

3)       Web respondents completing "forced-choice" questions selected "yes" around the same number of times telephone respondents selected "yes" in "forced-choice" options (50.8 to 51.2%). 101 options.

4)       Item non-response was generally the same between the telephone and Web versions.

Stringfellow, Vickie L., Fowler, Floyd J., Jr., Clarridge, Brian R.  2001. “Evaluating Mode Effects on a Survey of Behavioral Health Care Users” Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001.

Examined telephone and mail data collection for social desirability mode effects across both general and behavioral health care.  The Authors used samples of members of health care plans for 2 separate surveys.  A behavioral health care study (ECHO) was conducted with members of 2 health care plans in Massachusetts.  A general health care study (CAHPS) was conducted with Washington State employees belonging to a health care plan.  Respondents to both studies were assigned to either telephone or mail survey data collection.  Further, all respondents were 18 years of age, or older.  The analysis is limited to 26 items common to both surveys (ECHO & CAHPS).  Findings include:

1)       Potential Mode effects in ECHO & CAHPS. Prior to controlling for non-response, statistically significant differences between mail and telephone data were found for 4 items in the CAHPS survey and 6 items in the ECHO survey.  Only 1 item was found to have statistically significant differences between mail and telephone for both surveys.  These analysis considered 26 items total.

2)       Direction of possible mode effects in ECHO & CAHPS.  The Authors transformed the items showing statistically significant differences by mode into dichotomous variables and tested them once again for differences, this time to illustrate the direction of bias.  Statistically significant differences continued to be present for a majority of items (3 out of 4 ECHO items, 5 out of 6 CAHPS items).   In all cases, respondents to the telephone survey modes offered responses that were more socially desirable.

3)       Non-response as an explanation of differences.  The Authors theorized that significant differences may have been caused by the types of respondents who participated in the survey as opposed to social desirability or another type of mode effect.  Specifically, they were concerned that less healthy respondents (and heavier service users) might be more likely to participate in one mode versus another, thereby biasing results.  As such, they included self-health ratings and utilized reporting of service use as controls in a regression model, where the items discussed above were the dependent variables.  After controlling for health and service use, only 3 variables demonstrated what the Authors believe is a mode effect.

4)       Summary conclusion, mode effects not great. After controlling for health status, the Authors found very little evidence of mode effects.  They mainly attribute differences to non-response.

Mode effects were not great in this particular study.  Interestingly, this project takes the somewhat rare approach of controlling for non-response by evaluating a theoretical reason for non-response bias not based on simple demographic subgroups.  Doing so had an important impact on the results of the analysis, and it raises the question of whether other projects would find similar controls to be as powerful. 

Finally, after controlling for non-response, the Authors do find what they believe are mode effects for 3 items.  They attribute this to social desirability, but do not address other causes of mode effects (i.e., formatting, primacy, recency and question delivery).  It is possible that these mode effects are attributable to such factors.

Tourangeau, Roger, Couper, Mick P. and Conrad, Frederick. 2007. “Color, Labels, and Interpretive Heuristics for Response Scales” Public Opinion Quarterly  Vol. 71: 91-112

Used two different Web survey experiments with online panelists and river sample where respondents were exposed to a variety of questions with various and different scale qualities.  The experiments were arranged to test the effect of spacing, color, and scale labels on responses. 

Two scale formats were tested, one where colors moved from a dark blue to light blue hue across the scale.  The other, where colors moved from one color (red) to a completely different color (blue).  At the same time, the researchers tested different scales where respondents either received a scale ranging from -3 to +3 (w/ verbal end labels), 1 to 7 (w/ verbal end labels), verbal label end-points with no numbers, or verbal labels for all points but no numbers.  A second experiment was run where different questions were run, but the same formatting experiments were maintained w/ the exception of coloring.  In the second experiment, respondents received a scale ranging from a yellow to a blue scale rather than red-blue.  The authors made this change to introduce greater contrast.

Respondents received these conditions randomly, and were asked a variety of questions regarding health, diet and other personal qualities.  Formats ranged from favor/oppose, frequency (none of the time, rarely, etc.) and agreement with statements. (not all, mostly not, etc.).  Findings include:

1)       Effect of different scale formats and colors on results.  In the first experiment, the researchers found that for favor/oppose (e.g., “ a healthy diet”) fully labeled scales tended to result in the highest scores, followed by -3 to +3 scale, followed by 1 to 7 & no numbers (end-labeled) scales.  Moreover, with the exception of the fully labeled scale, the scales ranging from the same color hue tended to produce less positive results than the scale of different color (more contrast).  The fully-labeled scale demonstrated the opposite trend.  In terms of frequency items (e.g., “how often have you been very nervous”), the -3 to +3 scale had the highest scale ratings (i.e., greatest

2)        frequency), followed by the no number, endlabel scale & 1 to 7 scale and lastly by the fully labeled scale.  Again, the different color, greater contrast scale tended to result in more positive (i.e., greater frequency) responses than the same color (shaded scale)- with the fully-labeled scale version once again not following the same pattern.

3)       Effect of different scale formats and colors on results.  In the second experiment, the color scheme appeared to have less of an effect on the results.  However, moving from the same color (blue scale) to a different color (blue-yellow) did increase the number of positive responses for the no number, end-label scale.  Still, the greatest effects were seen in the scaling themselves.  The -3 to +3 and fully-labeled scale produced the greatest number of positive responses.

4)       Respondent Time-to-Completion.  Respondents tended to take the most amount of time when completing fully-labeled scale items.  The Authors’ theorize that this is a result of the added time required to read the response options for these items.

The Authors’ reason that respondents tend to take meaning from all different visual aspects of a question, but that they first look to certain cues over others.  The experiment shows that the more clear the question item is in displaying meaning (e.g., fully-labeled scale points) the more the respondent may focus on those aspects as opposed to secondary features (e.g., color scheme).

Works Cited

Bowling, Ann. 2005. “Mode of Questionnaire Administration Can Have Serious Effects on Data Quality” Journal of Public Health 27:281-291

Brogger, Jan, Bakke, Per, Eide, Geri E., and Gulsvik, Amund. 2002. “Comparisons of Telephone and Postal Survey Modes on Respiratory Symptoms and Risk Factors” Practice of Epidemiology 155: 572-576

Christian, Leah Melani, Dillman, Don A., and Smyth, Jolene D. "The Effects of Mode and Format on Answers to Scalar Questions in Telephone and Web Surveys." Advances in Telephone Survey Methodology.  Ed. Lepkowski, James M., Tucker, Clyde, Brick, Michael J., de Leeuw, Edith D., Japec, Lilli, Lavrakas, Paul J., Link, Michael W., Sangster, Roberta L. Hoboken: John Wiley and Sons, Inc., 2008: 250-275. Print.

David, Clarissa. “Mode Effects.” Polling America: An Encyclopedia of Public Opinion. 1st ed. 1st vol. Westport Connecticut Greenwood Press. 2005. 453-457.

de Leeuw, E.D. 1992. Data Quality in Mail, Telephone, and Face-to-face Surveys. Amsterdam: TT-Publicaties.

De Leeuw, Edith D. 2005. “To Mix or Not to Mix Data Collection Modes in Surveys” Journal of Official Statistics 21:233-255

Dillman, Don A. and Christian, Leah Melanie. 2003.”Survey Mode as a Source of Instability in Response across Surveys” Paper presented at the Workshop on Stability of Methods for Collecting, Analyzing and Manageing Panel Data, American Academy of Arts and Sciences, Cambridge, MA, March 26-28, 2003.  Paper accessed online at http://survey.sesrc.wsu.edu/dillman/papers/Mixed%20Mode%20Submission%20t...

Dillman, Don A., Phelps, Glen, Tortora, Robert, Swift, Karen, Kohrell, Julie, Berck, Jodi, Messer, Benjamin L. 2009. "Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet." Social Science Research 38: 1-18

Dennis, Michael J and Li, Rick. 2007. “More Honest Answers to Web Surveys? A Study of Data Collection Mode Effects” Journal of Online Research found online at www.ijor.org

Fricker, Scott, Galesic, Mirta, Tourangeau, Roger, Yan, Ting. 2005. "An Experimental Comparison of Web and Telephone Surveys." Public Opinion Quarterly 69: 370-392

Harmon, Thomas, Turner, Charles F., Rogers, Susan M., Eggleston, Elizabeth, Roman, Anthony M., Villarroel, Maria A., Chromy, James R., Laxminarayana Ganapathi, and Li, Sheping. 2009. “Impact of T-ACASI on Survey Measurements of Subjective Phenomena” Public Opinion Quarterly 73: 255-280

Heerwegh, Dirk and Loosveldt, Gert. 2008. "Face-to-Face Versus Web Surveying in a High-Internet-Coverage Population: Differences in a Response Quality." Public Opinion Quarterly 72: 836-846

Kreuter, Frauke, Presser, Stanley and Tourangeau, Roger. 2008. “Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity” Public Opinion Quarterly 72: 847-865

Schuman, Howard and Presser, Stanley. Questions & Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. San Diego, California: Sage Publications, 1996.

Smyth, Jolene D., Christian, Leah Melani, Dillman, Don A. 2008. "Does "Yes or No" on the Telephone Mean the Same as "Check-All-That-Apply" on the Web?" Public Opinion Quarterly 72: 103-113

Stringfellow, Vickie L., Fowler, Floyd J., Jr., Clarridge, Brian R.  2001. “Evaluating Mode Effects on a Survey of Behavioral Health Care Users” Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001.

Tourangeau, Roger, Couper, Mick P. and Conrad, Frederick. 2007. “Color, Labels, and Interpretive Heuristics for Response Scales” Public Opinion Quarterly  Vol. 71: 91-112

 


[1] See “Survey Nonresponse

[2] 2006 Research Profession Image Study, Marketing Research Association

[3] David, Clarissa. “Mode Effects.” Polling America: An Encyclopedia of Public Opinion. 1st ed. 1st vol. Westport Connecticut Greenwood Press. 2005. 453-457.

[4] See appendix,  table “1” for a summary of relevant theories as to why mode effects occur.