By the Numbers: Protecting online survey data integrity

As published in Quirk’s Marketing Research Review, July 2008

The increased popularity of online research can be mostly attributed to two words: faster and cheaper. In fact, these qualities are so powerful that online has now surpassed telephone as the leading research methodology. However, with any product or service, faster and cheaper can sometimes translate to lower quality and decreased reliability. To protect your research from this threat, data quality must be a top priority in all online endeavors.

While measurement error can occur at any stage of the research process, the first opportunity a researcher has to protect data integrity occurs before the first completed survey is ever collected. During the design phase of an online survey, researchers are using many techniques to identify and eliminate invalid response data. This type of response can typically be classified in one of three ways:

1. Inattentive – a respondent does not fully read or understand the instructions or question being asked.

2. Fraudulent – a respondent intentionally provides false data. This type of response most commonly occurs when a survey incentive is offered and the respondent is not qualified to participate.

3. Speeding – a respondent completes a study in an unreasonably short period of time.

Fortunately, each of these types of respondents can be identified via traps within your online survey. While these methods may not apply in every case, it is a best practice to include some combination of the following controls.

Question-based verification

Verification ratings are becoming more popular in online surveys and can be very effective in certain formats. This type of verification method is most effective in catching inattentive and speeding survey respondents who might otherwise straightline through your survey questions. The sample questionnaire shows an example of this preventative measure.

By including a checkpoint in the middle of a table-format question, researchers are able to verify respondents’ attention and remove straightliners when necessary. A straightliner is defined as a respondent who selects the same answer choice (possibly with an auto-fill feature) throughout the survey, forming a straight line down the tables. In this example, if the respondent does not choose the “disagree” answer choice, they would be considered an invalid respondent.

Of course, there are other question-based verification methods that you can use in place of, or in combination with the example above:

Inconsistency. Simply ask for the respondent’s zip code or city of residence both in the beginning and end of survey. If the respondent’s answers do not match then they would be classified as a fraudulent respondent.

Red herring .” Within a list of possible answer choices, a nonexistent choice may be entered. For example, if a respondent claims to have eaten at a restaurant that does not exist, they can be flagged as a fraudulent respondent.

Opposite wording . Respondents are asked a pair of similar questions at different points in the survey. For example, “I always use the Internet for driving directions” and “I never use the Internet for directions.”

Personal access codes

Another form of protection is the use of unique IDs for each survey respondent. Each respondent’s unique ID or personal access code (PAC) can be embedded into their survey URL and used to determine a) if the respondent is authorized to participate in the survey, and b) whether the respondent has previously completed the questionnaire.

Personal access codes can be hidden or visible to the survey respondent.

Hidden. Used when respondents do not need to know that a unique identifier is being employed. This eliminates the possibility of survey respondents tampering with their code and allows demographic information to be embedded with confidence.

Visible. Used when you want the respondent to know they are using a code to access the survey. This gives survey respondents a sense of security and still allows demographic information to be embedded. If a survey respondent changes his/her PAC they can be screened before entering the survey or cleansed from the data set after fielding is complete.

The use of PACs helps guard against ballot stuffers and duplicate or unauthorized responses. As an added benefit, these codes can also be used to create demographic slices of the results, identify randomly-selected prize winners, pass back information to external sampling partners and trace respondents in appropriate situations after fielding is complete.

It must be noted, however, that personal access codes should be used with caution. In every case, the respondent’s anonymity should be protected and controls must be in place to avoid linkage between the respondent’s response data and any personally identifying information available to the researcher.

Time-elapsed verification

A third form of respondent verification is time-elapsed verification, which is used to catch speeders in market research surveys. These respondents are identified as not taking enough time to provide meaningful responses to your questionnaire. The basic concept behind this methodology is to verify a valid survey response by examining the elapsed time each survey respondent spends completing the whole survey or a section of the survey. An acceptable range for the survey completion time can be calculated in two ways:

An educated estimate based on the number of items in the survey . For example, three closed-ended questions per minute, one open-ended question per minute, six repeated-rating table items per minute.

Evaluating the average completion time of the sample collected . Once an acceptable range has been defined, any outliers would be removed and considered invalid respondents. For example, if the acceptable range is determined to be 10 to 20 minutes and a respondent completes the survey in two minutes, it is more than likely that he/she provided invalid data.

Other opportunities

While most quality controls can be built into the survey instrument and data analysis processes, there are several other opportunities available to researchers which help ensure quality data:

• Survey sampling. Most panel providers perform numerous quality checks during the enrollment phase, and continually during panel engagement and cleansing. As a researcher, you should be aware of the processes utilized by each supplier and mandate that sufficient quality control procedures are in place and utilized.

• Survey invitations, introductions and titles. When recruiting or introducing panelists to your survey, be sure to avoid language which may tip the respondent to the qualifications you are seeking. For example, if your survey invitation indicates that you “are looking for individuals who have purchased a soft drink in the past three months,” the respondents who haven’t done so may indicate that they meet this criteria in order to participate.

• Survey length and composition. Research has shown that the longer the survey, the greater the risk for respondent inattentiveness and abandonment. It is the researcher’s responsibility to limit their questionnaire to an appropriate length and avoid composition flaws (i.e., long response grids, confusing language, etc.) which lead to respondent fatigue.

Quality data

These methods represent a few industry best practices, but by no means represent an exhaustive list. Steps must be taken during all phases of a research initiative to ensure that the resulting data is reliable, valid and actionable. It is the responsibility of online researchers to ensure that their clients are receiving quality data as critical business decisions are often based on these results. Continually taking these steps to protect data quality will increase confidence in online survey results across the industry and pave the way for more extensive online research in the future.

Posted by on July 28th, 2008 under In the Press