“Remember outsourcing? Sending jobs to India and China is so 2003,” Wired editor Jeff Howe said in 2006, “Hobbyists, part-timers, and dabblers suddenly have a market for their efforts... The labor isn't always free, but it costs a lot less than paying traditional employees. It's not outsourcing; it's crowdsourcing.” Since then, online crowdsourcing has advanced survey data collection methodologies so significantly that experts estimate nearly half of all cognitive research articles will make use of data from platforms such as Amazon Mechanical Turk in the coming years. Ready access to a wide range of diverse participants greatly reduces the time and cost of collecting primary data. However, many researchers have scrutinized the quality of data obtained from online crowdsourcing platforms, questioning the validity of the information these participants provide.
In their new study, “Drivers of Data Quality in Advertising Research: Differences across MTurk and Professional Panel Samples,” UA distinguished professor Scot Burton along with researchers Christopher Berry and Jeremy Kees examined how underlying mechanisms in the crowdsourcing process impact data quality, specifically in business and social science research. Online crowdsourcing has proven to be an important innovation in data collection thus far, but any business practitioner or academic research project, as well as data analytics applications, requires accurate information. The study’s findings offer effective safeguards that researchers and strategic planners can utilize to guarantee higher quality data.
When Crowdsourcing Goes Digital
People have been using crowdsourcing for hundreds of years, and modern iterations such as crowdfunded financial campaigns continue to prove the method’s utility. While traditional face-to-face, telephone, and mail surveys are tried-and-true methods of data collection that have been used for many decades, they have numerous limitations. For example, in today’s world, these data collection methods can be slow, expensive, and it can be difficult to recruit an adequate sample of sufficient size without use of the internet. So, that’s what people started to do: use the internet.
Everybody now seems to be crowdsourcing online. Even the European Union has asked the United Nations to enroll in crowdsourced climate action surveys and post-pandemic response projects through the Crowd 4 SDG program. And why wouldn’t they? There are numerous benefits to using online crowdsourcing like efficiency, cost-effectiveness, speed, relative anonymity, and the diversity of participants. Online crowdsourcing platforms, the most popular being Amazon MTurk, have streamlined the process, allowing more people to involve crowdsourced information in their data collection processes.
Many fields have witnessed drastic increases in the usage of MTurk data, causing some experts to worry about the quality of data from online sources. Oftentimes, researchers downplay these concerns due to the source's ease, speed, and affordability. Previous research has already identified some factors that may contribute to shoddy data that lead to questionable conclusions. Problems like participants speeding through surveys, inattention, and lying threaten the validity of data from MTurk and similar platforms. Poor data quality leads to inaccurate and misleading results. Some recent studies have found an increase in random errors and spurious correlations in certain projects involving MTurk data. Unreliable data can adversely affect the conclusions drawn by business managers and academic researchers.
Burton, Berry, and Kees did not just examine data quality issues stemming solely from online sources though. In fact, studies have shown significant data quality issues in both traditional panels and data sourced from online platforms, and large-scale, survey-based studies at times have found crowdsourced respondents being very similar to participants engaging through other survey approaches. Through their research, Burton and his colleagues aim to extend our knowledge of what exactly affects data quality across both professionally managed panels and popular online crowdsourcing platforms.
Staying Focused
Several underlying mechanisms can affect data quality across the entire spectrum of crowdsourced responses. These mechanisms include survey response satisficing, multitasking and respondent effort. These factors seemed to heavily affect data from professionally-managed panels and other online platforms. In all sources of online data collection, these factors directly and indirectly affected data quality.
The effort a worker puts into any task will affect the quality of their product. All types of consumers may engage in a behavior called satisficing – a process in which people only expend the effort necessary to make a satisfactory, but less than optimal, decision. In fact, satisficing is one of the foundations of productive behavior. To manage stress and complete tasks in a timely manner, people must find an equilibrium between effort and benefit.
Survey response satisficing begins to negatively affect data quality when participants attempt to accomplish unrelated goals and tasks while concurrently paying little attention to survey questions and response alternatives while completing surveys. By diverting the bulk of their attention away from the survey task at hand, participants make numerous mistakes and reduce the quality of their responses. Survey response satisficing, multitasking and lack of respondent effort are intrinsically connected and negatively affect the usefulness of the survey data.
Satisficing and multitasking, in particular, are cause for hesitation, especially when employing any online platform. Researchers relinquish control in computer mediated surveys, and participants can freely engage in other unrelated tasks, increasing response satisficing and multitasking and decreasing the effort they use in responding to the survey.
Burton, Berry and Kees expected all three factors would negatively affect data quality, but they didn’t expect response satisficing to play such a large role. Effort is positively related to data quality, high levels of multitasking hindered the quality, and response satisficing had the strongest effect in explaining the quality of the data. However, the effects of these three drivers of data quality varied across the different crowdsourced and professionally managed sample sources examined.
Vetted MTurk participants performed the best and were considerably less expensive than the professionally managed panel participant data. With this new information researchers can now better understand the drivers of data quality and implement safeguards to help ensure online data quality is top notch. Burton, Berry, and Kees emphasize how MTurk participants’ data drastically falls in quality when researchers are not proactive in utilizing effective screening measures. They suggest future studies use procedures to vet survey participants including quality assurance measures, attention checks, and speeding traps.
The Path to High Quality Data
Thoughtful responses require time and effort. Future studies examining online crowdsourcing platforms should address further strategies to motivate participants to produce higher quality data. Researchers could address how “thank you” messages that reinforce gratitude throughout surveys or examine the effect of offering increased compensation for participants who read all questions carefully and provide quality responses .
This study provides an invaluable roadmap into discussions about how we collect data online. By applying a few data quality procedures and checks, future researchers can now use comparatively inexpensive and convenient online data collection sources to their full potential.