With Big Data Comes Big Responsibility
October 23, 2020 | By Michael Adkison
Researcher: Varun Grover
In July, the Pew Research Center published an analysis of U.S. Congressional lawmakers’ social media habits from the last five years.
To do so, the writers detail the process of their research: “Pew Research Center collected every Facebook post and tweet created by every official and unofficial account maintained by every voting member of the U.S. Senate and House of Representatives between Jan. 1, 2015, and May 31, 2020,” a dataset that included nearly 1.5 million Facebook posts and more than 3.3 million tweets. Before the Internet, no single writer could quickly and easily file through that size of a dataset; but now, big data allows us to know that the average Congress member tweets 81% more and posts on Facebook 48% more today than in 2016.
That is the power of big data research (BDR), an emerging form of scholarly writing, which emphasizes massive amounts of data and analytics to reveal trends and patterns in users’ behaviors.
People are drawn to those empirical statistics when it comes to their news—people like numbers, to point to and cite in future writings and everyday discourse. Even peer-reviewed, scholarly journals have implemented BDR to their writings. But, to paraphrase Ben Parker, Spider-Man’s philosophical uncle, with big data comes big responsibility.
In their recent article from the Journal of the Association for Information Systems, researchers Varun Grover, Aron Lindberg, Izak Benbasat, and Kalle Lyytinen examine big data research and ask the question of whether or not so much BDR is beneficial. In “The Perils and Promises of Big Data Research in Information Systems,” the researchers call attention to the benefits and drawbacks of BDR, turning primarily to a single notion: Information systems (IS) studies are having a sort of identity crisis, as they are infiltrated with BDR studies that dazzle with their sheer size and analytical wizardry – but are light on constructing generalized theory. “Three interrelated consequences that may take hold within IS research and spawn a downward spiral are the dilution of IS field’s identity, greater fragmentation of the field, and greater corporate governance of research output.”
The Buzz on Big Data
The researchers define big data research as “research that involves large and often heterogeneous datasets represented in multiple formats (qualitative, quantitative, video, image, audio, etc.),” and, as they note, IS and other fields are turning toward big data. IS research analyzes the ways and tools we use to collect, store, and distribute information. For example, the Pew Research Center study on lawmakers’ social media activity crunched the numbers on an individual politician’s number of posts across different platforms, using big data (and analytics) in the process.
But, as any scholar can tell you, a peer-reviewed article is more than just some numbers; they draw upon prior theories, original research, and independence as a body of work. But as the researchers note:
As similar fruits of BDR gain more prominence in our journals and body of knowledge, future generations of IS scholars may unintentionally inherit a brave new world of research where big data, computationally intensive analysis techniques, and evidence triangulation will reign over theory, disciplinary relevance, and the importance of having a cumulative tradition.
Yet, BDR is taking over the IS field. The researchers found that big data receives more and more coverage over time. Ten years ago, BDR articles were rare, but today as much as 10% of major journal articles involve big data research.
Why has BDR taken over IS studies? The researchers list six reasons why BDR is so attractive:
- Impressive datasets — since the advent of the Internet, and more specifically social media, accessing data from millions of users has never been so easy.
- Increased ease of demonstrating statistical significance — “Using massive datasets, statistical significance can almost always be demonstrated, even if the effect sizes are negligible.”
- The lure of objective data — big data comes as close to “objective data” as possible; in other words, the data was not inherently tied to an experiment, and comes from “the wild.”
- The availability of powerful tools for analyzing large datasets — forty years ago, you would be hard-pressed to find a personal computer; today, we hold them in our pockets. As such, it’s never been easier to analyze massive datasets, especially with computing clusters like Amazon Web Services or Google Cloud.
- Increased synergy between teaching, research, and consulting — sure, peer-review has its benefits, but those theories and literature reviews can be dense. With BDR, the simple numbers “can more readily cross over to teaching and help researchers develop consulting assignments.”
- IRB approval issues — in a research-based study, an institutional review board (IRB) evaluates the ethics of the experiment. Since BDR focuses on the numbers rather than the people, implementing BDR runs into fewer ethical issues than otherwise.
Predicting the Future of BDR
On the surface, BDR seems like a great asset for IS studies and papers, and in some ways it can be. Why send out a survey to hundreds of respondents with promises of compensation, when you can simply focus on BDR and gain the interesting pattern in a fraction of the time? Yet, the gray areas for BDR-implementation expose obvious problems with the practice. In order to truly evaluate the state of the IS Field, the researchers propose five conjectures regarding how BDR will affect IS studies. “We use the term ‘conjectures’ to denote theory-free suppositions formed on the basis of the currently incomplete information that we currently hold about BDR and its potential impact.”
The first conjecture posits that “BDR will exhibit a tendency to address tactical problems.” The massive datasets in BDR inherently must deal with behaviors, “for example, ... traces of posting, payments, bidding, social connections, viewing, editing, downloading, or linking to various types of user-generated content.” As such, the BDR tends to focus on solving a firm’s tactical, rather than strategic, problems. “Tactical means that the research problems being studied are confined to largely local issues (i.e., issues that are narrow and contextually specific) and are mostly concerned with the immediate, empirical connections between the variables included in the dataset.” As such, BDR is so limited in scope that true theory, a staple of peer-reviewed articles, cannot be formed.
As such, the researchers also suggest that “BDR will result in widespread local diversity in research to the detriment of a cumulative tradition.” “While BDR often starts with discovering empirical irregularities in preexisting datasets, it generally does not present itself as inductive, exploratory, theory-building research.” Rather, BDR is focused more on analyzing patterns in order to “fine tune a particular platform or application,” instead of engaging a conversation on a specific subject, developing theory on the subject, and contributing to the subject’s literature review, all of which are aspects of peer-reviewed journals.
The researchers compare BDR in IS with the IT artifact, which IGI Global defines as “bundles of hardware infrastructure, software applications, informational content, and supporting resources that serve specific goals and needs in personal or organizational contexts.” Widespread use of BDR, the researchers argue, would mean that “BDR will exhibit a bias toward a nominal treatment of the IT artifact.” In other words, “such research will focus on the raw action of individuals using technologies… rather than identifying why they act as they do…” And in abandoning traditional IS-study practices, the researchers predict that “BDR will exhibit a bias toward cursory treatment of theory.”
Summarizing their predictions, the researchers write that “many BDR papers seek to make strong claims with regard to their contribution based on the novelty of data… or analytic technique used,” with their final conjecture that “BDR will have a tendency to focus on data and methods, as opposed to theoretical knowledge of the IT artifact associated with non-BDR papers.”
The State of the IS Field
“While the conjectures we put forth… may appear controversial, we envision them as bold suppositions, i.e., both thought-provoking and salient with regard to our modes of knowledge production,” the researchers say. Yet, controversial or not, much of these conjectures are already taking place. The researchers analyzed and coded nearly 400 papers for whether or not they use BDR, and how they implement it. “Conjecture #1 is already happening: Tactical research is prevalent in BDR studies.” Conjectures #2, #4, and #5 also occur today, as "less theory and more data feature prominently in BDR studies." The only conjecture not quite as prevalent today is #3, as “the treatment of IT artifact shows a less pronounced pattern.”
The researchers call for using BDR with more discretion by carefully considering the existing knowledge and the knowledge created, its generalizability and its impact beyond the BDR context. “While BDR may carefully examine available data and practical questions, we would argue that thought should be given to the corresponding questions, and some investment made in iterating between such questions and the variables available in the dataset to foster greater knowledge impact.”
Big data is undeniably prevalent in today’s writings, both in general writings and in academic literature. As a practice, BDR itself is not inherently an issue; on the contrary it can be a tremendous asset. But, as the researchers emphasize, there’s a time and a place for it, and scholars should not lean heavily on BDR as the exclusive form of evidence. “As BDR gains a stronger foothold in our outlets and research community, new and critical issues to be debated are emerging… If our field fails to continue to engage in powerful abstractions, our ability to think as a community about the rich phenomena surrounding IS and the multifarious relationships between the social and the technical will grow increasingly circumscribed.” With big data comes big responsibility — and today’s scholars must recognize that responsibility to preserve the future of IS studies.