As recent global events and overall trends have demonstrated, big data is here to stay. The very idea of “big” is expanding faster than China’s urbanization. We now talk of petabytes (1,000 terabytes) and exabytes (1,000 petabytes) of information. Technically speaking, data quantity and storage are clearly not an issue. It is the data quality and the sources of input that should be the focus for researchers and practitioners who use such information to advance their causes – whether that is equity or some other type of politics.

The notion of data quality has been a theme in the equity seminar meetings throughout this term. In a small group discussion of Coburn’s article on CBPR in our final session last Friday, the group (labeled “Technical expertise in writing, interpreting, and communication”) discussed the idea that people are collecting data, but this may not be the data they want to collect. In the larger group, practitioner Beth Kaye noted how the best available data might not be the data you want or need. Program “data plans” must address what type of data is going to be collected to administer, measure and evaluate specific programs. This very much depends on the program (or research) team, funding, and overall program scope. Beth noted how equity issues are deeply rooted and embedded, which makes extraction quite difficult. There are many ways to tell an equity story.

Sam’s post on negative encounters by participants in housing programs highlights the importance of grounded data, and also speaks to the conversation from our final session. Sy stated, “Data is politics, like everything else.” Beth’s take on data collection constrained by scarcity was to ask, “What gets measured? Units don’t have a race or ethnicity.” Collecting data in such a manner is political; statistics basically means numbers of the state. In order to represent politics beyond the state, data it should come from a plurality of sources in both quantitative and qualitative form. Tools like the CLF’s Equity Atlas can be very useful in articulating barriers to opportunities. Such tools, however, act mainly as collectors and interpreters of various data streams, which means they are limited to available data as well as their particular format (quantitative and spatial).

Okay, so what additional sources of data might we use? I read Sam’s post as highlighting the value of qualitative data. Quant data (e.g., Census, ACS) tell particular kinds of stories and can be easily generalized. Qualitative data, however, offer an important perspective. One example is the 2001 Sisters of the Road “Voices of Homelessness” qualitative database of approximately 500 transcribed interviews with people experiencing homelessness. This database is coded and searchable for various ends, and registration is free (http://sistersoftheroad.org/voices/). The life stories of the individuals interviewed are now in the public domain and, though perhaps small, their politics have at least a degree of representation. This type of data can help ground data collection and analysis (overwhelmingly quant) in the lived experiences of those for whom planners plan and to whom practitioners deliver services. It can also help articulate future research directions (a sort of detached CBPR?).

I find this approach (dare I say crowd-sourcing?) similar to Sy’s suggestion that “street scientists” or organic intellectuals should engage in data gathering, analysis and dissemination in a manner that is openly political. Many questions remain with real power implications. Does this mean creating new streams of data? Who owns and uses these new repositories and who gets (to) input? How often would a database like SOR’s Voice be updated (the data is 12 years old now) and where might funding come from? Free grad student labor and nonprofit funding can only go so far in what is an important, yet woefully underdeveloped resource in identifying the groups and individuals who have disproportionally less access to resources. Let’s add life to big data and embrace the potential for pluralism in political representation beyond the notion of classical statistics.