500 Words, Day 22

Big data is the gamification of 2013. Companies have to have some big data strategy. As I was writing this, Gartner released a report that 64% of organizations are funding big data projects this year. And it's true that important insights can be derived from large data sets – after all, that's what science does – but the data trend today seems to forget that having data does not mean you have answers.

Jaron Lanier, who should know from big data, writes in his recent book, "Who Owns the Future?":

We have become used to treating big business data as legitimate, even though it might really only seem so because of its special position in a network.

He gives as an example the matching algorithms on dating sites. There's no scientific validity to their numbers, yet they are treated as valid due to social engineering: people expect them to be "real". The same goes for recommendation systems of books, of restaurants (and note the recent sting of review mills in Brooklyn). They are given the veneer of truth because they are numbers and the controlling interests (the site, the company) present them as privileged. But this is problematic.

The data may be polluted. The data may be incomplete. The site may only present top listings, reinforcing options that may have been promoted by poor initial data.

What's even more problematic is that none of this matters to many applications of big data as long as something happens. Do you sell more widgets overall? Do you stickify eyeballs to the marketing message?

This creates a false positive of "it works" from an engineering and marketing standpoint. This is engineering- and marketing- driven design, features those roles prioritize because they are implementable and quantitatively testable. Yet they may not serve the user. And they might not answer any real questions.

It doesn’t matter if the science is right so long as customers will pay for it, and they do.

Lanier points out that none of the protective processed that science relies upon – peer review, standards, double-blind tests – are in place in the world of big data. Nor does he see any hint of a drive for these because (see above) people are making big bucks the way things are.

There might be a third way, adopted from the young field of data-driven journalism, though this way may be too uncomfortable for businesses to adopt. DDJ practitioners stress that data != information, let alone answers. As you would never go into a human interview without researching your questions, you have to "interview the data", including making an effort to understand any problems users might have in creating the data, or what their unexpressed needs and drives were. Understanding these can save you from  overlooking holes in the data, which leads to bad data and no answers.

If some of this sounds familiar, it's because you've read some UX person harping on user research. Maybe big data could learn something.

And that's 500 words.