What I Did Last Weekend (WSJ Data Transparency Edition)

I was lucky enough to attend the Wall Street Journal Data Transparency Weekend hosted at NYU and work with a fantastic team led by Prof. Ed Felten of Princeton (and the FTC). The project was very data-driven, so the UX work came at the very beginning and the very end: If our concern was surfacing privacy and surveillance issues to users, how can we build the needed database and then present the relevant information? Could we assign a letter grade to sites based on our desired criteria of third-party cookie use, adherence to Do Not Track requests, and allowing users to opt out? Looking at the user needs, we didn't want to provide a site or app that users had to visit separately, load a URL, see the results, and then decide whether to continue or not their everyday browsing and interactions. We realized we could build this as a browser extension; this would be unobtrusive but persistent, and could be hidden or exposed (we later added automated presentation of the site's "grade" in the extension icon, so users could immediately see the site's letter grade). This, we hypothesized, would more powerfully link the experience of visiting a site with knowledge of the site's privacy attitude. It was our hope that this would more likely spur user action based on a state of information, making our extension an effective sousveillance tool.

We crawled the top 500 Alexa sites on 4/14/12 and we logged all cookie downloads that resulted from those crawls. We performed three different crawls:

* first, with a clean-sate browser without any opt-out cookies or do not track requests * second, with the BeefTaco extension active (which downloads most opt-out cookies) * third, with the "Do Not Track" request option selected in the browser

We performed these different crawls to analyze if the sites honored opt out cookies and/or “Do Not Track” requests from the headers. Based on these crawls, we graded the top 500 Alexa sites and relevant third-party networks. Raw data from the crawls will be located at trackingcookie.info in the future for reference.

The resulting privacy grade (from A to F) for sites is based on what they do with their users' data. These grades reflect how well or how poorly that sites utilize their users' data.   We give stellar grades to first-party sites that: • do not allow a large amount of third-party networks to be called on their site (and do not let a lot of third-party networks to download tracking cookies on the visitor's browser) • honor both “opt-out” cookies and “do not track” requests

We give poor grades to first-party sites that: • call a lot of third-party networks and then those third-party networks download multiple tracking cookies on the user's browser • call third-party networks which have poor quality scores themselves (because the third parties do not allow for cookie opt out or do not honor "Do Not Track" requests) • continue to track users online behavior even after the users opt-out of online tracking through the use of “opt-out” cookies • continue to track users online behavior if the user turns on the “Do Not Track” option in their browsers

The current iteration of the extension presents this data in a three-pane column view. The left column shows the first-party site name, favicon, Yes/No to the presence of third-party cookies, and a graphic summing up the grade for the first-party site. The center column lists the names and companies of the third-parties (if any): even if users aren't interested in seeing details, a quick glance gives visual indication whether there are any, a few, or many. More advances users can click on any name listed in the center column to progressively reveal more data about each third-party, including details how it scored on our grading criteria. And, as said above, non-technical users can still see, even with the extension hidden, the letter grade as highlighted in the extension icon in the browser's status bar.

TRC_current.png


More information and the download link for the current extension are available here and here.

We also sketched out future direction. We'd like to incorporate a subset of the Mozilla Collusion plug-in to replace the center column with a graphical representation of the discovered third parties that shows their scope and relationships. Users would still be able to progressively disclose or ignore details in the third column.

extension_in_use.png