What Your “Queue” is Saying About You

Matthew Robinson-Loffler, Government Law Review member

      Our secrets, great or small, can now without our knowledge hurtle around the globe at the speed of light, preserved indefinitely for future recall in the electronic limbo of computer memories.  These technological and economic changes in turn have made legal barriers more essential to the preservation of our privacy.[1]


          Did Netflix “out” an Ohio mother?[2]  According to a recent complaint filed in Federal Court for the Northern District of California, the rental company provided data containing the subscriber’s viewing habits and preferences, alleging the company did not sufficiently anonymize the information.  While many people expect to take abuse from friends and family regarding their taste in entertainment—“You’ve watched On Deadly Ground how many times?”—very few expect that information to be shared with the community at large.

          Doe v. Netflix, alleges the DVD rental company, regulated as ‘video tape service provider’ under the Video Protection Privacy Act[3] (VPPA) disclosed the private information of its subscribers, in violation of the VPPA, allowing contestants, and anyone else who may have had access to the data, to de-anonymize the information and identify specific members.  Doe, an Ohio resident is suing individually, alongside three other plaintiffs; each seeking to represent a U.S. Resident class; a U.S. Resident and California Resident class; and a U.S. Injunctive Class.[4]

I. The Law

          In 1988, Congress passed the VPPA in response to the leaking of Supreme Court Nominee Robert Bork’s video rental history to a Washington newspaper.[5]  The VPPA prohibits “video tape service provider[s]” from “disclos[ing] to any person, personally identifiable information concerning any consumer . . . .”[6]  Personally identifiable information “includes information which identifies a person as having requested or obtained specific video materials or services from a video tape service provider.”[7]  Any individual whose personally identifiable information has been disclosed in a way not permitted by the act may recover “actual damages but not less than liquidated damages in an amount of $2500; punitive damages; reasonable attorney’s fees and other litigation costs reasonably incurred; and such other preliminary and equitable relief as the court determines to be appropriate.”[8]

 II. How Netflix Works

            For a range of fees, subscribers to Netflix are permitted to request physical DVDs or if there plan permits, stream content at home on their computers.[9]  Subscribers can search titles “by genre, new releases, top 100 or critics’ picks.”[10]  Alongside traditional film genres such as “Action & Adventure”, Netflix also includes a “Gay & Lesbian” themed genre which includes various gay and lesbian subcategories.[11]  As subscribers select and view titles, a record is made of each selection.  This is similar to any traditional record retained by a brick and mortar video rental store. 

          Additionally, subscribers are encouraged to rate each selection according to how many stars they feel the title deserves, one through five.[12]  All of the information is gathered in an attempt to provide the subscriber with future recommendations, recommendations Netflix hopes its subscribers will enjoy as much as the movies they have previously selected.[13]  The name of this software is Cinematch.[14]  Unlike simple or “traditional” record retention however, adding values indicating how much a particular viewer either enjoyed or disliked a particular film adds a second layer to the record, and according to the complaint, increases the likelihood that an individual can be identified.[15] 

III. The Contest

            In October of 2006, Netflix launched an open contest to improve its recommendations system stating, “[Cinematch’s] job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies.  We use those predictions to make personal movie recommendations based on each customer’s unique tastes.”[16]  The company offered a grand prize of one million dollars to the individual or team that could increase the success of the Cinematch system by 10%, provided that the team, among other things, “share [the] method with (and non-exclusively license it to) Netflix.”[17]  According to the complaint, contestants were given two separate data sets; first a training set, then a larger qualifying set.[18]  The training set included:

100 million subscriber movie ratings.  The ratings had been submitted by approximately 480,000 subscribers between October, 1998 and December, 2005 for approximately 18,000 moves.  Each of the 100 million ratings included a numeric identifier unique to the subscriber, movie title, movie year of release, date of subscriber rating, and the rating of one to five stars assigned by the subscriber.[19]

           The qualifying set used an identical method of assigning the unique number but included 2.8 million ratings.[20]  It is alleged that a contestant could use this information in conjunction with other internet databases, such as the Internet Movie Database (IMDB), to identify, or de-anonymize the Netflix data sets.[21] 

IV. The Complaint

           Arguing that Netflix should have been aware of the risk, the complaint alleges that “data mining of multiple Internet databases has been publicly discussed for nearly a decade.”[22]  Furthermore, Netflix should have been aware of the possibility of its data sets becoming de-anonymized because “two months before . . . AOL’s research department released a compressed text file containing twenty million search keywords for over 650,000 users over a three month period” from which reporters, not experts, were able to “identify individuals represented in the AOL database.”[23]

          In addition to the warning sign present in the business community, Doe contends Netflix should have been aware that two researchers, graduate student Arvind Narayanan and Professor Vitaley Shmatikov from the University of Texas at Austin, were in fact reversing the anonymization process, as reported in “numerous press outlets.”[24]

          One such report highlighted the ability of the researches to indentify “one of the people” who “had strong – ostensibly private – opinions about some liberal and gay-themed films . . . .”[25] (emphasis added).  The complaint names this “The ‘Brokeback Mountain’ Factor.”[26]  It alleges that the data released may reveal a subscriber’s “private information such as sexuality, religious beliefs, or political affiliations” as well as the potential for revealing “personal struggles with issues such as domestic violence, adultery, alcoholism, or substance abuse.”[27]  Quoting the researcher who was successful at de-anonymizing the data set, the complaint acknowledges that inferences drawn from ones viewing history may not be conclusive of one’s proclivities, “in many workplaces and social settings opinions about movies with predominantly gay themes such as ‘Bent’ and ‘Queer as folk’ (both present and rated in this person’s Netflix record) would be considered sensitive.”[28]


            While it is far too early to speculate on the impact Doe v. Netflix may have, it does raise a number of interesting questions.  First, if identification of Netflix subscribers required the use of secondary information already existing on the internet, information likely created by the user with little or no expectation of privacy—movie reviews and ratings on sites like IMDB—to what extent can a company like Netflix be held liable for how third parties use that data in conjunction with the anonymized data set provided by Netflix?  Would this require companies to scour the internet in an attempt to find potential corresponding data sets? 

          In the least, it is likely to make companies think twice about these types of contests which, while they benefit the company, also benefit the consumer by creating new and useful advances in service.  To what degree will innovation be curtailed in order to protect individual’s private information?  While this complaint raises a number of questions, one thing seems certain as an increasing amount of personal data becomes available, things like your Netflix queue might be telling people more than what you want them to know.

[1] Complaint ¶ 65, Doe v. Netflix, No. C09 05903 (N.D. Cal. filed Dec. 12, 2009) (quoting Shulman v. Group W, 18 Cal. 4th 200, 243–44 (1998) (Kennard, J., concurring).

[2] Id. ¶ 18.

[3] Id. ¶ 87; 18 U.S.C. § 2710(a)(4) (2006).

[4] Complaint, supra note 1, ¶¶ 15–18.

[5] Sarah Ludington, Reining in the Data Traders: A Tort for the Misuse of Personal Information, 66 Md. L. Rev. 140, 152 (2006).

[6] § 2710(b)(1).

[7] § 2710(a)(3).

[8] § 2710(c)(1)–(2).

[9] Netflix, About Netflix, http://www.netflix.com/MediaCenter?id=5379#about (last visited Mar. 25, 2010).

[10] Netflix, Netflix Features, http://www.netflix.com/MediaCenter?id=5379#features (last visited Mar. 25, 2010).

[11] Netflix, http://www.netflix.com/BrowseSelection (last visited Mar. 25. 2010).

[12] Netflix, Netflix Facts, http://www.netflix.com/MediaCenter?id=5379#facts (last visited Mar. 25, 2010).

[13] Netflix, Netflix Prize: Review Rules, http://www.netflixprize.com//rules (last visited Mar. 25, 2010) “Netflix is all about connecting people to the movies they love. To help customers find those movies, we’ve developed our world-class movie recommendation system: CinematchSM”  Id.

[14] Id.

[15] Complaint, supra note 1, ¶ 65.  “Plaintiffs’ and class members’ movie and rating data contain information of a more highly personal and sensitive nature. A Netflix member’s movie data may reveal that member’s private information such as sexuality, religious beliefs, or political affiliations. Such data may also reveal a member’s personal struggles with issues such as domestic violence, adultery, alcoholism, or substance abuse (Emphasis added).

[16] Netflix, Netflix Prize: Review Rules, http://www.netflixprize.com//rules (last visited Mar. 25, 2010).

[17] Id.

[18] Complaint, supra note 1, ¶¶ 32a–b. 

[19] Id. ¶ 32a.

[20] Id. ¶ 32b.

[21] Id. ¶ 35.

[22] Id.

[23] Id. ¶ 36.

[24] Id. ¶ 37.

[25] Id.; see Robert Lemos, Researchers Reverse Netflix Anonymization, Security Focus, Dec. 4, 2007, http://www.securityfocus.com/news/11497.

[26] Complaint, supra note 1, ¶¶ 64–72.

[27] Id. ¶ 65.

[28] Id. ¶ 68.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s