Final Project: Experiments in Silicone Obfuscation

[1] Obfuscation is the deliberate addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection (p.1)

[2] Obfuscation, at its most abstract, is the production of noise modeled on an existing signal in order to make a collection of data more ambiguous, confusing, harder to exploit, more difficult to act on, and therefore less valuable (p.46)

About

I began this project with my classmates Julia Rich and Idit Barak inspired by other obfuscation and related work we talked about in class (see definition above - CV Dazzle, Zach Blas, Ani Lui, NIR LED glasses, Kathleen McDermott).

We asked ourselves:

  1. Can we build on this work in the form a personal kit?

  2. Can we use the computer against itself (in form and function)?

Our goal was to create a kit of silicone face adhesives that would make the user anonymous to facial recognition software and that could be carried around in a small purse or bag. The design was important to us and we wanted the form to look angular, similar to the patterns of triangles that facial recognition software uses to break down faces into sections (using the computer against itself).

First, we researched different facial recognition softwares including to better understand how they work. You can read my summaries of these different softwares here. Most use machine learning now, which makes “fooling” them more complicated. We considered using machine learning to figure out what colors, patterns and shapes, but decided first to start with an analogue approach.

interface.png

Based off of an example our professor had shown us in class, I created a simple web interface using Amazon Rekognition to test our different methods. We tried the following

  1. Making angular paper shapes to change the “shape” of our face. We used plain white paper

  2. Color the paper to be close to our skin tone to see if the computer was more likely to see it as part of our face rather than something in front of it

  3. Applying silicone the same color as our skin tone to try to more realistically change our face structure

  4. Based off of research using machine learning to fool object recognition systems, we tried using images and patterns from these projects

None of this worked very well. We had a few successes, but on the whole, we decided we need a different approach. Based on this initial research, our next step will be to try to use machine learning, similar to the approach taken by the recently published research previously cited. Another route could be to crowdsource ideas by making the webpage public and letting people try different tactics to fool it. We are planning to continue the project use either or both of these methods - hopefully we’ll have an update soon!

More

If you have an NYU email you can see the final presentation here.

(redacted) code on github here.







Week 7 Reading: ISPs, Regulation and Innovation

For this week we read a few articles (linked at the end of this post) on the recent changes in FCC regulations for ISPs. In response, we were asked to answer the question: should ISPs be more regulated or will this stifle innovation. This is my response:

This question is one of the biggest challenges around technology for policymakers - striking a balance between enough regulation to provide protections and too much regulation that will stifle innovation. The internet is still in its infancy, but I believe that recently we have seen enough evidence to know that we need better protections for citizens that use the internet.

The tension outlined in Federal Communications Commission (FCC) Chairman Ajit Pai’s op-ed is an interesting one that I hadn’t fully understood in this debate before reading this: He was against the move in 2015 to regulate the internet “like a public utility” and moving it out from under the Federal Trade Commission’s (FTC) jurisdiction. Internet companies, however remained under the FTC purview when this happened. Pai calls this unfair treatment, which I agree with, but not for the same free-market capitalist reasons that motivate him.

I believe that the internet is and should be treated like a public utility and that both the ISPs and “edge” businesses should be regulated (aka “treated fairly”) in how they use data. But in thinking about other utility companies, I couldn’t think of another that has so many businesses built on top of it. This is new territory and it’s going to take a while to figure out how to regulate this new and changing landscape.

There were a few other arguments in the Forbes article in support of Pai’s op-ed that I didn’t agree with. The argument that the regulation of ISPs would have barely benefited consumers and would pass on extra costs was troubling to me. Perhaps, like congestion pricing, this is the externality we can’t see right now and need to pay for in order to ensure privacy and security. However, another way of thinking about it is that if there were more competition in the ISP space, costs would actually go down for the consumer - the government is the only actor that can make this happen and needs to be more proactive. In a capitalist society, I’m not optimistic that much progress will be made in this direction since internet companies - although not ISPs, according to Pai - profit off of people’s data by serving them ads to buy things. Just because ISPs don’t technically user consumer data in this way doesn’t mean that we don’t need protections around this issue - on the flip side, this is the reason that regulations are crucial. Innovation should not come at the cost of human well-being.

What does the new ISP data-sharing rollback actually change?

What ISPs Can See

The Nullification Of FCC’s Broadband Privacy Rules: What It Really Means For Consumers

No, Republicans didn’t just strip away your Internet privacy rights

Week 6: Midterm

User-Generated Computer Vision Dataset

For this midterm, I was able to accomplish part 1 of my project proposal for the user-generated computer vision dataset: making an app where people can draw themselves for the computer and label themselves.

When the server is running, you can see the webpage here: http://68.183.140.103/ and be the watcher here: http://68.183.140.103/watcher.html

Design

To the user, the interaction isn’t much different because I was focused on getting the backend in place. When the user load the page, they are prompted to trace their face and then choose an adjective to describe themselves:

Screenshot 2019-03-13 02.23.18.png
Screenshot 2019-03-13 02.25.05.png

Technical

The code can be found on github here.

I am using node.js with express, websockets to communicate between the server and the pages and nedb to save the adjective and mouse movements in the database.

Getting this functioning, really understanding every step of it, and then being able to control what I wanted was a big challenge for me since this is my first time doing this. I combined examples from Shawn, Dan Shiffman and the internet! And got some very helpful help from Rushali and Shawn - thank you!

Next

Now that I have the data, my next step is to complete part 2: create an output website or app based off of sketchRNN to make predictions based on these drawings. I also still need to consider some of the questions I posted in my previous midterm proposal (conceptual, technical and practical).

Week 5: Midterm Proposal

For my midterm, I am planning on building on my project from last week to make it into a user generated and user defined computer vision dataset. It will be made of two parts:

Part 1 - Create a website to allow people to draw how they want the computer to see them and label themselves

Part 2 - Create an output website that would allow people to put in a word and it will draw the corresponding person OR they draw a person and it predicts what kind of person they are drawing

Motivating Questions

  • Can we force technology to see us how we see ourselves? How we want to be seen?

  • Can this be used to train technology to see us differently?

  • Can we create new infrastructure for big data?

  • Can we create new modes of production for big data?

  • What type of augmentation does this give to our technology systems?

  • Can we make new biometric diagrams?

  • What would digital self portraits look like?

  • In what ways is obfuscation part of or not part of this process?

So far I have used the below for references in thinking about this work

Through this project, I hope I’ll learn:

  • Server management →

    • How to create and store a database on a server

    • How to pull information from a server

  • Use machine learning tools such as ML5

    • ML5 example: Train sketch RNN on drawings?

    • Code for Quick, Draw! which is used in the ML5 example

Other considerations for this first version:

  1. Should there be a time constraint?

  2. Total length of pixel constraint?

  3. Constrain labels to only one or two?

  4. Which order makes most sense/works best? Draw and then label or label and then draw?

These are questions that our professor Shawn asked me during office hours and I’d like to think more about!

Week 4 Reading: Adapting to Big Brother?

This week in class we read an article titled “If you can’t hide from Big Brother, Adapt” by David Brin.

The article outlines four lessons for an era of ubiquitous surveillance or at least ubiquitous surveillance capabilities by government agencies (I’d add companies in there too). These start with recommendations to the surveillers - “limit the number of your henchmen” - and end with a rallying cry for the surveilled - “you can either fight this new era or embrace it”.

I agree that we need to have systems in place to hold governments and companies accountable, but I think in this conversation it would be useful to take a step back and think about what is ethical, moral and necessary. When he talks about citizens being the watchdogs, my next question is ok but where do civilians draw the line? Who defines that and how? To me, these are all ethical and moral questions that we need to reckon with as a society, which in a more practical/policy form would look like be agreed upon universal digital human rights. These would be rights that citizens hold, not that they have to defend themselves. His final utilitarian conclusion is to “find ways to maximize the good and minimize the bad.” The danger with this perspective - and why outlining rights is necessary - is that when he talks about civilians as a generic group, he overlooks the fact that there are tiers of social classes therefore power, which means that there will still be those who don’t have the power, time or money to “adapt with resilience” as he suggests.

I do think that he rightly points out that security and freedom is not a zero sum game. In other words, you can have both at the same time. The question is whether surveillance is necessary and how much in order to have these two things. Ubiquitous surveillance seems be the current conceptual model among governments and the one that Brin has also accepted or resigned himself to, but I’m not sure it’s right - or at least I’m not willing to so quickly accept it.



Week 3: Server Tracking Followup

I spoke to my professor Shawn about the strange activity log on my server and he suggested gathering more information. He didn’t think that I set it up wrong, but instead that the logs were from bots.

He suggested adding the following line of code to see the details of the HTTP user-agent accessing my site. Mozilla Web Docs describes the user-agent as:

“The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.”

I tested the server to make sure it was working and then let it run. Looking at the server log, I saw what I expected: visits from bots crawling on the web.

Screenshot 2019-02-19 23.26.57.png
Screenshot 2019-02-19 23.27.18.png

Week 3 Activity: Google Analytics and Facebook Pixel

This week our assignment was to install Google Analytics and/or Facebook Pixel and take a look at the data they collect. I installed both on my personal website.

I was expecting to not see anything interesting, but I found out that my site had a lot of traffic on Sunday! Looking at Google Analytics, it said that I had 20 users from France! I have no idea why this would be. I am applying to internships, but I didn’t apply to any there. It also looks like most people accessed my page directly, so this is a mystery.

googlemap.png

But then looking at Facebook Pixel, it said that these views came from New York. Looking at my browser history, I confirmed that this was probably mostly me. I wonder why Google got this wrong?

Screenshot 2019-02-19 22.50.08.png

Facebook also suggested who I should advertise to (if I’m understanding the Pixel site correctly):

facebookad1.png
facebookad2.png
facebookad3.png

One interesting thing about having the Pixel Chrome extension installed was that I could see which other sites were using Pixel!

Yes, I was checking when March Madness starts…

Yes, I was checking when March Madness starts…

Week 3 Reading: Obfuscation

This week we read “Surveillance Countermeasures: Expressive Privacy via Obfuscation” by Daniel C. Howe.

I loved this article and all of these ideas.

The sentence, “technology is described as a form of political action, building on work by Langdon Winner and Bruno Latour, who have argued that technical devices and systems may embody political and moral qualities,” reminded me of the first article we read that at one point argued that nothing is purely technical, it is always “sociotechincal.” I agree completely with this statement and don’t think the two can be separated. I think that once technologists understand this, we will have better technologies that don’t just work as tools, but enable our best humanity.

It is powerful that Howe uses the term “datafication” to talk about what is happening to us online - this sounds like commodification and is much more evocative of exploitation and extraction (in the terms described in Why Nations Fail) than “digital breadcrumbs” or “using your data” which companies usually use when describing their data collection and tracking activities.

Screenshot 2019-02-19 22.06.10.png
Screenshot 2019-02-19 22.06.02.png

I also found it beautiful the way the author talked about these acts as poetic renderings of new alternative social spaces on the internet. This reminds me of a book I’m reading right now called @Heaven that is about the WELL, an early internet “message board” (it technically still exists today). I love imagining this time when these spaces did exist and when it seemed possible for the internet to be infinitely customizable, instead of prescribed and conformist - or the “inculcation of obedience” - as Howe describes at the beginning of the article. These projects give me hope that we can get back there.

Screenshot 2019-02-19 21.48.49.png

I tried installing Track Me Not and right away it started searching terms. I looked into the settings and realized that the default is 10 searches per minute!. I like that you can see a little bit of what it is searching displayed over the extension icon. Looking at the search data, I had a couple reactions to that I was surprised by and were counter to what I thought I felt about search data being collected and analyzed for targeted advertising: I wondered, will this start to mess with my Google search results? In this moment I realized that I like that they are so useful and curated to me. Also, I wondered, will this change the ads I see? I hate seeing ads, but I guess ads that are relevant to me are nicer to see than ads for things I don’t like.

But...I also like the feeling of doing this small protest.

Screenshot 2019-02-19 21.56.40.png

Week 2 Activity: 3rd Party Cookies

For this week, our assignment was to try tracking users through 3rd party cookies. So instead of only tracking users that came directly to our server site, we could place an image on another web page that would connect to our server, so if anyone went to that webpage, we would know. We learned this week that this is actually what Facebook and other sites do when they place a “like” button on another website.

Thinking about what to do for this assignment, I liked the idea of a creating a “beacon” for people to find, like an internet scavenger hunt or underground network. Could I find people who visited the same places? Could I figure out what connects them? Why am I trying to find them?

Thinking about where I can put beacons, I was limited to websites I can embed source code in and that I have access to. I thought about asking friends to embed something in their websites, but felt uncomfortable with this. I am very interesting in the phenomenon of lurking on the internet - for example, I have noticed that a lot of people watch my Instagram story, but never like my photos and don’t produce much “content” themselves. What if I captured this lurking or lurked back? I have one friend who has over 20k instagram followers so I thought about asking her to post a link on her story (only people with more than 10k followers can do this) and then decided against it for the same reason as before - I felt uncomfortable asking someone else to help me with the tracking.

How it works

First, using Fetch I logged onto the server I made and uploaded the “more user tracking” folder from GitHub with the updated server.js script and image to be used for tracking through 3rd party cookies. When the image is placed in another location on the web, the server will know through 3rd party cookies.

First, I added the default image to different pages on my blog to check to see if the tracking was working. When I visited the different pages, I saw the tracking numbers go up in the terminal output:

first test.png


Something I noticed was that when I looked at my website with my phone, the server didn’t register that I was returning. However, once I went to the server main page (IP address) and returned to my website with my phone, it remembered my phone. I’m not sure why this is and will have to look into it.

I also tried shutting down terminal and thus shutting down the server and then starting it up again and I found that it remembered me and counted my visits correctly.

I found one of my favorite emojis (and one that seemed to fit the spirit of this assignment), replaced the default image with it in the file package and then put it on the bottom of my personal website using the source code.

trackernotice.png
forever.png

I the used the “forever” command in terminal to allow the server to run even if I closed terminal or shut my computer down.



serverlog.png

My Test

I kept the server running for a day to see if anyone visited my website. I’m particularly interested in this right now as I’m applying to internships for the summer and was wondering if anyone is actually looking at my portfolio. I decided I wouldn’t visit my blog at all that day (February 12), because that’s normally what skews the analytics on Squarespace.

When I looked at the server log at the end of the day, I had visits from 28 different users and one user who visited twice. I was surprised by this because normally my personal website doesn’t get very much traffic. When I checked this against the Squarespace, their analytics told me I only had one visitor. I am wondering if I did something wrong setting up my server? I let it run using the forever command and when I tested it by visiting my website on my laptop and phone it seemed to work - I’m not sure why I’m getting these weird results! Anyone have any thoughts?






Week 2 Reading: Response to Response to Zuboff and Weinberg Articles

This week we read “What is the revenue generation model for DuckDuckGo” and “The Secrets of Surveillance Capitalism.”

Reading these articles back to back, I was struck by the contrast in how the authors (Shoshana Zuboff, professor at HBS and Gabriel Weinberg, DuckDuckGo CEO) view the same problem. Zuboff describes in detail the ways in which data collection on a massive scale (aka big data) and the computing power to store and analyze these data have completely revolutionized the way our economy works – from industrial and manufacturing goods to the mining and cultivation of people’s behavior (what she calls “behavioral surplus”) – in other words, these capabilities have disconnected supply and demand from the needs of people who make up the economy and our society. She calls this “dispossession by surveillance” and makes a very compelling case for why and how this has happened, why it’s bad for citizens and dangerous for our democracy and economy (“harvesting people from the virtual and real world” is the sentence that had the highest creep factor for me…or maybe it was “privately administered compliance regime of rewards and punishments that is free from detection or sanction”). But then Weinberg makes this whole dilemma (but this is how digital companies make money so if they aren’t allowed to collect data in this way how will we have a 21st century economy???) fall apart when he states that all of this data tracking simply isn’t necessary. Wait what?

He describes the DuckDuckGo business model in such a simple way, it almost sounds like we could walk away from all of this, if we wanted to. However, I think that Weinberg’s article actually then supports Zuboff’s case for regulation – most companies will not make the choice to not track people on their own. And I think that Zuboff’s steps for regulating this new “means of production” are the right ones for policymakers to think about.

I am almost embarrassed to admit that I was most struck by when Weinberg says, “using the internet doesn’t have to feel like you’re being watched, listened to, or monitored.” I didn’t realize it until he said it, but that IS what the internet feels like and it wasn’t always that way. Right after reading these articles, I heard a journalist interviewed on Planet Money who gave up the big 5 tech companies. In her own words, her life, “was hell.” I haven’t read it yet, but from her interview, it sounded like because she was no longer being tracked, the internet essentially no longer worked for her.

Finally and importantly, Zuboff I think correctly points to why companies have been able to get away with this kind of tracking and why everyone is surprised that DuckDuckGo has a business model: language. Both articles point out that these companies say that this is necessary for their business model and for their users to have a good experience. Zuboff takes this a step further by pointing out that when talking about the data they collect, they use dismissive terms such as “digital exhaust” and “digital breadcrumbs.” I think this is important to recognize because this is something we can change. What if we started using words like “digital personal property” or “digitally embodied self”?

Week 1 Activity: Simple Server

This week our assignment was to set up a simple server.

First, I made an account on Digital Ocean and created a droplet which gave me an IP address for my server and I created a password.

Screenshot 2019-02-17 09.50.23.png

Then I downloaded a file from GitHub that our instructor had made with a basic server framework and functionality to track users that visit the site.

I uploaded this to my server using Fetch.

Screenshot 2019-02-10 21.20.50.png

Then in the terminal I logged on to my server and installed those files and ran the server.

Screenshot 2019-02-04 12.06.20.png

Once I got the message that the server was running I opened it in my browser:

Screenshot 2019-02-04 12.06.15.png


Now onto tracking!






Week 1 Reading: Surveillance and Capture: Two Models of Privacy

This week we read a paper titled Surveillance and Capture: Two Models of Privacy by Philip Agre.

When I first started reading this article, I didn’t look at its publication date, but very soon started wondering. The article discussed two frameworks – the surveillance and capture models – and I at first had trouble understanding capture the way he was describing it. When I saw the date, I realized this is because he was trying to describe something that, although ubiquitous today, didn’t exist fully then and there wasn’t a standard language to describe it. There were many interesting and thought-provoking (although extremely obtuse and dense) parts of this article, but I’ll share my reactions to a couple in particular:

“But these systems all participate in a trade-off that goes to the core of computing: a computer – at least as computers are currently understood – can compute only with what it captures; so the less a system captures, the less functionality it can provide to its users.” (p.113)

I used to do public policy research, which involved designing and running large-scale studies to evaluate programs and policies related to education and youth violence prevention. We relied on large administrative datasets from the police department and the public school system. My colleagues and I used to talk about how our jobs didn’t exist when we were in college and they were only made possible by the rise of “big data” and isn’t that amazing?! But we also talked about challenges in measuring life outcomes in this way because there is so much that the data don’t include about a place or a person. Sometimes I wondered if we were measuring reality or just the reality that someone else had made available to us. Agre acknowledges this by pointing out that the capture system is a “correspondence” between reality and how we choose to represent it, something I feel isn’t talked about in the data science community enough.

“The result is a generalized acceleration of economic activity whose social benefits in terms of productive efficiency are clear enough, but whose social costs ought to be a matter of concern.” (p.122)

I was disappointed that he ended by talking about political economy implications and didn’t go into the deeper social concerns he alludes to here (although such a shift in capital generation was probably exciting and groundbreaking at the time!). It is frustrating to read this now and see that leading experts in this field understood the dangerous social implications of data systems at such an early stage. Earlier in the article, he describes electronic capture actions taking on a “performative quality” when there is an audience, which sounded like a very boring and long-winded way of describing how people use Instagram today. As I read this, I started to worry that we are close to the dystopia that he describes, but with a slight twist because in this current version we need to be surveilled because we are surveilling and capturing ourselves.