Final Project: Experiments in Silicone Obfuscation

May 13, 2019

[1] Obfuscation is the deliberate addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection (p.1)

[2] Obfuscation, at its most abstract, is the production of noise modeled on an existing signal in order to make a collection of data more ambiguous, confusing, harder to exploit, more difficult to act on, and therefore less valuable (p.46)

About

I began this project with my classmates Julia Rich and Idit Barak inspired by other obfuscation and related work we talked about in class (see definition above - CV Dazzle, Zach Blas, Ani Lui, NIR LED glasses, Kathleen McDermott).

We asked ourselves:

Can we build on this work in the form a personal kit?
Can we use the computer against itself (in form and function)?

Our goal was to create a kit of silicone face adhesives that would make the user anonymous to facial recognition software and that could be carried around in a small purse or bag. The design was important to us and we wanted the form to look angular, similar to the patterns of triangles that facial recognition software uses to break down faces into sections (using the computer against itself).

First, we researched different facial recognition softwares including to better understand how they work. You can read my summaries of these different softwares here. Most use machine learning now, which makes “fooling” them more complicated. We considered using machine learning to figure out what colors, patterns and shapes, but decided first to start with an analogue approach.

Based off of an example our professor had shown us in class, I created a simple web interface using Amazon Rekognition to test our different methods. We tried the following

Making angular paper shapes to change the “shape” of our face. We used plain white paper
Color the paper to be close to our skin tone to see if the computer was more likely to see it as part of our face rather than something in front of it
Applying silicone the same color as our skin tone to try to more realistically change our face structure
Based off of research using machine learning to fool object recognition systems, we tried using images and patterns from these projects

None of this worked very well. We had a few successes, but on the whole, we decided we need a different approach. Based on this initial research, our next step will be to try to use machine learning, similar to the approach taken by the recently published research previously cited. Another route could be to crowdsource ideas by making the webpage public and letting people try different tactics to fool it. We are planning to continue the project use either or both of these methods - hopefully we’ll have an update soon!

If you have an NYU email you can see the final presentation here.

(redacted) code on github here.

Week 7 Reading: ISPs, Regulation and Innovation

March 30, 2019

For this week we read a few articles (linked at the end of this post) on the recent changes in FCC regulations for ISPs. In response, we were asked to answer the question: should ISPs be more regulated or will this stifle innovation. This is my response:

This question is one of the biggest challenges around technology for policymakers - striking a balance between enough regulation to provide protections and too much regulation that will stifle innovation. The internet is still in its infancy, but I believe that recently we have seen enough evidence to know that we need better protections for citizens that use the internet.

The tension outlined in Federal Communications Commission (FCC) Chairman Ajit Pai’s op-ed is an interesting one that I hadn’t fully understood in this debate before reading this: He was against the move in 2015 to regulate the internet “like a public utility” and moving it out from under the Federal Trade Commission’s (FTC) jurisdiction. Internet companies, however remained under the FTC purview when this happened. Pai calls this unfair treatment, which I agree with, but not for the same free-market capitalist reasons that motivate him.

I believe that the internet is and should be treated like a public utility and that both the ISPs and “edge” businesses should be regulated (aka “treated fairly”) in how they use data. But in thinking about other utility companies, I couldn’t think of another that has so many businesses built on top of it. This is new territory and it’s going to take a while to figure out how to regulate this new and changing landscape.

There were a few other arguments in the Forbes article in support of Pai’s op-ed that I didn’t agree with. The argument that the regulation of ISPs would have barely benefited consumers and would pass on extra costs was troubling to me. Perhaps, like congestion pricing, this is the externality we can’t see right now and need to pay for in order to ensure privacy and security. However, another way of thinking about it is that if there were more competition in the ISP space, costs would actually go down for the consumer - the government is the only actor that can make this happen and needs to be more proactive. In a capitalist society, I’m not optimistic that much progress will be made in this direction since internet companies - although not ISPs, according to Pai - profit off of people’s data by serving them ads to buy things. Just because ISPs don’t technically user consumer data in this way doesn’t mean that we don’t need protections around this issue - on the flip side, this is the reason that regulations are crucial. Innovation should not come at the cost of human well-being.

What does the new ISP data-sharing rollback actually change?

What ISPs Can See

The Nullification Of FCC’s Broadband Privacy Rules: What It Really Means For Consumers

No, Republicans didn’t just strip away your Internet privacy rights

Week 6: Midterm

March 13, 2019

User-Generated Computer Vision Dataset

For this midterm, I was able to accomplish part 1 of my project proposal for the user-generated computer vision dataset: making an app where people can draw themselves for the computer and label themselves.

When the server is running, you can see the webpage here: http://68.183.140.103/ and be the watcher here: http://68.183.140.103/watcher.html

Design

To the user, the interaction isn’t much different because I was focused on getting the backend in place. When the user load the page, they are prompted to trace their face and then choose an adjective to describe themselves:

Technical

The code can be found on github here.

I am using node.js with express, websockets to communicate between the server and the pages and nedb to save the adjective and mouse movements in the database.

Getting this functioning, really understanding every step of it, and then being able to control what I wanted was a big challenge for me since this is my first time doing this. I combined examples from Shawn, Dan Shiffman and the internet! And got some very helpful help from Rushali and Shawn - thank you!

Now that I have the data, my next step is to complete part 2: create an output website or app based off of sketchRNN to make predictions based on these drawings. I also still need to consider some of the questions I posted in my previous midterm proposal (conceptual, technical and practical).

Week 5: Midterm Proposal

March 5, 2019

For my midterm, I am planning on building on my project from last week to make it into a user generated and user defined computer vision dataset. It will be made of two parts:

Part 1 - Create a website to allow people to draw how they want the computer to see them and label themselves

Part 2 - Create an output website that would allow people to put in a word and it will draw the corresponding person OR they draw a person and it predicts what kind of person they are drawing

Motivating Questions

Can we force technology to see us how we see ourselves? How we want to be seen?
Can this be used to train technology to see us differently?
Can we create new infrastructure for big data?
Can we create new modes of production for big data?
What type of augmentation does this give to our technology systems?
Can we make new biometric diagrams?
What would digital self portraits look like?
In what ways is obfuscation part of or not part of this process?

So far I have used the below for references in thinking about this work

Caroline Sinders - Feminist Dataset
Zach Blas - Face Cages and Facial Weaponization Suite
CV Dazzle - tips
ML5 sketch https://ml5js.org/docs/sketchrnn-example

Through this project, I hope I’ll learn:

Server management →
- How to create and store a database on a server
- How to pull information from a server
Use machine learning tools such as ML5
- ML5 example: Train sketch RNN on drawings?
- Code for Quick, Draw! which is used in the ML5 example

Other considerations for this first version:

Should there be a time constraint?
Total length of pixel constraint?
Constrain labels to only one or two?
Which order makes most sense/works best? Draw and then label or label and then draw?

These are questions that our professor Shawn asked me during office hours and I’d like to think more about!

Week 4 Activity: Watching through the Browser

February 26, 2019

Lately I have been interested in the process and possibilities of new dataset creation (see this example of a feminist dataset) and what punk ML (coined by Arnab) would look like.

I was thinking about how to use the web tracking and surveillance tools we have learned about for these purposes. For this week’s assignment I set out to make a webpage the re-envisions computer vision to be generated by and labeled by the user.

When people go to this website, they will see a mirror video screen. The website will prompt them to trace their face in as much detail as they want. It will explain that they won’t be able to see their traces, but not to worry, the computer can see them. Then it will prompt them to enter an adjective to describe themselves. Their traced image and their adjective will be captured on the watcher.html side.

The idea is to build an image dataset built, controlled and prescribed the users. What would a machine learning algorithm look like trained on this dataset?

I started by getting to know the code for the remote browser tracking, trying it out and editing it so that it draws instead of moving the div with the mouse.

Then I took a look at the WebRTC code. I wanted to incorporate video, but realized for what I wanted to do I didn’t need to be able to see the video on the other side. All I needed was a p5 sketch, an input box and to track the use mouse movements over that.

So I started by modifying the mouse movement tracking code to create a new div each time to create a line drawing instead of redrawing it following the mouse.

I tested this and it worked (well the first time I spelled one of the variables wrong so it looked like this —>

But then I got it to look like this! And then this!

On the left is what the user sees. On the right is what I see :)

I tested it on myself once I tweaked the appearance:

I tested it on myself and then collected some first samples:

Tom

Alive

Hannah

Messy

Mark

Kind

Veronica

Hungry

Arnab

Hyperactive

It was really interesting to me to see what words people chose to describe themselves when given such an open prompt (you can see the label in the upper left hand corner of the screenshot and I’ve also put it in the caption).

Next will be to make the server save these images and the input into a dataset: images and their labels.

Behind the scenes below…

You can also view the code on github here.

Week 4 Reading: Adapting to Big Brother?

February 25, 2019

This week in class we read an article titled “If you can’t hide from Big Brother, Adapt” by David Brin.

The article outlines four lessons for an era of ubiquitous surveillance or at least ubiquitous surveillance capabilities by government agencies (I’d add companies in there too). These start with recommendations to the surveillers - “limit the number of your henchmen” - and end with a rallying cry for the surveilled - “you can either fight this new era or embrace it”.

I agree that we need to have systems in place to hold governments and companies accountable, but I think in this conversation it would be useful to take a step back and think about what is ethical, moral and necessary. When he talks about citizens being the watchdogs, my next question is ok but where do civilians draw the line? Who defines that and how? To me, these are all ethical and moral questions that we need to reckon with as a society, which in a more practical/policy form would look like be agreed upon universal digital human rights. These would be rights that citizens hold, not that they have to defend themselves. His final utilitarian conclusion is to “find ways to maximize the good and minimize the bad.” The danger with this perspective - and why outlining rights is necessary - is that when he talks about civilians as a generic group, he overlooks the fact that there are tiers of social classes therefore power, which means that there will still be those who don’t have the power, time or money to “adapt with resilience” as he suggests.

I do think that he rightly points out that security and freedom is not a zero sum game. In other words, you can have both at the same time. The question is whether surveillance is necessary and how much in order to have these two things. Ubiquitous surveillance seems be the current conceptual model among governments and the one that Brin has also accepted or resigned himself to, but I’m not sure it’s right - or at least I’m not willing to so quickly accept it.

Week 3: Server Tracking Followup

February 19, 2019

I spoke to my professor Shawn about the strange activity log on my server and he suggested gathering more information. He didn’t think that I set it up wrong, but instead that the logs were from bots.

He suggested adding the following line of code to see the details of the HTTP user-agent accessing my site. Mozilla Web Docs describes the user-agent as:

“The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.”

I tested the server to make sure it was working and then let it run. Looking at the server log, I saw what I expected: visits from bots crawling on the web.

Week 3 Activity: Google Analytics and Facebook Pixel

February 19, 2019

This week our assignment was to install Google Analytics and/or Facebook Pixel and take a look at the data they collect. I installed both on my personal website.

I was expecting to not see anything interesting, but I found out that my site had a lot of traffic on Sunday! Looking at Google Analytics, it said that I had 20 users from France! I have no idea why this would be. I am applying to internships, but I didn’t apply to any there. It also looks like most people accessed my page directly, so this is a mystery.

But then looking at Facebook Pixel, it said that these views came from New York. Looking at my browser history, I confirmed that this was probably mostly me. I wonder why Google got this wrong?

Facebook also suggested who I should advertise to (if I’m understanding the Pixel site correctly):

One interesting thing about having the Pixel Chrome extension installed was that I could see which other sites were using Pixel!

Yes, I was checking when March Madness starts…

Week 3 Reading: Obfuscation

February 19, 2019

This week we read “Surveillance Countermeasures: Expressive Privacy via Obfuscation” by Daniel C. Howe.

I loved this article and all of these ideas.

The sentence, “technology is described as a form of political action, building on work by Langdon Winner and Bruno Latour, who have argued that technical devices and systems may embody political and moral qualities,” reminded me of the first article we read that at one point argued that nothing is purely technical, it is always “sociotechincal.” I agree completely with this statement and don’t think the two can be separated. I think that once technologists understand this, we will have better technologies that don’t just work as tools, but enable our best humanity.

It is powerful that Howe uses the term “datafication” to talk about what is happening to us online - this sounds like commodification and is much more evocative of exploitation and extraction (in the terms described in Why Nations Fail) than “digital breadcrumbs” or “using your data” which companies usually use when describing their data collection and tracking activities.

I also found it beautiful the way the author talked about these acts as poetic renderings of new alternative social spaces on the internet. This reminds me of a book I’m reading right now called @Heaven that is about the WELL, an early internet “message board” (it technically still exists today). I love imagining this time when these spaces did exist and when it seemed possible for the internet to be infinitely customizable, instead of prescribed and conformist - or the “inculcation of obedience” - as Howe describes at the beginning of the article. These projects give me hope that we can get back there.

I tried installing Track Me Not and right away it started searching terms. I looked into the settings and realized that the default is 10 searches per minute!. I like that you can see a little bit of what it is searching displayed over the extension icon. Looking at the search data, I had a couple reactions to that I was surprised by and were counter to what I thought I felt about search data being collected and analyzed for targeted advertising: I wondered, will this start to mess with my Google search results? In this moment I realized that I like that they are so useful and curated to me. Also, I wondered, will this change the ads I see? I hate seeing ads, but I guess ads that are relevant to me are nicer to see than ads for things I don’t like.

But...I also like the feeling of doing this small protest.

Week 2 Activity: 3rd Party Cookies

February 17, 2019

For this week, our assignment was to try tracking users through 3rd party cookies. So instead of only tracking users that came directly to our server site, we could place an image on another web page that would connect to our server, so if anyone went to that webpage, we would know. We learned this week that this is actually what Facebook and other sites do when they place a “like” button on another website.

Thinking about what to do for this assignment, I liked the idea of a creating a “beacon” for people to find, like an internet scavenger hunt or underground network. Could I find people who visited the same places? Could I figure out what connects them? Why am I trying to find them?

Thinking about where I can put beacons, I was limited to websites I can embed source code in and that I have access to. I thought about asking friends to embed something in their websites, but felt uncomfortable with this. I am very interesting in the phenomenon of lurking on the internet - for example, I have noticed that a lot of people watch my Instagram story, but never like my photos and don’t produce much “content” themselves. What if I captured this lurking or lurked back? I have one friend who has over 20k instagram followers so I thought about asking her to post a link on her story (only people with more than 10k followers can do this) and then decided against it for the same reason as before - I felt uncomfortable asking someone else to help me with the tracking.

How it works

First, using Fetch I logged onto the server I made and uploaded the “more user tracking” folder from GitHub with the updated server.js script and image to be used for tracking through 3rd party cookies. When the image is placed in another location on the web, the server will know through 3rd party cookies.

First, I added the default image to different pages on my blog to check to see if the tracking was working. When I visited the different pages, I saw the tracking numbers go up in the terminal output:

Something I noticed was that when I looked at my website with my phone, the server didn’t register that I was returning. However, once I went to the server main page (IP address) and returned to my website with my phone, it remembered my phone. I’m not sure why this is and will have to look into it.

I also tried shutting down terminal and thus shutting down the server and then starting it up again and I found that it remembered me and counted my visits correctly.

I found one of my favorite emojis (and one that seemed to fit the spirit of this assignment), replaced the default image with it in the file package and then put it on the bottom of my personal website using the source code.

I the used the “forever” command in terminal to allow the server to run even if I closed terminal or shut my computer down.

My Test

I kept the server running for a day to see if anyone visited my website. I’m particularly interested in this right now as I’m applying to internships for the summer and was wondering if anyone is actually looking at my portfolio. I decided I wouldn’t visit my blog at all that day (February 12), because that’s normally what skews the analytics on Squarespace.

When I looked at the server log at the end of the day, I had visits from 28 different users and one user who visited twice. I was surprised by this because normally my personal website doesn’t get very much traffic. When I checked this against the Squarespace, their analytics told me I only had one visitor. I am wondering if I did something wrong setting up my server? I let it run using the forever command and when I tested it by visiting my website on my laptop and phone it seemed to work - I’m not sure why I’m getting these weird results! Anyone have any thoughts?

Week 2 Reading: Response to Response to Zuboff and Weinberg Articles

February 17, 2019

This week we read “What is the revenue generation model for DuckDuckGo” and “The Secrets of Surveillance Capitalism.”

Reading these articles back to back, I was struck by the contrast in how the authors (Shoshana Zuboff, professor at HBS and Gabriel Weinberg, DuckDuckGo CEO) view the same problem. Zuboff describes in detail the ways in which data collection on a massive scale (aka big data) and the computing power to store and analyze these data have completely revolutionized the way our economy works – from industrial and manufacturing goods to the mining and cultivation of people’s behavior (what she calls “behavioral surplus”) – in other words, these capabilities have disconnected supply and demand from the needs of people who make up the economy and our society. She calls this “dispossession by surveillance” and makes a very compelling case for why and how this has happened, why it’s bad for citizens and dangerous for our democracy and economy (“harvesting people from the virtual and real world” is the sentence that had the highest creep factor for me…or maybe it was “privately administered compliance regime of rewards and punishments that is free from detection or sanction”). But then Weinberg makes this whole dilemma (but this is how digital companies make money so if they aren’t allowed to collect data in this way how will we have a 21st century economy???) fall apart when he states that all of this data tracking simply isn’t necessary. Wait what?

He describes the DuckDuckGo business model in such a simple way, it almost sounds like we could walk away from all of this, if we wanted to. However, I think that Weinberg’s article actually then supports Zuboff’s case for regulation – most companies will not make the choice to not track people on their own. And I think that Zuboff’s steps for regulating this new “means of production” are the right ones for policymakers to think about.

I am almost embarrassed to admit that I was most struck by when Weinberg says, “using the internet doesn’t have to feel like you’re being watched, listened to, or monitored.” I didn’t realize it until he said it, but that IS what the internet feels like and it wasn’t always that way. Right after reading these articles, I heard a journalist interviewed on Planet Money who gave up the big 5 tech companies. In her own words, her life, “was hell.” I haven’t read it yet, but from her interview, it sounded like because she was no longer being tracked, the internet essentially no longer worked for her.

Finally and importantly, Zuboff I think correctly points to why companies have been able to get away with this kind of tracking and why everyone is surprised that DuckDuckGo has a business model: language. Both articles point out that these companies say that this is necessary for their business model and for their users to have a good experience. Zuboff takes this a step further by pointing out that when talking about the data they collect, they use dismissive terms such as “digital exhaust” and “digital breadcrumbs.” I think this is important to recognize because this is something we can change. What if we started using words like “digital personal property” or “digitally embodied self”?

Week 1 Activity: Simple Server

February 17, 2019

This week our assignment was to set up a simple server.

First, I made an account on Digital Ocean and created a droplet which gave me an IP address for my server and I created a password.

Then I downloaded a file from GitHub that our instructor had made with a basic server framework and functionality to track users that visit the site.

I uploaded this to my server using Fetch.

Then in the terminal I logged on to my server and installed those files and ran the server.

Once I got the message that the server was running I opened it in my browser:

Now onto tracking!

Week 1 Reading: Surveillance and Capture: Two Models of Privacy

February 17, 2019

This week we read a paper titled Surveillance and Capture: Two Models of Privacy by Philip Agre.

When I first started reading this article, I didn’t look at its publication date, but very soon started wondering. The article discussed two frameworks – the surveillance and capture models – and I at first had trouble understanding capture the way he was describing it. When I saw the date, I realized this is because he was trying to describe something that, although ubiquitous today, didn’t exist fully then and there wasn’t a standard language to describe it. There were many interesting and thought-provoking (although extremely obtuse and dense) parts of this article, but I’ll share my reactions to a couple in particular:

“But these systems all participate in a trade-off that goes to the core of computing: a computer – at least as computers are currently understood – can compute only with what it captures; so the less a system captures, the less functionality it can provide to its users.” (p.113)

I used to do public policy research, which involved designing and running large-scale studies to evaluate programs and policies related to education and youth violence prevention. We relied on large administrative datasets from the police department and the public school system. My colleagues and I used to talk about how our jobs didn’t exist when we were in college and they were only made possible by the rise of “big data” and isn’t that amazing?! But we also talked about challenges in measuring life outcomes in this way because there is so much that the data don’t include about a place or a person. Sometimes I wondered if we were measuring reality or just the reality that someone else had made available to us. Agre acknowledges this by pointing out that the capture system is a “correspondence” between reality and how we choose to represent it, something I feel isn’t talked about in the data science community enough.

“The result is a generalized acceleration of economic activity whose social benefits in terms of productive efficiency are clear enough, but whose social costs ought to be a matter of concern.” (p.122)

I was disappointed that he ended by talking about political economy implications and didn’t go into the deeper social concerns he alludes to here (although such a shift in capital generation was probably exciting and groundbreaking at the time!). It is frustrating to read this now and see that leading experts in this field understood the dangerous social implications of data systems at such an early stage. Earlier in the article, he describes electronic capture actions taking on a “performative quality” when there is an audience, which sounded like a very boring and long-winded way of describing how people use Instagram today. As I read this, I started to worry that we are close to the dystopia that he describes, but with a slight twist because in this current version we need to be surveilled because we are surveilling and capturing ourselves.

My fist tagged photo on Facebook. September 6, 2006.

Week 1 Research: Facebook Data Mining

February 4, 2019

Our assignment this week was to look at our Facebook data. I was interested to see what information Facebook has about me, because I haven’t been as active the past couple years. Also, ever since I learned about Facebook using algorithms, I stopped clicking, liking or commenting on posts because I didn’t want Facebook to know what I was interested in. The obvious downside of this was that I got really random things in my newsfeed, sometimes from people I hadn’t seen in forever and didn’t want to see information on (maybe this happens anyways).

Facebook has its user data split up into two categories: Your Information and Information About You. There are many subcategories with a lot of uninteresting information (418 MB of it to be exact), but here are some of the more interesting things I found:

Your Information

Photos and posts: There weren’t many surprises here, but it was fun to look back on my earliest posts. It looks like I joined in late 2006. Looking at these early posts and photos I was reminded of how Facebook was used very differently in the beginning. Since we were in high school, we used it more of a way to send messages between friends. This is more endearing to me than embarrassing – I love seeing the early uses of technologies as people are figuring them out (YouTube and Twitter have incredible early content). It seems so odd now to send this kind of message publicly:

Pokes: This used to be a big thing and a way to flirt with people. Also such an odd concept! I looked at the pokes and realized I never poked two people back 6 years ago! One was an ex and the other was someone I used to carpool with in high school. I felt tempted to poke them back just to close the loop.

There wasn’t too much other interesting information under the “Your Information” section here I was interested in since I’m not super active on Facebook (I don’t have a lot of pages or groups for example). I wish they would make it easier to see trends over time, such as number of posts per month. If I had time I would download the data and do this myself!

Information About You

Ads: Here you can see information about how they’ve categorized you and your interests for the purpose of advertising under section such as Business and industry, news and entertainment, hobbies and activities, shopping and fashion and more. Some of my ad cateogries made sense (“Runner’s World”, “The Economist”), but some were odd (two were related to “Dyeing” and “Dye”?? and “Ford Explorer” under Hobbies). There weren’t as many as I thought – perhaps because I try not to click on ads? I sort of feel like I succeeded in hiding my true self from facebook!

I tried to get the list of advertisers who uploaded information on me to load, but it never did!

I also learned that the ads are based off of the information you import into your profile, which makes sense, but I didn’t originally upload this information knowing it would be sent to advertisers. From this Facebook have also inferred categories about me such as:

“away from hometown”
“away from family”
“frequent traveler”
“close friends of those who live abroad”
“US politics (very liberal)”
“gmail users”

This is only a step away from the information I entered into my profile (it’s easy to infer that I am away from my hometown when I say I’m from Seattle but live in New York), but it feels so much more descriptive and deliberate.

I also learned that you can choose to hide certain types of adds such as ads about alcohol!

Calls and Messages: I realized that Facebook has all of my contacts, but luckily none of my call logs or messages. I’m going to find a way to delete all of my contacts after I finish with this blog post.

Location: I entered my password to get to this section and found out my location history is off (thank goodness!!)

Overall, the data I have on Facebook was similar to what I expected based on the way I use it and the privacy settings I have set in the past.

Latest Posts

Featured

Feb 3, 2019

Week 1: Hourly Comic

Feb 3, 2019

About

More

User-Generated Computer Vision Dataset

Design

Technical

Next

Tom

Alive

Hannah

Messy

Mark

Kind

Veronica

Hungry

Arnab

Hyperactive

How it works

My Test

Your Information

Information About You