Author Topic: Breaking Prediction Methods (Read 3924 times)

nslay · « **on:** June 15, 2011, 01:08:32 am »

Came across this:
http://www.guardian.co.uk/technology/2011/jun/14/google-instant-pages-web-search

Nifty idea, but I'm increasingly concerned about the prediction power of advertisement firms, search engines, social media, online stores, etc...

Based on trivial actions such as clicks or search queries, they can build very accurate profiles about you and even fill in the blanks very accurately. What do they do with this information? Beats me ... they don't tell anyone. Which information do they collect? They don't tell you that either ... it's all disguised into seemingly harmless and nifty services for a mostly gullible and unsuspecting population.

Let's take Facebook "Like" for example ... it's brilliant! People want their friends to know what they like. Why is this a feature in Facebook? Duh, tells Facebook what individuals like and what the population likes on average. Nobody thinks of that and it's so harmless that nobody realizes its just a wee bit of privacy's blood (until it bleeds to death).

The ability of these firms to collect and use information as they see fit, and build very accurate public and personal profiles without your consent is very troubling. These prediction tools with a wealth of training data are so potentially powerful that these companies probably know more about you than yourself or anyone you know.

So, how can we make it harder for them? Well, aside of knowing the details of their methods ... one could probably employ comical methods to every day life:

Deliberately do something you wouldn't normally do. For example, search for something you've never searched for. Click on something you wouldn't normally click on.
Query non-sense. One can introduce incompatibilities into search queries:
Quote
- Assemble magic 8 balls using only rooster sounds
- Using gmail to grill giraffe meat
- Plant cacti in edible jello
- Ancient medicine cows and their effect on a future octopus race
- The smell of the color blue
- Spicy sauce for triangles
Ambiguous search queries. One can phrase search queries with no clear interpretation.
Quote
- Attack bears with lasers
Are these attack bears equipped with lasers or are we wanting to attack bears with lasers?
Like, or similar, something you don't really like (i.e. mediocre). This introduces bias into predictions about what you and the population like.
Append unrelated keywords into emails and messages
Quote
P.S. I talked to Cisco about U2 and Nike Shoes. I like KFC with Intel processors.

Not sure how much you would have to deviate from your normal behavior for it to be effective ... but if everyone did that, it'd probably be a lot harder to make predictions.

I think non-sense can be filtered out, if NLP can even solve comprehension problems yet ...

deadly7 · « **Reply #1 on:** June 15, 2011, 01:49:47 am »

The amount of people that would care to do this is such a statistically insignificant portion of Facebook users that it wouldn't even register as anomolous to companies that record this data. I have Facebook friends who will, in a given day, like 10-15 things, and are now at five to six thousand Liked pages. I don't have the time, and more importantly don't care enough, to make those kinds of numbers.

Also, if you build up enough of a profile clicking weird, obscure shit, that is as personally identifying as anything else. I think I read a math or CS paper describing that the number of people to "Like" consecutive groups decreased by a factor of n or n^2, so after X number of likes there was a 95+% probability you could be uniquely identified. I may be wrong though.

The only option is to take your ball and go home. Or do what I do and do almost nothing on Facebook. I have a "status update" once every 2 weeks, tops, and maybe 3-4 posts on a Wall a week.

Newby · « **Reply #2 on:** June 15, 2011, 02:49:20 am »

Quote from: nslay on June 15, 2011, 01:08:32 am

Not sure how much you would have to deviate from your normal behavior for it to be effective ... but if everyone did that, it'd probably be a lot harder to make predictions.

So what you're saying is if you got everybody to introduce noise into the system consistently and effectively (e.g. not just one random word at the end), it probably would make it a lot harder to make predictions. This is assuming, of course, that they don't modify their algorithm to filter out said noise. How would they do this? Probably the same way they set up their current system: hire some brilliant engineers and throw a ton of money at them until they produce something.

So basically you've got to convince hundreds of thousands (if not millions) of stupid ignorant internet users that their privacy is worth something. Then you have to teach them how to mask their surfing habits. And then you have to pray that Google and the like don't adjust to the new method of surfing.

Considering I have problems with paranoia, I have no problem with Google hosting: my school e-mail, my personal e-mail, my contacts list, my surfing habits, my location (when I use GPS on my phone)... the list probably goes on, but those are major ones.

nslay · « **Reply #3 on:** June 15, 2011, 09:41:41 am »

Quote from: Newby on June 15, 2011, 02:49:20 am

Quote from: nslay on June 15, 2011, 01:08:32 am
Not sure how much you would have to deviate from your normal behavior for it to be effective ... but if everyone did that, it'd probably be a lot harder to make predictions.

So what you're saying is if you got everybody to introduce noise into the system consistently and effectively (e.g. not just one random word at the end), it probably would make it a lot harder to make predictions. This is assuming, of course, that they don't modify their algorithm to filter out said noise. How would they do this? Probably the same way they set up their current system: hire some brilliant engineers and throw a ton of money at them until they produce something.

So basically you've got to convince hundreds of thousands (if not millions) of stupid ignorant internet users that their privacy is worth something. Then you have to teach them how to mask their surfing habits. And then you have to pray that Google and the like don't adjust to the new method of surfing.

Considering I have problems with paranoia, I have no problem with Google hosting: my school e-mail, my personal e-mail, my contacts list, my surfing habits, my location (when I use GPS on my phone)... the list probably goes on, but those are major ones.

Exactly. We've been slowly conditioned by these companies to be perfectly alright with increasingly less privacy. I can imagine that twenty or thirty years ago everything you mentioned would result in a massive public outcry. I think people in the past probably valued privacy more than we do now. Considering that, I would say we're both stupid ignorant Internet users who fell for cool toys.

Sidoh · « **Reply #4 on:** June 15, 2011, 10:55:04 am »

Because people from twenty or thirty years ago were always right.

nslay · « **Reply #5 on:** June 15, 2011, 11:02:05 am »

Quote from: Sidoh on June 15, 2011, 10:55:04 am

Because people from twenty or thirty years ago were always right.

Privacy was partly a subject of famous books such as We and 1984. Though, perhaps it was with government in mind. However, people gave serious consideration to the issue of privacy in the near to distant past. Now-a-days, the issue is almost completely ignored or discredited because such personal disclosure is the norm and deemed "normal" or "trivial".

Rule · « **Reply #6 on:** June 16, 2011, 03:26:39 pm »

I agree that we are losing privacy, and I am strongly against things like chromebook. And it's a concern to me too.

However, nowhere near a majority of people are going to go out of their way to introduce a lot of noise into the system. And doing it yourself is probably a waste of time; it's more trouble than it's worth. I do not consciously regulate my internet use, and I doubt that Facebook has much useful information about me. GMail might, but they certainly don't know more about me than myself -- judging from the advertisments they post. I think you somewhat overestimate the power of our present machine learning algorithms.

nslay · « **Reply #7 on:** June 17, 2011, 03:21:01 am »

Quote from: Rule on June 16, 2011, 03:26:39 pm

I agree that we are losing privacy, and I am strongly against things like chromebook. And it's a concern to me too.

However, nowhere near a majority of people are going to go out of their way to introduce a lot of noise into the system. And doing it yourself is probably a waste of time; it's more trouble than it's worth. I do not consciously regulate my internet use, and I doubt that Facebook has much useful information about me. GMail might, but they certainly don't know more about me than myself -- judging from the advertisments they post. I think you somewhat overestimate the power of our present machine learning algorithms.

I think you underestimate or are unaware of the information you provide.

For example, I remember an ICML talk dealing with making predictions about users merely on clicks ... granted it's not your life story.

It's apparently possible to accurately predict your location merely by how you write (Twitter)

Every time you visit a site with Google Ads, you're unknowingly telling Google which sites you visit and in what order (if any) ... so most sites.

Your e-mail content tells a great deal about yourself. You can see Google Ads adjust itself when you login to gmail.

Renting movies tells something about yourself ... I hear Netflix can very accurately predict which movies you'll like (like 90% accuracy?).

Your search queries tell a great deal about yourself ...

What other mindless trivial actions are there that could be used as features?

Individually, these bits of information are not very useful ... but taken as a whole, I think they paint a picture of the user with increasing detail and resolution over time.

Sidoh · « **Reply #8 on:** June 17, 2011, 08:37:07 pm »

Almost all of the machine learning papers using social media data I've read are guilty of really awful experimental procedures: experiments designed to show results, terrible generalizations made, overfitting, etc.

I don't think I'd argue that Google doesn't know a lot about me. They almost certainly do. I just don't think I care very much. Privacy isn't even close to an ultimate end in my mind.

Clan x86

News:

Author Topic: Breaking Prediction Methods (Read 3924 times)

nslay

Breaking Prediction Methods

deadly7

Re: Breaking Prediction Methods

Newby

Re: Breaking Prediction Methods

nslay

Re: Breaking Prediction Methods

Sidoh

Re: Breaking Prediction Methods

nslay

Re: Breaking Prediction Methods

Rule

Re: Breaking Prediction Methods

nslay

Re: Breaking Prediction Methods

Sidoh

Re: Breaking Prediction Methods