
Tony: I'll just get started. I'm Tony Jebara. I'm a professor at Columbia University. Three years ago, I started a company with Greg Skibiski and Alex Pentland. We are basically looking at massive amounts of communication data. We think that's a very rich way of describing what people are doing, what they're interested in. We think that is going to be the next net; it's going to be the next source of important data.
Sense Networks started three years ago. The basic idea was that we want to use location information, which we're all collecting right now; everyone in this room, through their cell phones, is providing information about where they are. Obviously, you're all here at eComm 2009, so that reveals something very important about you.
But before I start talking about that, let's just go back about 15 years ago, to the WorldWide Web and Facebook. Fifteen years ago, there was this new paradigm where we switched from thinking about the data online as a collection of documents, and began thinking of it as a network. What's more important than just the actual data at each node on the web, each website, we really care about looking at the connections between the web pages.
Companies like Google exploited this network connectivity and said this is a network of online places, to help us shuttle around and find interesting things, find what I am looking for in my search engine, but also to provide some interesting websites that might have advertising dollars and advertising revenue for Google.
This idea of moving people around an online network was a very interesting new paradigm, and also this other network, not of places but of people, also started to become very important. We're not just looking at people in terms of their attributes of where they work and what they do, but also the network of friends and people who are like them. You can exploit this other network, as well, for other lucrative opportunities.
For example, you could say, "If I bought something, maybe my friends would also be interested in buying the same product, so, advertise the same product to people who are in my friends' network". That Amazon.com model, for example, if I buy these five books, where someone like me bought an extra book, Amazon.com will suggest that extra book to me.
These types of networks are very useful, but they are about our online personas and these are online places. They're not really tied into the real world. We asked ourselves what is next and how do we move beyond just a network of online places, to a network of real places, instead of a network of our online personas, to a network of our real personas? It's very easy to get online data. We're always online generating that kind of data, but how do we get data about what we're doing in the real world?
It turns out, through the large availability of GPS that we can now start getting a lot of information about what we're doing in the real world. Whenever you use your phone, or any one of these navigation devices or any smart mobile device, or even if you take a taxi, you generate mobile location data.
Sense Networks is in the business of collecting a lot of this data from various partnerships and also through our own direct applications, and then mining this data to do a lot of the things we are doing in the online world, but with off line data. We are doing collaborative filtering, marketing, smart advertising, smart search; all the things we're used to in the online world can now be done with the off line data.
What Sense Networks does is it tries to understand places and people from location data, to enable all these different services. Here is an example of what our data looks like. This is 4 million users, is roughly the dataset sizes we're working with now, and their locations up to the past three years. We are trying to with this data is understand what is going on, what these people are about; if they're consumers, segmenting the consumers for marketing and advertising and promotion purposes and also understanding what places are all about.
Here is an example of a few hundred users in Manhattan, running around commuting. What are these people doing? How can I better understand one of these consumers if they're one of my consumers? How do I know what to sell them or how to advertise to them?
From this massive amount of data, there is actually a lot of very rich information. That's what we're hopefully going to be able to show you, today. For example, we can just look at the information in aggregate and begin correlating the activity of people from their movements with the stock market. It turns out, people start coming into work very early in the financial district, when the DOW Jones starts to drop, which makes sense. When we see a big drop in the DOW Jones, everybody starts coming in early and stops slacking off on in their commutes.
Other interesting things was people going out late at night when the market was doing well, but now if you look at the nightlife of San Francisco, it's actually gotten quieter, just because the market has been slow. These are things in aggregate. We even see some interesting things like, right before bonus time; everybody goes into work extra early. There is a lot of rich information in the aggregate.
We also provide an application that lets you see the density of people, in real time, on a street map. This is San Francisco. You can download this to your iPhone or BlackBerry. It shows you how many people are at every street corner, more or less, in real time. If I go out at night, and I want to find a good restaurant or bar that has a big crowd, I can look on the heat map and do a search for where is the crowd right now, in real time. That tells you where everyone is.
The next version of it is not just where is everybody, but where is everyone like me. That's the first thing people ask us. I don't want to just see a heat map of the activity in the city, where everybody is. I want to see where my crowd is.
In the next version, you will begin seeing a colored version. Each colored dot is a different user who falls into a different social category. There are about a dozen different colors. If you realize you like to hang out with the blue users, and the typically line up with the places you go to, they may be the young and edgy users. They might be the business travelers. You could say, "Where are the business travelers out tonight," in real time, versus the young and edgy students.
To really understand who's in what category or what tribe, we actually build a network of people. This network of people is a little like the Facebook network. It's just that instead of declaring that you're friends with someone, we figure out that you're friends with them or you are like them because you coincide in the same types of places.
If I'm always hanging out with my friends, I don't need to tell Facebook this because our phones are always in close proximity to each other. We actually do this a little more intelligently than just proximity. We can't just look at proximity, alone, to say you are like someone.
You might be like someone because you coincide, not just because you are in the same physical space, but you're also coinciding in the same semantic space. If I go to a coffee shop at Starbucks at 4:00 pm on a Thursday, and you go to a different Starbucks at 4:00 pm on a Thursday, we're kind of coinciding in the same place. It's not the exact same place, but in terms of semantics, it is.
What we need to do is to figure out if user A and user B in our network are actually similar or potential matches as friends on Facebook. It's very hard to compare the raw spaghetti is GPS data. Here is a user that lives in San Francisco and works in Palo Alto; commutes back and forth. User B also lives in San Francisco, but works in San Francisco. It turns out they both work at Tech startups.
How can we tell these are really similar people? We have to figure out if this place is semantically the same as the other place, not just physically the same. We have a network of places, which basically tells us if place A is like place B. There is a network of worldwide web online websites, which connects websites that are similar; they might be half-way across the country. There may be the IBM website and the Sony website. They may be physically located very far apart, but if there are a lot of links between them, then they should be similar.
We're going to learn the links between places by looking at how people move around cities, how they flow around, and also looking at the commercial activity at each place and the demographic activity at each place, to build a network of places.
Here is an example. This is how we figure out if two places are the same. We looked at the way people move around San Francisco and started finding the top two hundred nightlife spots. Those are the dots. We asked, "Do these people in the dot on the left hand side come from similar places and leave that place to go to similar places," as some other dot on the right hand side.
If I look at the people coming into a place, and then the people leaving it, they come from similar types of locations, place A, and they leave to go to the same types of places, as another place, place B, then those two places are the same. There may be a bar people go to after work in the financial district, and then they go to the wealthy neighborhoods near home, afterwards. If I see that the inflow and outflow are the same for two places, they're kind of the same type of place. That's how we figure out if places are similar. We color code the places in the city and figure out what are similar types of places.
These are two hundred top nightspots. Some things we've actually verified. People are now using this to do bar advertising, as one of our advertiser collaborations. You can learn what places are similar to each other, just by watching how people flow in and out of them. You can also just ask, with the census data that we have access to, what is available in terms of commercial activity at this place.
For every street corner, I can tell you if there is a restaurant there, if it's an Asian restaurant, if it's an Italian restaurant, etc. That's another way of describing places throughout the country. We also have the demographic information of all the places in the country.
This really lets us characterize places. If I hang out in the same type of place as you, not the same physical place but the same type of place with the same demographics, the same flow of crowds, and the same commercial activity; we're actually hanging out in very similar places.
That's what we do; we model every one of our users, by seeing what category of place they hang out in, for every hour of your week. If you are a user, one out of four million, we track your location every twenty minutes or so, on some of the users. We can see, at every hour of the week, what area were you hanging out in, what commercial activity were you exposed to, and what demographic were you exposed to.
We build this profile model of each user. For every hour of the week, what are they doing at that hour. There is a 30% chance, on a Thursday, at 8:00 pm, that they're eating at an Italian restaurant. On a Friday, at 4:00 pm, they're having coffee at Starbucks, with 20% chance. That's what we basically do; we describe each one of our users by their exposure in different activities, in different types of places. Here is an example of nine users and each one of these is the model of the user. It's basically, for every week hour, what am I most likely doing at that week hour.
Given this type of data, we can quickly tell you if two people are the same or not, or similar or not, just by looking at how often they coincide, not physically, but semantically. Did I coincide at a coffee shop with you, did I coincide at a restaurant with you, etc. By doing this we can now build a network of users.
We have four million users. This is a subset of a couple of thousand. This is a network of users where you don't say this person is my friend; you get linked if you coincide in the same types of activities. We can analyze large amounts of users this way, for any one of our companies and clients that want to partner with us. Then we can start clustering these users and saying, "It turns out that some of these users are the young and edgy users; some of them are the mature homebody users; somebody is a weekend mall; there is the business traveler user". We've clustered the users into these different categories.
Now that we have this clustering, we have this user network. It's very easy now to start doing things like recommendation engines and marketing and advertising. For example, we can cluster people and identify what category they live in, and identify if they're interested in buying cars because they've actually gone to car dealerships very often, in the past few weeks. We can begin automatically sending promotions because they fit into a certain category and they look like they're interested in cars, just by their physical movement.
Does this work? It turns out for one of our clients; we've modeled somebody's response to advertising on the phone. You can model someone's response to advertising based on their click history, or you can also combine that with their location history, as well. We've seen a 2.5x improvement, if you combine my location with my click behavior, when you are sending the ad. Don't just send me ads based on what I click on and how I use my phone and my profile on my phone, but also look at how I move around the city and what I'm interested in . That helps you target better ads. We've seen a lift in better targeting, this way.
We call this the next net. There was a recent article in Business Week, if you want more details, please visit the website. The basic idea is that if you have location information about users, it's a very rich was of describing what someone is interested in, what category they fit in, and don't just imagine their online data but their offline data. Imagine offline cookies that combine with their online data to give us a better user model and a better marketing search engine, collaborative filtering tool. Thank you.
Audience 1: What kind of privacy issues do you get into? This is kind of scary.
Tony: The privacy issues are that we analyze this data for clients who already have the data about their own users. This might be a phone company or might be a device manufacturer where you've already opted in. You've opted in, saying that you permit this application to get your information. We just analyze the data and give it back. We don't do anything aggressive with the data. We're an analytics company, at the end.
Also, a lot of this data isn't stored in terms of where you were, as a trajectory. We actually just store a model of someone. For all of our users, we don't know where somebody was November 2008; we just know that they like to go to Italian restaurants, with a 28% probability, on Thursdays. At the end, we just store models of the users; we don't store their exact data.
We are very careful about the privacy issues. Everything is secure. We do some [0:15:19.8 unclear] anonymization methods to protect the data so you can't recover somebody exactly. Again, we actually think a lot of this is a lot less invasive than when you compare it to the stuff that's already being collected by Facebook, MySpace, and all this other online data, which is a lot more invasive, I think.
Chair: We live in a civilized society with legal microphones. [laughs]
Audience 2: The fact that someone else is being bad is an answer to whether this is okay?
Tony: It's only bad if you don't give a service in exchange for the information and if you do bad things with the data. We are actually providing the data as a service to people that can give them social recommendations and give them a map of where the users are.
Audience 2: You're going to give me car coupons, but you know what this would be really good for, a police state.
Tony: That's true, but we're not working with a police state to do this. This is an important new source of data and I think companies already collect it. The key is do you want to use this to give better socially relevant applications, and also smarter consumer modeling, or do you want to just throw it away.
Audience 2: Do you see that this could be applied to political campaigning and electioneering?
Tony: Sure, we are not working on those types of projects. It's not necessarily our key target. You could imagine using this information in all sorts of ways, but the point is; the phone companies already have this data. What they want to do with it is to get better understanding of their users. They have your call logs as well, and that's used as well to do churn modeling and to do all sorts of other promotional things. Here, we are giving the location aspect, as well. We're doing the modeling and the analysis. We are not necessarily doing the applications, right now.
Audience 3: The providers that you are sourcing the location knowledge from, are they also open to aligning that for enterprise-specific applications? We can see location being important data related to presence data information, on a more of a real time basis, particularly in an enterprise application, where I have deployed workforce using mobiles, and turning that into mapping applications and things like that. How open are they to aligning that with the devices that are enterprise.
Tony: I think, at the end, we have a number of projects. I would love to discuss more of them in detail, offline with you. Many of the companies are interested in providing a social navigation device, something that's not personally identifiable, but others are interested in personally identifiable information. It depends on the partnerships.
Lee: Any other questions? It's kind of significant what's been put across here. It actually lies at the very heart, in my opinion, of the future of telecoms. It's not a utopia we're going into.
Audience 4: I actually think it's an interesting and exciting application. The question does come back to one of privacy. I'm actually interested in the iPhone app that you have. Are all the points in that database opted in, or are they generically represented from a carrier? Where does that information come from, what we can see live, now from your system?
Tony: For our iPhone app, every time you use the app, it gets your location data. That's how we show you the heat map of all the other users. You provide data, but in exchange, you see where everybody else is. It's anonymized. You can't find any one person with those heat maps, but it tells you where everybody is. Is everybody out tonight? Where should I go? Should I search for restaurants that are crowded with people or that no one ever goes to? In the end, that's the exchange; we don't reveal the information in any way the de-anomymizes the individual. You provide your data in exchange for looking at everybody else's information.
Audience 4: The answer is to opt-in, right now.
Tony: It's completely opt-in for our users. The phone companies, when you start using their devices, they already have your data without specifically saying opt-in to your location.
Audience 4: They have it for their own internal use though, and I think that's an important distinction. I've already agreed; I'm sure Brad has disagreements with this, that I've already clicked the check box of this giant list of things that says, "I agree that you can use my contact information, as long as you don't sell it". That's where the border resides now, whether that is where it resides in the future or not, whether these companies are going to start selling that in aggregate is really, what worries me.
Tony: The border isn't quite selling the data to somebody else, right now, I think it's should they use this information to better market their products or to run ad networks and things like this. They're not quite going to go and sell it.
Audience 4: An ad network is selling it.
Tony: An ad network that they're working on is not necessarily selling it.
Lee: We have time for one more question because otherwise, we eat into the social networking lunchtime too much. Gentleman, Alan, yeah it's the gentleman. I called you a gentleman; you're lucky. It's because we're out in public, here.
Alan: Then, Mr. Lee, I'll call you Mr. Chairman or we'll find some other fancy name. One question that one other colleague had; how does this compare to Google Latitude?
Tony: Another interesting competitor is Google Latitude because they're showing you the location of your friends, but also, Google has all this data. They're hoping to combine their online advertising with offline mobile advertising information, as well. Again, we were thinking about this for a while before Google started Google Latitude.
We have a buddy finding application as well, which shows you where your friends are, but the key thing is that we're not revealing any information about individuals. We are storing this data and doing clustering, telling you what tribe you're in. At the end, what we're trying to do is to summarize the user by saying is this person a business traveler; is this person a student. It's really trying to figure it out at that level, to put you in one of a couple of dozen categories, for advertising purposes, etc. We're not trying to collect personally identifiable information about somebody in particular.