March 2009 Archives

Mark: Thank you. Asterisk and Skype is probably not a pairing that a lot of you guys might have thought about a year ago. You would have thought about, "Wow, it would be nice," but didn't seem much like a reality. It is exciting that we have it now getting closer to being a reality.
First of all, what is Skype for Asterisk? You're used to SIP and SIP let's you talk to SIP devices, and Skype let's you talk to Skype devices, which pretty much is, of course, just Skype. It is a generic channel driver for Asterisk, so it supports most of the things that Asterisk channel drivers already support. There is no special programming or anything; you just install it an you can use it.
It supports, of course, the usernames, encryption, end points, and it supports both talking to regular Skype names, any arbitrary Skype name, as well as talking to the SkypeIn, SkypeOut services.
It's really, the first practical Skype gateway from a PBX platform. It allows you to connect this really broad user base of people that are already using Skype, with Asterisk. If you think about Asterisk as a very pragmatic and practical platform for telephony, for business phone systems, Skype has been incredibly successful in the Voice over IP space because it's been a very pragmatic solution for customers to be able to use.
I've often said that the lesson of Skype is really that when you look at open source, open standards, and all those things, their value is really only as strong as their ability to deliver on ease of use, performance, and low cost to the consumer. Even though Skype wasn't open source, wasn't open standards, or any of that stuff, because they were able to deliver value to the customer they were able to get such a tremendous amount of following and a number of users who are obviously accessible to you, through Asterisk.
There have been some other very ugly hacks that involve virtual machines and emulating sound hardware and stuff like that, but the Skype for Asterisk is literally just native Skype code running within the Asterisk environment so it removes this huge, ugly hack factor. It's actually scalable and it integrates nicely in Asterisk with behaviors you would expect.
There are several use cases to go over. The business call centers, of course, is one. Now, anybody who has Skype can contact you and you could register a Skype name for your business and have people call directly into there. It integrates very nicely in and gives you a low-cost way for your people to call you without having to run up 800 number minutes. A lot of customers already have Skype and you could have click-to-dial, and all that kind of stuff.
The other thing, of course, is business PBX. If you like to use Skype for your business communications, which a lot of customers do, even though the IT people typically don't; you can use SkypeIn and SkypeOut minutes to associate with your PBX and you can also have your Skype username that both comes to your Skype client natively, and rings over to the Asterisk PBX. For example, you could have everything unified on the same voice mailbox. When someone calls you, it can ring both your hard phone and your Skype soft phone.
Of course for end users, any application that you have that you would want to expose to a wide variety of users, you could do so via Skype. If you wanted to make a call for the weather, or whatever it is, any kind of IVR, you could connect all that in and allow for calling people back via Skype, as well.
The status of the Beta - because of the complexity of the integration, we decided to start with a closed Beta. We were only going to open it up to a certain number of people, initially, so that we could have engineers working directly with the customers. That started in January of this year. We have over 100 people, but less than 1,000, so far, in the Beta. We will be going to a public Beta, hopefully, very soon. So far, it's been working reasonably well.
There are some big caveats that are very important. First of all, Skype is also in the process of releasing something called the Business Control Panel. Although it's not implemented in the current Beta, Skype is requiring that the usernames you use to register your device with Skype, in other words, the ones you use with the Skype for Asterisk, will all have to be business control panel accounts, which I believe means you are not going to be able to use existing accounts unless you are somehow able to make them part of the business control panel.
This is something that Skype has demanded, so feel free to go tackle him, over there, if you have any concerns about that. There is not much we can do on it, right now. However, there are areas that we are interested in hearing more about, like chat and video. We would definitely like to get your input on priorities about how those features would want to be integrated in, in later revisions of the product.
You dial it kind of as you would expect. It's just another channel driver and you do Skype slash the username. Presence is supported. We do have some AMI events that are generated, to give you a little bit more ability to kind of hook in for some of the Skype-specific stuff that wouldn't be present in other telephony interfaces.
You actually have access, as it turns out, to a lot of the variables in the Skype call. For example, you could use the language that is provided by the user in their configuration to give them IVR in their native language of choice. Some of the others, I guess you could wish them a happy birthday if they called on their birthday. I don't know exactly what you would do with that. You can see there is a lot of demographic information that may be able to be helpful, if for nothing else, from a logging and statistic gathering point of view.
There is a shortened URL, if you want to sign up for the Beta for when it does go to the public Beta. I will save the rest of the time for questions. Your questions can be about Asterisk generally, or specific to the Skype for Asterisk.
Audience 1: Is this going to use the Asterisk jitter buffer, or Skype's jittering, fancy de-jittering technology?
Mark: It's going to use the Skype de-jittering, but it will use the Asterisk native codecs. Essentially, all the existing codec work you do with Asterisk, already, will be used. That is all native. It doesn't have to get retranslated.
Audience 2: What caller ID will it actually pass?
Mark: The caller ID will be based on the Skype username that you are using to place the call. You can register multiple Skype usernames with it and then say, "I want this call to come from this particular Skype username". On inbound, it will obviously be the phone number associated with the caller. Typically, it would just be their Skype name.
Audience 3: So if you put it under the corporate name...
Mark: Right, it would go out, for example if it was Digium, the word Digium would be the caller ID and then the name would be whatever name you had associated with it.
Audience 4: You can use a mobile number as your Skype caller ID, instead of your regular Skype ID. Will the...
Mark: I'm not really sure if we've set it up for that. I hadn't thought about that, in terms of allowing your mobile number, whatever you have listed in Skype as your actual phone number...
Audience 4: It becomes handy for getting into things like CauliFlower, and so on.
Mark: Okay, I will have to get back to you on that one. You stumped me on that one.
Audience 5: Mark, if you're phoning someone like 1-800-go-fedex through Asterisk to Skype, does early media work, and can we get the DTMF into that?
Mark: The early media is only unidirectional, right now, in Skype. I don't believe you can transmit media before the call is through. FedEx, as a specific example, probably wouldn't work, although now that you mention it, I don't know how you do that through SkypeOut anyway. That's a good follow up question.
Audience 6: There was a question about the jitter buffer. Can you elaborate a little bit about the media exchange, going from a SIP endpoint to Skype? It's starting out on RTP, and then are you guys doing a virtual driver, or are you writing a wav file to a file and then playing it out?
Mark: The audio comes in over RTP, into Asterisk, and Asterisk routes it out the Skype interface.
Audience 7: I have two questions. One has to do with security. At what point does the Skype call cease to be encrypted? The other question is will you be allowing Skype-to-Skype re-routing, as opposed to a PSTN to Skype? What can I do, through this application, transfer a call to another Skype user?
Mark: The first question is when does it get unencrypted. It gets unencrypted when it hits Asterisk, obviously it has to be to be able to convert to other media. I think the second question was can you have a Skype call in and then send a Skype call back out. The answer is yes, you can. I don't know if it's going to be out in the first release, but there is a way to transfer that call off. Otherwise, you would be in the middle of that call and you would essentially be getting the media, decrypting it and re-encrypting it on the other leg of the call.
Audience 8: Mark, is there a charge for the [0:11:56.7 unclear] Skype? Has that been determined, yet?
Mark: The current plan is that they would be sold, more or less, like the G.729 licenses, but there hasn't been any kind of formal price announcement or anything like that. Hopefully, as we get closer to the final Beta, we'll be able to confirm what the terms of that will be.
Audience 9: I'm just curious; there is something I heard about, called SkyHost. Is that part of this solution?
Mark: That's basically a name for the API that's being used, yes.
Thank you very much.
Chair: Great job, thanks Mark.

Tony: I'll just get started. I'm Tony Jebara. I'm a professor at Columbia University. Three years ago, I started a company with Greg Skibiski and Alex Pentland. We are basically looking at massive amounts of communication data. We think that's a very rich way of describing what people are doing, what they're interested in. We think that is going to be the next net; it's going to be the next source of important data.
Sense Networks started three years ago. The basic idea was that we want to use location information, which we're all collecting right now; everyone in this room, through their cell phones, is providing information about where they are. Obviously, you're all here at eComm 2009, so that reveals something very important about you.
But before I start talking about that, let's just go back about 15 years ago, to the WorldWide Web and Facebook. Fifteen years ago, there was this new paradigm where we switched from thinking about the data online as a collection of documents, and began thinking of it as a network. What's more important than just the actual data at each node on the web, each website, we really care about looking at the connections between the web pages.
Companies like Google exploited this network connectivity and said this is a network of online places, to help us shuttle around and find interesting things, find what I am looking for in my search engine, but also to provide some interesting websites that might have advertising dollars and advertising revenue for Google.
This idea of moving people around an online network was a very interesting new paradigm, and also this other network, not of places but of people, also started to become very important. We're not just looking at people in terms of their attributes of where they work and what they do, but also the network of friends and people who are like them. You can exploit this other network, as well, for other lucrative opportunities.
For example, you could say, "If I bought something, maybe my friends would also be interested in buying the same product, so, advertise the same product to people who are in my friends' network". That Amazon.com model, for example, if I buy these five books, where someone like me bought an extra book, Amazon.com will suggest that extra book to me.
These types of networks are very useful, but they are about our online personas and these are online places. They're not really tied into the real world. We asked ourselves what is next and how do we move beyond just a network of online places, to a network of real places, instead of a network of our online personas, to a network of our real personas? It's very easy to get online data. We're always online generating that kind of data, but how do we get data about what we're doing in the real world?
It turns out, through the large availability of GPS that we can now start getting a lot of information about what we're doing in the real world. Whenever you use your phone, or any one of these navigation devices or any smart mobile device, or even if you take a taxi, you generate mobile location data.
Sense Networks is in the business of collecting a lot of this data from various partnerships and also through our own direct applications, and then mining this data to do a lot of the things we are doing in the online world, but with off line data. We are doing collaborative filtering, marketing, smart advertising, smart search; all the things we're used to in the online world can now be done with the off line data.
What Sense Networks does is it tries to understand places and people from location data, to enable all these different services. Here is an example of what our data looks like. This is 4 million users, is roughly the dataset sizes we're working with now, and their locations up to the past three years. We are trying to with this data is understand what is going on, what these people are about; if they're consumers, segmenting the consumers for marketing and advertising and promotion purposes and also understanding what places are all about.
Here is an example of a few hundred users in Manhattan, running around commuting. What are these people doing? How can I better understand one of these consumers if they're one of my consumers? How do I know what to sell them or how to advertise to them?
From this massive amount of data, there is actually a lot of very rich information. That's what we're hopefully going to be able to show you, today. For example, we can just look at the information in aggregate and begin correlating the activity of people from their movements with the stock market. It turns out, people start coming into work very early in the financial district, when the DOW Jones starts to drop, which makes sense. When we see a big drop in the DOW Jones, everybody starts coming in early and stops slacking off on in their commutes.
Other interesting things was people going out late at night when the market was doing well, but now if you look at the nightlife of San Francisco, it's actually gotten quieter, just because the market has been slow. These are things in aggregate. We even see some interesting things like, right before bonus time; everybody goes into work extra early. There is a lot of rich information in the aggregate.
We also provide an application that lets you see the density of people, in real time, on a street map. This is San Francisco. You can download this to your iPhone or BlackBerry. It shows you how many people are at every street corner, more or less, in real time. If I go out at night, and I want to find a good restaurant or bar that has a big crowd, I can look on the heat map and do a search for where is the crowd right now, in real time. That tells you where everyone is.
The next version of it is not just where is everybody, but where is everyone like me. That's the first thing people ask us. I don't want to just see a heat map of the activity in the city, where everybody is. I want to see where my crowd is.
In the next version, you will begin seeing a colored version. Each colored dot is a different user who falls into a different social category. There are about a dozen different colors. If you realize you like to hang out with the blue users, and the typically line up with the places you go to, they may be the young and edgy users. They might be the business travelers. You could say, "Where are the business travelers out tonight," in real time, versus the young and edgy students.
To really understand who's in what category or what tribe, we actually build a network of people. This network of people is a little like the Facebook network. It's just that instead of declaring that you're friends with someone, we figure out that you're friends with them or you are like them because you coincide in the same types of places.
If I'm always hanging out with my friends, I don't need to tell Facebook this because our phones are always in close proximity to each other. We actually do this a little more intelligently than just proximity. We can't just look at proximity, alone, to say you are like someone.
You might be like someone because you coincide, not just because you are in the same physical space, but you're also coinciding in the same semantic space. If I go to a coffee shop at Starbucks at 4:00 pm on a Thursday, and you go to a different Starbucks at 4:00 pm on a Thursday, we're kind of coinciding in the same place. It's not the exact same place, but in terms of semantics, it is.
What we need to do is to figure out if user A and user B in our network are actually similar or potential matches as friends on Facebook. It's very hard to compare the raw spaghetti is GPS data. Here is a user that lives in San Francisco and works in Palo Alto; commutes back and forth. User B also lives in San Francisco, but works in San Francisco. It turns out they both work at Tech startups.
How can we tell these are really similar people? We have to figure out if this place is semantically the same as the other place, not just physically the same. We have a network of places, which basically tells us if place A is like place B. There is a network of worldwide web online websites, which connects websites that are similar; they might be half-way across the country. There may be the IBM website and the Sony website. They may be physically located very far apart, but if there are a lot of links between them, then they should be similar.
We're going to learn the links between places by looking at how people move around cities, how they flow around, and also looking at the commercial activity at each place and the demographic activity at each place, to build a network of places.
Here is an example. This is how we figure out if two places are the same. We looked at the way people move around San Francisco and started finding the top two hundred nightlife spots. Those are the dots. We asked, "Do these people in the dot on the left hand side come from similar places and leave that place to go to similar places," as some other dot on the right hand side.
If I look at the people coming into a place, and then the people leaving it, they come from similar types of locations, place A, and they leave to go to the same types of places, as another place, place B, then those two places are the same. There may be a bar people go to after work in the financial district, and then they go to the wealthy neighborhoods near home, afterwards. If I see that the inflow and outflow are the same for two places, they're kind of the same type of place. That's how we figure out if places are similar. We color code the places in the city and figure out what are similar types of places.
These are two hundred top nightspots. Some things we've actually verified. People are now using this to do bar advertising, as one of our advertiser collaborations. You can learn what places are similar to each other, just by watching how people flow in and out of them. You can also just ask, with the census data that we have access to, what is available in terms of commercial activity at this place.
For every street corner, I can tell you if there is a restaurant there, if it's an Asian restaurant, if it's an Italian restaurant, etc. That's another way of describing places throughout the country. We also have the demographic information of all the places in the country.
This really lets us characterize places. If I hang out in the same type of place as you, not the same physical place but the same type of place with the same demographics, the same flow of crowds, and the same commercial activity; we're actually hanging out in very similar places.
That's what we do; we model every one of our users, by seeing what category of place they hang out in, for every hour of your week. If you are a user, one out of four million, we track your location every twenty minutes or so, on some of the users. We can see, at every hour of the week, what area were you hanging out in, what commercial activity were you exposed to, and what demographic were you exposed to.
We build this profile model of each user. For every hour of the week, what are they doing at that hour. There is a 30% chance, on a Thursday, at 8:00 pm, that they're eating at an Italian restaurant. On a Friday, at 4:00 pm, they're having coffee at Starbucks, with 20% chance. That's what we basically do; we describe each one of our users by their exposure in different activities, in different types of places. Here is an example of nine users and each one of these is the model of the user. It's basically, for every week hour, what am I most likely doing at that week hour.
Given this type of data, we can quickly tell you if two people are the same or not, or similar or not, just by looking at how often they coincide, not physically, but semantically. Did I coincide at a coffee shop with you, did I coincide at a restaurant with you, etc. By doing this we can now build a network of users.
We have four million users. This is a subset of a couple of thousand. This is a network of users where you don't say this person is my friend; you get linked if you coincide in the same types of activities. We can analyze large amounts of users this way, for any one of our companies and clients that want to partner with us. Then we can start clustering these users and saying, "It turns out that some of these users are the young and edgy users; some of them are the mature homebody users; somebody is a weekend mall; there is the business traveler user". We've clustered the users into these different categories.
Now that we have this clustering, we have this user network. It's very easy now to start doing things like recommendation engines and marketing and advertising. For example, we can cluster people and identify what category they live in, and identify if they're interested in buying cars because they've actually gone to car dealerships very often, in the past few weeks. We can begin automatically sending promotions because they fit into a certain category and they look like they're interested in cars, just by their physical movement.
Does this work? It turns out for one of our clients; we've modeled somebody's response to advertising on the phone. You can model someone's response to advertising based on their click history, or you can also combine that with their location history, as well. We've seen a 2.5x improvement, if you combine my location with my click behavior, when you are sending the ad. Don't just send me ads based on what I click on and how I use my phone and my profile on my phone, but also look at how I move around the city and what I'm interested in . That helps you target better ads. We've seen a lift in better targeting, this way.
We call this the next net. There was a recent article in Business Week, if you want more details, please visit the website. The basic idea is that if you have location information about users, it's a very rich was of describing what someone is interested in, what category they fit in, and don't just imagine their online data but their offline data. Imagine offline cookies that combine with their online data to give us a better user model and a better marketing search engine, collaborative filtering tool. Thank you.
Audience 1: What kind of privacy issues do you get into? This is kind of scary.
Tony: The privacy issues are that we analyze this data for clients who already have the data about their own users. This might be a phone company or might be a device manufacturer where you've already opted in. You've opted in, saying that you permit this application to get your information. We just analyze the data and give it back. We don't do anything aggressive with the data. We're an analytics company, at the end.
Also, a lot of this data isn't stored in terms of where you were, as a trajectory. We actually just store a model of someone. For all of our users, we don't know where somebody was November 2008; we just know that they like to go to Italian restaurants, with a 28% probability, on Thursdays. At the end, we just store models of the users; we don't store their exact data.
We are very careful about the privacy issues. Everything is secure. We do some [0:15:19.8 unclear] anonymization methods to protect the data so you can't recover somebody exactly. Again, we actually think a lot of this is a lot less invasive than when you compare it to the stuff that's already being collected by Facebook, MySpace, and all this other online data, which is a lot more invasive, I think.
Chair: We live in a civilized society with legal microphones. [laughs]
Audience 2: The fact that someone else is being bad is an answer to whether this is okay?
Tony: It's only bad if you don't give a service in exchange for the information and if you do bad things with the data. We are actually providing the data as a service to people that can give them social recommendations and give them a map of where the users are.
Audience 2: You're going to give me car coupons, but you know what this would be really good for, a police state.
Tony: That's true, but we're not working with a police state to do this. This is an important new source of data and I think companies already collect it. The key is do you want to use this to give better socially relevant applications, and also smarter consumer modeling, or do you want to just throw it away.
Audience 2: Do you see that this could be applied to political campaigning and electioneering?
Tony: Sure, we are not working on those types of projects. It's not necessarily our key target. You could imagine using this information in all sorts of ways, but the point is; the phone companies already have this data. What they want to do with it is to get better understanding of their users. They have your call logs as well, and that's used as well to do churn modeling and to do all sorts of other promotional things. Here, we are giving the location aspect, as well. We're doing the modeling and the analysis. We are not necessarily doing the applications, right now.
Audience 3: The providers that you are sourcing the location knowledge from, are they also open to aligning that for enterprise-specific applications? We can see location being important data related to presence data information, on a more of a real time basis, particularly in an enterprise application, where I have deployed workforce using mobiles, and turning that into mapping applications and things like that. How open are they to aligning that with the devices that are enterprise.
Tony: I think, at the end, we have a number of projects. I would love to discuss more of them in detail, offline with you. Many of the companies are interested in providing a social navigation device, something that's not personally identifiable, but others are interested in personally identifiable information. It depends on the partnerships.
Lee: Any other questions? It's kind of significant what's been put across here. It actually lies at the very heart, in my opinion, of the future of telecoms. It's not a utopia we're going into.
Audience 4: I actually think it's an interesting and exciting application. The question does come back to one of privacy. I'm actually interested in the iPhone app that you have. Are all the points in that database opted in, or are they generically represented from a carrier? Where does that information come from, what we can see live, now from your system?
Tony: For our iPhone app, every time you use the app, it gets your location data. That's how we show you the heat map of all the other users. You provide data, but in exchange, you see where everybody else is. It's anonymized. You can't find any one person with those heat maps, but it tells you where everybody is. Is everybody out tonight? Where should I go? Should I search for restaurants that are crowded with people or that no one ever goes to? In the end, that's the exchange; we don't reveal the information in any way the de-anomymizes the individual. You provide your data in exchange for looking at everybody else's information.
Audience 4: The answer is to opt-in, right now.
Tony: It's completely opt-in for our users. The phone companies, when you start using their devices, they already have your data without specifically saying opt-in to your location.
Audience 4: They have it for their own internal use though, and I think that's an important distinction. I've already agreed; I'm sure Brad has disagreements with this, that I've already clicked the check box of this giant list of things that says, "I agree that you can use my contact information, as long as you don't sell it". That's where the border resides now, whether that is where it resides in the future or not, whether these companies are going to start selling that in aggregate is really, what worries me.
Tony: The border isn't quite selling the data to somebody else, right now, I think it's should they use this information to better market their products or to run ad networks and things like this. They're not quite going to go and sell it.
Audience 4: An ad network is selling it.
Tony: An ad network that they're working on is not necessarily selling it.
Lee: We have time for one more question because otherwise, we eat into the social networking lunchtime too much. Gentleman, Alan, yeah it's the gentleman. I called you a gentleman; you're lucky. It's because we're out in public, here.
Alan: Then, Mr. Lee, I'll call you Mr. Chairman or we'll find some other fancy name. One question that one other colleague had; how does this compare to Google Latitude?
Tony: Another interesting competitor is Google Latitude because they're showing you the location of your friends, but also, Google has all this data. They're hoping to combine their online advertising with offline mobile advertising information, as well. Again, we were thinking about this for a while before Google started Google Latitude.
We have a buddy finding application as well, which shows you where your friends are, but the key thing is that we're not revealing any information about individuals. We are storing this data and doing clustering, telling you what tribe you're in. At the end, what we're trying to do is to summarize the user by saying is this person a business traveler; is this person a student. It's really trying to figure it out at that level, to put you in one of a couple of dozen categories, for advertising purposes, etc. We're not trying to collect personally identifiable information about somebody in particular.

Lee: We are starting with a keynote from Skype's General Manager, of Audio and Video. Again, thank you to Skype for the support. It has really helped this community. Skype is going to make an important announcement during the next half hour. I think the announcement that is being made is significant. Please welcome Skype's General Manager of Audio and Video, Jonathan Christensen.
Jonathan: Good afternoon. It's great to be back at eComm. I'm going to talk about something that is near and dear to me. I'm going to talk about audio codecs. It may seem a little low level or obscure, but it's something that touches us all in our daily lives.
Human speech is kind of made up of two sounds. There is the voiced speech and the unvoiced speech. Voiced speech is these vowel sounds, "ah," and "oo" and they come from our larynx, our voice apparatus.
There is the unvoiced speech, "f-f" and "k-k," these consonant sounds. When you put those sounds together, you have a frequency range of about 80 to 14,000 Hz. This is the wonderful, full fidelity voice that you are hearing from me, right now, in this room.
By some strange coincidence of nature, the human hearing apparatus also can hear about 80 to 14,000 Hz. Until very recently, the speaker and the listener had to be in physical proximity to be able to hear that full range, to hear anything, in fact, in terms of human evolution, anyway.
More recently, with the advent of the telephone, we got telecommunications and speech from afar, speech from distance. In order to make this thing cost effective, and to reach the mass market with this device, the electronics in the device needed to be limited in a way that they could only allow us to hear between 400 and 3,400 Hz, effective bandwidth of 3 KHz. This is less than ideal, less than a third of all that audio you could hear when in the room together, as we are now, without a speech codec involved.
These devices were analog in the beginning and the electronics were end-to-end. This fundamental limitation wasn't a limitation of the wire or the network, because they were connected end-to-end over an analog electronic signal. It was something that had to do with the electronics and the cost control in the actual device.
Theoretically, this could have been upgraded over time to a wideband experience, to a richer, fuller experience. The old switching gear looked something like this; it was manual. You connected a physical circuit between the endpoints, using a mechanical switch, where the circuits preserved that end-to-end electrical current. This didn't scale very well.
In the 1960's and 1970's, there was a very big upgrade, an upgrade to a more scalable and automated system. This is really the introduction of the digital era in voice. With it, they needed a way to efficiently transport the voice and the core of the network on the digital side.
The analog handsets were still out there in the industry, but the core of the network became digital. PCM and G.711 and the packet-based world were invented. This is a very simple scheme, a scheme that uses 8 KHz samples; every 125th millisecond there is a sample taken. The sample is 13 or 14 bits. Some super simple math is used to encode that to 8 bits. You have 64 Kb on the wire.
As an interesting aside, do you ever wonder why the theoretical limitation over the phone system was 56K for data while you had the 64Kb audio channel and you were shoving as much data over it as possible? That audio channel was end-to-end; it was universal within the PSTN, as a part of that upgrade to those switches. In this world, you have a theoretical limitation of 3.4 KHz of bandwidth.
This is a picture of an audio signal, a speech signal, and this is my magical PowerPoint interpretation of what happens to that when you encode it with PCM. We cut off the top frequencies, we cut off the bottom frequencies, and then we cherry pick data from the middle of that signal. What you end up with is something that is much thinner than the previous picture.
By deploying this global architecture, this huge infrastructure, this massive PSTN investment of CapEx, we effectively locked ourselves into 3.4 KHz for everything that touches the PSTN. The PSTN infrastructure and this basic limitation is never going to change; it's never going to be upgraded.
In about 1998, when we got started with Voice over IP, we had a new opportunity. Because we had an open transport layer, we saw the introduction of the first wideband audio, typically 16 KHz sample rate, roughly 8 KHz of audio bandwidth. If you take MR wideband as an example, you have something like 50 to 7,000 Hz of frequency range.
If you think about the human ear, we are roughly halfway there. This was great, especially on PC applications. By 2003, you had Skype taking wideband to the mainstream. This is because the PC platform was programmable, it was open, and it had the sound interfaces to be able to do this. You see, for the first time, mainstream use of wideband audio.
In Skype 4.0, we introduced something new - super wideband audio, more samples still, more fidelity, and what we think is the next building block for Voice 2.0, which needs attribution to our friend, Alec Saunders, here.
It has rich voice, multi-modal communications, communications that can be supported on a rich and programmable endpoint. Effectively, we went from this over time to this, and now we're going back to this, or in the case of SILK, 50 to 12,000 Hz, the useful frequency range for speech.
I want to do a little demonstration of this sound system in here; you can really appreciate some of this. Let me set it up a little bit. We are going to hear a female voice and a male voice. We are going to hear each one of them encoded back to back, three times, with three different codecs, representing PSTN quality, and traditional wideband quality - what we started with at Skype, and the super wideband - what we're shipping in 4.0. Can you play the audio file?
[audio sample playing]
That was pretty cool, wasn't it? Even in this room, with speakers, it was much ... go ahead; I love it too. [clapping] Let's listen to it one more time. [Laughter]
[audio sample playing]
It's cause for celebration, for sure. Now, we can recognize who is talking in a conference call. We can decipher the difference between those difficult sounds that the PSTN doesn't allow us to decipher, the /f/ sounds and the /s/ sounds. Are we failing or are we sailing this week? It's warmer, richer, and much more like being there. Again, we're making progress.
Especially on the PC side of things, where you have this open platform. You have this programmable platform where you can introduce these things. In the super wideband case, there are some challenges; a lot of sound cards don't support super wideband on the input and the output. I will talk a little bit about how we're addressing that.
But in general, it's a much more accessible platform than some others are. There are still some problems. What if you want to have this experience on an embedded device, on a cordless phone, or on your mobile device? Which codec do you choose? Which transport do you choose?
Even if you can solve all of the transport, open, programmable platform issues, just the basic question about which codec do you use is a daunting one. The space is extremely complicated. I have some friends at Cisco in the room. I think that they coined this idea that "standards - we love them because there are so many to choose from." In this case, it's a pretty complicated mess. There are a lot of tradeoffs between license restrictions, royalties, and quality in this world.
There are some more obvious choices, but no really good ones. It's still really a pain in the neck. You have AMR wideband, which has been standardized in 3GPP. You have Speex and some others. As I say, a tradeoff between quality, cost, and complexity of licensing that makes this a very complicated space for developers. We're hoping that with what we've announced today, that we can help, that we can bring the ecosystem to a new level, with voice quality. So what is the announcement?
We're going to offer this codec that we demo'd to you, a couple of seconds ago, SILK, royalty free, to any third party, for any device or any software application they choose, free. They can use it with or without the Skype network, absolutely free. We think this is state-of-the-art technology and it should be extremely appealing. SILK is now the new default codec for all Skype to Skype calls. Since 4.0 launched, there have been millions of new users using the SILK codec, and making billions of minutes of calls. It's battle tested, for sure.
I want to share a few more things about SILK. It was designed and developed by the Skype audio team. Some of the primary people in that project are here in the room, today. It's scalable, it's a variable bitrate codec, between 6 and 46 Kb. It was designed to be used in embedded applications so it's highly portable, written in ANSI C and it's very lightweight in terms of low bitrate, low delay, low CPU consumption, memory, and total footprint of the codec, as well. It was designed with the Internet in mind so it's resilient to packet loss and jitter.
We have two modes shipping, today, the 16 KHz traditional wideband mode and the super wideband mode. We're also adding a narrow band mode to the codec, which will be promoting to gateway vendors, but also for use in those devices that are ultra-low complexity, the devices that have CPU and battery constraints, where you need something very minimal but that still works.
We've done some testing with a third party, called Dynastat. The best way to explain this graph, mean opinion score is where you get a lot of listeners to listen to the codec as compared to others. In this case, it's AMR wideband and Speex.
You can see the source at the top, which was the un-encoded file the listeners were listening to. We have it encoded at various bitrates and you can see the red line, SILK, on the top, beating the other codecs at every bitrate.
In this plot, you can see when we add packet loss to the situation, what happens, even further divergence from the other codecs. At 2%, at 5%, and 10% packet loss, SILK is very robust in those situations and preserves voice quality.
One thing I want to mention about SILK is that versus the previous Skype codec, wideband codec, we've achieved a bitrate that is about 50% at the normal wideband setting of the previous codec; it's more efficient on the wire as well. It's quite a technical achievement.
We have many partners we've been working with on this initiative, from the hardware and chipset space; we have people like Digium, the leading provider of open source PBX infrastructure. Tomorrow, Plantronics will be announcing the first Skype-certified super wideband USB headset. It's a headset that has its own soundcard on it. You bypass the issues of a poorly performing soundcard in your laptop or PC, and you get the full super wideband experience. There are a number of others, all leaders in their segments.
There is a little bit from the bloggers in our community. PC Magazine Editor's Choice called SILK audio "class leading." Our friend, Alec Saunders said it was the "biggest improvement in 4.0." I don't know about that, but thank you Alec for the kind words. We'll take it.
With that, I wanted to say thank you. It's a very exciting time for us, at Skype. It's part of a new stance that Skype is taking towards openness and community. We talk about platform at Skype. We believe this is a way the whole industry can come up to a new standard for voice quality, without all of the hassles that have been there before. I guess I can take some questions. It was relatively quick, so I think we have some time.
Audience 1: It's great that it's free. Will Skype indemnify people use it against the patents that it infringes?
Jonathan: Do I want to take this question, or does somebody from my team want to take it? Does VoiceAge indemnify you when you license AMR wideband? Do they? I'm not so sure about that. I didn't want to throw the first dart, but who indemnifies you when you put Speex? This issue comes up. It's often the first question. Actually, the first question we get from the partners we've approached is "What's the catch?" That's the second, or the first for others. I think there are some misconceptions about indemnity. It's free. If you want to pay us a lot of money to buy you insurance, we'll indemnify you.
Audience 2: What's the chip support story like?
Jonathan: The question was about chipset support? Our approach is that we support x86 on multiple operating system platforms, the next Mac, Windows, and we have devices and our software running on ARM and MIPS and all sorts of other platforms.
We'll do the porting and optimization for a certain number of platforms and we'll release those optimized binaries for people. We also are working with the community of experts, some of those people that were on the partner slide, and enabling them to do that porting and optimization, as well. That's part of our plan, as well. We want this to be as broadly adopted as possible, and we want to have as few barriers to that adoption as possible. Today, it's x86, Windows, Mac, and Linux, or very shortly and soon it will be many, many more.
Audience 3: What is in it for Skype? Is it that more people will use Skype as an endpoint?
Jonathan: As we looked at this space and we want to connect to a mobile phone or cordless handset or an enterprise environment where you have a broad deployment of IP endpoints, soft clients, and hard endpoints, we found we couldn't come to agreement on what technology we would use to do that. Really, it was a mess. In some instances, technology has been standardized, which is fifteen or twenty years old, and doesn't do the right thing, technically. In other instances, it was too complex, from a licensing perspective, to get it done. We wanted to tear down that barrier so in the eventuality that we're talking to an enterprise endpoint, cordless handset, or mobile phone, that we're doing it with a common and easy to adopt technology.
Audience 4: If you're enabling interoperability at the media level, does this imply that we can expect to see interoperability at the signaling level, as well, with things like SIP endpoints?
Jonathan: A couple of months ago, we announced, all along the lines of being more open and moving more and more in this direction, we announced a partnership with Digium, where they're using tools that we've given them to create connectivity between their multi protocol environment and our environment. What I would say is a little more than "stay tuned"; this is the direction we're moving in, in general. There will be more of these kinds of deals and access and programs.
Going back to my last statement, what's in it for us; if we're not going to talk to anybody else's network, or anybody else's endpoints, why would we seed the market with this technology? That's a chicken and egg thing that we're working on.
Audience 5: You probably know the question I'm going to ask you. You say Mac, Linux soon; can you expand on that? When will SILK be in the Mac?
Jonathan: That's an easy question. I thought you were going to bring out the baseball bat or something. [Laughs] Mac 2.8 Beta is live, today, and I think that we're on track for an April release of 2.8, where SILK will be in them. My internal builds that I'm running on my desktop already have SILK in them. There isn't a porting or optimization problem there. The same thing, I think, is true of internal builds of Linux.
Lee: Such an announcement. Usually people love to jump in there and give Jonathan a hassle, or at least they did last year. Well, there are two people here so we'll [0:21:35.0 over speak] the mic first.
Audience 6: Could you just update us on the Asterisk related stuff, because there was all that announcement last September, and I'm not sure there has been much follow through at this point, that there is stuff out there.
Jonathan: Sure, we're actively involved in the production of the tools and the environment. We're signing up Beta customers, actively. This is definitely going to mature into something that is an offering. We're on track. It's taken a while, and maybe we got eager with the announcement.
Audience 7: Jonathan, I have one question. Transcoding costs are very often the big issue in selecting codecs as well. Perhaps coding from PCM to any sort of other codec. Can you make a statement on that with SILK, or for SILK?
Jonathan: I'm trying to interpret the question. The question is about transcoding.
Audience 7: It was the costs and how expensive it is to transcode from SILK to PCM or vice versa. This is very often an issue.
Jonathan: The expense is usually related to the complexity of the codec. This is a very low-complexity codec. Where there has to be transcoding, this will have an advantage for people who are trying to keep down costs. The primary goal of releasing this is to avoid transcoding, all together. When we get to the PSTN, that's another issue. As I said, it's never getting upgraded.
Audience 8: Will you be publishing the description of the codec as well as the implementation, so we can rewrite it in some other random language?
Jonathan: We'll be working with partners to make sure they have the tools to get it on as many chipsets and environments, as we want, need, and can do. I want to be more direct than that. I think that we'll be working, selectively, with the partners who are experts in this area. If somebody comes to us with a business case for why the codec needs to be ported or rewritten for a particular environment, then we'll be all ears.
Audience 8: Just for clarity, is this going to follow the open source model, i.e. is it redistributable, or will you retain copywriting control and people will need your permission for any changes?
Jonathan: For the broad spectrum of partners, we think they are just going to take a binary and use that binary. They can redistribute the binary. They can put it on whatever device or application they want. It's very open in that sense. For those partners that want to do things that are lower level and more detailed, we'll engage with them, as well, on a one-by-one basis.
Audience 9: I have two things. First, to answer Brough's question, Digium actually did put out a blog post on an update on where they're at with Asterisk, about ten days ago. That gave me some material for a blog post, as well, on "Voice on the Web."
The SILK codec addresses a major endpoint issue, but there are other issues that I know you've been working on, in terms of making it easier for the user experience at the endpoint, and addressing issues such as soundcards, headphones, and so on. What are you generally doing in that area, just to make it dead simple for people to use Skype, but to ensure they aren't inhibited by some of the hardware or configuration issues?
Jonathan: With respect to super wideband, it means moving the ecosystem to the next level, unfortunately. There are some basic limitations with hardware that is inexpensive, poorly implemented, or whatever. We're not going to be able to do much there. We do everything we can, in terms of pre-processing, noise reduction, echo cancellation, and all that kind of stuff, on all those platforms. In terms of super wideband, this is a little bit of a change function.
More generally speaking, we have a big team that is working around the clock, very actively, on automatic device configuration, selection, new UI for example, in 4.0, that minimizes the randomness of device selection and misinterpretation by the user, better "under the covers" technology, algorithms for doing the noise reduction and all of the preprocessing stuff.
On the network side, a lot of work in our team goes into packet loss and jitter concealment, and with 4.0, we released a new bandwidth management algorithm, which more accurately detects when the bandwidth comes, and goes, and adjusts the bitrate of the codec in accordance.
Audience 10: You talked about 2.8 for Mac and Linux. What about the better video codecs from 3.6 and 4.0, the 640x480 codec and so on, going into those environments?
Jonathan: Those have actually been in relative lock step with Windows, except that 2.8, the new Beta, has a newer video engine than 4.0. One of the big pieces of evidence of that is that Mac is the first platform with that video engine that supports screen sharing. At this point, they sort of leapfrog each other, opportunistically, based on when the release is going out the door and what the audio/video team has done.
Audience 11: [distant]
Jonathan: The environments that we developed for Mac, Linux, and Windows are very different. The Mac environment is native and the look and feel of that client is very different and much more native to the Mac user experience. That has been our strategy until now. That strategy may shift, I guess, but that's how things have gone, so far. In accordance with that, the version numbers have maintained their own roadmap.
Lee: I just wondered why Skype is doing it? I'm amazed that nobody has asked what the ulterior motive is. Am I the only one with a dark side?
Jonathan: There was one question, what's in it for us, I guess. It really is just about this idea that at some point, we want to be able to talk to all of those endpoints. We want to be able to do that in a way that's neither cumbersome for us, from a technology or licensing perspective, nor for our partners. We do get a lot of this "What's the catch? Why are you doing this? Free?" That is especially true since this represents a very significant chunk of R&D.
We are all under an information, content and communications overload. So I realise that any aids which save people time processing information - that is extracting out what is of interest is beneficial.
With that in mind @eCommConf will be used to Tweet key statements from the wealth of material gathered during last week's Emerging Communications conference (along with a link to the full source).
It means you get the information filtered - i.e. main elements Tweeted. You can then use this to decide if to look at full source.
So I'd recommend following @eCommConf here on Twitter:

The 2nd Emerging Communications Conference starts tomorrow. Registration is still open.
The programme guide has been produced and can be downloaded in PDF format here.
Below is the welcome message:-
Welcome to eComm 2009.
Our world finds itself at a critical juncture. Both trillions of dollars and the future of human communications including fundamental access to it are at stake.
For telecom operators and media outlets there is not a migratory way from where we are to the future. There is a clear consumer shift underway that runs in the opposite direction to that of telecom and media incumbents; emergent social practice is increasingly clashing with the very structure and desires of incumbent players.
A battle is unfolding which is taking place across three related planes; between industries that were previously clearly demarcated (telecom, cellular, Internet and media); between distributed, peer-to-peer ecosystems enabled by the Internet versus centralized, command-and-control ways of organizing to deliver services and content; and between opportunistic infrastructure versus tolled infrastructure.
Complicating the drastic change is the fact that the most popular means of communication, telephony, is increasingly broken. The experience and underlying paradigms are now at odds with the time and attention of the people who are talking through it. It's approached the point of being unacceptable and bad for the economy. Yet it's the source of nearly 80% of the multi-trillion dollar per year telecom industry. Worse still for carriers, telephony and SMS revenues will peak in most advanced economies within the next five years.
Yet as the telecom kingdom fragments it's leading to more flexible, finer and more dynamic means of assembly that furthers innovation opportunities.
The transformations emerging in global telecommunications and media, require big thoughts and big bets. We hope that you find eComm the venue for those thoughts to be shared and heard.
We'd like to think that what happens this week will have reverberations globally.
Glad you've joined us.
eComm takes place March 3-5, 2009 at the San Francisco Airport Marriott; a free shuttle ride from SFO airport. You can register here.