
Lee: We are starting with a keynote from Skype's General Manager, of Audio and Video. Again, thank you to Skype for the support. It has really helped this community. Skype is going to make an important announcement during the next half hour. I think the announcement that is being made is significant. Please welcome Skype's General Manager of Audio and Video, Jonathan Christensen.
Jonathan: Good afternoon. It's great to be back at eComm. I'm going to talk about something that is near and dear to me. I'm going to talk about audio codecs. It may seem a little low level or obscure, but it's something that touches us all in our daily lives.
Human speech is kind of made up of two sounds. There is the voiced speech and the unvoiced speech. Voiced speech is these vowel sounds, "ah," and "oo" and they come from our larynx, our voice apparatus.
There is the unvoiced speech, "f-f" and "k-k," these consonant sounds. When you put those sounds together, you have a frequency range of about 80 to 14,000 Hz. This is the wonderful, full fidelity voice that you are hearing from me, right now, in this room.
By some strange coincidence of nature, the human hearing apparatus also can hear about 80 to 14,000 Hz. Until very recently, the speaker and the listener had to be in physical proximity to be able to hear that full range, to hear anything, in fact, in terms of human evolution, anyway.
More recently, with the advent of the telephone, we got telecommunications and speech from afar, speech from distance. In order to make this thing cost effective, and to reach the mass market with this device, the electronics in the device needed to be limited in a way that they could only allow us to hear between 400 and 3,400 Hz, effective bandwidth of 3 KHz. This is less than ideal, less than a third of all that audio you could hear when in the room together, as we are now, without a speech codec involved.
These devices were analog in the beginning and the electronics were end-to-end. This fundamental limitation wasn't a limitation of the wire or the network, because they were connected end-to-end over an analog electronic signal. It was something that had to do with the electronics and the cost control in the actual device.
Theoretically, this could have been upgraded over time to a wideband experience, to a richer, fuller experience. The old switching gear looked something like this; it was manual. You connected a physical circuit between the endpoints, using a mechanical switch, where the circuits preserved that end-to-end electrical current. This didn't scale very well.
In the 1960's and 1970's, there was a very big upgrade, an upgrade to a more scalable and automated system. This is really the introduction of the digital era in voice. With it, they needed a way to efficiently transport the voice and the core of the network on the digital side.
The analog handsets were still out there in the industry, but the core of the network became digital. PCM and G.711 and the packet-based world were invented. This is a very simple scheme, a scheme that uses 8 KHz samples; every 125th millisecond there is a sample taken. The sample is 13 or 14 bits. Some super simple math is used to encode that to 8 bits. You have 64 Kb on the wire.
As an interesting aside, do you ever wonder why the theoretical limitation over the phone system was 56K for data while you had the 64Kb audio channel and you were shoving as much data over it as possible? That audio channel was end-to-end; it was universal within the PSTN, as a part of that upgrade to those switches. In this world, you have a theoretical limitation of 3.4 KHz of bandwidth.
This is a picture of an audio signal, a speech signal, and this is my magical PowerPoint interpretation of what happens to that when you encode it with PCM. We cut off the top frequencies, we cut off the bottom frequencies, and then we cherry pick data from the middle of that signal. What you end up with is something that is much thinner than the previous picture.
By deploying this global architecture, this huge infrastructure, this massive PSTN investment of CapEx, we effectively locked ourselves into 3.4 KHz for everything that touches the PSTN. The PSTN infrastructure and this basic limitation is never going to change; it's never going to be upgraded.
In about 1998, when we got started with Voice over IP, we had a new opportunity. Because we had an open transport layer, we saw the introduction of the first wideband audio, typically 16 KHz sample rate, roughly 8 KHz of audio bandwidth. If you take MR wideband as an example, you have something like 50 to 7,000 Hz of frequency range.
If you think about the human ear, we are roughly halfway there. This was great, especially on PC applications. By 2003, you had Skype taking wideband to the mainstream. This is because the PC platform was programmable, it was open, and it had the sound interfaces to be able to do this. You see, for the first time, mainstream use of wideband audio.
In Skype 4.0, we introduced something new - super wideband audio, more samples still, more fidelity, and what we think is the next building block for Voice 2.0, which needs attribution to our friend, Alec Saunders, here.
It has rich voice, multi-modal communications, communications that can be supported on a rich and programmable endpoint. Effectively, we went from this over time to this, and now we're going back to this, or in the case of SILK, 50 to 12,000 Hz, the useful frequency range for speech.
I want to do a little demonstration of this sound system in here; you can really appreciate some of this. Let me set it up a little bit. We are going to hear a female voice and a male voice. We are going to hear each one of them encoded back to back, three times, with three different codecs, representing PSTN quality, and traditional wideband quality - what we started with at Skype, and the super wideband - what we're shipping in 4.0. Can you play the audio file?
[audio sample playing]
That was pretty cool, wasn't it? Even in this room, with speakers, it was much ... go ahead; I love it too. [clapping] Let's listen to it one more time. [Laughter]
[audio sample playing]
It's cause for celebration, for sure. Now, we can recognize who is talking in a conference call. We can decipher the difference between those difficult sounds that the PSTN doesn't allow us to decipher, the /f/ sounds and the /s/ sounds. Are we failing or are we sailing this week? It's warmer, richer, and much more like being there. Again, we're making progress.
Especially on the PC side of things, where you have this open platform. You have this programmable platform where you can introduce these things. In the super wideband case, there are some challenges; a lot of sound cards don't support super wideband on the input and the output. I will talk a little bit about how we're addressing that.
But in general, it's a much more accessible platform than some others are. There are still some problems. What if you want to have this experience on an embedded device, on a cordless phone, or on your mobile device? Which codec do you choose? Which transport do you choose?
Even if you can solve all of the transport, open, programmable platform issues, just the basic question about which codec do you use is a daunting one. The space is extremely complicated. I have some friends at Cisco in the room. I think that they coined this idea that "standards - we love them because there are so many to choose from." In this case, it's a pretty complicated mess. There are a lot of tradeoffs between license restrictions, royalties, and quality in this world.
There are some more obvious choices, but no really good ones. It's still really a pain in the neck. You have AMR wideband, which has been standardized in 3GPP. You have Speex and some others. As I say, a tradeoff between quality, cost, and complexity of licensing that makes this a very complicated space for developers. We're hoping that with what we've announced today, that we can help, that we can bring the ecosystem to a new level, with voice quality. So what is the announcement?
We're going to offer this codec that we demo'd to you, a couple of seconds ago, SILK, royalty free, to any third party, for any device or any software application they choose, free. They can use it with or without the Skype network, absolutely free. We think this is state-of-the-art technology and it should be extremely appealing. SILK is now the new default codec for all Skype to Skype calls. Since 4.0 launched, there have been millions of new users using the SILK codec, and making billions of minutes of calls. It's battle tested, for sure.
I want to share a few more things about SILK. It was designed and developed by the Skype audio team. Some of the primary people in that project are here in the room, today. It's scalable, it's a variable bitrate codec, between 6 and 46 Kb. It was designed to be used in embedded applications so it's highly portable, written in ANSI C and it's very lightweight in terms of low bitrate, low delay, low CPU consumption, memory, and total footprint of the codec, as well. It was designed with the Internet in mind so it's resilient to packet loss and jitter.
We have two modes shipping, today, the 16 KHz traditional wideband mode and the super wideband mode. We're also adding a narrow band mode to the codec, which will be promoting to gateway vendors, but also for use in those devices that are ultra-low complexity, the devices that have CPU and battery constraints, where you need something very minimal but that still works.
We've done some testing with a third party, called Dynastat. The best way to explain this graph, mean opinion score is where you get a lot of listeners to listen to the codec as compared to others. In this case, it's AMR wideband and Speex.
You can see the source at the top, which was the un-encoded file the listeners were listening to. We have it encoded at various bitrates and you can see the red line, SILK, on the top, beating the other codecs at every bitrate.
In this plot, you can see when we add packet loss to the situation, what happens, even further divergence from the other codecs. At 2%, at 5%, and 10% packet loss, SILK is very robust in those situations and preserves voice quality.
One thing I want to mention about SILK is that versus the previous Skype codec, wideband codec, we've achieved a bitrate that is about 50% at the normal wideband setting of the previous codec; it's more efficient on the wire as well. It's quite a technical achievement.
We have many partners we've been working with on this initiative, from the hardware and chipset space; we have people like Digium, the leading provider of open source PBX infrastructure. Tomorrow, Plantronics will be announcing the first Skype-certified super wideband USB headset. It's a headset that has its own soundcard on it. You bypass the issues of a poorly performing soundcard in your laptop or PC, and you get the full super wideband experience. There are a number of others, all leaders in their segments.
There is a little bit from the bloggers in our community. PC Magazine Editor's Choice called SILK audio "class leading." Our friend, Alec Saunders said it was the "biggest improvement in 4.0." I don't know about that, but thank you Alec for the kind words. We'll take it.
With that, I wanted to say thank you. It's a very exciting time for us, at Skype. It's part of a new stance that Skype is taking towards openness and community. We talk about platform at Skype. We believe this is a way the whole industry can come up to a new standard for voice quality, without all of the hassles that have been there before. I guess I can take some questions. It was relatively quick, so I think we have some time.
Audience 1: It's great that it's free. Will Skype indemnify people use it against the patents that it infringes?
Jonathan: Do I want to take this question, or does somebody from my team want to take it? Does VoiceAge indemnify you when you license AMR wideband? Do they? I'm not so sure about that. I didn't want to throw the first dart, but who indemnifies you when you put Speex? This issue comes up. It's often the first question. Actually, the first question we get from the partners we've approached is "What's the catch?" That's the second, or the first for others. I think there are some misconceptions about indemnity. It's free. If you want to pay us a lot of money to buy you insurance, we'll indemnify you.
Audience 2: What's the chip support story like?
Jonathan: The question was about chipset support? Our approach is that we support x86 on multiple operating system platforms, the next Mac, Windows, and we have devices and our software running on ARM and MIPS and all sorts of other platforms.
We'll do the porting and optimization for a certain number of platforms and we'll release those optimized binaries for people. We also are working with the community of experts, some of those people that were on the partner slide, and enabling them to do that porting and optimization, as well. That's part of our plan, as well. We want this to be as broadly adopted as possible, and we want to have as few barriers to that adoption as possible. Today, it's x86, Windows, Mac, and Linux, or very shortly and soon it will be many, many more.
Audience 3: What is in it for Skype? Is it that more people will use Skype as an endpoint?
Jonathan: As we looked at this space and we want to connect to a mobile phone or cordless handset or an enterprise environment where you have a broad deployment of IP endpoints, soft clients, and hard endpoints, we found we couldn't come to agreement on what technology we would use to do that. Really, it was a mess. In some instances, technology has been standardized, which is fifteen or twenty years old, and doesn't do the right thing, technically. In other instances, it was too complex, from a licensing perspective, to get it done. We wanted to tear down that barrier so in the eventuality that we're talking to an enterprise endpoint, cordless handset, or mobile phone, that we're doing it with a common and easy to adopt technology.
Audience 4: If you're enabling interoperability at the media level, does this imply that we can expect to see interoperability at the signaling level, as well, with things like SIP endpoints?
Jonathan: A couple of months ago, we announced, all along the lines of being more open and moving more and more in this direction, we announced a partnership with Digium, where they're using tools that we've given them to create connectivity between their multi protocol environment and our environment. What I would say is a little more than "stay tuned"; this is the direction we're moving in, in general. There will be more of these kinds of deals and access and programs.
Going back to my last statement, what's in it for us; if we're not going to talk to anybody else's network, or anybody else's endpoints, why would we seed the market with this technology? That's a chicken and egg thing that we're working on.
Audience 5: You probably know the question I'm going to ask you. You say Mac, Linux soon; can you expand on that? When will SILK be in the Mac?
Jonathan: That's an easy question. I thought you were going to bring out the baseball bat or something. [Laughs] Mac 2.8 Beta is live, today, and I think that we're on track for an April release of 2.8, where SILK will be in them. My internal builds that I'm running on my desktop already have SILK in them. There isn't a porting or optimization problem there. The same thing, I think, is true of internal builds of Linux.
Lee: Such an announcement. Usually people love to jump in there and give Jonathan a hassle, or at least they did last year. Well, there are two people here so we'll [0:21:35.0 over speak] the mic first.
Audience 6: Could you just update us on the Asterisk related stuff, because there was all that announcement last September, and I'm not sure there has been much follow through at this point, that there is stuff out there.
Jonathan: Sure, we're actively involved in the production of the tools and the environment. We're signing up Beta customers, actively. This is definitely going to mature into something that is an offering. We're on track. It's taken a while, and maybe we got eager with the announcement.
Audience 7: Jonathan, I have one question. Transcoding costs are very often the big issue in selecting codecs as well. Perhaps coding from PCM to any sort of other codec. Can you make a statement on that with SILK, or for SILK?
Jonathan: I'm trying to interpret the question. The question is about transcoding.
Audience 7: It was the costs and how expensive it is to transcode from SILK to PCM or vice versa. This is very often an issue.
Jonathan: The expense is usually related to the complexity of the codec. This is a very low-complexity codec. Where there has to be transcoding, this will have an advantage for people who are trying to keep down costs. The primary goal of releasing this is to avoid transcoding, all together. When we get to the PSTN, that's another issue. As I said, it's never getting upgraded.
Audience 8: Will you be publishing the description of the codec as well as the implementation, so we can rewrite it in some other random language?
Jonathan: We'll be working with partners to make sure they have the tools to get it on as many chipsets and environments, as we want, need, and can do. I want to be more direct than that. I think that we'll be working, selectively, with the partners who are experts in this area. If somebody comes to us with a business case for why the codec needs to be ported or rewritten for a particular environment, then we'll be all ears.
Audience 8: Just for clarity, is this going to follow the open source model, i.e. is it redistributable, or will you retain copywriting control and people will need your permission for any changes?
Jonathan: For the broad spectrum of partners, we think they are just going to take a binary and use that binary. They can redistribute the binary. They can put it on whatever device or application they want. It's very open in that sense. For those partners that want to do things that are lower level and more detailed, we'll engage with them, as well, on a one-by-one basis.
Audience 9: I have two things. First, to answer Brough's question, Digium actually did put out a blog post on an update on where they're at with Asterisk, about ten days ago. That gave me some material for a blog post, as well, on "Voice on the Web."
The SILK codec addresses a major endpoint issue, but there are other issues that I know you've been working on, in terms of making it easier for the user experience at the endpoint, and addressing issues such as soundcards, headphones, and so on. What are you generally doing in that area, just to make it dead simple for people to use Skype, but to ensure they aren't inhibited by some of the hardware or configuration issues?
Jonathan: With respect to super wideband, it means moving the ecosystem to the next level, unfortunately. There are some basic limitations with hardware that is inexpensive, poorly implemented, or whatever. We're not going to be able to do much there. We do everything we can, in terms of pre-processing, noise reduction, echo cancellation, and all that kind of stuff, on all those platforms. In terms of super wideband, this is a little bit of a change function.
More generally speaking, we have a big team that is working around the clock, very actively, on automatic device configuration, selection, new UI for example, in 4.0, that minimizes the randomness of device selection and misinterpretation by the user, better "under the covers" technology, algorithms for doing the noise reduction and all of the preprocessing stuff.
On the network side, a lot of work in our team goes into packet loss and jitter concealment, and with 4.0, we released a new bandwidth management algorithm, which more accurately detects when the bandwidth comes, and goes, and adjusts the bitrate of the codec in accordance.
Audience 10: You talked about 2.8 for Mac and Linux. What about the better video codecs from 3.6 and 4.0, the 640x480 codec and so on, going into those environments?
Jonathan: Those have actually been in relative lock step with Windows, except that 2.8, the new Beta, has a newer video engine than 4.0. One of the big pieces of evidence of that is that Mac is the first platform with that video engine that supports screen sharing. At this point, they sort of leapfrog each other, opportunistically, based on when the release is going out the door and what the audio/video team has done.
Audience 11: [distant]
Jonathan: The environments that we developed for Mac, Linux, and Windows are very different. The Mac environment is native and the look and feel of that client is very different and much more native to the Mac user experience. That has been our strategy until now. That strategy may shift, I guess, but that's how things have gone, so far. In accordance with that, the version numbers have maintained their own roadmap.
Lee: I just wondered why Skype is doing it? I'm amazed that nobody has asked what the ulterior motive is. Am I the only one with a dark side?
Jonathan: There was one question, what's in it for us, I guess. It really is just about this idea that at some point, we want to be able to talk to all of those endpoints. We want to be able to do that in a way that's neither cumbersome for us, from a technology or licensing perspective, nor for our partners. We do get a lot of this "What's the catch? Why are you doing this? Free?" That is especially true since this represents a very significant chunk of R&D.



