Hypervoice Interview: The "Google Moment" For Voice

| | Comments (0)

As I start out once again on the Emerging Communications (eComm) path, this time looking towards building a 2013 event, I'm very pleased to start out with a superb interview about a new term and technology known as Hypervoice.

During this interview you will learn what Hypervoice is; the emergence of a Hypervoice consortium; a potentially vast field of opportunity if the consortium is a success; and our shared realization that Hypervoice is another "Google moment", but this time, for voice.

Most notably during the interview, Martin stated:

The highest aspiration any pre-existing voice service I've ever seen had was to be as good as being there in person and at no cost. What I think Hypervoice does is it takes it beyond that. It can actually be better than being there in person.

and:

So all the things that Hypertext has done for us in the last 20 years, that's what we'll experience with voice in the next 20. It's going to transform how we work.

This interview came about because in May 2011, I had the distinct pleasure of being introduced to Kelly Fitzsimmons.

She had ambled on somewhat casually about her companies forthcoming product, Symposia.

After sometime of questioning her about the technology and ideas behind the product, it struck me like a thunderbolt that the concept should be taken globally and that it was on par with hypertext; it was in many ways the voice equivalent of hypertext.

I was very excited and so I'd asked Kelly to provide one of the keynotes at Emerging Communications (eComm) 2011 (video). Her husband also launched the company's Symposia product at the event (video).

However, for whatever reason (I did not ask), Kelly did not cover exactly what had excited me, as I'd anticipated. I then introduced Kelly to good friend Martin Geddes as he was away to run a Future of Voice Workshop at the time.

Then earlier this year Martin told me that Kelly had asked him to write a whitepaper on the topic and that he'd coined the term Hypervoice to describe it.

A couple of weeks ago Martin asked if I could help to get people on a free virtual Hypervoice event due to take place on 12th Dec 2012 (registration). I agreed to help and suggested I interview both Kelly and Martin. This post is a result of that request.

Below is the interview transcript. I added a range of headings as well as bold/italic markup afterwards to make consumption easier because the interview extends to nearly 10,000 words.

You can download the audio of the interview (21meg MP3, run time is 01:06:36).

Greetings & Introductions

Lee: Hello Martin and Kelly.

Kelly: Hello Lee.

Lee: Martin, how did you know to let Kelly go first, or is that just natural to let ladies go first?

Martin: It's latency from this side of the world.

Lee: I thought it was modern-day chivalry. I hope the pair of you are both doing well. On today's interview we're going to be talking about Hypervoice. Let me jump in straight away dynamically here and ask where the term "Hypervoice" came from, Kelly?

'Hypervoice' Term Origin

Kelly: I have to give credit right back to Martin. Martin was the genius behind the concept and it's a word I'd been looking for at that time, for probably 14 years. When he had his eureka moment, thankfully I was nearby, and we could celebrate together with some chocolates. I'll let Martin talk about where it came from because it's really based on his career, his insights, and I'm just the active evangelist. Martin, why don't you take it?

Martin: It would be lovely to believe that these little insights come smoothly and cleanly and innocently, but they don't. So HarQen had hired me to write a whitepaper, helping to explain the concepts behind the wonderful product, that was Symposia (video of Symposia being launched at Emerging Communications 2011). One was always trying to explain in terms of something else, because we always have to use metaphors, it's like a conference-calling system but it's not. It's like webinars but it's not.

I started writing this whitepaper, and I meandered all the way around the territory to find the core issue. I put together a bunch of thoughts that were interesting and relevant but hadn't really got to the point.

In the meantime I've been reading some very interesting books. One which I think is utterly brilliant, but very hard to make sense of by Venkatesh Rao. It's all about tempo, it's all about how we're blind to the temporal as humans. We're really good at the spatial but utterly hopeless at reasoning about time. He produced a book which takes on migration of all thoughts about time and timing and how it affects decision making, etc.

The one takeaway was you can often - by putting the lens in front of yourself you see the temporal rather than seeing the spatial. You see the world in a very interesting, special way. I got really used to the idea of trying to flip spatial/temporal metaphors and seeing these patterns in all kinds of things in life.

An example might be a factory, in the spatial world, seeing lots of inventory piled up in a factory mean men were working hard. That's great. Whereas in a lean manufacturing plant seeing inventory piled up is terrible because it isn't moving. It's not achieving the timing output, outcomes required.

So I was in a meeting with Kelly and Ann, we were talking about the product and whitepaper and communications challenges they were having. Someone said the magic phrase, "Link what you say to what you do." It wasn't me. And then suddenly all the bits hit me, "Ah! Links, links, hypermedia, links, arghh, it's Hypervoice." At that moment it all gelled. Yeah it's Hypervoice, switch temporal for spatial links and you're there. Right, done.

Lee: Going back, I remember in May last year, May 2011, we were talking, Kelly. I said, "Hey Kelly, this is the voice equivalent of Hypertext," and you agreed. The key was it was linkable. We didn't mention any of that Temp stuff back then. Do you remember that, Kelly?

Kelly: I do. We were still trying to get our arms around it and I think Martin really put it well. One of the things that when you bring a product to market you're trying to figure out what's a use case; why is this valuable to anybody. It took us a long time to figure out the temporal component of it.

Lee: I think we're jumping ahead. What I wanted to establish is so we spoke in May last year and then you presented a product at Emerging Communications 2011, but we were still having a lot of difficulty communicating about the product. Then in order to help with this, it's my understanding you've engaged Martin to do a whitepaper, which anybody can download, is that correct?

Kelly: Yeah.

Lee: I know Martin came up with this nice term, Hypervoice, based on that engagement. So this is now the term we're going to use to sort of light the fire and lead the way. So that's really great. This whitepaper, why was it commissioned?

Kelly: The real key reason was we couldn't figure out how to talk about ourselves. The early piece of struggle that you heard back in May last year is we spent a lot of time trying to explain what it is that we do. There was a wonderful movie years ago called The Hudsucker Proxy. I don't know if you've ever seen it. But in it, the protagonist is in there trying to explain to the world and he's invented the hula hoop. He keeps drawing a circle on a piece of paper and pointing to it and saying, "It's for kids."

And people look at the paper and look at him and think he's nuts. And I feel like for the longest time, trying to explain what HarQen did, we were drawing a circle on a piece of paper and pointing to it, and people are looking at us oddly. This was a very - it started out as a very important initiative from just a sales and marketing standpoint. What ended up evolving out of it was much bigger.

Future of Telephony

Because once we got our head around it, it transcended HarQen very quickly. It was kind of like what we were doing was the Pong version of this. But this is really the future of telephony. As we started looking at the implications of what does it mean for voice to be a native web object, what does it mean for voice to be within your activity stream for social media, a linkable, shareable object, searchable, findable?

First Mention Of Hypervoice Consortium

All of a sudden lights started going off, and Martin and I got this giddy feeling. We've got a ton of emails that went back and forth over a course of several months in which it kept getting bigger and bigger and bigger. We ultimately came to the conclusion that this is a conversation that needs to be had with anyone who is involved in telecommunications and/or voice on the Internet. And it couldn't just be HarQen and friends. That was what led to the founding of the consortium.

Martin: There is an extra insight as well, actually. I think that not only is this thing Hypervoice, but jumping up and down and now saying and this is the origin of the universe here. This is where you need to stand at this point to make sense of everything else around you. It's not just merely an interesting marketing term, but this is part of a longer narrative.

Another aha for me was that Kelly in the meeting said, "Oh, it's like Web 3.0," and that made me stop and think about what really was Web 2.0? My thoughts in the revelatory moment was recognizing that Web 2.0 was really hyper-messaging, so unlike email, faxes, telexes, SMS, in the Web 2.0 revolution we gave messages URLs. The moment you give something a URL, it can be fed into our great idea amplification machine of the web, and has that transformative power on our intellectual assets. It's not always necessarily called Hypervoice but Kelly's right; it is actually Web 3.0. It's an equivalent transformation of what Web 2.0 was. Social media on the web has been a big deal to society and the economy. This is as big a deal.

Lee: I hear you and it's important you say that. I hadn't quite intended to get there yet. What I was really trying to do was get chronology in place. In order to understand where we came from and where we're at now because for me I was a little confused because in May last year, if you remember Kelly, you and I got really excited because I said this could become a standard. We spoke about having different parsers for voice. We spoke about not just being a little product, but being a web standard.

What Happened Since Emerging Communications (eComm) Conference 2011?

You and I were extremely excited, and this was May last year. Then this is rolling into December 2012. So what's happened between May last year? You did a presentation at eComm the following month. Now you've just - it seems quite a bit of a time lag there. I'm not going to deny that. It's just really important for me to have chronology there. Am I misunderstanding this?

Kelly: Yeah, you're not missing anything at all. I think part of it is the nature of a startup and leading a startup. When I was brainstorming with you about it, I absolutely agree this is potentially a standard. But at the same time, we had to get a product out to market and had investors on us.

I had to double down and really focus on getting Symposia to be real, opposed to really good demo ware. In the process of doing so, you put your blinders on. You just get down and you start focusing on let's get this product to market, let's figure out how to make it work, let's test it. We're probably towards the overkill version of testing to make sure the application would work because we've got very large clients. When you say enterprise software, it means something to us, specifically.

That's what I did. I had to get the day job done. It was really still focusing on the problem of our partners were not getting it. We'd go into our partners and do that circle on a white sheet of paper routine, and that was when I realized we were missing all the language.

What we were able to do back when we had our conversation was we were able to make some leaps that I couldn't make with the general public. You intuitively got it very quickly, Lee, and the rest of the world required a lot more steps between it. You saw a hula hoop where the rest of the world saw a piece of paper with a circle on it.

Lee: I was super excited in May of last year. We need to keep the answers shorter by the way, because we're not going to cover what we need to cover. I know where I want to go with this. You obviously had to spend time on a product. That delayed things. Now in terms of messaging, and getting something out there, this whitepaper has been put together. That's as much as I understand now, as to how things have evolved since last year.

What I'd like to do now I've almost got where we're at, is find out exactly where we are at this point. I need shorter answers on this. There's been a whitepaper and now you mentioned the word consortium. Could somebody please tell me about this consortium? We're going to cover these topics about consortium, etc. and standards, because I assume a standard is involved.

Then we'll look at what Hypervoice is. Let's get a snapshot of exactly now what is this consortium.

Consortium Charter & Structure

Martin: How to describe its charter. There's huge amount of excitement around the potential of this technology as well as a huge amount of uncertainty. There's a field of opportunity to grow very rapidly, very quickly. Therefore, for that to happen, the ideas need to become legitimized and understood.

The core issue is helping to raise awareness of the term, the technology, and the concepts so that it can spread fast.

Lee: What is the structure of the consortium?

Kelly: Right now it's a membership organization made up of a series of founding companies. HarQen stepped forward. So did Telefonica, Voxeo, and Joyent so far, to help put together the initial group. The first even that we'll be having of this broader group will be a virtual conference we're hosting on December 12th, at 12 p.m. PST. The goal is to start introducing the concept of Hypervoice to wider group.

We had an initial workshop, a very small group of just 25 thought leaders we brought together out in San Francisco in October, and were able to do some demonstrations of Hypervoice. Andy Kershaw from Oracle Social Network did a demonstration of a Hypervoice-enabled application within OSN or Oracle Social Network, which was really impressive.

That I think gave people a vision of wow this is real, this is something that can be done today. It's not airy fairy sort of future oriented. It's got some legs to it. That's where the consortium came out of, was an idea of let's get together and start talking about how do we define applications, capabilities, future standards; of what do we mean by Hypervoice. We don't want it to become an airy fair term like so many other things: cloud, or mobility. These are very generic terms that don't mean anything. I think Hypervoice is in a unique position to have some real meat to it. That's what the consortium's goal is to do.

Lee: Is there a link to the consortium yet? Is there a web address for it?

Kelly: We're hypervoice.org.

Lee: I think I know where we're at except just before we had begun the interview, earlier on you had mentioned, Kelly, becoming an evangelist. This evangelism, was this for the consortium, or how was it connected to the consortium? At the moment you're CEO of HarQen, so you mentioned making a change.

Kelly In Evangelist Role

Kelly: I'm not joking. I think I caught a lot of people by surprise on this one. At the end of this year, I'll be transitioning out of my role as CEO of HarQen and into the role of co-founder with Martin of the Hypervoice consortium. I see Martin as the key spokesperson for it and myself as sort of the active evangelist. It plays to one of the things that I think is really important, which is in the early days of something like this it's really a movement. You need to get people excited about the possibilities.

I was worried about trying to do this with the CEO hat. First off, I don't have time to run a startup and to do this. The second piece is we need to work with competitors. It's got to be a true consortium. It can't just be like I said earlier "HarQen and friends". That's not the intention at all here. We might have an early application of it, but it's not going to be the last application of it, or else Hypervoice is not very interesting. This gives me a chance to help get this movement under way, and lend my skills as a serial tech entrepreneur to something that I care very deeply about, which is the future of voice.

Lee: Thank you. I feel very much up to date as to what's happened between May last year and today. I really appreciate that catch up and I think it's beneficial to a wider audience to know that.

Technical Introduction

Now if we can actually begin talking about the technology. We covered how the term came to be, etc. Now let's look at technology for a while. Let's get technical here. I've just opened a PDF of the whitepaper you did, Martin, and I see "Hypervoice, where voice on the web is as native and natural as Hypertext." So first question there is; are you looking to create the voice equivalent of Hypertext? [laughter]

Martin: I think it's a bit varied in that we're using a metaphor here which is hypermedia. But it expresses itself in different ways for different kinds of media. They all have very unique properties. It strikes me that Hypervoice is a way of understanding and reasoning about a particular domain, but we must be wary of this concept.

Is it like Hypertext? In one-way yes. You have an anchor piece of media which in hypertext-based documents manages relative space, and in hypervoice there is a piece of media audio that manages relative time. From that you have other things that are tied to it, there are links. But it's also different from Hypertext in that the links for special Hypertext are all a document to document, linking metaphor.

Whereas in a Hypervoice conversation, it tries to link what people say to what they do. They are joined by being at the same time or in place proximity in time. There are also different structures as well that are different from hypertext.

Linking What You Do To What You Say

Lee: You touched upon two things which would definitely need expanding. What do you mean by linking what they say to what they do? Imagine you're not speaking to me, but to a lay person.

Martin: During any typical conference call you're talking to the person, you're typing some notes, you're opening web pages, you're showing a presentation deck, there are all kinds of digital interactions you're having at the same time. At the moment, those are all done in our little silos and they aren't related to each other. We lose the correlation between those things.

Lee: Correlation between what?

Martin: Between what I'm saying and what I'm doing. We don't record the audio and if we did record the audio, it's completely disjoined from the actions I'm taking, like showing presentation slides or talking typing some notes.

Lee: All you're doing is saying you're embedding voice.

Martin: Right, we're enabling ourselves to record voice and relate it to what you actually did in the moment. The reason for that is it's a bit like when - Google search, is that the things you do give a particular meaning to those moments. So a Google search, if I have a word "Volkswagen," and I put a link around it that points to another page, I'm trying to say that other page is probably something about Volkswagens. If I'm having a conversation now with you, and I typed the word "difficult question" in my notes, that's probably trying to say something about this moment in the conversation, a difficult question. It could be about Volkswagens. The notes I'm taking are really the things that give significance in the record to what I'm saying.

Lee: Okay, I must admit for me it sounds a little too much on the abstract side. I must admit maybe I'm dumb today but I haven't got this linking to what you're doing and saying.

Kelly: Maybe I can try a use case.

Lee: Please.

Sales Call Example

Kelly: Let's try a sales call. Here's a great example. Today what we do, we log into the CRM and we look at the notes, grab his contact information, and then what do we do? We pick up the phone and we make the phone call. There are getting to be dialers within the sales force and different things but the vast majority of sales people, what do they do; they start in the system of record and then they go off the system of record to make this phone call. Nothing they do, other than maybe keep some spotty notes, is retained. They keep the notes and throw away the conversation.

What Hypervoice enables you to do, say you stay in the system of record. You stay in the CRM. You click to call that person. The system is smart. The system knows you just called that person. It logs the call for you so that you don't have to type anything. As most people know with CRM systems, it's garbage in/garbage out. If the sales person forgets to log the call, it just is unknown.

This is a case where now the system knows you made the call because you clicked to call within the application. The next piece is you're having the conversation. You say, "Wow, this is really important stuff. I need to get a couple other people into this conversation."

Today we'd schedule another call with those important people. With Hypervoice, and what it enables you to do, is say, "Hold on a second. Would it be okay if I recorded this part of the conversation, so that I can share it with some key stakeholders at home? It will allow me to get you a proposal lead faster because I won't have to have a second meeting. Would you be opposed to me capturing this part of the conversation so that we can get to your proposal faster?"

Lee: I understand now. Just so you know, again this helps - the interview's useful for getting this stuff out to the layperson. I think that just complicated things before. All you're saying is voice becomes a native digital object that you can fragment. It's a native digital object in a database of sorts.

Kelly: It's important that you don't think that it's voice-to-text. A lot of folks go immediately to that. They're like "Oh, it's just a voice-to-text," and it's not. Searchable voice is very different than voice-to-text**. What Hypervoice allows you to do is do it in all languages.

Your Gestures Index The Voice

Martin: It's all the things you're doing are actually building the index to the recorded voice. It's the customer records you touch, the trouble tickets you open, the notes you take are the index to that recorded voice.

Lee: That's an extremely important point, Martin. I appreciate that and hopefully a lot of people will understand what you said, so I'm not going to expand on that. But that is really pertinent to get out there.

I'm going to jump on a little bit. Again looking at this PDF, it says, "We have skewed our patterns of communication to favor media that can be easily structured and processed by machines" as opposed to free voice, like this. The paper says, "A machine processes symbols, not spoken syllables." So are you working on something, you are working on something that is just allowing machines to process syllables? Do you want to expand upon that?

Kelly: We're not at HarQen. There's lots of folks that have been working on this problem. There's some very sophisticated technology that has come out over the years, in terms of phoneme search and things like that. But that is part of the larger sort of ecosystem of what Hypervoice can leverage.

There's been so many applications that have been developed over the years in the context of call centers that do voice search and voice spotting and things like that. The key is that it's still siloed data. This is data that's not within a larger either system of record, and/or is dynamic in any way. It's just a huge database that's static in the grand scheme of things.

The key from our standpoint is to try and figure out how all this fits together. This is why I keep saying this is a much larger thing than simply one product or one company. It's an ecosystem. We're trying to define how all of these different players who've been trying to solve related problems over the last 20-40 years can come together to solve some very important problems around global communications today, that are not being addressed. We do think that Hypervoice is the first step in that direction.

The Google Moment

Martin: To go back to the evolution of Hypertext, we had a search engine by AltaVista and they were trying to extract the meaning of the webpage by looking inside the webpage. All existing voice tools have tended to look inside the voice, as a stand-alone object. What I think is significant when I saw Symposia, it's that Google moment. The Symposia product does to voice what Google did for text, or Hypertext, which was to realize that the meaning wasn't necessarily in the words themselves, but in all the stuff that people were doing around it. Just as the Volkswagen word in another webpage points to my webpage, tells me that my webpage is about Volkswagen. It's the other stuff that's around voice that's important.

Lee: Martin, I hear you and very much appreciate the way you put it there. I think again was just very eloquent in terms of getting to understand it. Actually I had the same opinion. Now we're really promoting Kelly here because I had the same opinion that it was equivalent of the Google moment. That's what got me so damned excited.

Excitement

What I couldn't get was why Kelly wasn't excited. I was pushing back on Kelly last year, and Kelly would say, "No, I am excited!" And the two of us were totally excited. I'm like "Hang on, we're having that Google moment of links are a democratic way of voting as to what's important with the Google aha. This is the voice equivalent." We just got totally ecstatic.

Then I got totally ecstatic that year as well, Martin, because of your Predictable Network Solutions stuff. It was a really good year. I really appreciate the way you put it. I really appreciate hearing that you've also considered it a Google aha moment. I just couldn't understand why Kelly wasn't blown away. It was like "No, Kelly, this is huge!"

What was amazing, Martin, if you don't mind me asking you there is; it kind of seems obvious now. I know we're jumping ahead a little bit, and we should be expanding technology, but now I see you have the same - you admire it as much as I do. Don't you think to yourself why didn't we see this before, Martin?

Martin: I had a meeting earlier this week and someone gave me a quote from Vannevar Bush, from 1945. It's a seminal article in The Atlantic about the coming digital area and all the ideas of the memex and the Hypertext and the like. And it's a short paragraph; I've got it in front of me here, only a few sentences long.

Why didn't you see it coming? Well, somebody saw it coming in 1945. It just took a few of us a while to realize. It says, "One can now picture a future investigator in his laboratory. His hands are free, and he's not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may both be in miniature, so that he projects them for examination."

He saw it coming, it just takes us 60 years to figure out how to articulate it in the modern technology.

Lee: I think that's a fantastic answer. I love that.

Kelly: One of the things you have to know about my background is that the conversation regarding HarQen started 14 years ago. By the time we all coalesced and met, I'd been out there in the wilds trying to explain this idea and nobody was getting it. It drags on you after a while. You get to this place of exhaustion, I think in part, where you're like "Why doesn't anybody see this?" When you guys lit up, I did light up like a Christmas tree. I truly did, because it was like coming in from the cold is how I would best describe it; to have somebody light up and say, "Yes! Yes! This changes everything!" And for both of you, who are both incredible experts in the larger field, and have such incredible credentials to get it, meant the world to me.

I think you might have misread me a bit in terms of my reaction. It was shell shock. You're so used to this reaction of not getting it, that when people started getting it, it took me awhile to go "Is this real? Did you say you got it?"

Lee: I thought you were a bit crazy for not being blown away, but then as we went on in May last year, I think the excitement came in. But I was still blown away you weren't even more excited because it got me excited. It meant a lot to me back then.

So thanks for the answer, Martin. We need to cover some more questions about the technology itself. Jumping on with the questions, you want to essentially - at the moment voice is free form. We either record all of it or none of it. The only way to retrieve it is sort of serially play through it, most of the time. What you're achieving with Hypervoice is to provide a structure to voice. Is that correct?

Value Creation

Martin: Yes, to provide the structure that lets us get value out of the few little parts of that voice that are really important. Kelly's example of a sales call, she might be a salesperson selling some relatively complex product and works at a biotech company or your health-care company, calling up people. You get to close the deal, yes I want to buy something. You spend five minutes then configuring the service or product that's going to be delivered.

Somebody else in your organization is going to have to pick up on and enter the data entry system and manage. Rather than the sales person having to take a bunch of notes and then send those off to the customer as "Is this what you understood as being correct?" You can take that voice exactly as it is, so that five minutes, together the notes people have taken and pass just that part on to the fulfillment team, who then can hear the original as to what the customer actually said. They don't have to hear the whole of the sales process to get there. They know exactly what part to go to.

Lee: Yeah, so what's quite funny here, Martin, is that even though you describe something valuable when you give that example, this is a bit like theorizing before the worldwide web what you could do with Hypertext, what applications you could have, why it would be useful. So it's funny for us having to say why this would be - we're trying to sell something and give examples. It's like we're trying to make a sales case nowadays for Hypertext. It's almost laughable, even though they're good examples. [laughter]

Asking For A Business Model for Hypervoice Is Akin To Asking For A Business Model For Hypertext

Martin: [laughter] The question I dread is: "What is the business model for Hypervoice?" At that point I just want to slam my head into the wall. It's like there isn't a business model for Hypervoice.

Lee: It's like world transformational in terms of technology.

Kelly: Right, what's communication? What is the business model for communications?

Perfect Timing

Martin: What's the business model? Well yeah I don't think you've quite got this, have you? [laughter] It's foundational. It's the thing on which you build everything else. And it's at the most apposite moment in time, just as the hegemony of telephony, the reigning nobility, the king of voice is coming to an end, the emperor of voice. And WebRTC and other things are growing up. There's talk of cloud services, and the social media web revolution, the Web 2.0 is reaching its peak. We couldn't be here at a better moment saying, "And there's a new Promised Land to step into." That's my way of seeing it; Kelly was looking back down the hill. Look at all the stuff we're bringing with us in this HarQen Symposia thing. "Kelly, look over the ridge! Look over the ridge! That's our valley in front of us. Stop looking behind. Look this way." That was an exciting moment for me. [laughter]

Kelly: [laughter] It was. It was this incredible moment where truly - you get lost in the weeds when you do this day in and day out. You're working on bring a real product to market that needs to have real customers. So where poor Martin wants to slam his head into the wall about the business model for Hypervoice, I was living that because my investors wanted to know. We had to come up with use cases, viable use cases that people would pay us real dollars for to show how Hypervoice works. We were able to successfully do that, but boy, was it hard. So when he said look over the ridge, it took a lot of energy for me literally to look up, and a lot of letting go, which - [00:37:33.7].

Martin: If we take a fairly familiar analogy, it's a bit like the Lewis and Clarke expedition across this new territory. The person that's a real heroine in that was Sacajawea who had been living amongst the native people, but she didn't understand the significance of the journey, as it were. She'd been plowing the terrain for years and years, and then other people came along next to her or beside her that can say, "Look, we can populate this area." They obviously had in that case a definite impact on people that were already there. But that's the kind of analogy I see.

Value Beyond Minutes

Lee: It reminds me again, in May last year, I sent someone at Skype a message and it was to say for Skype to have a long-term future beyond the business model at the moment of people paying per minute for regular telephony (ironically), look at this company seriously. This is where the future is to be made, beyond minutes. Would you concur with that opinion, Martin?

Higher Than Highest Aspiration Of Any Pre-Existing Voice Service

Martin: Absolutely. The highest aspiration any pre-existing voice service I've ever seen had was to be as good as being there in person and at no cost. What I think Hypervoice does is it takes it beyond that. It can actually be better than being there in person. By recording our conversations rather than just being a meeting in an ordinary meeting room, and annotating them, and typing them, and making them amenable to this incredible idea amplifier of the web. It is actually better than being there. There's actually a reason to have a conference call rather than go to the office. The results are more efficient and effective than ordinary human conversations.

Lee: Martin, I have to remember that people who will hear this will not maybe get it instantly. What you just said was - I even feel embarrassed to call it significant [laughter]. What do you do when it's more than significant? I think a lot of people who think it's just a fancy recorded call with some meta data on it. Do you want to spend a moment to say why it would be better than actually being there?

Voice Amplification

Martin: Steve Jobs understood the significance of the computer. What he said was "What a computer is to me is it's the most remarkable tool we've ever come up with, and it's the equivalent of a bicycle for our minds." That bicycle has been amplifying our assets for all kinds of text and image-based media for some time now.

Next 20 Years

But our voices have not been really amenable to processing and amplifying spoken idea pieces. We amplify our voice at a distance but the ideas that are embedded within the voice have been lost. So the significance of this is just like on that sales call, those few moments where I spent - the five minutes where I speak this is how I want to configure the product, don't have to be repeated in text, which is then an inaccurate proxy of what was actually said, and in addition introduces a while bunch of business processes and failures in sharing it. So all the things that Hypertext has done for us in the last 20 years, that's what we'll experience with voice in the next 20. It's going to transform how we work.

Lee: I can only agree there. So you're talking about instant value creation and then it's the fact that you can keep adding value to value, just like what Hypertext has done. So it's really dumb again this whole business model because straight from the get it's value and exponential value creation.

Martin: Yes indeed. If you think of how Twitter works and you retweet something, little fragments of voice can also be retweeted and so something that's very significant can be heard by thousands and millions of people. That is how these machines amplify our ideas. At the moment all our voice calls tend to get lost. That will change.

I need to go; I've reached check-out time at my hotel.

Lee: I'll stay on this interview with Kelly for a bit longer. I'll say adios to you, Martin, and thank you very much for your time.

Martin: Thank you.

Advantage of Voice

Lee: We're going to proceed here, Kelly. A lot of people think text is efficient. It can be read by machines, etc. Do you want to say a bit about why you think voice as a media is important in comparison to text?

Kelly: Yeah, there's nothing wrong with text. In text there's a tremendous amount of value and purpose. It makes things skimmable and you can accelerate through things. It also introduces a lot of errors, a lot of times, in terms of conversation because it's a low-resolution version of the human voice. It's stripped of tone and nuance. When you're having a highly charged conversation, emotional conversation, or one where inflection and pausing is important, which is most conversations, particularly if you are dealing with any sort of global relationships; you really want to understand what are people stressing. What are they saying and how are they saying it?

You need to use the original. You need to use voice. But the problem is that up until now, voice itself is just a large flat wave file that nobody can really consume in an intelligent manner. What we really want to do is to be able to skim voice. We want to be able to jump to the moments that matter. That's where the correlation and similarity to Hypertext really comes in. You can jump to that point in the conversation where it was the two seconds that you just want to rehear and reframe and go, "Okay yeah, that's right. I got it now."

That's why I think there's a huge value about capturing and organizing voice, because it doesn't strip out sometimes the most important content, which is the meta content, the tonal qualities of the conversation, which text does.

Lee: Voice has extra information, apart from just the words themselves.

Kelly: Absolutely and it's valuable information. So just think out a bit. Now imagine if you will your CRM, up until now we've used it as a repository for notes for the most part about client accounts and relationships. But what happens if it actually had snippets and segments of really important customer conversations, the real one, the actual ones, the original ones, the ones in voice? Voice is a very rich media. It's got a lot of information in there.

Big-Voice Data

If you are able to mine it from an analytic standpoint, it could be incredibly valuable. This is looking out much further than where we are today. This is part of the vision, as Martin puts it, of looking over the hill. You start putting some of the predictive analytics we have, the software, and aiming it on a rich media set like voice, you can start to get some really interesting insights. This is another aspect of big data, but because voice isn't collected today it's not really part of the dataset we think of. What we're doing right now with Hypervoice is creating structured, big-voice data that's mineable.

Lee: I think that's powerfully put: structured, big-voice data. I like that phrase you just put together. You were talking about mining and consumption. I think again we need to think for people trying to quickly understand it, because I think it might take a lot of people a bit of time to get it; can you give another example of how you put something around voice in order to index it? Some kind of example we've not had on this call?

Another Example

Kelly: Sure, for instance I'll give you a very common, very simple example, but one that I think most people can relate to. You get an email from a colleague that was at a conference, and they said, "This speaker was amazing! I've attached a copy of their slide deck." And you open up the slide deck and it's just that, it's just slides. There is no audio track to it. There is nothing other than reading the slides, and you're missing so much of the content.

With Hypervoice, if that had been a Hypervoice-enabled presentation, passively there would have been tags put in every time they advanced the slides. So that the voice would be synced up with the visuals. If you did send them a link to the presentation, the person could step on to slide 4, and hear exactly what was being said in slide 4.

This is the key of linking what we say to what we do; the presenter didn't have to "do" anything. For Hypervoice to really work, there can be some active components of it, but a lot of it has to be passive. It has to be taking these normal, natural behaviors, things that we're already doing, and using that as a social gesture to inform the system what's important, and where the voice needs to be indexed so that it can be findable later. Does that help?

Lee: It certainly does help. To me, it seems so obvious and it seems ludicrous we are even needing to come up with these. It's a bit like now saying user-generated content, like FlickR; why was tagging so wonderful on FlickR. [laughter]

Voice Is Dead? It Hasn't Even Started Yet

Kelly: [laughter] Right, we've had this conversation before, like 12 years ago, but we did it for text. We did it for images, or we did it for video. Somehow voice is this missing rich media type that on line has no respect. Hypervoice is all about saying, "Wait a second, you think voice is dead? It hasn't even started yet. It's not even really on the internet in a meaningful way." Right now it's still about transport, not about content.

Jump From Transport to Content

Once it makes the leap from transport to content, now it's monetizable. Now it is part of a larger ecosystem, and like you all were saying very early on; it's like Hypertext. Its value becomes so obvious so quickly because people put it into use. The uses that will come up for it now are going to sound really lame in about 10 or 20 years, just like Bill Gates talking about software and thinking maybe my mother could use it to organize her recipes for the home PC. Even he couldn't think of a really good example.

Natural Gestures May Point To Value

Lee: Exactly, so people can hopefully through a lot of natural gestures, like tagging or whatever, can point to wherever the value is in the voice string, collectively, and user-generate content, and collective intelligence, blah, blah. I'm wondering, and I don't know if you're able to say, but I'll ask anyway; how does it jump from the link, the tag, or whatever, into the right part of that PCM, or whatever it is, voice stream? How do you get there, or is that your secret sauce?

Secret Sauce At Odds With Consortium?

Kelly: That's what HarQen has been concentrating on. That is the secret sauce of our application of Symposia. But it's not as if it's only for us. This is part of the opportunity to create the ecosystem. There's lots of ways of actually solving this problem. We've come up with one of them, and we did build a rich patent portfolio around it, but again, it's one way of solving this much larger problem. There's going to be other ways, and in the future probably more elegant ways, as this is the beginning of the conversation. I feel like we're sort of the Pong version.

Pong was really important. People loved Pong. Today we look back at it and think why did anyone ever play that game. I'm of that generation so I remember it. I think that what we're doing is really important, but the future generations of how they do the technology and what will define how Hypervoice is created and experienced is going to mutate and evolve incredibly rapidly. The pace of change today is so exceptional. Five years out, what we're talking about now is all going to sound silly.

Lee: Obviously I'm thinking okay we have a secret sauce here and we have talk of a consortium. Obviously it's a bit of a dilemma going on there in my head between a consortium and a secret sauce. Can you help me out there?

Kelly: Sure, like I said we're doing it one way. I shouldn't say we; HarQen is doing it one way. My intention with the consortium is to do it lots of other ways. This does not have to be solved in one particular way. The key thing is, and this is how I think about Hypervoice, which is a little nuanced from how I think about Hypertext, which is that the goal of it is to link up and make voice findable, shareable in all languages. The key thing is you can jump to a point in the conversation. You can be able to search conversations. How we go about that today versus tomorrow is going to be dramatically different because technology is changing so fast.

It's not to say this is the only technology. These are the set of outputs or outcomes that we're looking for. This is the kind of usefulness we want. Define it there, and allow the technology to have the space to evolve. Because it's going to change.

Lee: Maybe I'm wrong and I'm misunderstanding, but at the moment at this point in the call I must admit I'm thinking, "No, you really actually need to give something away in order to provide stimulus." It sounds like quite an overhead coming up with that technology, to even begin to play.

New Ecosystem Opportunity

Kelly: We are. There's a lot of stuff that we're doing and I wouldn't put it in terms of giving it away. It's not as if we're there yet. One of the things that HarQen had to do is monetize it. But I think that there's an opportunity for a similar thing for Red Hat and LINUX. There's an ecosystem that needs to be developed and made that today doesn't exist. In many ways, we're a point solution, or HarQen is a point solution. But it needs to exist within the larger framework and that's what we're trying to do.

Lee: I understand. Maybe what I'm detecting, tell me if I'm wrong, is it's not fully thought out yet what we're "giving away."

Kelly: Exactly, we don't know. That's part of the reason why getting all these smart people together, this is very early.

Lee: Otherwise they're only a promotional tool for HarQen, if they cannot start. Obviously there's a lot of decisions needing to be made.

Kelly: Exactly, and they need to come from a group of people who include the voice of direct competitors. That's a conversation worth having, and that's a conversation I found big enough to step out of my role as CEO of HarQen, and say, "No wait, this is a movement. This is going to require us to take down any sort of false barriers we might want to put up in terms of competitiveness, and have some real conversations about how do we make this work." How do we collaborate as opposed to how do we compete?

Role of HarQen

Lee: I'd be wrong if I don't ask even though it's quite direct; I'm wondering what the benefit is to you because it appears you're shooting yourself in the foot if you're starting a consortium to encourage competitors. What is your personal gain there? I don't understand and I think other people will want to know that answer.

Kelly: It's two-fold. There's a benefit to HarQen in the grand scheme of things of being one of the first applications of Hypervoice. That's meaningful. HarQen has a hard job of keeping the lead and keeping peoples' attention, or risk becoming Atari.

That is one part of it. The other piece is if you look at my history, I every six or seven years will find myself in a transition where I want to go out and start something new. HarQen is a real business today and it's got a real nice growth curve. I'm what I would call a starter CEO. I love creating something from nothing. I see that opportunity with Hypervoice. That's the kind of stuff I can't resist. That's the stuff that gets me out of bed in the morning and quite frankly; I don't even know what it fully means yet.

I don't say that to sound as if I haven't thought about this a lot. I have. Obviously when you make major changes like this you think about it a lot. What I mean by that is that this is such new territory that I'm completely excited by it. And I'm only one voice in a much larger conversation that needs to take place. Rubbing ideas against Martin and the folks from Voxeo and Telefonica has been extremely enlightening and helpful. And as we broaden that conversation, it's going to get even better. So that's what gets me up in the morning.

Lee: I'm known to be hard on people and for me that was a solid answer. Well done there.

Kelly: It's the benefit of being true.

Lee: I'll agree with you. I want to stress that there's still a lot of excitement around Voice over IP and now Web RTC. But that's at the transportation layer. Actually, the transportation layer we've been trying to argue for so long isn't that exciting. So the excitement is at the content level. Would you agree?

Transport VS Content

Kelly: I think they're solving two different problems. Transport is solving a very old problem, and it's still not solved completely. It's a hard problem, which is solving the space problem of you and I are connecting across space right now. We need desperately for transport to work well for us to connect and have a good conversation. As we all know, or have experienced, telephony today isn't what it used to be. Meaning that there's a lot of times where we have quality and latency issues and all sorts of things that disrupt our ability to communicate across space. But it's a problem that we've been working on since the days of Alexander Graham Bell.

What the content problem is, is very different. The problem it solves is the time problem, which is we're a global society today. Thanks to the internet and everything we have the ability to talk to people in multiple time zones. The problem is we still have to sleep, outside of you of course. We have the ability to connect in the right way; we need to solve the time problem in content. And I mean by recorded chunks, searchable voice content, allows us to meet asynchronously.

We can start a conversation and have somebody pick it up eight hours later and keep going with it, while we're asleep. So we can do these business cycles, and have rich, voice, layered communications and yet solve the problem of we have too many time zones between us. Does that make sense?

Lee: It does make sense. What I'd like to pick up on and reiterate is the present situation today is we're throwing away - I don't want to say we're throwing away voice; we're throwing away incredible value. You gave a tiny example of a speaker in a presentation. Martin gave an example of the sales process. For example, you could index what successful sales closes are, if you're a sales company.

Kelly: Exactly, or objections. What are the ton of objections we heard this week? The sales manager is able to go back and actually listen to 30 seconds here and there of these objections that are coming in near real-time from customers, so they can get a real sense not what the sales person thinks and put into the CRM, but what the customer actually said. You can time share the expert ear.

When we start getting into asynchronous content, that's the huge unlocked value. I don't have 15 minutes to listen in on a sales call. But I do have 30 seconds to listen to a couple of objections and then put my expertise to how we overcome those. That's value-added communications, and that's where I think we're going.

Lee: And it's all part of this global brain that is building, that started the worldwide web. It's just you bring people together. It's ridiculous that we're not doing it. It's frustrating that we're not doing it today. It's even more frustrating that I didn't even see it until May last year when I first spoke with you.

Kelly: Yeah. I loved that. I have to tell you how important that was for me. It's wandering in the wilderness, and today I'm there wondering why nobody gets it. Once you know it, it is obvious. It's wonderful. And the future starts to come into a clarity and you're like "I want this now." I have now 18 months worth of captured conversations that are all organized that I can go back to from the last year and a half of testing on Symposia.

I know what it feels like to have perfect memory of all my meetings, which is an incredibly liberating feeling. It's the kind of stuff you get used to, and you can't imagine why in the world would I attend something that I'm saying is important enough for an hour of my time, and not capture it in a way that I have perfect recall. It's extremely frustrating that people don't think that way but we've gotten into a behavior system of rework and re-conversation and re-going over things again and again because we don't think those systems are possible. People haven't opened their eyes to the possibilities yet, because Hypervoice to them is not a reality. But it's very much a core part of my reality today.

Origins Of Such Voice Indexing

Lee: Thank you, Kelly. I do want to ask; how did it come about that you came up with this idea of being able to index voice in such a way? Where did that all come from?

Kelly: I would love to take credit for this one, but I can't. I have to credit my husband Jeff. He came up with it. The big question that started it, back in 1998 was why was it we could share images online; why could we share video online; why could we share text online but we couldn't our voice online? It was this huge, missing object.

That was the start of it, he actually did with comedy. He said, "What's a type of viral voice conversation that people would get?" He started with joke telling, where people could capture and share jokes online. They went to launch it in 2000 as the web broke, and the dot-com crash happened, and we ended up putting that project on the shelf and taking it out from time to time to look at it. That was really the predecessor technology behind HarQen. When I say it's a 14-year conversation, I mean it. We've been wandering around in the desert for a long time.

Lee: That sounds like accidental stepping to begin with, until you bumped into it, right?

Kelly: Yeah, that's how I think a lot of innovation happens. We try to tidy it up in posts and talk about how we had this grand eureka moment where we're running through the streets of Athens, but the fact of the matter is I think a lot of times innovators back into it. It becomes obvious in post.

For me, I think the one piece of why this is a personal passion, and you know this Lee, I'm dyslexic. I am an auditory learner by nature. Visual text does not work well for me. The internet is really broken in a lot of fundamental ways for people who have learning disorders or blind, much less. You should talk to the ABA (sic), the American Blind Council in terms of what they do with passwords. That's an incredible bugaboo.

So looking at it through different peoples' lenses, and particularly my off lens, I don't see the world like I think everybody else does. I could see them missing in how hard the web was to navigate as somebody who primarily doesn't learn through text.

Lee: I appreciate that. Kelly, you've inspired me. I didn't have any knowledge of how you came up with it. I've been absolutely amazed. I feel for the purposes of this interview, I feel we've covered what we should cover. I'd like to thank you very much for your time.

Kelly: Thank you. I enjoyed it.

Lee: I am sure I shall interview you very soon to find out what happens, and hopefully it's not another year and a half, Kelly.

Kelly: Exactly, and hopefully you get a chance to spread the word at my little event on the 12th. The more the merrier. We want to get the word out.

Lee: How do people find out about the event? Where can they go to? hypervoice.org?

Kelly: Yeah, hypervoice.org and it's under "Events" and it's on the 12th of December at 12 p.m. PST. There's a link in there to find it and we'd love to have you.

Lee: I'm assuming you're going to be using your own product to capture that virtual event?

Kelly: Yeah, we'll be using HarQen Symposia for the event, which is nice. People can actually experience what Hypervoice feels like. Again, I put it out there; it's not designed to be promotional for HarQen. It's really designed to give the idea of Hypervoice, which I think it illuminates, and like I said, it's the Pong version of it. I'm looking forward to seeing what's next.

Lee: Okay Kelly, much appreciate it and I hope as many people as possible join your event on the 12th.

Kelly: Wonderful. It's limited to 75, so hopefully people will be able to get in before we close it.

blog comments powered by Disqus

Get Updates

About this Entry

This page contains a single entry by Lee S Dryburgh published on December 11, 2012 7:25 AM.

Jan Dawson - Telecoms in 2020 - A Vision of the Future was the previous entry in this blog.

eComm 2014 Date and Location Confirmed is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.