SMW 17: Microsoft's Andy Beach Talks Machine Learning and Media
Tim Siglin: Welcome back to Streaming Media West 2017. I'm here with my good friend, Andy Beach. Tell me what you're up to these days?
Andy Beach: I'm still working at Microsoft and one of the areas that I've been really interested in is how we're exploring machine learning and media, so that's what I wanted to give a talk about this year at Streaming Media, to talk through some of the different options that are out there.
Tim Siglin: As a matter of fact, I had Scott Grizzle from IBM Cloud Video, and, of course, they're doing a lot with Watson trying to do machine learning, deep learning, that kind of thing. Is the focus primarily to do this through cloud and big data combinations, or how does that work?
Andy Beach: The approach that we've taken is to make machine learning available and accessible to anybody that needs it. If you're a data scientist and you know R, there are ways that you can just come in train your own model, but there're also ways that if you're just a developer and you want to include machine learning, we have specific APIs that perform a function from some sort of train model and you can implement that API to get the thing you want, whether its facial recognition or close caption transcription, or something like that.
Tim Siglin: So in those cases, you're doing speech-to-text and computer vision as part of the machine learning?
Andy Beach: Exactly, and then, if you're not a developer, but you still want access to that type of information, we then even productize, as your media services, the ability to take those elements, upload your content and give back and interactive player that has the facial recognition widgets in it, or a full transcript of all of your audio that flows next the player as it's playing, and allow you to translate it into other languages on the fly.
Tim Siglin: What's fascinating to me is having done work with what we use to call index and search and retrieval way back with some of the companies that did that stuff on stand-alone devices, essentially now what you're doing is using the power of the cloud, and also the distributed big data tables that you get from doing a lot of the analytics. Do people have a way to score correct what the audio transcript shows, because we all know they're not perfect?
Andy Beach: Everything that comes out has some sort of competence score with it, and there are abilities to tune that over time and confirm how correct it is, or you can go in and edit things within your content that need correcting and it adapts and learns from those corrections.
Tim Siglin: Interesting. Are you aiming at a particular market vertical? When I worked in Europe through a Framework Package Six Project that had a bunch of guys from the Lurn out in Housby who did naturally speaking. They could do really well saying legal and medical because the terminology was very distinct, but generic or general conversation was much more difficult; so how are you guys approaching that?
Andy Beach: There's just a sort of a baseline API when you're talking about the cognitive services piece of it where it's just trying to contextually make sense of the words that it sees, based on the words around it. So, we're trying to understand what something is in relationship to the paragraph or something else that's there, and that helps frankly a lot with the accuracy; because it's gonna understand the difference between certain terminology that might get used because it's putting it into a context.
Tim Siglin: Are there specific libraries or market verticals that you're going after? Like legal, like medical?
Andy Beach: It's pretty wide open, you know. I think there are both enterprise applications and surveillance and educational tracts that are using it. But, we have entertainment partners who are also using the same services to create functionality today.
Tim Siglin: Okay, nice. So, what other things are you doing? Obviously machine learning's not the only thing you're doing.
Andy Beach: I finally got to actually do some big video projects in the last couple months, which were the first sort of transcoding projects that I've worked on in years and it was like working on old muscle memory; pulling back terminology and things. So it was kind of exciting to do, but in relation to that, another one of the big areas that I've discovered become important with what I'm doing from a sort of infrastructure perspective to video is I'm doing a lot more around high-scale data. Taking all those data points that we pull out through machine learning or through video player interactions and that kind of thing, and how do you put it somewhere and then very quickly slice and dice it to expose certain trends that you see. I've had to learn a lot more around how containers fit into this and how you create large-scale data bases. It's things that I never imagined that I was going to be working with--I was a video jockey at the end of the day. But now I'm learning all these new elements and it’s kind of exciting.
Tim Siglin: Well as we both know the metadata is ... you're talking about context of words in a paragraph, the metadata itself around a container and a format inherently can help you constrain down to particular decision points. If it's an MPEG-2 transport stream, more than likely it's gonna only have one or two codecs in there; versus if it's something that's WebM, it's probably not gonna have AVC as part of the format. As always, thanks for coming and stopping by, and have a great show.
Related Articles
Google's Leonidas Kantothanassis explores the vast range of applications for machine learning in the media workflow and supply change in this clip from his Content Delivery Summit keynote.
19 Feb 2018
Streaming Media's Tim Siglin interviews VideoRx's Robert Reinhardt at Streaming Media West 2017.
05 Dec 2017
Streaming Media's Tim Siglin interviews IBM Cloud Video Senior Solutions Engineer Scott Grizzle at Streaming Media West 2017.
01 Dec 2017