Maana’s Chief Scientist Steven Gustafson, Ph.D.
Executive Education – Knowledge Graph Conference
Columbia University – School of Professional Studies
May 7-8, 2019
That was a great to hear that talk about the analytic tapestry. I’ve always been very impressed with the Lego graph. And it was a great transition for me because what I’m going to talk today about is a use case, but I’m really going try to make the point about the importance of considering the analytics and more and more specifically the decision involved in some of these applications. So, I’m the chief scientist at a startup called Maana. Maana is about six years old. They don’t like to think of themselves as a start-up anymore. But it actually started from the point of view of solving search or enterprise search for large industrials. So, one of the founders came from Microsoft where he worked in the Bing team developing their knowledge graph technology called Satori. And through that learned that, you know, search didn’t work for companies very well.
And so they created this company to go out and try to solve this problem and over several initial use cases and pilot projects, what they learned was nobody wants to really pay for search. So, developing these proof-of-concepts and showing people how they could bring their data together, how it could be connected, and then going back to the customer and saying, do you want to license this now? They got a lot of, ah, this is interesting. It’s great. But what I really wanted to do was to inform myself about this specific decision. And so what they learned was, is that to make an impact in these companies and to build a product that people wanted, they had to move away from the problem of search and go towards the problem of answering specific questions that an expert was trying to use to make a decision, preferably a decision that would help that company make money or save money.
And so that’s when they started to transition away from enterprise search towards a knowledge graph. And eventually by focusing on the decision part of the problem towards something that we call the computational knowledge graph. Now, I’ve been at this company for about two and a half years and I actually came from General Electric who was an initial investor in Maana and I was at General Electric’s research center for over a decade. I came there after doing my Ph.D. and Postdoc and machine learning, uh, focused on solving industrial or applied problems. And over the 10 years, essentially back then it was not called data scientist, but I was basically and applied machine learning researcher trying to solve specific valuable use cases for the GE businesses, from capital to aviation engines to gas turbines, wind, healthcare, a huge, very huge diverse set of problems. And every single one of those, it was like doing a mini proposal, like a mini grant proposal.
You had to justify to a business leader why they should give you a couple of hundred or a couple of million to build a team, do a technology build or select vendors, work with them and solve a problem. Nobody really cared about the technology. What was important was something gets deployed, there’s an outcome and you could measure the value of that. And so after about a year or two, I became obsessed with trying to figure out how do I get to an outcome because if I don’t get to an outcome, nobody really cares what I’m doing and it, but if I can do outcomes, then I can have more control and more flexibility and pursue the research and the science that I was really interested in. And so doing in that experience, in, in doing that kind of work, brought me to the realization that it was very difficult to learn about the use cases and learn about these problems in general electric across the space.
You know, if it was health care, clearly we heard a lot of the, in the previous talk, about some of the domain knowledge needed to be successful there or if it’s in the aviation space around all the mechanical engineering to make an engine. And so, in the other aspect was just like a lot of the talks, made yet the points yesterday, the data is messy. It’s crazy. There’s a huge amount of schema. Has data are labeled things like NGX one NGX too. And it just makes no sense. And so yes, a lot of time was spent as a data scientist cleaning and prepping data, but really most of the time was spent about learning the actual domain and learning what these different data things represented and making some really stupid mistakes. Like going back and showing your machine learning model and having the person say, oh, you’re using a variable that I created that just codes for the why that you’re, you’re predicting.
And I was like, okay, well that explains why I was so awesome. So you know, that those sort of mistakes, you know, taught me that if I wanted to be, better, and if I wanted my team at that time to be more efficient and to get to outcomes quicker, we need, we needed some kind of semantics in the data. And that’s actually how I came to the knowledge graph technology and learn about semantics and ontologies and things was really from that point of view of I want to be a better machine learning researcher and I want to deliver more impact. And so after a couple of years of doing research and building knowledge graph technology, I realized there was something missing. There was a missing representation for the analytics. So we had developed really cool knowledge graph technology and you can certainly go and see some of that.
It’s, there’s an open source library that my former lab, created called sem, TK out and get hub. Nobody’s, I don’t believe using it, but it’s really cool and I think it’s really awesome and just a little hidden gem sitting there. But we were missing a representation of the analytic of how people made decisions. And at that time I reached out to the founders of Maana and I said, hey, I’ve been at GE for 10 years. I’m really passionate about trying to figure out how to build a knowledge graph with analytics in it. I just saw your guy’s recent presentation. I hadn’t talked to them for a couple of years and, you know, can we do something? And that’s how I ended up there.
So, in this use case, this is about shipping oil products around the world. And it’s a very classic kind of use case that I worked on at General Electric. And it’s very typical of the customers that we work with in Maana. We focus on the decision until you can see right away we know what the mission of this company is of this use case and we know what resources they have to get to this point. We did a workshop two workshops to understand what is the key business decision that they’re trying to solve and that they will know will have impact. And so we worked this use case over a period of two or three months using our platform to get this outcome of getting a pretty good cost reduction and also increased utilization of the resources. So, from an outcomes point of view, this is great.
I’m not going to hyper focus on these numbers because they differ by customers. The important part is that we use our technology, we got to a successful outcome, but specifically, or more importantly, we also achieved two other additional outcomes. One is, is that we helped them create new knowledge, so having better insight. And so real tangible knowledge captured in the platform around their performance of their fleet as well as having more visibility to into their operations, allowing them to make more new decisions. And so, the transformation that we tried to achieve here was largely successful. You know, I think it’s a small increment, but helping them be more metrics-based is sort of like getting the basic or the base level for them to build upon. And so notice that while I talk about knowledge, it isn’t the graph or the number of triples or the number of, of things that are in the graph. That was the success. The success is being able to talk about their business and understand the business and how to make better decisions. And this is what they experience from the knowledge graph technology. So, there’s no graph here, there’s no edges, uh, nodes. This is an application that was also part of the use case solution. And on the left hand side are the vessels. And on the right hand side is time and it’s interactive and the customer can go in and move things around. They see the recommendations and they focus on improving their decisions given the outcomes of the graph or the computation. It looks great here. Do you guys all want to come over?
It’s pretty, in fact, I was a little bit nervous about putting up a screenshot so I actually blurred it out. So let me come back to this informative slide. Coming back to the point about decisions is to get to the point where you can deliver a use case and focus on the application, not lean into the knowledge graph and lean into the kinds of data that you’ve transformed in the Etl to give them this valuable resource that they can go and use and say, Hey, you have knowledge management, you can build all kinds of applications. We had to focus on, here is an application that will get you value because we’re a startup and if they don’t buy the product, we’re kind of in trouble a little bit. And our investors get very anxious. And so we focus a lot on the decision.
And this is when, about the time that I came in. And one of the first things my task was to do is to help define that artificial intelligence strategy in particular. How do people think about decisions and how are we going to provide technology that helps them make better decisions? And so, we identified these four things in green around observe, reason, decide and learn. And this is the kind of tasks that you do or experts do when they make decisions sought over many, many use cases in General Electric. You look at the environment, you ingest data, you think about how you’re going to solve this problem. Out of all those recommendations are all those opportunities. Your head, you make a decision based on the outcome of that decision you learn. Now computers can provide assistance in here because there’s a lot of things that we do in our head that don’t get captured anywhere.
Like making predictions, using your judgment, learning from your experience, from the feedback, knowing which feedback you should be collecting and putting into your mental model. And that is forming the experience. And so when people talk about retirement or people leaving the job and they’re worried about a knowledge leaving the enterprise, it’s the things on the right that don’t get captured that they’re very worried about. And this is where analytics can help a lot. But like I said, in a lot of these solutions, analytics aren’t formally captured anywhere. So to help, well that turned out well actually it did. It’s not a great slide. So to help people and to help our, our customer solutions teams work with customers and define a use case in a good way. That one will work with our platform. But we’ll also follow those processes of how do people make decisions and where can computers and technology, machine learning analytics help them.
I developed this thing called a decision canvas. It’s kind of like the business model canvas, but it basically lays out these boxes that you can sit and there’s little questions that are, you can’t read them here, but you can work through this with the team and identify what are the main components of a solution. And then how do you break that down into a prediction problem, the data, the data science aspects aspect of it. And how do you bring the concepts and the analytics from the prediction into one flow and that essentially defines your knowledge graph and your analytic or function graph. The bottom box here I took from the prediction machines book, and if you haven’t read that, it’s a great book that came out last year. The top portion is things that I had to find before I found that book. So, I kind of merged in together.
I can give you this, it’s released it in the creative Commons. So, let’s come back to the use case. Like typical use cases and a lot of what’s been motivating the talks here and the different work that the people that you are doing is that there’s a diverse set of data in these companies. So, there’s all kinds of things from data systems coming from oil trading platforms, different ports, information about the ships, the charters, a spot in time charters that you have to bring together in. A lot of times these are provided by third parties. So, they faced the similar challenges. And that’s what motivated the knowledge graph. Okay. And, and those, that data forms those key observations that you need to take into consideration when making a decision here about what is the optimal schedule. But look at all these other things that are important when you go.
When we went and interviewed the experts, these were the things, they weren’t the data systems. These were the aspects of what was important to them. So in almost all of these, is there some kind of analytic, what are the requirements at a dry dock given a specific shift, how would you calculate that? So it’s kinda like a rule constraint. What are the different, certifications required? What’s the compatibility? Sometimes it isn’t documented anywhere. If a ship will be able to fit into a, into a berth. So, you have to calculate that. And so a lot of analytics and different calculations that have to be done. So, the decision, so making impact requires that you encode this. And if you saw for this and you create new fields or concepts in your graph, but that solving rests in Python code or some other analytic, it’s sort of lost. It’s not represented in the solution.
And so, this is a view of the solutions knowledge graph. It’s a little bit earlier version. It’s not the one that the customer is using, but it’s representative. There’s about 33 classes or things. And in this graph they’re connected by different kinds of relations. And you can see I blew up one here on the far right for the berth. And this is in a shipping – are sort of shipping a domain graph and it has different properties are then relations to other things in there. It’s fairly typical. It looks kind of messy, but it’s actually really, really valuable because you can see in these objects, things are named with common sense terms. They represent real things in the world. So this graph presents immediate value. And it’s probably something that everybody here, when you work with different knowledge graph technologies, you also see something like this as being valuable and you can query it and find data and it’s great for data scientists.
You can find stuff. But what we actually do and where we spend most of the time is in building these sorts of things. So this is a function graph representing the computation of a vessel travel time. And there’s a couple of different functions in here that like the first top left one is called get the current vessel location. So, where is a vessel? It seems like it might be an easy thing, but it’s not. They’re out at sea somewhere. So with what are the coordinates of the vessel, what’s the port location? And these things flow from inside to out from the left to the right. And so in fact, over the course of five or six years, Maana has developed several platforms that eventually started to attack this problem more and more about how do you create a knowledge representation for functions going from using different kinds of graph databases to proprietary key value stores, to different graph databases to just store the metadata than ontology about the domain to now where we are, which is and this will be, I think, you know, possibly surprising to some folks here that this product, our product and these solutions are actually all graph QL microservices.
And so the reason is, and here’s the key, here’s two more examples. The reason is, is that as we did these use cases, as we did more and more solutions, trying to bring better decisions to customers and realizing that they needed to focus on the representation of the functions and the analytics for a couple of reasons. One of them is that if you start a solution and there’s a complex analytic and you have to wait for data science to finish and you can’t do anything until that data scientist finished, the whole solution kind of goes off the rails a bit is that you can provide simpler analytics. So, by focusing on a Microsoft, a distributed microservice architecture allows you to sort of mock up parts of the solution where there’s analytics with simple things, bring those microservices together. And what’s cool about graph QL is that it’s a Schema of hierarchical concepts and our platform essentially stitches those things together and allows you to flow the inputs from one analytics of the output very much like we heard in the previous talk about analytical tasks, a tapestry.
These are important concepts, but what’s the most exciting thing to me from machine learning point of view and from a knowledge representation point of view, because I’ve come to really love and to really be passionate about making sure people think about knowledge representation is that when I look at this function about determining vessel port compatibility, somebody can go to the solution. And this of course is you’re seeing the platform part of our solution where people construct solutions. You can go there and you can understand how it’s computed. You can understand the analytics. Now, if it’s something that’s really, really big and complex than as a, hopefully as a solution engineer, you provide functions at a higher level. You don’t have a function of a thousand concepts in here flowing through and at the most granular level. So, there is some choices that have to be made just like a knowledge graphs on how you represent your computation in your functions.
But I can go and see what these things will look like. And then if I joined this company or if I join the company that’s running this and they say, Hey, you know senior apply. Hopefully, I’m a principal, applied data scientists. Maybe if I go there they say, hey, go and fix this solution. It’s gone kind of a little bit crazy and it’s not getting as good performance. I can go up here and open this up and see which functions I might want to start to investigate and then start swapping things in and out. And that’s very exciting to me as an, as a scientist or a subject matter expert or a data scientist working on one of these solutions. And so by trying to help companies do digital transformation by focusing on the knowledge graphs and a function graphs, we give them both, not just better decisions, but actually a layer, a computational knowledge layer where they can go maintain, improve, and develop more knowledge to themselves and help share that around the organization and reuse that over time.
Stay in the know with the latest information about Maana services, events, news and best practices by email.