A powerful way to link and query relationships and data -[FULL INTERVIEW]

In this interview, Jim Webber, Chief Scientist at Neo4j, discusses the growing importance of graph databases in the world of data analytics and artificial intelligence (AI).

He highlights how graph databases are becoming a critical tool for organizations to uncover insights from complex relationships within their data. He emphasizes that as data continues to grow in size and complexity, graph databases offer a unique approach to modeling and querying interconnected data, leading to more accurate predictions and insights.

Find out more about Neo4J -> Here.

Key Points

Graph databases are gaining momentum as organizations seek to extract valuable insights from complex and interconnected data.
Jim Webber highlights the significance of graph databases in various domains, from space exploration at NASA to biopharmaceutical research and supply chain optimization.
The ISO G Core graph query language is expected to become a standard, providing a consistent way to query graph data across different platforms.
Graph databases offer a high-fidelity data model that allows for more nuanced analysis of relationships, making them ideal for AI and machine learning applications.
The next five years are expected to witness an explosion in the use of graph databases in analytics, contributing to a quiet revolution in the AI and ML space.

Key Statistics

The ISO G Core graph query language is set to be released at the end of the year, potentially revolutionizing the graph database ecosystem.
Graph databases are projected to save the shipping industry 60 megatons of CO2 emissions annually through more efficient route planning.

Key Takeaways

Graph databases are becoming essential tools for extracting insights from interconnected data, offering a high-fidelity data model.
The standardization of graph query languages is expected to provide greater flexibility and interoperability for users.
The application of graph databases spans various industries, from space exploration to healthcare and logistics.
As data continues to grow, the use of graph databases in AI and analytics is poised for significant growth in the coming years.
Organizations should consider integrating graph databases into their data analytics and AI strategies to harness the power of relationships within their data.

Interview Transcript

0:02
Hi, everyone, I’m here with Jim Webber today. And he’s the chief scientist at near for J and near for j are in the graph database space. So Jim, thanks very much for agreeing to join me today and chat bit about graph databases are absolutely, Chris, thank you for taking the time to host me. I’m looking forward to the conversation. Wonderful, wonderful. So I suppose I first came across, I suppose graph databases almost as a bit of a tangential thing to some of the stuff I used to do in the fraud world when we were looking at like network fraud and sort of connections of people between different accounts and how you get these fraud networks that kind of built up. But it seems like it’s evolved terribly. And we were doing it using I think spreadsheets at the time. And this was many years ago. But it seems like it’s evolved a lot since then, with graph databases and the whole theory around suppose graph theory. And like how that sort of being building up, do you want to explain a little bit about like, where it fits into that? Because that’s definitely one of the use cases that I’ve heard of, for sure, has been a long sword. Rather, anti fraud, hopefully, has been a long standing use case for as I’m guessing the fruits could use it to try and map out their world. But I don’t want to give any bad guys any ideas. But yeah, absolutely. It’s actually, coincidentally, I was talking to her to one of the large banks in the US just yesterday, just before this conversation has been in our calendar for weeks. And they’re looking at a similar kind of thing. They’ve got a world where people are making payments. So peer to peer payments they have sometimes they the both the recipient, and the sender happens to have accounts with the bank. So they’ve got some idea of who’s sending and who’s receiving. And sometimes it’s outside the bank. And sometimes there are intermediaries. And plotting that universe is with with an Excel spreadsheet as you did is a challenge.

1:38
Don’t get me wrong, Excel is amazing. The planet basically runs on Excel, which is frightening and amazing. But you’ve got two dimensions to play with. And then you have to get really creative when you started to think about n dimensions, or you start to think about multiple paths through the spreadsheet, it’s really hard to do. And it doesn’t scale well.

1:56
graph theory, the kind of technology underpins it in modern times graph databases is built on a much simpler idea which I explain it to people who haven’t seen it, or non technical folks is like the London tube map. So you have stations and those stations are connected together by lines. And we all know the rules there that if there’s a station that’s connected to stations connected by a line, that means there’s a train that runs between them. And if you put them all together, you can plot a path around the network and get from A to B. And even for folks who’ve never written the to before, they can look at a map and kind of make sense of where they want to get, they might they may get slightly longer, slightly longer path than the people who are really adept at using the tool. But this idea of kind of dots connected by lines is really simple. So when we bring it into the fraud world, we’ve got people and we’ve got transactions that flow between them. It’s like a social network, right with some antisocial elements being the fraudsters. And so the first step is that we build a map of payments. Now, the payments carafe was I would think of it of people that are paying and so if I’ve paid you some money, there’s a link between us perhaps representing that transaction, or if I’ve paid you several times, maybe there are several links between us with the transactions between them. And over time, that builds out to the financial equivalent of a Twitter social graph or a Facebook social graph. And that’s fine writing the kind of good case all I’ve got here is a map of a bunch of people that pay each other and maybe I pay you, and then you pay Alice and then Alice pays Bob, and you can see the way the money goes. But let’s imagine someone in our network turns out is nefarious, and perhaps we’re able to identify them. So the first benefit we have is being reactive quickly reactive, let’s say that Alice is rogue, and the ultimately payments that go to Alice are fraudulent, she’s acting in bad faith, or indeed Alex is acting as an intermediary for other people are working in bad faith, the moment we’re able to spot that either through traditional means your spreadsheet analysis or through graph analytics means we can react quickly so that we can we can figure out this as being the problematic actor here. And then everyone connected to Alice, we can take action, and we can find that out very quickly. Because the connections between us and Alice, even if they’re not directly, if there’s their indirect connections, we can find those out very quickly. So a modern graph database, when I run the effigy on my laptop, and it’s a laptop, right DOS, fancy piece of hardware, I can traverse about 10 million of those links per second per core. So I can do a lot of computation very quickly. So if I find a bad actor, I can quarantine them very quickly, and I can understand Understand the spread of the pathogen. But I think it’s much that actually now that’s trivial. The more interesting thing is now I’ve got a social graph, I’ve actually got 300 years worth of graph theory that I can bring to bear in terms of graph analytics to be able to identify clusters of bad actors to look for nodes in the graph that have high centrality, their popular paths and so on to be able to clamp down on perhaps fraudsters where a lot of transactions are flowing through. And as far as I understand it, it’s almost like you got nodes and you got edges and then you got properties. Haven’t you’re supposed to do queries against those prop

5:00
to use. So you say, well, this person here or this, let’s say it’s the central line, let’s look at everything that’s on the central line. And you can then use that query. So it’s almost like it’s turning. For me, it’s almost like turning. It sounds like something like turning relational databases upside down and turning them inside. And instead of looking at points we’re looking at, like relationships and how they how things are written to relate to each other. So it’s like, turning it inside out almost. Yeah, I have a lot of sympathy with that. I think actually, in a way graph and relational art are kind of kin, right? Especially compared to the no SQL databases. So we’re very used to relation on by the way, I’m jealous that relational got the name. Yeah, because relational relational databases means row, right? That’s way of saying row. And graph database is much more about relationships. So relational databases, but they are kid in a way, because when you build a relational model, you build these high fidelity model using tables and all that kind of stuff that we know. And normalising, and then denormalizing, but you’re building a high fidelity model, then you ask questions of it, you query it, and you get an answer. That’s very different from the no SQL databases where you do store retrieve, and the kind of level of insight you can get from the queries is pretty primitive. And the expectation is that you’ll pull your documents or your columns back from those stores, and your process them using code to gain insight. It’s a very different model. It’s almost kind of a symmetric store, retrieve kind of pattern. Graphs are much more similar to relational, right, you build a high fidelity model, and then you ask it questions, and it gives you answers, major differences, the data model is different. So instead of having rows or relations, and eventually, of course, we get to joins because most sophisticated relational systems have joins, we have very simple idioms, we have nodes, which we think of as little circles and their records that tend to represent entities. So

6:46
a node is the moral equivalent of a row in a table really. And then those nodes can have labels and zero more labels. And the labels tell you the role of that node. So for example, if I had loads of nodes labelled person, that’s the moral equivalent of the person table, right. And then we connect these nodes together with relationships. And relationships are like arrows, they have a start and an end and they have a type. So we could create a simple graph that says, a node representing me a person representing me a person, though representing you. And I could say that I follow you on Twitter or something. So really, Arrow drawn between us that says follows. And as you mentioned, properties, I can put property data on any of those things. So on my node, I might have my name and my email address. And on your node, you might have your name and your date of birth and your postcode. And on the relationship between us that the gym follows Chris relationship, I might even put since 2018. And then that gives us a really rich information or model. That’s, and that’s it. That’s Yes, that’s the complete model. And you repeat that to build out large networks. And then you query it using a language called Cypher, which is currently been standardised as ISO G QL. And the idea there is that you draw ASCII art pictures of the patterns that you’re looking for. And then the database goes off and finds matches for those patterns and returns you that insight, which is very much stuff we were doing in in Excel, but it was like we were heavy number crunching to do it. So it sounds like you can do it with a relational database. But it’s just not as quick. How do you do when you have like, massive numbers of relationships or traversing like really large sort of like data centres that where the advantage comes in? It can be right, so we all know, relational and I have nothing against relational at all. In fact, I used to teach relational when I was a postgrads. Right. And it’s what we know. And I think relational has been a fine idiom for us for many decades. And people know it, and they use it. And actually, some people are really creative at using relationships, they can do things that astonish me. But one of the things I think that is causing people to reflect on data models is modern data. So relational grew up in an era, I imagined it as a kid at the time of a very grey era. But the thing is, then everyone was similar, the data that we had was pretty homogeneous. If you and I work for the same company, we would have the same, we’d have the same tabular structure, employee number, last name, first name, pay grade, blah, blah, blah, and everything would be the same. And that data fitted really well in Excel spreadsheets. Also, in the ratio tables that we’re used to. I think that was such a like, people have various theories, why relational took off, I think the data model suited the data that we have. And then in my career, I’ve used relational databases extensively before I came to near for J. I was building Business Information Systems, and it was always the database and it was obvious that the database meant either Oracle or SQL Server or DBT. I obviously.

9:40
But over time, as data got more interesting, I noticed that the schemas that we had started to be only part of the picture of the data, so we were diligently some smart people, some really skilled DBA who design schemers and normalise, and then denormalize them that’s all good. We’re the good stuff.

10:00
but then we’d find that we’d also have to do some parts of processing, like in my client code, like writing C sharp or JavaScript or something, and particularly looking for things like nulls. So some of the tables we were building tended to be very sparse. Some of the joints we were doing tended to be very extensive. And these are weird things. For relational, right? We don’t like them in relation or because Rachel doesn’t like dolls. So meanwhile, out in my C sharp code, I’ve got this weird set of work around. Yeah, we’re gonna write nested ifs to the to the nth degree, checking for null here. But if it’s not there, then do this. And don’t do that. If it’s not there, do this and don’t do that, I think got complicated, right? I think it’s weird because the database engines, the relational database engines were getting better and better. But the nature of the data started to become much more heterogeneous. Yeah. Did you think that it always have we been trained to think about relational databases, it was almost like it in our DNA and the way we think about even things like EMI right, so it’s almost it’s quite easy for us to think about, I’m a person and I have individual properties associated with me. But actually, if you look at it, we actually, were actually built off relationships to a certain extent, which is like it’s that’s almost like an added layer of complexity. So is there like a new way of thinking we’ve got, we’re really evolving towards and we’ve been trained to think about things in almost like in a relational database might not be the way the world actually works. We are right, because the universities are brilliant at teaching us both the use and theory of relational databases. So we come out of university, we’re quite good at it. But that’s the relational database theory, stuff is still remarkably good. In fact, even in building a graph database, we are influenced by the research work that’s gone on in that community, that’s, there are bits of the database that I work on where I can point out say, Look, this is definitely from some relational database papers, that relational database researchers and some of the machinery works. But the data model, I think you’re right, it’s time that we take a look at other data models today, you’re right, we live in a world of relationships, it’s very natural for us. And in the world we live in the relationships are heterogeneous, it’s very easy for me to tell you, Chris, I own a car, and I own a laptop. And I own a dog. That’s very normal. And in a graph world, we would that would be a node representing me an arrow, say owns to car, an arrow saying owns to laptop and an arrow saying owns two dogs. And that’s very simple in graph. But it’s not simple in relation, alright. Because then I’ve got to do this weird thing where it’d be, I can’t say owns three times. So I’m gonna say owns dog owns laptop owns car. So there’s a little bit of implementation complexity that starts to creep in. Especially your doesn’t have that. But to your point is if I own a dog in a car, but But you don’t own a dog, then that becomes they become properties of me, don’t they? So I had Corolla, cat, whatever it is. And in terms of that kind of relationship kind of changes, which makes it more complex. I completely agree with that. I think that data is driving this, if if we were just trying to do a kind of technically driven thing, hey, we woke up one morning thought, wouldn’t it be nice if we get nowhere? The richness and the kind of interconnectivity of modern data and the complexity of modern data is driving us towards trying other data models? I think you’re absolutely right there. Imagine if if we were to design a relational table together, and the first requirement we had is okay, you have to be able to say whether a person owns a house, cool new column, all set. But actually, the next person we put in is a renter, so they don’t own a house. So we’ve already got our first not, that’s

13:32
okay, now we do owns pet, only half people own pets. So the other half is going to be told. And this gets very messy very quickly. I suppose I want to ask the question, I spent a lot of time looking at credit risk, and we’re looking at like modelling the different modelling techniques that come out. Can you use graph databases to do some of that kind of modelling in particularly, particularly predictive modelling? Is that theory out there? And how’s that? How does that sort of manifest itself? Are you seeing that use in particularly financial services or insurance, which is more about the areas where I kind of work? Yeah, you could imagine the financial services companies, they’re always looking for competitive advantage. So they were relatively quick off the blocks when it comes to adopting this technology. If that’s they were the second quickest off the blocks. Yeah, you’re the first to hear the first quickest off the blocks were the telcos. Right? The networking companies good for Jays first blue chip customer was actually Cisco. Yeah. And I have to say, I think it’s because networks make a bunch of sense to those folks. Yeah. A graph is just another is a synonym for network. I think that’s why but then the financial services company are always looking to get that kind of Lego as a absolutely you can. So I think graph theory has a in and of itself has a wonderful bunch of predictive capabilities. So particularly graphs involving humans, they’ve been studied really well. And they have known properties. So for example, one of the classic properties in human graphs is that if you and I are friends, and you’re friends with another person that I don’t yet know, I’m probably going to become friends with that person because you’ve vetted them for me like I

15:00
Like Jim, I like Alice, Alice and Jim are probably compatible. And there’s a fancy name for that they call that a triadic closure, it really just means make a triangle in the ground. Okay, close the triangle. But it turns out that in human, a graph, those kinds of things are very common. So you can use a bunch of simple rules from graph theory, independence of your domain, you may be talking about risk, I may be talking about social networking, or

15:22
logistics and shipping or whatever. But the same kind of rules apply, where there are a bunch of rules where you can start to evolve your graph and see how it might mutate. In near for j, we have some of this, we have this kind of notion of graph native learning. And we can actually ask the graph to predict missing relationships to predict missing labels and even to predict missing properties on nodes. So you can start to think how might the graph evolve? And I suppose I’m used to seeing more sort of modelling built off, or other traditional kind of data tables, but can you use can you convert between the two confirmatory relational graph and transport the attributes between the two and use use those to make more sort of predictive models? Is that currently in place because that competitive edge? It is right, but it’s, I wouldn’t advocate for converting, but there are moral equivalent. So if you’re used to, for example, building ml models by harvesting features from a relational database, and many it’s commonplace now, right? Everyone’s doing that. And you might take age, postcode, bank balance, blah, blah, blah, as your features for this risk model. And you’ll build and test the model and ultimately deploy all good. But what the data science tell us data scientists tell us is that within reason, more features is better than fewer features. But actually, if you’re in the relational world, you could harvest features from your columns. What else can you do? Certainly, harvesting things like name is a terrible feature, right? We’d much prefer numerical features. But the graph we have more features. If you just looked at the graph, you’d say, look, Jim, I can see I’ve got people and they often have names and account numbers, and all have the associated accounts or accounts have balances. And they might be jointly or something. But where do I get more from? Actually, the topology is, the more, you’ve actually got richer structure in the graph. So now I can run community detection algorithms, or centrality algorithms, I can run PageRank over this and look for the popular people in the graph. So if you’re looking for risk, let’s look where the money flows. And then there’s the look, that popular institution or whatever, let’s look what it’s backed by. So now I’ve got centrality, I’ve got labels, I’ve got communities, and all of these can be taken as features. And what’s more, I can even take the topology itself, and encode that numerically using simple node embeddings. And use that as a feature. Now, my machine learning model is rich. And our experience is that by harvesting what we call graph features, to accompany your kind of normal business features, you can make predictive model classifiers that are substantially better. Yeah. So use those other features that make models even more powerful as a result of looking at the Apology of the network. Yeah, but it’s a freebie, right? Because if you’ve modelled the graph, you’ve got the topology. So let’s put it to use that was quite interesting in terms of if you look at networks, and you look at I think it was, I think it was, was it operational theory, where you can only basically use like so is that the metre analysis to look at like how networks into React. And it’s not actually what was actually said, almost like the nose is like how they interact is actually quite indicative around actually what’s actually happening. And that actually tells you information, almost like from that level of layer of information, as much as actually what’s actually being said as well. Right? Yeah, you’re absolutely right. And a few years ago, I had the privilege of meeting a chap called James Fowler, who’s a public health physician. And he wrote really lovely book, he was a co author of a really lovely book called connected, really wonderfully written book was really wonderfully sciency. He studies pathogens in community, so things like smoking, how would smoke in spread through a community, he gave this talk and he said, Look, I would rather know more about, you’re given a choice if I could know in detail about the record that represents you your name, your age, your postcode, why

18:50
I would rather know more about your connections than you because your connections Tell me more. And I might really I want to, obviously, my graph person wants to believe this, but it’s hard to swallow that I have a solution for you.

19:02
Because it modelling this in a relational database to help him there. But moreover, the thing that really, and it shouldn’t astonish me because I’m a graph advocate, right, I love this stuff. But he said, Actually, and what we found in our research is that these pathogens spread depth to what what? So if I smoke, my kids don’t smoke, but their friends do I make my kids and I’m like this, I talked to him after he’s like, this is This doesn’t feel right. And he said, Jim, your feelings are important, but not as important as my data. So it’s only in that sense, but it’s nice because he showed these these counterintuitive things using graph theory, you can actually demonstrate, as you say, the impact of the topology on the way that people in this case behave or pathogens spread. And it was amazing to me that richness just came out of the way the graph was structured matter about the individuals their their gender, or their age, or their zip code or whatever it was born about struck.

20:00
Drew, and being able to influence people to your structure blew me away. Yeah, so this stuff is quite hard to think about for the layman, at least anyway, you know, I find it quite difficult. You’re obviously an advocate of it. But when you go into businesses, how much of difficulty is around the understanding? Because it seems like it’s a different way of looking at the world. And it’s a different way of thinking about how you analyse data. How much of a challenge is that? And what can we do to change some of that into thinking about the way we think about relationships and changing our mindset to a certain extent? Yeah, look on the fly, we could make up an analogy. So let’s see if this goes terribly wrong. Do you ride a bike? I do ride a bike you find riding a bike? Simple. I do now, yes.

20:39
Software as you get older, but can you? It’s a long time ago for both of us, I imagine. But learning to ride a bike was not simple. No, it wasn’t right.

20:50
Yeah. How does that even work? Right? I mean, it’s like magic. But I feel the same about about graphs. Once you’re over the initial learning curve. It’s really wonderful. It’s easy. It’s natural. Because you’ve got simple idioms, you’ve got entities and associations between them. You can build rich models. The hardest thing about going to graph particularly for folks like us and many other information system professionals, is that we know relational really well. And relational puts us on rails, right? Building the schema first, second, third normal form Boyce Codd, normal form denormalized, we’ve got that muscle memory for it. You come to graph and you’re like, Okay, I’ve got a whiteboard in front of me. Now I know, I’m not going to draw a table, where do I start? Okay, I’ll draw a person and I’ll connect that person to another person. And I’ll connect that person to a bank account and this person to a credit card. And then you’re like, is this correct? This kind of like normal form, and it’s worrying. And I remember the very clearly the first time I picked it up near for J, I was working at the teller telephone company telecoms company, I got a problem with product recommendations that I was going to solve. And the first time I picked up the effigy, I built the model pulled in some data from the product catalogue, got it to build what I thought was a graph. It gave me great answers. It says, if you already own these products, I’ll recommend you these ones because they’re compatible upsells. And instead of being happy that I was able to do that, in one very long working day, I was terrified. I was like, Is this right? Yeah. Because in relational I would know, right, first, second, third, Boyce Codd, normal form. Correct? Yeah, nothing, no duplication, my model is good, whatever. In graph, it was really hard. And I got a fraught night sleep, came back to work the next day and showed some other people and they were we don’t know, we’ve never seen this stuff before. This is 2008. This doesn’t exist. And then over time, I realised actually what you draw on the whiteboard. If you can sanity check it. By reading it in English, Jim owns Jim bought internet. Okay, internet is as our digital phone service depends on internet. Okay, that makes sense. Mostly, what you draw on the whiteboard is what you store in the database. And the biggest challenge that information systems professionals will have is letting go unlearning the reflexes habits and muscle memory that we have from years of good quality relational work, and say, Okay, this is a different idiom, I’ve got to drop my biases and preconceptions from the racial world and embrace this idiom, it’s simpler. There are different techniques. But it affords me so many more degrees of freedom. So for certain projects, this is going to be a very worthwhile investment. And I suppose we talked a bit about network analysis, use the talented telecom communications example there, we talked a bit about fraud and financial services, recommendations and you want but you also mentioned things like medicine as an example as well. So where’s where are they? Where are the areas where using this sort of like, graph theory, using these kind of tools and techniques? Where’s it sort of expanding into where you think there’s really a big opportunity? That’s a really awful question. And I’m not maligning you at all. It’s just awful. Because you’ve given me the tyranny of choice, it’d be much better if we had this chat. 10 years ago, when I could have given you five very crisp use cases. Yeah, we would have done that work that we would have done social and we would have done fraud detection, and blah, blah, blah, right. But it turns out that in the intervening time graph has been picked up as a very general purpose database. And in fact, I’m Mike, I feel that graph will be the kind of general purpose database for the coming decades in much the same way that relational has been for the previous decades. But that means I’ve got a bewildering, away array of use cases. Now. For example, some of the ones that tickled me, NASA, they’re trying to get to Mars. NASA has a problem as this as do many enterprises, their organisational dementia, they don’t know what they know. So if someone over here has done research on particular rocket boosters or something, and then that useful information just floats around over there never propagates gets forgotten about. So in the current missions to Mars, NASA actually constructed a knowledge graph. So to try to capture what they know initially, just by looking at who collaborated with who and the papers they’ve written and using

25:00
That’s the skeleton to hang the other stuff off. And from that they actually found a solution to some problem they’re having from the Apollo missions, which now has shortened their Mars mission plans by two years, which is incredible. It’s a lovely thing. You’re also seeing, for example, we work closely with the DZ D, which is the German diabetes research firm, and they are using the graph in a kind of biopharma way to try and figure out cures for diabetes. So they’ve got a kind of knowledge, a very sophisticated domain specific knowledge graph that allows them to explore in a digitised form, the kinds of interactions between medicines and so on, capture their knowledge and their results, and be able to use the graph to steer them more quickly towards productive research, for hoping to cope with or cure in the long term diabetes flip. Completely different.

25:52
There’s a big shipping company called OB MI, they have these big cargo ships that go around the planet. What the heck of cargo ships want to do with graphs? The truth is the original graph theory problem was a route planning problem for the emperor of Prussia. He wanted to walk around Konigsberg modern day Kaliningrad have the seven bridges, and the question was posed, can the Emperor walk around them and cross each bridge once and only once is the clue for a mathematician Leonard Euler, to either do a lot of walking, which mathematicians don’t like that right, or to invent the graph theory and then prove that it was impossible to cross each bridge once and only once. If you wanted to visit, visit the whole Mechanicsburg. Fast forward to today, same kind of techniques are being used. But you’ve got these massive cargo ships going around the world. Today as we speak, the Suez Canal is low because there’s been low rainfall. So it’s actually it’s throttle. So now I have to make routing decisions about what am I going to do with these ships that are currently floating around heading towards the Suez Canal to reroute them. Actually, for the sake of argument, Southampton is chock a block today, you’re going to be cute if you want to get in there. But Portsmouth is pretty clear, and it’s deep enough for your ship in in split seconds, they’re making 1000s of decisions about this huge fleet, the ultimate goals of their customers to get those containers to where they need to be, we’d be able to route them because there is a typhoon in the Philippines or the Suez Canal is blocked or low or whatever, having to reroute them, saving them not to put too fine a point on it for them saving millions in terms of effective use of Port provision. And few more for us, even if we’re not interested in supply chain, he also saves 60 Mega tonnes of co2 per year from going into the atmosphere. So you’re now starting to see this kind of Graph Tech being used for good business purposes that have these beneficial side effects for the rest of us. And it goes on and on you too. It’s too radical question to ask? Well, we’ve almost got to think of it in terms of like anywhere where there’s relationships, can I better analyse those relationships, to really come out with a better understanding of outcomes, if I use financial services kind of language, the knowledge piece is interesting in terms of like just thinking about how each of the pieces of information we have and how they will relate with each other. I think that’s very interesting. Particularly as we get there’s just been such an explosion of data. And it thought it does, everything does relate to each other, which is a difficult piece to clean in credit risk, right is things relate over time, things relating for different kinds of ways, they have different attributes. And it becomes it becomes very complex when you start looking at it. And it’s interesting to think about in terms of relationships in an in a different way than we would traditionally think about it, for sure. So for example, in credit risk, I’m in expert here, so forgive my amateurish attempt, there is complexity in the domain, right, the kind of analysis you have to do are complex. But what graph gives you there is at least the data model doesn’t get in your way, if you want to relate something to a previous version of itself to a previous version itself. And all of those have different socio activity with other instruments and people and money and all that stuff, then you can and at least the data model doesn’t trip you up. And then when you come to query it, you just need to think about queering it as a associative data structure as a network or a graph. Rather than taking the mindset I create the universe of possible answers. And then I filter which is what we do when we do it. relationally Yeah. So So since we say you, since we had this book, I came across Stephen Wolfram. Some of his hypothesis, I was gonna chat a little about that I’m never gonna go off the wall a little bit now. But he was talking about using graph theory to model the universe. I’m not sure if you’ve been familiar with that. Or if they’re even using that could be using like your technology to do it, which I thought was interesting in terms of that. Even the whole of the universe is almost like relationships of nodes. And it was like graph theory. It’s almost like at a macro level. I thought that that was interesting. After we chatted, it made me think of it and think I suspect they’re using their own stuff, because I don’t know that they’re using my stuff. But I also suspect that in their world, the graphs are a mathematical thing because morphing is a phenomenal mathematician. i I want to believe it. It’s not challenging, complex problems. It’s going to be hard to compute no matter how you represent it. But I do think one step down from where geniuses like Wolfram are aware just semi geniuses like the light

30:00
To the alphabet and so on are, we see a big move there, two graphs as the underlay, and computational model for AI, right. So people who are talking about general purpose AI, the kind of the leading lights in that world are gravitating that way. So it does feel the kind of luminaries out there that the rest of us look up to, and try to emulate that they’re thinking about this, right, they’re thinking about bringing the mathematical elements of graph theory for Warframes point of view, or the utility of the data model for the kind of AI ml point of view, bringing that to the fore for the kind of ambitions they have for the near future. Now, this was also fundamentally thinking about relationships and how everything interrelate with each other, which I think is I think it’s kind of it’s fascinating, at least at least from a from a, from a physics point of view, at least, and residents are being a bit of a bit of a challenge there, at least anyways, yeah, I don’t think it’s possible to date and model the University near for j, it was one of those things where you need a, you need a database, the size of the universe to model the universe kind of thing. But I’m sure that you could do things like subgraph isomorphism, effectively pretending that a big chunk of the universe is the same as a single node. And these techniques apply even to credit risk systems or social networks as well. So you can start to use those techniques to rig the search space or computational space of a problem so that it’s manageable in the graph. But it does feel like almost like a lot of these techniques seem to be layered. So you think about things. When you think about computer programming, you started off back in the day of like, assembly code, remember that. And then it was like, and then your machine machine code, and you get the difference kind of like levels of computer programming languages that sort of build up. And each one sort of sits on top of the other, right? And to so to a certain extent, to graph databases sit on top of relational databases, and almost as much as they complement them. And it’s almost like, we’re just almost like building up technology on technology that sort of allows us to simplify things to certain extent. And

31:50
the answer to that today is no, no near for j, for example, the database I work on, it’s not like it’s a graph layer on top of SQL Server or Oracle. But the lot, not too many people know this, when near for J came from was exactly that. So my boss, Admiral Ephrem, the CEO of the FJ, in a previous company, he was working on a content management problem, Enterprise Content Management, which is a very tree and graph kind of problem. And they had a good relational database, he’d worked in previous systems where the relational database was like, it accelerated the development of systems. And he was frustrated as to why in this particular system, the relational database wasn’t accelerating their progress, progress. And indeed, why it was considered to be contentious people were fit it was would say they’re fighting with the database, which is a weird thing back in 2000, relational databases are amazing. Everyone loves them, they accelerate your system development. And it turns out, they were trying to do graph operations. And two things were difficult for them, what was SQL, it’s really hard to write SQL to do path operations, you get into that kind of recursive join thing at the sequel blows up becomes complicated. The other thing is, so they solve that. And they said, what we’re going to do is we’re going to write a Graph API on top of the database, so that we can say, a connects to B connects to C, or find me a path between Alice and Charlie or find me the shortest path between Alice and Charlie, that kind of thing. And they did that. And it solved one problem. No longer did they have big cumbersome SQL programmes. They had nice, readable graph oriented programmes that were small, easy to reason about easy to debug, that created a second order problem. People really liked the graph API. And they started using it and heavier. And you and I can both I think many of the people that will be listening to this, you know what’s going to happen right under the covers, I start doing lots of recursive joins now. That ground the database into the dust, and then quite ridiculously, some bright spark on that team said, Hey, what we should do is we should replace the engine with an engine that’s native for grabs designed to do it. Imagine doing that in the year 2000. Right? It’s ridiculous. But they did. And that’s actually when the FJ came from a kind of gradual replacement of SQL with something that’s more graph oriented and a replacement of the underlying storage engine with something that’s more graph oriented, which led to the near for j that that we have today. So yeah, that’s what I said, I think earlier, I said their kid, graph databases, and relational databases are both high fidelity models. They’re both typically acid transactional kind of solid, you can put data in them and trust them, you can cluster them and scale them and all that stuff. But fundamentally, I think that the data model that can be processed by graph databases is much more interesting. And it tends to be much more performance, it tends to be very lightweight on the use of the underlying compute resources. So if you look out into sort of the next five years, where do you think where do you think we go from here with it, then? Yeah, that’s another too radical question. Thank you very much. Remind me to interview with you again. These days. You can ask the other question, you can ask the other questions.

34:55
It’s interesting, I think graph databases themselves as technology category continue.

35:00
To march from strength to strength, the the analysts and industry analysts are bullish on the category, the web hyperscalers. Now have graph databases. It’s a thing and it’s going to it’s going to it’s going to extend the ISO G core language is due at the end of this year. I think that’s an important inflection point. So it’s the same query languages committee that designs SQL originally have looked at this and said, Actually, graphs are difference interesting enough that they deserve a standard query language to allow the vendors to ecosystem the same way sequel did. So they’ve been busily working on that for a few years. And ISO G Corp graph query language comes out this year. And I think you’ll see if I was a CIO looking at that, I breathe a sigh of relief. Now, because I’ve got a standard language, it feels like a shot in the arm for the category. But also, I’m not locked into any vendor that in the same way that sequel gave me degrees of freedom, G Corp give me degrees of freedom. So I think that side graph database is going to go up, we’re going to see an expansion of use cases, they’re going to be commonplace. But I also think another driver is AI. I think your knowledge graphs as an underlay for enterprise AI and beyond, are quite sensible, if that’s if I might slightly be immodest. My colleague has his brother and I just finished a book on this is available at near for j.com. Feel free to no writing book, it’s nice and so on. But I think that is also going to be something that drives the growth of graphs, both in the datasets, but also in the graph computer. So being able to run those graph algorithms, that we learned that university with funny Dutch names, right. So all of that stuff, I think he’s gonna drive forward, kind of interesting, quiet revolution in analytics. And that, in turn is going to drive forward what people could do in the AI and ML space, I think over the next five years, it’s going to be, there’s going to be an explosion, it’s going to be fun. He talked very eloquently around like it being used within, you know, within sort of like the data science space, and those kind of things that it feels like sort of sitting slightly outside of that does this whole sort of mindset change or thinking about relationships and be able to query relationships, it feels like that piece could be changing. And this will unlock a bit more of the power around that on a much broader scale. It feels like just the expanse of data we got and the expansion of data we’ve got, it feels like that could be a tool that’s sitting in the wings. Frustrating. I couldn’t agree more. Chris, I think you’ve said that yourself very eloquently. So I will happily sit here smiling, say that. You had me sold as on the problems of having nulls.

37:24
So I know that can be a real problem. For sure, like complexities of describing systems and like, some of these things just don’t exist, right for some people, but they do exist for other people. Right. So amazing. The relief you get when you realise I don’t have to deal with Nozick graph things exist, or they don’t. And that’s a perfectly normal state of things. That’s it was good, at least. Anyway, Jim, thanks very much for for explaining everything. I really appreciate it. I’ve learned a huge amount from from this. And I think it’s really it’s fascinating, just in terms of like, how it’s evolving, but also how we think about relationships as well. And I think we’re just in this new relationship kind of world. And I think there’s, for me, there’s just that mindset change in terms of thinking about going forward. I think it could be really powerful. I think so good. I hope so as well. Thank you for taking the time out today. I really appreciate it. Thanks very much, Jim.

#Neo4J

RO-AR insider newsletter

Receive notifications of new RO-AR content notifications: Also subscribe here - unsubscribe anytime