Graphs come from mathematics, and are nothing more than a way to formally represent a network, which is a collection of objects that are all interconnected (this is all stuff you should already know if you have been religiously listening to this podcast, which you should be). Now we're going from theory to practice and talking about how to represent graphs.
This episode of the Basecs podcast is based on Vaidehi's blog post, From Theory To Practice: Representing Graphs from her basecs blog series.
[00:00:02] SY: (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.
[00:00:08] VJ: And I’m Vaidehi Joshi, Author and Developer.
[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about…
[00:00:15] VJ: Representing graphs.
[00:00:18] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.
[00:00:54] SY: All right. Let’s do a quick recap of a graph. What is the definition of a graph?
[00:00:59] VJ: So a graph is a data structure that we’re pretty familiar with. And really if you’re defining it, you really just need to think about two parts. It has two distinct things within it. First, you have a set of vertices, which sometimes we often refer to as nodes and it’s a finite distinct set of vertices or nodes and then there’s also a set of edges, which are the links, the references and the pointers between the vertices and they connect the vertices into the graph structure. And that’s really all a graph really is, just these two distinct parts that are combined together to create a large or small structure.
[00:01:40] SY: And there’s different ways that we can talk about graphs, right? Like we can talk about their density.
[00:01:45] VJ: Yeah. And a lot of the ways that we talk about graphs is by describing what they look like and how they’re structured and density, as you mentioned, is one of those ways. When we’re talking about density in a graph, what we’re talking about is basically how dense it is, in other words, how many edges it has compared to nodes. So when you have a very dense graph, you have many more edges that are connecting the nodes. On the other hand, if you have a graph that has a smaller edge to node ratio, that’s a sparse graph. And so for example, if you’re representing a graph or if you’re talking about a graph to someone and you’re like, “Oh, it has all these edges, lots and lots and lots of edges,” somebody could be like, “Oh, sounds like a dense graph.” But if you’re like, “Oh, it has a few nodes, but then like just some edges,” not a whole ton of them, just a few, that’s a sparse graph.
[00:02:36] SY: And then we can also talk about direction when it comes to graphs.
[00:02:40] VJ: Exactly. Yeah. And direction is basically like the flow of how you can go from one node to another and what it’s really describing is the edge that connects two nodes. So if you can go from one node to another in both directions, that’s an undirected edge. And if you can go in only one direction, it has a directional flow, that’s a directed edge. So if a graph has only directed edges, only those that have some directional flow to them, it’s a directed graph, and on the other hand, if you can go either direction through all the edges and traverse through the graph kind of any which way you want, that’s an undirected graph.
[00:03:17] SY: And the other thing that we talked about previously is how to represent graphs and there’s a mathematical way to do it. What was it again?
[00:03:24] VJ: So the mathematical way of representing a graph is G, where G is the graph, equals a set of V and E. That’s just like a mathematical representation of the thing we just explained with a bunch of words, which is that a graph is it’s a combined set of vertices and a set of edges. So basically if you have a graph, you will expect that there has to be a set of vertices and a set of edges. And depending on if the graph is directed or undirected, those edges are going to be represented as ordered or unordered pairs. We’ve already talked about that in previous episodes, but that’s just like a reminder that you will represent a directed graph and an undirected graph differently because the edges are going to look a little different.
[00:04:12] SY: Okay. So that’s the mathematical representation of a graph, but how do you actually code a graph?
[00:04:19] VJ: So the way that you are going to code a graph is really tied to the representation of the graph because to be honest like you could represent a graph in different ways because all you’re really doing is representing a set of edges and a set of vertices, right? So there are different ways that you can represent a graph, how you could theoretically represent it in code, and specifically how you would hold the memory of that graph. But I think the quickest way is by turning these groups of ordered pairs as it were into a list or an array because that’s like an easy tool to reach for. Most of us know some kind of way of representing an array or a list, so we’re like, “Okay.”
[00:05:03] SY: Yeah, we’re comfortable.
[00:05:03] VJ: Yeah. That’s something that we can do. That’s a really quick and easy way to represent this group of ordered pairs, this group of edges, group of vertices and how they connect to each other because that’s really what we’re trying to do. There’s actually a term for this simple version. The list or the array representation of a graph is called an edge list and really what it is, is a representation of all of the edges in the graph and sometimes the edges is referred to as like… you’ll see it sometimes written as just like capital E with like the absolute value signs around it. So this edge list just represents which edges exist in the graph and it’s a really simple representation. You can represent it as an array. You can also just kind of represent it as like a list table type of thing. If you just imagine like an Excel spreadsheet and you’re like, “I’m just going to count up all the edges.”
[00:05:58] SY: Yeah.
[00:05:58] VJ: You could kind of think of it like that or you could represent it as an array or object. All you’re basically doing is you’re representing each edge in the graph by including the starting point of the edge and the ending point of the edge and that’s like how you represent it in this list or array format.
[00:06:16] SY: So it’s interesting because when we’re talking about the mathematical representation of a graph, there’s the V and the E, right? There’s a thing in there for the vertices and a thing in there for the edges, but when it comes to an edge list, we’re not really talking about the vertices, right?
[00:06:32] VJ: Correct. You’re not explicitly coming up with a list of vertices, but you’re basically like implicitly figuring out what those vertices are because if you have an edge list, you know all the edges in the graph, and if you have all the edges, then you basically know like, “Oh, this node connects to this other node because it’s represented as an edge.” So if you have a list of all the edges, you also inherently do know which nodes are in the graph because they have to be connected to at least one other node, right?
[00:07:04] SY: Okay, cool. So can we walk through an example of what an edge list might look like?
[00:07:08] VJ: Sure. So let’s do a simple example, maybe just like three nodes, I’m not feeling particularly creative today. So let’s just say we have three nodes; 1, 2, and 3. And it’s just a triangle and they’re all three connected to each other. So one is connected to 2 and 3, and 2 is connected to 3 and 1, and 3 is connected to 2 and 1. And that’s the whole story. It’s just a trial.
[00:07:31] SY: The end. Yeah.
[00:07:32] VJ: So if we’re going to represent this graph in the form of either a list or an array, let’s start with an array, we know that we need to represent the edges and just thinking about this graph which just has three nodes that are all connected to each other, how many edges do we even have?
[00:07:49] SY: We have three.
[00:07:50] VJ: Exactly. We have three edges. Thank God we’ve kept it simple. And all we’re going to do is basically in our array, we’re going to basically represent each edge. So first, we have an edge that connects Node 1 to Node 2. So we could, in our array, put like a mini array, an element within it that is just 1, 2 and that represents that edge.
[00:08:12] SY: Okay.
[00:08:12] VJ: And then we kind of do the same thing. So we have an edge connecting 2 and 3. So then we have another element inside of our array, our edge list that is 2, 3 and then we have three connected to 1, and so we have to represent that third edge, 3, 1. We insert that into our array and now we have an array with three edges and each of those edges, because it’s an array, has an index. So in order to represent a graph with these three nodes, we can use this array or we could also transform it into a list. And in the list situation, we basically will have an index for each edge, again, sort of like a little Excel mini spreadsheet table where you have one column is the index, so 0, 1, 2 because we’re starting with 0 index lists, and each one of the nodes that connects each edge is also in its own column. So it’s the same idea, 1 and 2 is one edge, 2 and 3 is another edge, 3 and 1 is a third edge.
[00:09:11] SY: Okay. So how do we actually use this edge list? What can we do with it?
[00:09:17] VJ: One thing to note is that you can find whether two nodes are connected to each other by looking at the array or the list and seeing if there’s a connection if there’s an edge between those two. So if I wanted to see if the nodes 3 and 1 are connected, I would look at my array and I’d basically have to like Iterate through it and see if there is an edge that has 3, 1 or 1, 3, and there’s something important here to note which is that I just put all those edges in the array and in the list in any order, right? There was no rhyme or reason to it.
[00:09:55] SY: Oh, right. That’s what I thought. Yeah.
[00:10:00] VJ: And that’s important because the edges of that graph and the edges of any graph that you represent in an edge list don’t have to be in any particular order. So that means I’m trying to check if an edge exists between two nodes. Potentially, that edge could be at the very end of the list or it could be at the beginning. That would be nice. For example, if 3, 1 is the last element in the array, it’s the last item. I have to look through my whole list of edges to check whether it exists and like that’s not great, it’s not super efficient. It’s basically like it takes linear time to find an edge potentially. That’s like the worst-case scenario if we’re talking about Big O notation. Part of the reason for that is there’s no order, there’s no guaranteed order, and the only way to find something in this array is to iterate through it. So potentially, you could have like O of E time where E is the number of edges and actually that’s the same situation with how much space it takes too, right?
[00:10:58] SY: Right. Yeah. Yeah.
[00:10:59] VJ: We started with like a really simple graph, three nodes, but imagine if we were like, “You know what? I’m feeling really gutsy. Let’s do three hundred nodes.” Then we would have like an array with who knows how many edges because some nodes could be connected to many different other nodes.
[00:11:14] SY: Right.
[00:11:15] VJ: Yeah. It takes a lot of space and it could potentially take a decent amount of time.
[00:11:19] SY: Okay. So it sounds like we can use an array, we can do this edge list situation, but it doesn’t sound like it’s the best tool that we can use. Is there a different way we can represent graphs?
[00:11:30] VJ: Yes, there is. There’s another representation for a graph called an “adjacency matrix”. And an adjacency matrix is basically a matrix representation of exactly which nodes in the graph contain edges between them. So for what it’s worth, we did the same thing with a list, but an adjacency matrix is basically taking the same information and representing it in a different way, in a matrix format. So you can still sort of derive the same conclusions, but there’s some positives and I guess also some negatives to it, too.
[00:12:11] SY: Okay. Let’s go through what that actually looks like. So if we use our graph, our 1, 2, 3 graph, the one that looks like a triangle, everything is connected to everything else, what would that look like as a matrix?
[00:12:22] VJ: Adjacency matrices are always going to have a top row and a side row. It’s a little bit hard to describe, but I’ll try my best. In a matrix, you sort of like have two elements and then when you look at the intersection of them, that’s how you get some information out of them, at least in an adjacency matrix. So we have like a top line and a line on the left and each of these have a number associated with them and the number is the node.
[00:12:48] SY: Yes. So we have our like row header and column.
[00:12:50] VJ: Yes.
[00:12:51] SY: Sort of, right?
[00:12:52] VJ: You have the perfect terms. Yeah. I don’t know why I was struggling. I was like, “You have a line and this line is next to it.” Oh my goodness.
[00:13:00] SY: I’m here to help.
[00:13:02] VJ: Yeah. So you have this header row and then like this column next to it. And for each node, we have to represent it in both the header row and in the side column next to it. So you have 1, 2, and 3 on the header row because we have nodes 1, 2, and 3. And then we have 1, 2, and 3 going down the column on the left. And basically the way that you represent this matrix is when you find the intersection of each of the nodes you put a value for whether or not there is an edge between them.
[00:13:38] SY: Okay.
[00:13:38] VJ: So what that means is if I want to see if there is an edge between 3 and 2, I would sort of like take my finger and like draw a line from 2 on one column and 3 in one row and sort of like make them intersect.
[00:13:53] SY: Yeah.
[00:13:54] VJ: Does that make sense?
[00:13:54] SY: Yes. It’s basically like a graph, right? We have our X-axis and our Y-axis which is kind of going to the right and then going up and then we find our values. In this case, we’re going to the right and then going down to find our value.
[00:14:04] VJ: Exactly. Yeah. And another interesting thing to note about adjacency matrices is that we also have the ability to represent whether there is an edge between a node and itself because we have 1, 2, 3 on the top and 1, 2, 3 in the side. So theoretically, I could look for one on the top and one on the side and see where they intersect, which is where you could have an edge that references itself. However, most simple graphs don’t have self-referential edges.
[00:14:34] SY: I was going to say, I don’t think we’ve really talked about that before. That’s kind of weird to have a reference to yourself.
[00:14:39] VJ: Wouldn’t it be great if I just drop that bomb on you right now and I’m just like, “Spoiler!”?
[00:14:43] SY: This whole time. Yeah.
[00:14:46] VJ: I mean kind of a spoiler, a node can have a self-referential edge, but we’re not going to talk about that. Just pretend like we don’t know about it. It’s a secret.
[00:14:55] SY: No. No. Okay. So that makes total sense that we would kind of go across the row and then go down and find our corresponding node and then see what value is there. But how do we know if there is an edge? What’s the value that we’re looking for?
[00:15:09] VJ: So this is kind of cool. Because all we’re trying to do is represent is there an edge or is there not an edge because now we know what the two connecting nodes are, right? All we just have to do is figure out yes or no, edge or not. Because this is a binary situation, we can use a 0 if there is no edge and we can use 1 to represent the existence of an edge. So basically if I looked at the top row, the header, and then looked at the column on the side and I was like, “Oh, let’s see. Do 3 and 1 have an edge?” I would find 3 in the header, 1 in the side column, find where they intersect and if I see a 1 there, that means yes, there is an edge. But if I see a 0, that means sorry, no existence of an edge. These two nodes are not connected. The interesting thing here is I talked about self-referential edges and how we’re not going to have them, the side effect of that is that there is always going to be this main diagonal line that goes through an adjacency matrix for simple graphs where the diagonal going down the matrix is all zeros because if you imagine, if you find 2 and 2 on the header and the side column and you find where they intersect, if there’s no edge that 2 has that refers back to itself, it’s always going to be 0. So if there’s this diagonal line going down you’re like, “Okay, now I know there’s no self-referential edges. This is a simple graph.”
[00:16:35] SY: Okay. Cool.
[00:16:36] VJ: Another interesting thing about this is that if you know that the values on both sides of this main diagonal are the same, you know that the matrix is symmetric, which means that there’s basically an edge between 2 and 1 in the same way that there is an edge between 1 and 2 and like they’re basically equivalent and you know that there is an edge between both of those nodes. So it’s basically an undirected graph.
[00:17:03] SY: Right. Because if it was a directed graph, then I might have an edge between 2 and 1, but I would not have an edge going from 1 to 2.
[00:17:10] VJ: Exactly.
[00:17:11] SY: And in a matrix, those are two individual intersecting points so I can look at the value for each one.
[00:17:16] VJ: And so it’s very easy to see if there are any undirected edges just by looking at this because you can pretty easily see like there’s a 1 here, but there’s a 0 at its equal corresponding thing, this must be only one directional flow. It’s obviously a directed edge.
[00:17:32] SY: So when we’re talking about edge list, we talked about how it’s not very efficient. It takes up a lot of space. So now that we’re doing this matrix, is it better in terms of efficiency?
[00:17:41] VJ: It is because most of the operations that you can perform on an adjacency matrix can be done in constant time because basically all you’re doing is you’re looking up a node and its corresponding node and that’s pretty easy to do because you don’t need to search through a list that’s ordered here and then you’re just seeing if there’s a value or not. If you suddenly want to create an edge, you just basically say, “Oh, I’m going to turn the 0 into a 1,” and now there’s an edge and now you’ve represented it and you didn’t have to go searching through this big, long array and insert and delete stuff. This is way faster. It’s constant time.
[00:18:15] SY: Nice. Okay. What about space efficiency? Because that’s one of the things that you said would take up a lot of space for the edge list. How well does the matrix perform?
[00:18:25] VJ: Okay. So this is where things get a little sad because the trouble with adjacency matrices is that they are always going to require V-squared amount of space or O of V-square rather, if we’re talking about Big O notation, because remember this is like both positive and a negative. The positive is that that’s pretty easy to see, like, “Oh, is there an edge going from 2 to 1 and 1 to 2?” However, the negative is now you’re representing potentially the same exact edge twice, especially if you’re like, “Oh, this is an undirected edge. I know that I can go either direction. Why are we representing it in two different places in this matrix?
[00:19:04] SY: Yeah.
[00:19:04] VJ: Depending on what your graph can look like, it can be like a lot of wasted space and it’s the same thing if you have like a pretty sparse graph, you maybe are going to represent a whole bunch of nodes in your matrix, but they’re going to be a bunch of zeros because if it’s a sparse graph, you don’t have too many edges connecting the nodes. So it’s like, “Did I need to represent it like this? Maybe it’s not right.” However, if it’s a dense graph, maybe it’s actually useful. You probably have a bunch of ones in your adjacency matrix and now you can leverage the benefit of it. But basically there are problems with adjacency matrices and of course with edge list. So like neither one of them was super great, but they both do have benefits to them depending on what your graph looks like.
[00:19:50] SY: Okay. So is there a third option? Is there something else we can do?
[00:19:53] VJ: There is something else we can do and it’s kind of like this hybrid version of both of the things we’ve talked about today. It’s called an “adjacency list”.
[00:20:02] SY: Okay.
[00:20:03] VJ: And we will talk about it next episode because I love leaving you on a cliffhanger.
[00:20:06] SY: Ah, cliffhanger.
[00:20:08] VJ: I haven’t done that in a while. I don’t even think I’ve done it this season.
[00:20:09] SY: It’s been a while. We’ve been pretty good. We’ve been very gentle, but now it’s time to hold on and listen to the next episode and find out all about this new hybrid solution.
[00:20:18] VJ: Yeah.
[00:20:19] SY: And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.
[00:20:55] VJ: Bye everyone.
[00:20:56] SY: Thanks for listening. See you next week.
Thank you for supporting the show!