Description

In this episode, we continue our discussion of representing graphs with adjacency lists -- a hybrid between an edge list and an adjacency matrix, which we learned about last episode! They are also the most popular and commonly-used representation of a graph.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, From Theory To Practice: Representing Graphs from her basecs blog series.

Transcript

[00:00:02] SY: (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:08] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about…

[00:00:15] VJ: Representing graphs, again.

[00:00:17] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.

[00:00:54] SY: Okay. So we talked about graphs last time. We talked about two ways to represent them. We talked about edge lists and adjacency matrices. Can we do a quick recap of what each one is? Let’s start with edge list.

[00:01:07] VJ: So an edge list is basically like the simplest way that you can represent a graph, specifically the edges that are in it, and using the edges, which are basically just like a pair of the starting node and the ending node, you can figure out what nodes also exist in the graph. That’s basically like just an array where you have a list of edges as the name would suggest, edge list. On the other hand, an adjacency matrix is slightly more complicated but not too much. We basically have a matrix where all of the potential nodes, all of the nodes in the graph are listed on the top and on the side. And at the intersection of each node, we represent whether or not an edge exists and the way that we do that is with a 0 or a 1 where 0 means no edge and 1 means yes edge. And based on those values, you could pretty much tell whether an edge exists between a node and also what that whole graph looks like, whether there are a lot of edges, whether it’s pretty dense, or whether it’s a sparse graph because there’s not very many edges and therefore there’s probably a lot of zeros in your adjacency matrix.

[00:02:17] SY: And we talked about how there are pros and cons to each one of those to the edge list and the adjacency matrix and we talked about how there is a hybrid of both called “The Adjacency List”, which is what we’re going to get into today.

[00:02:28] VJ: Yeah. Adjacency lists are very much a hybrid which is why it’s so important to talk about those first two representations before we get into this because we’ll see elements of both in an adjacency list.

[00:02:39] SY: Absolutely. Okay. So what is the definition of an adjacency list?

[00:02:43] VJ: So an adjacency list is basically an array of linked lists. And a side note, you can actually use an array of arrays. For the most part, I’m going to say an array of linked lists because that’s how I’ve seen it before, but it’s an array of linked lists that serves as a representation of a graph, but it also makes it really easy to see which vertices are adjacent to other vertices. I think the way that we can kind of think about this is that we have this list, this array basically, which I sometimes call list, but what I’m talking about is an array, and that basically is where we put all of our vertices, all of our nodes in a graph. And at each of those nodes, there is a pointer to a linked list and that linked list is basically a list of all of the neighboring vertices to the vertex that we’re looking at. So basically every single element in the array is a node and each node has a reference to a linguist that tells it, “Here are all your neighbors. Here are all the other nodes that connect to you.” With that information, you can figure out what edges do or don’t exist.

[00:03:54] SY: Okay. So if we do a really simple example, let’s start with our three nodes: 1, 2, and 3, and they make a triangle which means they’re all connected to each other. How would that be represented in this new linked list system?

[00:04:09] VJ: So in an adjacency list, first, we’ll start with the list part, the list that represents the nodes, the vertices, and then we can worry about looking at who is connected to them. So we’ll have our list of our three nodes, in this case, as you mentioned 1, 2, and 3, and each of them are basically elements in this array and so each of them have an index. So we’ll have 1 comes first at index 0, 2 comes next, and 3 comes after that at index 2. And so at each element, we’re representing a certain vertex and that vertex has a pointer or a reference to a linked list. So then inside of that linked list, we’re going to basically insert all the nodes that are neighbors to it. Another term for this that we’ve talked about a couple times in the series is the word “degree”. So a node’s degree.

[00:05:01] SY: Yeah.

[00:05:02] VJ: Yeah. It’s like a throwback to who knows how many episodes ago, but we talked about back when we were learning about graphs how the degree of a node is basically how many other nodes are adjacent to it, how many of them are neighbors, how many of them are connected. So when you have this vertex with a pointer to a linked list, inside of that linked list, we’re basically going to have the other neighboring and connected adjacent nodes that are going to live inside that linked list. So basically, we’ll have, for example, 1 is going to come first and adjacent to 1 is the Node 2 because 1 is connected to 2. So at 1, we’ll have a pointer to our linked list and our linked list will have an element, 2.

[00:05:46] SY: Yes.

[00:05:47] VJ: Or an element rather with the data 2 so that we know, “Oh, this is a reference to the node called 2,” whoever that is. And then when we look at who else is the neighbor of 1, we’ll also see that 1 is connected to the Node 3. So inside of our linked list, we’ll add another element, which will contain a reference to 3. So basically if you go to index 0, you’ll find Node 1 and then you’ll travel to the linked list that it is referencing, and inside of there, you’ll find two elements, 2 and 3. And the reason is 1, the vertex, has a degree of 2, it has two neighbors and those two neighbors are 2 and 3.

[00:06:28] SY: So what does this look like in this new linked list system?

[00:06:32] VJ: In adjacency list, we start first with our array of vertices. So we’ll just populate that first and we’ll just say, “What are our nodes in this graph?” In this case, thankfully, it’s a simple graph. So we only have three nodes: 1, 2, and 3. That’s the first half of it. Now we need to start considering the degree of each node. And if you are thinking, “Oh, I know what the word degree is. I feel like I heard that before, “well, you’re not wrong. We’ve talked about it way back when we’re talking about graphs originally and the degree of a node is basically how many other nodes are its neighbor. In other words, which other nodes are adjacent to it, which ones can you get to because they’re connected through an edge. So at each index of this list, for the elements 1, 2, and 3, because we’re just dealing with an array, we want to have a reference to somewhere else which is where we’re going to put our data for what other nodes are connected to it. So each element in this list has a pointer to a linked list and this is where the linked list part of the adjacency list comes in. So at the Node 1, we’ll have a pointer to the linked list, and remember that a linked list is basically one element that points to another. They’re pretty simple, so you can’t hold that much data. So we’ll just say, “Okay, what are the nodes that are connected to the Node 1?” Well, we know that the Node 2 is connected to the Node 1. So we’ll have the beginning of our linked list will be one element with the number 2, and it will have a pointer to the rest of the list. So we’re not completely done yet. We have one neighbor of the Node 1, but there’s also another neighbor, the Node 3. So we kind of need to go into our linked list again and try to insert the value for 3 because we need to keep track of who all of 1’s neighbors are. However, we already have an element there. We have a reference to the linked list, which has one element, 2. So basically, we can say to 2, “Oh, okay, let’s add another element. We’re going to have you point to another item called 3.” And that’s the next element in our linked list. If you remember back when we were learning about linked list, like a chain of references to other things. So we have Node 1, which has a pointer to a linked list. The first element in the linked list is 2, which points to the next element which is 3 and actually there are no other neighbors of 1, so 3 doesn’t point anywhere. It just points to a null-terminating reference which is how we know, “Oh, we’re at the end of the linked list. We’ve gone through all of the 1’s neighbors. Have a nice day. Goodbye.”

[00:09:08] SY: And you’re done. Okay. So if we do the same thing for our next node, if we do it for Node Number 2, then Number 2 would point to a linked list. Let’s start with Number 3, with the Value Number 3, and then the pointer of 3 would then point to another element which has a value of 1 because 2 is connected to both 3 and 1 and then 1 would have a pointer that doesn’t point anywhere so that’s just null.

[00:09:34] VJ: Exactly. Yeah. When we get to the end of the linked list, that’s how we know, “Oh, I found all of 2’s neighbors. And also, we have an interesting piece of information, which is that the degree of 2 is 2 because there are two elements in that linked list that 2 references. Hopefully, it’s not too confusing because 2 has a degree of 2 and there are two elements, but the reason is 2 has two other neighbors, 1 and 3, and it’s not a magic number. It’s just like how many nodes are adjacent to the Node 2.

[00:10:02] SY: Okay. Cool. So let’s talk about efficiency because efficiency is one of those things we talked about last episode and why the edge list and the adjacency matrix weren’t always good choices. So how does this one measure up? What is the efficiency look like for finding the number of edges?

[00:10:21] VJ: So there are a couple things that we can sort of measure here. Your question is a really good one, but we can kind of break it down into different pieces. So the first thing I think we should think about is the structure of the adjacency list because remember we have this list of vertices and then each of them has a reference to their own linked list. So because of the structure of the adjacency list, it’s pretty easy to determine who all of the neighbors are of any particular node. And in fact, retrieving one node’s neighbor takes a constant amount of time because if we think about it, if we want to find a specific edge, for example, let’s say we are looking at Node X and we want to find if it has a connection, if it has an edge Node Y and we were looking at a graph with 1, 2, 3 but imagine there’s X and Y in this graph. So what we would need to do is we’d have to find Vertex X in our adjacency list and then we’d have to like measure that and because this is a list which could also just be an array, we can just look at the index. We can just use that to figure out where is X. “Oh, I know it’s the second element.” So it’s at Index 1 and that takes constant time to look it up because we’re using indices. So that’s pretty efficient just to find the vertex. And then the next step we’d have to do is check whether Y is in the linked list that X reference it. In other words, we need to check, “Oh, does Y exist in this adjacency list for this node?” That’s the only way I know of its neighbor of this node or not. That could be pretty quick to do if Y is first in the linked list. Well, hooray, we’re done. That was pretty easy. If it’s the only item in the list, it’s also pretty quick to do. The worst-case scenario is that Y is at the end of the list, which basically means you have to iterate through the whole linked list for Node X. And so potentially it could take us O of D time where D is the degree of the vertex. So I guess really depends on what your graph looks like and specifically if you’re looking at a node that has like a very high degree, now potentially it could take you more time to find an edge. But if your graph is not too complicated, if the degrees are pretty limited, especially if it’s a sparse graph, which means it doesn’t have very many edges and now you’re like, “Okay, worst-case scenario, maybe I have only two or three edges. That’s not that bad. The degree is not that high.” Kind of comparing it to what we talked about last episode, we’re still better off even if you were looking at a large number of edges. This is much better than looking at all of the edges. And if you remember when we are learning about the edge list, we had to iterate through all the edges and sometimes you’re looking at like two nodes and you don’t even care about. You’re like, “Why do I care about if 5 and 6 have an edge? But I have to iterate through it because, as you know, edge list.”

[00:13:24] SY: Edge list, causing so much trouble. Okay. So that is the situation for speed. What about space efficiency? How much space does all this take up?

[00:13:35] VJ: So this is a great question. The number of vertices plays into it. That really actually determines everything. There’s another element too, which I’ll talk about in a second. But first off, in order to even populate your list, you need to look at how many vertices you have. So the adjacency list itself, the original array with indices for each node, that’s going to depend on how many vertices you have. So if you have five nodes or if you have three or you have five hundred, like that is going to be an important factor. The amount of space you’re going to use is going to be different, it’s going to fluctuate based on that. So the adjacency list itself requires the amount of space to represent the list where V is the number of vertices because each vertex needs its own index, its own spot in the list.

[00:14:22] SY: Okay.

[00:14:23] VJ: That’s one thing. Now let’s talk about the edges because they’re little tricky, tricky little fellows. So the number of elements for every edge that’s represented in an adjacency list completely depends on whether the edge is directed or undirected.

[00:14:41] SY: Okay.

[00:14:41] VJ: If you think about it, maybe you can actually figure it out, figure out why even if I don’t tell you.

[00:14:46] SY: Okay. So if we’re talking about undirected graphs, well, we have a lot fewer edges, right? Because with the directed graph, we have to represent an edge going in both directions. So 2 and 1 are connected and they’re connected with no direction that I have to connect 2 and 3 and then I have to connect 3 and 2. But if it’s a directed graph and I just know it’s going only from 2 to 3, well, that’s the only edge I have to keep track of. So I assume that undirected graphs take up more space.

[00:15:15] VJ: You’re totally right. They take up actually twice the amount of space because for an undirected graph, remember that each node has a reference to its neighbor and its linked list. So that means even though it’s only one edge between Node, let’s say, X and Y, X needs to know about that. And so in its linked list, it has to have Y somewhere and Y, in its linked list, has to have some sort of reference to X. So it’s really only one edge, but you are taking twice the amount of space to represent that one edge just because of the way that the adjacency list works. If an undirected edge exists between two nodes, now each of those nodes has to represent that in some way and that does take some space.

[00:16:03] SY: Okay. So now that we’ve talked about the efficiency both for time and space with the adjacency list, I want to kind of go back and see how it compares with the adjacency matrix and the edge list. So can we do an example where we kind of just go through all three representations and see what the differences are?

[00:16:20] VJ: Yeah. Let’s do it.

[00:16:21] SY: Okay. So we’ve been doing 1, 2, 3. Let’s switch it up just a little bit. Let’s do 3, 4, 5. So same triangle, all the things are still connected to each other. But if we’re going to do 3, 4, 5, what does that look like for an edge list?

[00:16:33] VJ: I’m glad you asked me this one because this one's a little bit easy. It’s a good way to ease into it. Remember, it’s just a list. So all it’s going to contain is the three edges in this graph of three nodes. So we will have three elements in it. It’s like for example an array with three elements and the three elements are going to be the pairs of the three edges. So in our case, it would be a list with 3, 4 as the first element, maybe 3, 5 as the next element and then 4, 5 as the third and final element because all we’re doing is we’re creating a list of all of our edges.

[00:17:10] SY: Okay. So now for the adjacency matrix, what does this one look like?

[00:17:14] VJ: So if we start from the top, we’ll work on the top axis where we have 3 and then to its right 4 and then to its right 5. So if we read down the column of 3, we’ll have zero first because that is the intersection of 3 and 3 and there are no self-referential edges, so no edge. We have a zero. Then let’s go down that column. The intersection of 3 and 4, there is an edge. So we have a 1, and at the intersection of 3 and 5, one more cell down, we have a 1 because 3 and 5 do have an edge connecting them. And if we move over to the next column, at the next column, we’re looking at 4 and all the edges that connect to it. So we’ll have 101 because there is an edge between 4 and 3. There is not an edge between 4 and 4 itself and then there is an edge between 4 and 5. And finally, we have just one more column to look at which is the column that represents 5. So we’ll look at 5, how it intersects with 3, there is an edge so there’s a 1, then 5 intersecting with 4, there is an edge so another one, 5 intersecting with itself 5 is a 0. So if you read this adjacency matrix from the top left corner going from left to right, down the rows, we have 011, 101, 110. We’ll see that is a diagonal line of zeros. And so when we see this diagonal line of zeros and we see ones on either side reflected exactly the same way we see, “Oh, this matrix is symmetric. Oh, and it’s an undirected graph.” This lines up. Undirected graphs are represented with the symmetry inside of an adjacency matrix and we can pretty easily see where the edges are, where the edges are not, and what kind of edges we have in this graph.

[00:19:10] SY: And for this representation, we require nine places, right? I guess the three by three matrix, there was like nine values there.

[00:19:19] VJ: Exactly.

[00:19:19] SY: Whereas in our edge list, we only had three. So just something to keep in mind.

[00:19:22] VJ: Yeah, it’s three squared actually, right? V-squared.

[00:19:25] SY: Yeah. Yeah. You’re right. Yeah.

[00:19:28] VJ: It’s okay because we only have three, but thank goodness we’re only doing three and not thirty.

[00:19:32] SY: Okay. So now let’s do the adjacency list. Can I give this one a try?

[00:19:34] VJ: Yes, please do.

[00:19:36] SY: So for this one, we have our Node 3 and we know that 3 is connected to 4. So it’s going to be connected to a linked list with the value 4 and that linked list 4 has a reference to another element which is named 5 because 3 is connected to both 4 and 5. So we have Number 3 with a linked list that has two elements, one named 4 and one named 5.

[00:20:04] VJ: Exactly. Yeah. And this 3, I just want to call it out explicitly, this 3, 4, and 5 that you were talking about, originally those are the three vertices and they live inside of a list itself. So remember that they’re in an array that is indexed. So when you said, “Oh, I’m going to go to Node 3 and let’s look at its elements and its linked list,” you just found its index and you just moseyed on over. It’s very easy for you.

[00:20:27] SY: Moseyed on over. Okay. Cool. So the next one, we’ll go down 1, we’ll iterate and we’ll go to 4, our Node 4. And then from there, we see that it has a linked list. The first one is a value of 3 because it’s connected to 3 and then that element has a reference to another element which has a value of 5 because our Node 4 has an edge connecting it to 3 and 5.

[00:20:52] VJ: Totally. Perfect. One more to go.

[00:20:55] SY: Okay. Last but not least, we iterate again and now we’re at Node Number 5. And this one, same deal, we’re going to have a linked list with two elements. We’re going to have the first element which is going to be value 3 and then that one is going to be linked to an element with value 4. So we have 5 and then a linked list with two elements, number 3 and number 4.

[00:21:17] VJ: Totally. And now that we’ve kind of built this adjacency list, we’ll notice that each edge appears twice, right? There was an item in 3’s linked list that contained the data 4, and there was an item in 4’s linked list that contained the data 3. So we’re representing each edge twice and both of the nodes have those references in either one of their linked list. So basically, we have 2 times 3 elements that we’re representing in all of these linked list which is 6. But I think a cool thing I haven’t mentioned yet about adjacency list that I do want to call out, like it’s not as bad as the adjacency matrix in terms of how much space it takes and there are so many benefits to it compared to an edge list, which only has to represent three items but like it’s an array so it’s kind of a pain to deal with it sometimes. The really cool thing about adjacency list I think that I haven’t even said yet is that you can use an adjacency list to represent a graph. And just from that structure, from that representation, you could actually avoid building out the whole graph because you basically see what those nodes are pretty easily, they’re in your list, and then from that list and its references and its linked list, you could pretty easily look up, “Oh, does 3 have a path? Does it have an edge to 5? Oh, yes, it does. So I’m going to go to Node 5. And does 5 have an edge to 4? Oh, yes, it does. I’ll go to 4.” And what is really nice about this is instead of having to build out the whole graph structure, you have sort of like this cheat sheet version, which is your adjacency list and it’s really nice when it comes to things like traversal problems or like graph coloring problems, which we haven’t really talked about yet, but we will get into more and more graphs very soon as we’ve done a lot of trees, graphs are coming up next, but understanding graph traversal problems and graph coloring problem really, really depends on knowing what an adjacency list is even if you don’t end up coding it like knowing how they work and how you would have to use it is helpful because they’re way easier than trying to build a whole graph structure. As we’ve seen, pretty simple to use, just an array, linked list structures we’ve talked about, and just from looking at it at a glance, you can easily see what your whole graph looks like, which is so cool, at least I think.

[00:23:49] SY: Absolutely. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.

[00:24:28] VJ: Bye everyone.

[00:24:29] SY: Thanks for listening. See you next week.

Thank you for supporting the show!

Thank you for supporting the show!