CodeNewbie

Description

Last episode, we talked about traversing through a graph with the depth-first search (DFS) algorithm, which helps us determine one (of sometimes many) paths between two nodes in the graph by traversing down one single path until we can't go any further, checking one child node at a time. Now we talk about how you code BFS and what tools might you use.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Deep Dive Through A Graph: DFS Traversal from her basecs blog series.

Transcript

[00:00:02] SY: Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:12] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today, we are continuing our conversation on…

[00:00:16] VJ: Depth-first search in graphs.

[00:00:21] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website and with their In-App Payments SDK, you can add it to your mobile app, too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.

[00:00:57] SY: All right, let’s do a quick recap of depth-first search, also known as DFS. How would we sum it up?

[00:01:05] VJ: So probably the easiest way of summing it up is thinking about walking or traversing through a graph but only doing it sort of one path at a time and basically what that means is starting at one node and picking a path and going deep into the data structure until we can’t traverse it any further, basically until we hit a dead end, and we’re going to just keep checking one child node at a time, the child of a child of a child until we just can’t anymore.

[00:01:34] SY: Yeah, just like a maze.

[00:01:36] VJ: Yes, like a maze.

[00:01:36] SY: Like to follow our path all the way or we’re trying to escape a maze because we’re not really trying to escape a graph, but, you know, following, following a path till we get to a dead end then working our way back up and then following a different path till we get to another dead end. So mazes are like graphs or like mazes.

[00:01:53] VJ: It’s very autumnal. It’s an autumnal thing.

[00:01:55] SY: Yes. Okay. So we’re working with an example, we’re working with our graph and let’s see if we can describe it so you all can visualize it. So we start with our, I guess, there’s no root node, right, technically in a graph or we’re going to start with the node that we’re going to start with, which we’re going to call…

[00:02:15] VJ: Arbitrarily.

[00:02:16] SY: Arbitrarily, starting with A. We’re going to start anyway, but we’re going to start with A. And then A has pointers to another node, B, and also another node, C. So we’ve got A and then its child B and its child C.

[00:02:30] VJ: Exactly.

[00:02:31] SY: Now B and C both point to a child that I guess they share. They’re like, “Oh my God! They’re kind of like parents!” They birth a node.

[00:02:40] VJ: Yeah, that’s weird though because they’re siblings so we’ll just not think about that.

[00:02:43] SY: Okay. All right. All right. All right. I got really excited. Oh my God, that’s kind of gross. Okay. So they’re both siblings and parents to Node D and then D has its own child, Node E, but strangely enough, D and B are both attached to E. So E is the child of both B and D.

[00:03:04] VJ: A weird lopsided kite type structure.

[00:03:07] SY: Yeah. So it’s kind of like a kite but then we have like an extra E hanging out. So we are going to talk about how you actually implement this. If we’re going to code this, what might that look like? What tools do we get to use?

[00:03:20] VJ: That is a great question and it’s going to bring us back to something we talked about last season. It was Season 7, amazingly, which is last season. We learned about this way of implementing graphs called adjacency lists. And if we wanted to implement this grass in an adjacency list, we could do that. Then the question of course arises, “How do you run depth-first search when you’re dealing with an adjacency list?” And we’ll get to that in a minute, but I think it would help probably to recap what an adjacency list is because Season 7 was a long time ago. So an adjacency list is a way of implementing a graph and the way to think about it is that you have two sort of pieces to it. You have all the vertices and each of them live in an array and arrays aren’t too scary, hopefully, anymore. No, it’s just a list of things. Everything in the collection has an index. So that’s what we do with all of our nodes, our vertices. Now we also need a way of storing the neighbors to each vertex and the way to do that is at each element in the array, at each node, it has a reference to a linked list and that linked list is where we store references to all the neighbors. So basically, you can kind of imagine, you go to any index in the array and you’re like, “Okay, who are you?” And the node is like, “I’m a node, this node, whatever.” You can look at the node and look at its reference to its linked list and pretty easily, asterisk on the easily, tell who its neighbors are, which is the nice thing is you get a quick snapshot view of like this node is connected to these other nodes.

[00:05:13] SY: Okay. So for example, we have our first node that we’re dealing with, Node A. So in my array, I have Index 0 because we always start with Index 0 and we have the value of that as the vertex, the Node A. And then that is attached to a linked list that basically says, “Hey, I have two kids. The first kid is Node B,” which we can find at Index 1, and then we have, “Here is my other kid, Node C,” who you can find at Index 2. So that means if I want to find out what the deal is with Node B, I can go to its reference, which is Index 1, and I can go in that same array, Index 1, and that’s where I will find Node B.

[00:05:58] VJ: Yes. And then conveniently, if you go to Node B, you can also ask Node B who its neighbors are.

[00:06:05] SY: Right.

[00:06:05] VJ: You can repeat the same thing. You can just look at its linked list, which is not in any specific order. It’s important to note because this is not a tree. It’s a graph. So also the fact that we made A, Index 0 is just like for convenience sake because it’s an adjacency list. There’s no reason it needs to be ordered anyway. But the point being, you can hop to index, figure out the value of that node, and then look at its linked list and figure out who its neighbors are and because the linked list itself has a reference to its neighbor, you can figure out, “Oh, its neighbor lives at Index 2. I don’t know who is at Index 2. Let me go check,” and then you’ll get your answer. Index 2 is Node C.

[00:06:49] SY: Okay. So it’s kind of like if we’re hanging out and you’re like, “Hey, who are your friends?” And I’m like, “I’m not going to tell you my friends, but I’m going to tell you where they live.” And then you can like go find out on your own. And so I have like my little friend address list with me and I’m like, “Here, if you want to go get to know them, then you have to follow, like go to the addresses.” So it’s kind of like my indices, right? And then you can go to 503 Broadway. I made that address up. I don’t know who lives there. And then you can knock on the door and you can be like, “Oh, Billy, what’s up? I heard your friends with Saron.” And then he might have a list of his own, right, of like friend that he’s not… again, he’s not going to tell you the names, but he’ll tell you the addresses. So basically, he’s putting you to work. He’s making you kind of jump around and find out what this network is all about.

[00:07:41] VJ: Yeah. And I kind of feel like it’s a little bit of a privacy violation, giving out addresses. But remember, these are just references and it’s all ephemeral.

[00:07:51] SY: This is not GDPR compliant at all. But that’s basically how adjacency lists work.

[00:08:00] VJ: Exactly.

[00:08:00] SY: Okay. Cool. So it’s all linked lists and addresses and we can hop from one index to the next and we can kind of connect the dots on our own. Okay. I can work with that. Okay. So when we run depth-first search on the graph that we described, we have like a path, right? Because we go from A to B then to D then to E and then we like go back up and then go down and then go back up until we're done with all of the nodes, right, all of our dead ends?

[00:08:30] VJ: Yeah. We basically recursively just keep running it and I think last episode we picked one certain path, but we could have picked anything, but the important bit is that we just went deep down the path until we couldn’t go any further. In this case, in this example, once you hit E, you are at the dead end.

[00:08:46] SY: Right.

[00:08:46] VJ: So you got a backtrack and check if there are any more children.

[00:08:49] SY: Okay. So if we were to run DFS looking at our adjacency list, where is that path? How does that work?

[00:08:58] VJ: We basically do the same thing we did when we talked through depth-first search, but we’re limited to using our adjacency list, which means if we want to go look at a node, we want to go visit it, first of all, we need a way of keeping track of whether we visited it or not, but we’ll handle that in a second. But what we need to do is we need to find the node that we’re looking for. We’ll pick something arbitrarily then we need to look at its children, but only one child at a time, right? Because we’re going deep down one path. So what we’ll do is once we visit our first node, we’ll look at the first item and its linked list and then we’ll go visit that one and then we’ll look at the first item in its linked list and then we’ll go visit that one and that’s like if you sort of imagine it as like we’re stacking one node on top of the other as we visit the first one initially, so like we visit the initial parent node sort of a fake parent node, like Node A, and then we go visit one of its children and then we’ll remember from last episode when we can’t visit it anymore, we sort of just pop off the things we’re visiting until we’re back at a place where there’s a new path we can take.

[00:10:13] SY: Yes.

[00:10:13] VJ: So we’re going to sort of do the same thing, but instead of adding nodes, pushing them onto the stack or popping them off of the stack, what we’ll do is we’ll hop around from one index in the array and looking at its linked list to another index. We’re just sort of going to bounce around and the important thing is we’ll remember we want to be efficient. So we need a way of keeping track whether we visited a node or not. I think an easy way of doing that is like if we just add one more piece of data at each node where we’re like, “Did I visit you or not?” And it can just be like true or false.

[00:10:47] SY: Cool.

[00:10:47] VJ: So LIKE we have five nodes, A, B, C, D, E. For each node, we’ll just have this little piece of data that’s like, “Have you been visited?” True or false. And before we start this algorithm, they’ll all be false. And as we go through it, we’ll start marking them as true.

[00:11:04] SY: All right. I’m excited. Let’s do this.

[00:11:05] VJ: Should we start with A? Yes. So take a deep breath.

[00:11:11] SY: It’s going to be great, Vaidehi.

[00:11:13] VJ: I feel like it’s going to fly by in an instant. You know, that’s what happens when you go depth-first.

[00:11:21] SY: When you DFS. Time just flies by. Okay. So starting with Node A at Index 0. Okay. Now what do we do?

[00:11:30] VJ: So the first thing we’re going to do is we’re going to say, “Oh, we’re starting with Node A.” We’re visiting it. So we need to make note of that and we’re going to say, “Oh, we’re visiting Node A right now.” We’re going to mark that visited property from false to true. We’re going to say, “Oh, I’m visiting it. Cool.” Now we need to look at its linked list and I think for convenience sake, we can just look at the first item in the linked list. That’s kind of how linked lists work. Unfortunately, you have to traverse through all of them, all the items in the linked list. So we’ll just say, “Okay, Node A, you’re at Index 0. We’ve just visited you. Who are your neighbors? Give me the first neighbor you have.” And Node A will say, “Oh, look at my linked list.” And in its linked list, there’s a reference to another node that lives at Index 1.

[00:12:18] SY: Okay. That’s the address. Yup.

[00:12:20] VJ: So we’re going to peek at Index 1 and say, “Did we visit this one? Did we visit this node?” We have not visited it. Let’s hop on over to Index 1.

[00:12:29] SY: Okay.

[00:12:30] VJ: Do you want to try this one?

[00:12:31] SY: Okay. Sure. So now we are at Index 1 and there we will find Node B. That’s where Node B lives. Okay. So now that we are at Node B, we are visiting it so it is visited so we can change that value from false to true. Now that I am at Node B, I want to find out if it has any friends to check out, to hang out with. So we’re going to look at its linked list and check out the first item on its linked list and there we will see a reference to Index Number 3. So now we need to figure out who’s living, who’s hanging out at Index Number 3.

[00:13:13] VJ: Yeah. And if we go peek at index 3, we’ll see, “Oh, we have not actually been over there. We haven’t been to this address yet.” So we’ll hop on over to that index in our array and we’ll check it out and, “Oh, Index 3 has no D.” We’ve come to a new node. We haven’t visited it. Let’s mark it as visited. That’s the first thing we’ll do and we’ll say, “Oh, this is a new node. Cool. Do you have any neighbors?” And we’ll head over to its linked list, and this node has an item and its linked list. It references Index 4. Conveniently, Index 4 points to another node that we also haven’t visited because if we look at its property, its visited property is false, so we’ll hop on over there.

[00:13:59] SY: Okay.

[00:14:00] VJ: And at Index 4 is Node E. Yup. And so we’ll mark it as visited. But what happens now? Because E is kind of a special one.

[00:14:09] SY: Okay. So we want to see what its linked list looks like, but it actually doesn’t have one because it’s at the end of our graph. It doesn’t have any kids or neighbors or anything like that. So it’s the dead end.

[00:14:24] VJ: It’s the dead.

[00:14:24] SY: Yeah.

[00:14:27] VJ: A tricky E.

[00:14:28] SY: Yeah. What do we do once we get to the dead end?

[00:14:31] VJ: So last episode, we talked about how it’s important to like leave breadcrumbs.

[00:14:35] SY: Right.

[00:14:36] VJ: And we haven’t really talked about that with the adjacency list bit, but like if you were writing the algorithm, we mentioned this last time. So we’ll just quick recap. You will have some way of keeping track of where you came from with parent pointers. So because we ended up at this dead end, Node E, we have parent pointers to figure out where we came from. So we’ll conclude this whole path with Node D because we hit a dead end and we’ll say, “Okay, we’ll go back up to Node E’s parent where we came from.” And the place we came from was Node D.

[00:15:09] SY: Okay. So now we’re at Node D. So I guess we want to check like, are there any because we’ve only been looking at the first item of the linked list, right? We don’t know if there are other items in our linked list. So I guess we want to check first like, “Hey, D, do you have any other friends? Anyone else we should know about?” And if we check that, we’ll see that there are no other items. So D is, at this point, our second dead end.

[00:15:35] VJ: Yeah, exactly. And what you’re describing is like the recursive nature of depth-first search but also there’s like an interesting thing we’re doing where, as you mentioned, we’re only checking the first item in the linked list and we don’t even look at the next item if there’s somewhere to go.

[00:15:53] SY: Yup. Yup.

[00:15:53] VJ: If there’s somewhere to go, we’re like, “I’ll deal with your other neighbor later. I’m just focused on this neighbor.”

[00:15:57] SY: One friend at a time.

[00:15:57] VJ: Yes, exactly. And so once we go deep enough and we backtrack, we can’t just backtrack all the way back up. We have to backtrack and check, is there another path to take from here? So in the case of Node E, we jump back up to where we came from, its parent, so to speak, in the path, Node D. But as you mentioned, we can’t just like go all the way back up to the main parent. Once we get to know D, we have to see, “Can I go any deeper?” Sure. I know I can’t go any deeper down the path of E, but does Node D have any other children? Because if it does, then that means we can keep going deeper into the graph and in this case, Node D doesn’t have any other children. It just has E in its linked list. So we can sort of backtrack again. But it’s important to point that out because if Node D had more children, we would have to go check those items in the linked list too.

[00:16:54] SY: And go deeper into the graph.

[00:16:56] VJ: Exactly. And in this case, we’ll go from Node D back up to its parent in the path, which is Node B, right? That’s where we came from.

[00:17:03] SY: Yeah, Node B. So here’s where I think things get a little interesting because when we go back to Node B and we can ask Node B like, “Hey, you know, we already visited Node D. Thanks for that introduction. But do you have any other friends, any other linked list items?” And Node B does have another linked list item. It is to be found at Index Number 4. So we have to go back to Index Number 4 to see what that node is and we’ll find that that is Node E, which we’ve already visited.

[00:17:39] VJ: And actually, we almost… like we don’t actually have to go back to 4, we can sort of look at where 4 is, like what it’s referencing and just check, is it visited or not?

[00:17:50] SY: Before we even know what node it is.

[00:17:50] VJ: We don’t need to go visit it.

[00:17:52] SY: Yeah. Okay. Okay.

[00:17:53] VJ: Because we’re being efficient about it and we have this property, right, that we’re keeping track of for each node. We’re saying, “Are you visited or not?” So because we have that property, we just sort of look at what’s at 4 and say, “Is 4 visited? I don’t even care what’s at 4. Just tell me, is it visited?” And it is, so we don’t need to do anything.

[00:18:12] SY: We don’t even care. Okay. So once we realize that Index Number 4 has been visited, we go back to where we came from, which is Node B, and then we asked Node B, “Hey, do you have any other friends? Is there a third linked list item?” There is not. We have been introduced to all of its friends. So now we ask Node B what its parent pointer is and we go back to where we came from, which was Node A.

[00:18:41] VJ: Now we’re back at Node A and we started this whole deep adventure by looking at the first item in Node A’s linked list, which was pointing us to Index 1, which is Node B. We already know that whole story. Nothing to do there. But we didn’t check. Is there anything else in the linked list? We looked at the first item. So now is our opportunity to see if there’s anything else in that linked list, and there actually is one more item. And the second item in the linked list takes us to Index 2.

[00:19:13] SY: Yes.

[00:19:14] VJ: And if we look at Index 2 and check, has it been visited? It has not been visited. It’s the only thing we haven’t visited actually. And this is Node C.

[00:19:25] SY: Yeah.

[00:19:26] VJ: The last little dude to visit.

[00:19:28] SY: Last one.

[00:19:28] VJ: So all we have to do at this point basically, we’re kind of getting to the end, we go to Node C, we mark it as visited. We need to do our due diligence. We need to check, does Node C, which is at Index 2, does it have any? Does it have a linked list with anything in it? If it does, we have to pay attention to that.

[00:19:47] SY: I got to talk to that person. Yeah.

[00:19:49] VJ: Exactly. And Node C has a linked list with just one item that points to Index 3.

[00:19:55] SY: Okay. So now we’re going to go to Index 3. And before we even look at what node it is, we’re just going to ask, “Hey, have you been visited?” And we will see that we have actually been to Index 3 already. So we don’t even care who’s living at Index 3. We’ve already visited it. We’ve already like done our job. So then we can go back up where we came from, which is Node C. And then at this point, we can ask, “Hey, do you have any other linked list items?” And then we see that we do not. So we’re done with C and now we go back to its parent, which brings us all the way back up to A.

[00:20:31] VJ: And there’s nothing else in A’s linked list.

[00:20:34] SY: Right. Yeah.

[00:20:34] VJ: We only had two items. We’ve gone down both of those paths as far as we could go without repeating ourselves and doing extra work. And so we’ve checked all of A’s neighbors. And actually in the process, we checked all the nodes in this graph.

[00:20:48] SY: Yeah.

[00:20:49] VJ: We visited all of them and we ran the full algorithm.

[00:20:51] SY: We hung out with so many people. That’s right. Okay. Okay. So let’s talk about performance. How does this perform?

[00:21:00] VJ: So we sort of talked about this at the end of last week’s episode but only partially. I sort of left it on a cliffhanger and I think it’s good for us to talk about it now because we just ran depth-first search with this adjacency list and we saw two things happening. First, we saw how we checked every single vertex in the graph and so the process of visiting that vertex because we are keeping track of whether it’s visited or not, that’s not really too difficult, that takes constant time. So we could say it’s like O of N where N is the number of nodes. So that’s not too expensive. The expensive part is the time that it takes to look at the items in the linked list and actually visiting the nodes at each of those indexes, that can be more time consuming.

[00:21:49] SY: Yeah.

[00:21:50] VJ: So in our case, Node A and B had two items in their linked list. Node C and D had only one item and Node E had nothing. So this is like a simpler example. It wasn’t too difficult. But if you had a lot more edges, then it would totally depend on how many edges we’re connecting a node because you’d have to go deep. You have to say, “Okay, let me look at the first item in the linked list.” Go all the way deep down that path. “Let me look at the second item in the linked list.” Go all the way deep. So it really just depends on how long these linked list start to get and the linked lists are just basically telling you, “Oh, this node has however many number of neighbors.” So the way that we sort of talked about that is when it comes to running depth-first search on a directed graph, like the example we just did. So when we’re talking about the runtime of depth-first search, what we really care about is the amount of time it takes to check each of the vertices, which is constant, but then also the amount of time it takes to look at each of the edges in the linked list. And so one thing that I want to say before we sort of wrap this up is that when you only have directed edges in your graph, you only need to show that edge once in an adjacency list.

[00:23:10] SY: Right. So it’s not going in one way?

[00:23:12] VJ: Exactly. Yeah. And our example like Node D connected to E, there was a reference to E in D’s linked list, but not vice versa.

[00:23:23] SY: Right. There is no D in E’s linked list.

[00:23:26] VJ: Exactly.

[00:23:27] SY: Right.

[00:23:28] VJ: But if this was an undirected graph, then there would be, which is now…

[00:23:32] SY: They’d show up with each other’s list.

[00:23:33] VJ: Yeah. Now you have two edges like really that you need to represent. You need to show E is connected to D and D is connected to E and now you really have double the edges in your adjacency list. Your linked list now doubled in size.

[00:23:47] SY: Okay. Because basically D wants to be friends with E and he’s like, “No, I don’t need your address, whatever.” But if it went both ways, if they were mutually friends and they would keep each other’s addresses.

[00:24:00] VJ: And then you’d have to go look at them.

[00:24:05] SY: And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog posts. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy, whether you’re building a mobile app in iOS, Android, Flutter or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.

[00:24:42] VJ: Bye everyone.

[00:24:43] SY: Thanks for listening. See you next week.

Season 8 Episode 2 October 23, 2019

Jump around the indexes with DFS!

Description

Show Notes

Transcript