We ended last season by starting our discussion of searching, or traversing, through a graph with breadth-first search (BFS). The breadth-first search algorithm traverses broadly into a structure, by visiting neighboring sibling nodes before visiting children nodes. Now we begin our new season with depth-first search (DFS), which also helps us determine one (of sometimes many) paths between two nodes in the graph, but this time by traversing down one single path in a graph, until we can't go any further, checking one child node at a time.
[00:00:02] SY: Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.
[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.
[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today, we’re talking about.
[00:00:16] VJ: Traversing through a graph.
[00:00:18] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app, too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.
[00:00:55] So last season, we ended our season by talking about breadth-first search, which is one way that we can traverse a graph. And today, we’re talking about a different way called depth-first search.
[00:01:05] VJ: You got it. We’re just searching all over.
[00:01:08] SY: Search it is, just trying to find the meaning of life. Okay. So BFS, Breadth-First Search, if we can summarize that, maybe in one sentence, two sentences, how would you summarize it if we do like a little recap?
[00:01:25] VJ: The BFS algorithm basically just traverses through a graph one level at a time and what that means is for a single node, it visits all of its children. And basically, it looks at the neighboring nodes that are equidistant from how far away one node is from the parent in the graph. And when I say parent, I just mean the starting node because remember there are no parents in graphs. It’s not a tree.
[00:01:51] SY: Oh, right because they’re not trees, yeah.
[00:01:53] VJ: Yeah. But the important thing like if you want to remember in just a few words, BFS means traversing the graph one level at a time, like one layer by layer.
[00:02:02] SY: It’s still across.
[00:02:03] VJ: Yes, exactly.
[00:02:04] SY: Like an onion or a cake. Now let’s talk about DFS, Depth-First Search. How does that relate or how does that measure up to BFS?
[00:02:13] VJ: So DFS doesn’t go layer by layer. Instead, it traverses down one single path in the graph and it basically keeps traversing, traversing, traversing until it can’t go any further like when it hits like a dead-end. And because it’s traversing down one path, what that means is it’s checking one child node at a time. So it’s basically going down one child node looking at its child rather than looking at siblings. So that’s how DFS, Depth-First Search, is fundamentally different from BFS, Breadth-First Search.
[00:02:50] SY: I like this example of thinking about DFS as a maze, like if you think about literally what was your experience like walking through either maze in real life, which I’ve never done actually, walking through a maze in real life actually sounds really terrifying.
[00:03:04] VJ: I’ve done it before. I did a corn maze.
[00:03:07] SY: How was that?
[00:03:08] VJ: It was really big and then they were like these bridges that were elevated that were supposed to help you feel like you could see where you’re going, but I was just like I feel confused.
[00:03:16] SY: Yeah.
[00:03:17] VJ: But I was with other people who were better at it than me.
[00:03:18] SY: I don’t think I would do well. Okay. So when we think about what these two algorithms are designed for, what is breadth-first search good at? What is it optimizing for?
[00:03:30] VJ: Breadth-first search basically helps us find A shortest path. It helps us find one of sometimes many short paths between two nodes. So if you have Node A and B, when you run breadth-first search, you’re basically figuring out, “Okay, there’s a path to this. There’s a path to this other node. Here’s the shortest one.” I can figure it out pretty quickly. But depth-first search is sort of optimized to do a whole different thing, which is telling us whether or not a path even exists. That’s really what it’s good at.
[00:04:04] SY: At all?
[00:04:05] VJ: Yeah, exactly. It’s like, “Can I even get from Node A to Node B?” Not what’s the shortest path or what are the three shortest paths, just can I get there?
[00:04:13] SY: So when I think about knowing the shortest path, that to me feels very applicable to real life because if I want to use Google Maps and I want to get to a place, I want to know obviously the shortest path to get there so I can see how it comes back, how it relates back to real life. But for DFS and knowing if a path is there, when would I want to know that? Where does that show up in real life?
[00:04:37] VJ: My favorite way of thinking about graphs is like sort of as a network. So like if you think about social networks, you have a bunch of people who are all connected to each other in some way, shape, or form. The people are the nodes. Their connections are the edges. And so if you just want to be able to tell like for example on Twitter, is this person following me? You just want to know can you find a path from my node to their node. That’s all they’re good for, not the shortest one, and this is actually kind of a good example because I don’t know what like… I guess the shortest path would be like how many connections you have, like if you have something like Facebook, like what are your mutual friends or something like that.
[00:05:14] SY: Or even like LinkedIn, right? When it says like, “First Connection, Second Connection.”
[00:05:16] VJ: Yeah. That’s actually even better.
[00:05:18] SY: Yeah.
[00:05:19] VJ: Yeah, that’s even better than my example. Forget my example. Yours is better.
[00:05:23] SY: But in general what matters more than the first connection and the second connection is purely like, “Are we connected at all?” Like, “Is there even a relationship there?”
[00:05:30] VJ: Yeah. Is this a total rando?
[00:05:32] SY: Yeah. Is this a rando? Question solved by DFS. Okay. So if we were to actually do this traversal, let’s walk through an example. So let’s say we have a graph and let’s say we’re going to start with A. I know there’s no like root because it’s not a tree and only trees have roots, but it’s a graph. We’re going to start at A just for visualizing this graph.
[00:05:55] VJ: Yeah.
[00:05:56] SY: And it has two…
[00:05:56] VJ: Yeah, but we could start anywhere theoretically.
[00:06:00] SY: Yes. Yes. Yes. And it has two children, B and C. And its child B has its own two children, D and E, and its other child, C, also has a connection to D and D has a connection to E.
[00:06:16] VJ: I think it sort of makes sense. So basically like if you’re thinking about it like A is the starting point, two children, B and C, B and C sort of connect to the same node, D.
[00:06:26] SY: Yes.
[00:06:27] VJ: And B and D connect to one node, E.
[00:06:31] SY: Okay. So if we were to initiate a traversal, what is the first thing that we would do?
[00:06:37] VJ: Well, we’ve actually sort of done the first thing, which is to choose an arbitrary node to start the traversal with and because you already said we’re going to start with A, that problem is solved.
[00:06:47] SY: Okay.
[00:06:47] VJ: But the first thing you’ll really do with any graph is choose your starting node because, again, there’s no concept of a root node the way there is in a tree structure. So you just have to first pick an arbitrary node to start with.
[00:07:02] SY: So we’re going to pick A because I like A. You know, it makes you feel like it’s a root even though it’s not a root, so just feels comfortable. So we’re going to start there.
[00:07:08] VJ: I like that. Comfort is good.
[00:07:10] SY: Right? Comfort is good, especially when you’re doing something new. So we start with A. Once we’ve decided that A is going to be our first node, what do we do with that information?
[00:07:21] VJ: So because we’re traversing through this graph, something we have to keep in mind is we don’t want to spend a bunch of effort unnecessarily so we don’t want to repeat ourselves and where that comes into play is that we don’t want to repeat the act of visiting nodes we’ve already looked at. Like if we look at C, we don’t want to come back and look at C later. We already did it.
[00:07:40] SY: Right. That’s a waste. That’s so silly.
[00:07:41] VJ: We already did it. Yeah. We’re trying to be efficient here. So because we want to ensure that we don’t repeat any of the nodes as we look at them, like we don’t want to come back to them again, once we visit a node, once we go to it and visit it, we need a way to figure out later that, “Hey, we already visited. Don’t need to visit again.” So what we can do is we can mark every node that we visit as, like we can flag it as like visited and that’s how we can ensure that we don’t repeat nodes in our traversals. So it’s sort of like a little checklist of like, “Oh, here are all the nodes,” and, “Oh, I visited this one. I’m going to mark it as visited. Now I don’t need to go back over there and look at it again.”
[00:08:24] SY: So does that mean that each node has an attribute called visited and then we’re marking like true or false? Like is that kind of what that looks like?
[00:08:36] VJ: You could have it as an attribute, but usually because this is an algorithm like I don’t actually know if I was implementing this. I don’t know that the node cares about being visited or not, really it’s the algorithm that cares, but you could just like a local variable or something like that.
[00:08:50] SY: Right. Right. Right.
[00:08:51] VJ: A node has no idea what that means and It’ll probably change as you do different traversals on it. So it’s more just a little like a reference to something in our algorithm itself is how I would think about them.
[00:09:04] SY: So in terms of actually employing this, how do we keep track of where we are?
[00:09:08] VJ: This is a great question. So when you think about depth-first search, you sort of imagine, like we’ve talked about depth-first search on trees before and I think when we talked about it, I don’t know how many seasons ago, but we talked about going deep into the ocean and like as you go deep into the structure, you’re going further and further into it layer by layer by layer. But because you’re sort of going deeper and deeper and deeper, you can sort of think of it as like you’re building on top of each like step of the algorithm. What I mean by that is like it’s sort of like a stack of nodes that you’re visiting as you go deeper and deeper and deeper and that is actually the answer to how do you keep track of where we are. You can use a stack data structure and basically what we’ll do is the moment that we check a node, so for example if we start with Node A and when we check Node A, we just say, “Okay, I am about to look at this node. I’m about to visit it. I’m going to push it onto the stack.” And as we continue to look at more nodes, we just keep adding them to the top of the stack and the stack is going to grow as we keep going deeper into the graph.
[00:10:20] SY: It’s interesting because when we are dealing with these nodes, there’s this idea of going to the node. Then once we’re there, we haven’t actually checked it yet, right? Like we’re just like hanging out. But then we’re going to actually check it and say like, “Hey, what’s going on with you? What’s your deal?” And then we are like adding it to a stack. I just think it’s so interesting that there are always things that we can do with the node, even though we’re already there at that step. Does that make sense?
[00:10:45] VJ: Yeah. Yeah, and actually like when you talk about being in a node, you’re there but then you can do a few different things and all of those things sort of help you keep track of where you are in the algorithm and also they help you later on too because as we go deep into the graph, eventually we probably want to work our way back up to where we started, right? Because with depth-first search, you’re going to look at all the children down one path, but eventually you want to look at other children too. So you need a way of sort of backtracking and the backtracking is pretty important as well and that’s another thing you have to do with every single node you check. You need to figure out a way to backtrack from it.
[00:11:29] SY: Yeah. Absolutely. Okay. So going back to our example, we start with Node A, our not quite root node, and then we say, “Okay, I’m going to check A.”
[00:11:41] VJ: Yeah.
[00:11:42] SY: By checking A, do I automatically add A to my stack?
[00:11:45] VJ: Yeah. So when you look at A and you’re like, “Okay, I’m going to say that I’m visiting it now.” So before you go and visit it, you basically say, “Okay, I’m looking at A, I’m going to add it to the stack.”
[00:11:55] SY: Okay.
[00:11:56] VJ: Because that’s the first thing I’m looking at.
[00:11:57] SY: Right. Right.
[00:11:57] VJ: Now I’m going to say, “Oh, I marked it as visited.” Great. Check that off. The other thing you need to do, we haven’t gotten to that yet, but I should mention it because this is one of those, “I’m at a node. What do I do steps?”
[00:12:10] SY: Yeah.
[00:12:11] VJ: What you need to do is you need to set its parent pointer. So I was just talking about how you need to backtrack.
[00:12:16] SY: Yeah. Yeah.
[00:12:17] VJ: You need a way of figuring out who the nodes parent is. So this is sort of just like a little piece of information, like if you had all the nodes lined up and you’re like, “Oh, I’m going to go one by one,” you sort of put a sticky note on the node’s face.” You’re like, “Hey, your parent is this node.”
[00:12:33] SY: I know who your mama is. Yeah.
[00:12:35] VJ: But in this case, basically what you’re doing is you’re setting a parent pointer. So in the case of A, A is the first node we’ve looked at, so it doesn’t have a parent because we’re starting with it. So with A, we’ll just say, “Oh, you don’t have a parent, poor thing, your parent pointer is null.”
[00:12:53] SY: Sorry A.
[00:12:55] VJ: Don’t worry. It has a lot of friends and other nodes.
[00:12:57] SY: It’s got a lot of children.
[00:12:58] VJ: Yeah. It’s got a big family.
[00:12:59] SY: It’s a big family. Yeah. Yeah. Okay. Okay. So with A, we go to A, we check A, and checking it, we’re adding it to our stack and then we give it a parent pointer. Is that what you called it?
[00:13:12] VJ: Yes. Yeah. When we mark it as visited, we also give it a parent pointer.
[00:13:15] SY: Parent pointer. And our parent pointer is null because we’re starting with A, it doesn’t have a parent. Okay. Cool. So are we done with A at this point? Is there anything else we need to do with it?
[00:13:25] VJ: Well, we need to check and see if it has any children because this is depth-first search, we want to go deep down a path and we are like, “Okay. Well, we need a path to go down. Do you have any children for us to like traverse down into the structure?” And if it does have children, we should also check that like before we decide to go visit any of its children, we should check, “Hey, have you been visited already?” Because if you haven’t, we’ll visit you. But if you have and we don’t actually care about you.
[00:13:55] SY: No extra visitors. Yeah, no bonuses. Yeah. Okay. So I go to A, I do my thing with A and then I’m like, “All right, A, do you have any kids?” And A is like, “Yes, I do. I got a B and I got a C.” Do we just like pick one or how do we decide between B and C? How do we know where to go?
[00:14:15] VJ: Just like how we chose A was arbitrary. We can arbitrarily choose any of these children because they’re both not visited. Now notably if one was visited and the other one wasn’t, the only thing we’d choose is the one that we haven’t visited. But in this case, A has two children, B and C, and we haven’t visited either. So we can just pick whichever one we’re feeling. And go and visit it.
[00:14:37] SY: Okay. So I’m feeling B, I’m in the mood for some B right now. So if I decide that I’m going to be, “Do I need to like record C for later use or anything or do I just go straight to B?”
[00:14:50] VJ: You go straight to B because we’re just looking at one child and we’re going to go with this child and down its children down into a path. So you don’t need to worry about its sibling. Just pick one child basically and just start going that way and you’re kind of like, “You know what? I’m just going to hang out and spend the day with this kid and I’m going to worry about this other kid later.” Like we’re doing like individual…
[00:15:12] SY: One-on-one time. No group sessions.
[00:15:14] VJ: One-on-one time. There we go.
[00:15:15] SY: Yeah. Okay. Okay. Cool. So I decided to go with B. So now that I’m at B, I want to check it, which means I need to put it on my stack and then I’m going to actually check. Do I check its value at this point?
[00:15:28] VJ: Well, I think because we’re doing a depth-first search, what we care more about like in this case is just working our way through the nodes.
[00:15:36] SY: Okay.
[00:15:37] VJ: If you were doing a depth-first search and you are like, “Actually, also please print out the nodes,” you could read its value.
[00:15:43] SY: We don’t really care about that. Yeah. Yeah.
[00:15:43] VJ: But for this example, we don’t really care what they are.
[00:15:46] SY: Okay.
[00:15:47] VJ: I mean its value was B, but I don’t really care about its value.
[00:15:49] SY: Yeah. Yeah.
[00:15:50] VJ: I can keep it to itself.
[00:15:51] SY: Okay. So we check B. It’s on our stack. Now we want to give it a parent pointer.
[00:15:58] VJ: Yes. We want to mark it as visited.
[00:16:00] SY: Mark it as visited.
[00:16:00] VJ: We want to say, “Hey, I visit you. I visited you.” And we want to say, “Okay, who’s your parent?”
[00:16:05] SY: Okay. And its parent is A. So we make a node of that and then we’re done with B, right? Do we then see if it has any kids?
[00:16:14] VJ: Yes. Yeah. Actually, once we’re done with those few steps we need to do with B, we do the same thing of check if B has any kids and arbitrarily choose one provided it hasn’t been visited.
[00:16:25] SY: Okay.
[00:16:25] VJ: And we’re just going to keep doing this until we hit a dead end.
[00:16:29] SY: Okay. So let’s do one more. So B has two kids, D and E, and so I’m just going to picky E and so we go to E. We check it. We push it onto our stack. We mark it as visited. We give it a parent pointer of B and then we check to see if it has any kids. It does not have any kids. We’re at a dead end at this point. So…
[00:16:54] VJ: Ding, ding, ding.
[00:16:55] SY: Yeah. We did it. We did like the whole path and we got to the end. So now what do we do?
[00:17:02] VJ: So when you get to the end of the path, you have one problem, which is now we can’t do the same repeated step, right? We can’t just be like, “Okay, go to E’s children because there are none.” There’s nowhere to go from Node E. So basically what you have to do now is when you hit a dead end, just like if you were in an actual maze, you start backtracking and the way to do this is to just backtrack one step at a time.
[00:17:28] SY: Okay.
[00:17:28] VJ: And what that means is one node, one vertex at a time. So what we’re basically going to do is when we’re at Node E, we’re going to take a step back, back to where we came from. And from there, we’ll see, “Hey, is there another path I can go down?” So in the case of E, when we get to the end of that road, we’ll say, “Okay, there’s nowhere to go.” We can pop E off of the stack and now we know we’re at Node B because that’s where we came from. That was the next thing on the stack, right? Because on our stack, we had A at the bottom then B and then E. So once you pop E off, now you have B. So you basically backtrack up to B and then we can see, “Oh, does B have any other children to go down? Are there any other paths to take from here?”
[00:18:16] SY: Yeah.
[00:18:16] VJ: And if it does, we can continue the process.
[00:18:20] SY: Okay.
[00:18:20] VJ: If it doesn’t, you can just backtrack again.
[00:18:22] SY: Okay. So we do. We have a D that we didn’t get to the first time.
[00:18:25] VJ: Yeah. B has one more child, D.
[00:18:27] SY: B has one more child, D. Yeah. Yeah. Okay. So now we can go down again, go down to D, check it, push it to our stack, if we’ve shown our stack, then we mark it as visited, we say, “Hey, do you have any kids?” And D is also connected to E so D actually does have a kid. It has E. But we already been to E. We’ve already like done our thing there.
[00:18:49] VJ: Yup. And thankfully, we marked it as visited.
[00:18:52] SY: Yes.
[00:18:53] VJ: And so we know that E has been visited and so we’re like, “Okay, well, I guess we don’t need to do anything with E.”
[00:19:00] SY: I guess we’re done with E.
[00:19:02] VJ: Because we were super-efficient.
[00:19:03] SY: Yeah. Yeah. Okay. Great. So E is done. So we go back to D because of the parent point.
[00:19:10] VJ: Well, we don’t even go to E is the important thing. We don’t even go to it because we don’t need to push it back on the stack. Basically we’re standing at D and we’re like, “Hey, D, do you have kids?” D is like, “Uh-hmm. I got E. Have you met him?” And we’re like, “We did. We did.” So we’re not going to go down there.
[00:19:28] SY: Yes. Okay.
[00:19:30] VJ: So we don’t actually push E on top of D. We basically are at D and we’re like, “Well, I guess D is effectively kind of like a dead end for us. There’s nowhere to go,” which means we can pop D off.
[00:19:41] SY: Okay. So now our stack is just A and B at this point.
[00:19:45] VJ: Yup.
[00:19:46] SY: All right. All right. Okay. So we popped D off, so now we’re going to use D’s parent pointer to go back to B and then B, we’re like, “Hey, B, do you have any other kids?” B does not have any other kids. So B is another dead end for us at this point, so we can pop D off our stack and using its parent pointer, we’re back at A.
[00:20:06] VJ: We are.
[00:20:07] SY: And then now that we’re back at A, then we can say, “Okay, do you have any other kids?” And A does have another kid. It has a C. I remember at the very, very beginning when we’re like, “Where should we go? B or C?” We have the C. Now we go to C and then we want to check it. We add it to our stack. We push it on top of our stack. We say, “Okay, C, you were visited.” And then we say, “C, do you have any other kids?” And C is connected. Remember how B and C both connect back to D. So C does have another kid, but it’s D and we’ve already been to D. So we don’t even like care about D.
[00:20:40] VJ: Yeah. We don’t even need to go look over there.
[00:20:41] SY: We don’t need to go there at all.
[00:20:42] VJ: And also like D points to E but because we’ve already visited D, you don’t even need to look at D.
[00:20:47] SY: Yup. We’ve already been to E.
[00:20:48] VJ: Like by proxy.
[00:20:48] SY: Yeah. All right.
[00:20:50] VJ: Yeah. By proxy, we know we went down that path. We don’t need to repeat ourselves. We know there’s just some corn down there, the maze.
[00:20:56] SY: We don’t need it. I like corn.
[00:20:57] VJ: It’s not what we’re looking for.
[00:20:58] SY: Yeah. Okay. That’s why I love corn. Corn is great.
[00:21:01] VJ: Yeah, exactly.
[00:21:03] SY: But okay. So C is basically another dead end for us.
[00:21:05] VJ: Yes.
[00:21:06] SY: There’s nothing else to go.
[00:21:08] VJ: I mean, it’s like a pseudo dead end because the real dead end is E because there really aren’t any children, but the point being like there’s nothing to do at C. There’s no other option for us to take. So we can pop C off and now we’re back at A and now we’re like, “Okay, A, do you have any more kids that we haven’t visited?” And we’re like, “No, you have B and C.” We visited both. Okay. There’s nothing left to do with A and we just pop A off and we’re done with our depth-first search.
[00:21:36] SY: That was awesome. Okay.
[00:21:40] VJ: Yeah, that’s fun.
[00:21:41] SY: Yeah, that seems like pretty straightforward because we’re just going to this loop of like, “Do you have kids?” Adding to the stack, marking was visited then like moving on. You know, like it feels very repetitive.
[00:21:53] VJ: We’re just asking each vertex like a series of personal questions and then like doing that for each of its children and it’s interesting that you say that it’s repetitive because that’s the recursive aspect of this algorithm. Depth-first search is often implemented recursively because you’re doing those same steps again and again and you can kind of think of like those steps could be encapsulated into a function and you could just have like a function call that does the work of visiting the node and then, you know, marking it as visited, pushing it onto the stack, checking if it has children, like you’re just doing the same thing again and again and again. So you could just encapsulate it and call it internally, recursively for each node that you’re checking and that’s the recursion in action.
[00:22:43] SY: Yeah. Okay. Very cool. Okay. So when we think about its runtime, what is a runtime for DFS?
[00:22:52] VJ: So there’s sort of two parts to it and I’m going to leave you on a cliffhanger just because I like it. So the first part is not a cliffhanger. I think the first part might even be intuitive, which is that the easier part of this algorithm in terms of runtime like the less expensive part is checking each node in the graph because remember we’re marking each note as visited so because we’re doing that, we’re only ever checking each node once in the whole algorithm. So the process of visiting each vertex in the graph takes constant amount of time so we could basically say it’s like O of N where N is the number of nodes. Pretty straightforward. You have a big graph, take more time. Small graph, it takes less time. It’s completely dependent on how many nodes you have. Now the complicated part is the amount of time it takes to check each of like the outgoing edges from each node because we are always going back to a node when we backtrack or when we’re going down deep into the algorithm into the structure. We’re checking for each node, like, “Oh, do you have children we haven’t visited?” And basically when you’re asking that question, you’re saying like, “How many edges do you have that we can visit?” And that part, it’s a little bit like expensive and time-consuming because some nodes could have more neighbors or in other words more children to check than others. And depending on what your graph looks like, you could have a node that has only one edge, but you could also have a node that has like 25 edges and now you’re like, “Oh, potentially I could be going down 25 different paths.” So it completely depends on that and the interesting thing about that is it ties in the things we’ve learned about last season with like adjacency lists and things like that, but we will get into that the next episode.
[00:24:48] SY: Okay. That sounds good. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.
[00:25:28] VJ: Bye everyone.
[00:25:29] SY: Thanks for listening. See you next week.
Thank you for supporting the show!