CodeNewbie

Description

Let's break down how breadth-first search (BFS) actually works! We'll walk through a real example, explain the Big O notation of this algorithm, and explore how you might decide whether to use breadth-first search or depth-first search.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Breaking Down Breadth-First Search from her basecs blog series.

Transcript

[00:00:00] SY: (Music) Welcome to the Basecs podcast, where we explore the basics of computer science concepts. I'm your host, Saron, founder of CodeNewbie.

[00:00:09] VJ: And I'm Vaidehi Joshi, author and developer.

[00:00:12] SY: And she is the brilliant mind behind the basecs blog series. Today we're continuing out conversation on...

[00:00:18] VJ: Breadth-first search, or BFS.

[00:00:21] SY: This season of the Basecs podcast is brought to you by our wonderful friends at Twilio. If you haven't checked out their API yet, you totally should. With Twilio, your app can send text messages and make phone calls with just five lines of code. And for a lot of developers, Twilio is the first API they've ever used. And that first time you type a few lines of code, run your program and see your phone light up with the text message? It's kind of magical. And as a thank you for being a podcast listener, you get a promo code for twenty dollars in free Twilio credit. Just text your name to 480-485-4321. You'll also get a link to a quick start guide for your favorite programming language. So text 480-485-4321. Link to that is in your show notes. Ok. Let's get started.

[00:01:11] SY: B-F-S. I like BFS 'cause it sounds like BFF. You know, like best friends forever. Best friend search. Maybe? No?

[00:01:19] VJ: Oh.

[00:01:20] SY: That's fine.

[00:01:20] VJ: That's good.

[00:01:21] SY: Just kidding.

[00:01:22] VJ: Your BFS is your BFF.

[00:01:23] SY: There you go. There we go. Possible episode title. Ok, so let's do a quick recap of what breadth-first search is. What is BFS?

[00:01:33] VJ: So BFS is one of the strategies that you can use to traverse through a data structure. And usually you're going to use it on a tree data structure or graph data structure. I mention graphs. We haven't really talked about BFS in the context of graphs, but I do wanna note that it is a thing you can do, and we will do it in future seasons. But for the context of today's episode and last week's episode, BFS is a breadth-first search of a tree data structure, which means that rather than searching down a branch of a tree or down all the way to a leaf node of a tree, you traverse through the nodes one level at a time, which is also called in-level order.

[00:02:18] SY: And we also compared it to depth-first search by talking about the data structure that it uses to help it do its searching. And we talked about how breadth-first search uses queues.

[00:02:31] VJ: Yep. You're right. Breadth-first search actually is like doubly cool because not only do you get an algorithm, which is the searching algorithm, but you also get a familiar old friend: the queue data structure. And as we'll remember from previous, many previous episodes at this point, the queue data structure follows the first-in-first-out principle, which is also called FIFO, which basically means that you enqueue items and you dequeue them but you always enqueue an item. And that is the first thing that's gonna go in. And that means it'll be the first thing to come out. It's not kind of haphazard. It follows this rule of the FIFO principle.

[00:03:12] SY: And on a high level, if we walk through the steps of the algorithm, we're basically falling into a rhythm of reading data then adding a child, adding a child, reading a data, adding the child, adding child, so we have a read-add-add process.

[00:03:29] VJ: Exactly. Yes. We are going to read the data of a node, and then we will add a child. And when we add a child node—the left node if it exists and the right node if it exists—what we're really doing is we are enqueuing those nodes into our queue. And that's how we, that's how we know what order we wanna visit the various nodes in a tree. And that's how we decide, "oh, this is the next node to read. This is the next node...

[00:03:54] SY: Yeah.

[00:03:54] VJ: ...to read because you've got a little queue, and they're all lined up for you ready to go.

[00:03:59] SY: Ok, so it, it kinda sounds like there are two different, I don't know, like types of nodes that we're dealing with. We're dealing with the, the main node, the one that we're actually reading. And then we're adding the, the kids or the addresses for the kids later so we can visit them later on. Is there a name for, for that distinction?

[00:04:20] VJ: There is. Yes. I'm so glad you asked because I, I kind of, you know, skirted around this terminology in last week's episode, but it's important for us to talk about it now because I think it helps kind of distill how this algorithm works because when you're looking at a node, for example, the root node, you can kind of imagine you're like holding that node in your hands, and that's the node you're like going to read, but then you're also kind of making note that other nodes exist. You're not looking at...

[00:04:48] SY: Yeah.

[00:04:48] VJ: ...them. You're not reading them. You're just like kind of stumbling upon them, and you're like, "oh yes. I am noting that you are there. I'll come back to you later, you little child node." (Laughing) And, and the term that we use for the nodes that we kind of encounter as we read a node is called a discovered node. And a discovered node is going to be the children nodes that we add to our queue and whose location we know, but we haven't actually visited them. We haven't actually read their values yet. We haven't, you know, taken them off the queue. We're just—we know they exist. We know where they are. We're gonna add 'em to our queue. And those are our discovered nodes. They're the nodes we know are gonna come up soon. A good thing to note is that the very first discovered node is the root node because...

[00:05:40] SY: Oh, yeah.

[00:05:40] VJ: ...that's the first one that we're like, "oh, this is the first—the address for." And then we're like "all right. What's the node in our queue?" And at—in the beginning, the only thing in the queue is the root node. And that is the only discovered node. And so that's when you kind of pick up the root node, and you're like "all right. We've enqueued at least one node. We're gonna pick it up. We're gonna look at its data." And then we're gonna start, you know, looking at who its children are, enqueuing them, discovering the children and then come back to them later. And basically, you run the BFS algorithm until you have discovered all the nodes there are to be discovered and you've gone through all the discovered nodes in the queue.

[00:06:19] SY: Ok, so once I have the address—basically, once I know that the node exists and I know where to find it, then it is a discovered node.

[00:06:29] VJ: Exactly. Yeah. You know it exists. You know how to go find it...

[00:06:32] SY: Nice.

[00:06:32] VJ: ...later. You've added it to the queue.

[00:06:33] SY: Ok.

[00:06:34] VJ: You haven't read its data. You don't know what that node contains. You just know that it's a thing.

[00:06:40] SY: Ok. Cool. Ok so let's walk through an actual example. We're going to set up a tree that we actually have used already. We used it to walk through our depth-first search example. It starts all the way at the top with our root node being the number six. Then on the left child, we have the number four. On the right child, we have our number eight. Then for number four, we have a left child of two, a right child of five. And then going back up to our number eight, we have a left child of seven and a right child of nine. And then if we go all the way back to the left, we have where we had our, our last child, two, that has a child, one, on the left and then on the right, a number three. So it's ok if you don't remember all of those. We're gonna start at the top. We're gonna work our way—you know, I was gonna say we're gonna work our way down, but we're actually working our way kind of down, mostly across because we're...

[00:07:41] VJ: Exactly.

[00:07:41] SY: ...doing breadth. (Laughing) Neat...Ok...

[00:07:44] VJ: We're gonna, we're gonna do a little level-order trave—tra... traversal. Say that three times fast. (Laughing)

[00:07:52] SY: Ok. Cool. So we start with six. And because our first step is reading, we're gonna read our one and only node, also our one and only discovered node so far.

[00:08:05] VJ: And because we've discovered it, that means we've added it to our queue. So...

[00:08:08] SY: Yes.

[00:08:09] VJ: ...when you say we start with, first thing we'll do is we'll say, "oh, there's a node." We're gonna enqueue it. So right now in our queue we have just one node, which is our root node.

[00:08:17] SY: Ok. So we've enqueued it. Now we have read it, and we see that it is a six. And because we are doing a search, we've decided we are not looking for a six. That is not the, the thing we're trying to find. So we need to keep on searchin'. So at this point, I'm going to look to the left and say, "hey, six, do you have a left child?" It does. And now I'm going to ask it where does that left child live? And I'm gonna add that reference, reference that node to my queue.

[00:08:54] VJ: And you don't know what's in that node.

[00:08:56] SY: I don't know what's in that node.

[00:08:56] VJ: You just know it's there.

[00:08:57] SY: Yes, yes.

[00:08:57] VJ: In your queue.

[00:08:59] SY: So in my queue I have six, which I just read, and I have a reference to my left child, which is four, but I don't know that it's four yet.

[00:09:10] VJ: Right.

[00:09:11] SY: Ok.

[00:09:12] VJ: Oh, well left. (Laughing)

[00:09:13] SY: What did I say?

[00:09:13] VJ: That's correct. That's correct. I, I said right, and I meant you are right, not right node. (Laughing) This is confusing.

[00:09:20] SY: Oh, words. Ok.

[00:09:22] VJ: You are correct. (Laughing) You have your left child of the root node is in that queue. It's ready to go. You don't know what it is, but it's there.

[00:09:31] SY: But it's there. Ok. So now I'm going to see if I have a right child. Oh, looks like I do. And I'm not gonna read it, but I'm going to add it and add a reference to it in my queue. So now my queue has the number six, and then it has left child, which is four, and right child, which is eight. Cool. So now that I'm done reading all the children for my, my root node, my six, I can dequeue it. Is that right?

[00:10:04] VJ: Yeah. Just pop it on off. You're done with that.

[00:10:06] SY: Pop it on off. Done. Nailed it.

[00:10:08] VJ: But we're not done with our algorithm because...

[00:10:09] SY: No.

[00:10:09] VJ: ...we're gonna keep going 'till we've got nothing left. And right now we have two—we still have two nodes left in our queue.

[00:10:17] SY: Ok. So now that six is gone, I'm going to look at the next thing in line, which is my left child node. And now, I can actually read it. And I'm gonna say, "ok, what are you?" It's gonna say, "hello, I am four." (Laughing) And before I get rid of it 'cause I'm like I'm not looking for four—ok, I'm looking for something else—I want to add its children to my queue so that if the next thing in the queue isn't the thing I'm looking for, I can still go ahead and, and have access to—this sounds so bad—I was gonna say have access to its children, but that's not, you know, I can still give the kids presents. There we go. (Laughing) 'Cause I'm Santa. Santa Saron. There you go. Ok, so I'm gonna say...

[00:11:06] VJ: Your BFF.

[00:11:07] SY: Your B...

[00:11:09] VJ: Who runs BFS. (Laughing) See? I'm, I'm...

[00:11:13] SY: You're just so good at this.

[00:11:14] VJ: It's catching on now. It's good.

[00:11:15] SY: You're so good at this. We should have a podcast. Ok. So my four—I'm gonna say ok, what are the addresses to—well first of all, do you have children, right? 'Cause if they don't have any kids then it doesn't really matter. I'm gonna say ok, left child node. Oh, yes. There it is. And I'm going to add the reference to that to my queue. And then I'm gonna say, what about the right child? Oh. Yep. Have that as well. Ok, now I'm gonna add a reference to that to my queue. So at this point if we look at my queue, I still have the four 'cause I haven't, I haven't kicked it off yet. I have my reference to the node, which is going to be eight. And then now I have two more references. I have the left child of four, which is two, and then I have the right child of four, which is five. So I have four things in my queue.

[00:12:09] VJ: And you've got one level kind of. You've got the second level, right? Where you had...

[00:12:14] SY: Yeah.

[00:12:15] VJ: ...four and eight, which are, you know, the direct descendants of the root node. Right now they're both in the queue. And you've got the beginning of the next generation, the next level. You've got two and five also in the queue. And they are enqueued in the correct order right now. You've got four and eight and then two and five. And if you were kind of visually reading across the tree, you'd be kind of scanning across one level.

[00:12:37] SY: Yeah.

[00:12:37] VJ: And you've got one level and you've got the beginning of the next level. So you might even be able to guess the way that this algorithm's working who's gonna get enqueued next.

[00:12:46] SY: Yeah, that's true 'cause I started with six. Then I went down, and then I went from left to right four, eight. And then now I'm going down. And then from left to right it's two, five. And what's next is probably gonna be seven and nine.

[00:12:57] VJ: I hope so.

[00:12:58] SY: I hope so. (Laughing) Otherwise we did this really wrong. And then if we go down another level, we have basically our last level, which is just one and three.

[00:13:09] VJ: Yep.

[00:13:09] SY: Ok, so in our queue we have four. We have the reference to eight. We have the reference to two and the reference to five. So now that we are done adding all the references to four's children, we can dequeue four. So four is gone. Done. Don't need you anymore. So we then look at the next thing in our queue, which is the reference to eight. So now I can say, "ok, eight, what are you? I'm gonna read you." Eight says, "hello, I am eight." (Laughing) And before we get rid of eight, we need to add the references to its children. So we're gonna go to the left and say, "do you have a left child?" Yep. It's the reference to the number seven. And we're gonna add that to the queue. And the now we're gonna go to the right and say, "do have a right child?" "Yes, I do." It's a reference to the number nine. And so now we add that to the queue. So at this point, we still have our number eight, which we just read, and we have not gotten rid of. And then next, we have reference to two, then a reference to five, then a reference to seven, then a reference to nine, which is basically our entire third level of our tree.

[00:14:21] VJ: Yeah. It's one whole generation of the tree, and you've got them in level order. And when you go to read them—like if we were to continue this algorithm and you went to read them, you would be then looking at their children also in order and enqueuing them. And the nice thing is once you dequeue a node, you don't need to worry about it because the only reason you kind of enqueued it is so that you could check its children. And once...

[00:14:44] SY: Right.

[00:14:44] VJ: ...you've got children enqueued into the queue, you don't even need the parent node anymore. And that, that solves the problem a little bit of what we were talking about last episode where there are no reverse links. And so you're like, "ok, how do I go back to the parent?" And here, you're kind of flipping it on its head a little bit where you're saying ok, I'm gonna hold onto the pair until I have the information I need from it. And I can, you know, move on and forget about the parent and move on to the next generation.

[00:15:10] SY: Yeah. And you basically keep going down this, this road, this process until you've found the node that you're looking for?

[00:15:21] VJ: Yeah. Or if—you know, you could also just, you could also just run BFS if you just wanted to print out the tree in level order.

[00:15:28] SY: Yeah. Ok, so when we were talking about depth-first search, we analyzed the algorithm, and we said, "ok, what does this mean in terms of time space complexities?" Now we're getting into Big O notation stuff. When we look at breadth-first search, what does it look like in terms of time? How do we measure that?

[00:15:52] VJ: I guess the first thing we wanna think about is how much time does it take to actually visit a node? Because that is the one thing that we are gonna do for all the nodes, right? 'Cause we're searching through everything so like we need to know what does it mean to visit a node and how much time does it take? So whether you are looking at the root node or we're looking at the, you know, some child node—maybe the furthest all the way down like the lowest level mode—the thing that you're actually doing is you're reading the node's value, and then you're enqueuing its children. And you're always going to read its value and check whether it has a left and right and enqueue it. So the actual work of reading and checking a node when you, you know, kind of pop it off the, the queue and look at what it is and who its children are, that doesn't change as you go through the tree. It's the same amount of time no matter what node you're looking at. And so we could say that reading a node at a time is actually constant. It takes constant time or O(1) time. But we're not just reading one node. We are reading the value of all the nodes 'cause we're gonna run BFS through this whole tree. So even though reading a single node takes constant time, we need to think about it in the context of what does it look like to read every single node 'cause that's eventually what we will do.

[00:17:16] SY: Right.

[00:17:17] VJ: And so the time complexity of running BFS, breadth-first search, really depends on how many nodes you're reading because it's—the, the thing that's gonna take time isn't the work of reading the node, it's how many times you're gonna have to read different nodes.

[00:17:32] SY: Yeah.

[00:17:33] VJ: Which means that the time complexity of BFS is linear. And you can imagine if I was, if we ran this algorithm and we were like ok, let's look at a tree with three nodes versus let's look at a tree with 30 nodes or 300 nodes. It would take longer—even though the work of reading isn't that hard—it would take longer for more nodes. So that's why we say that where "n" is the number of nodes in the tree, the Big O notation is O(n), where it's linear proportional to how many nodes we'll have in the tree.

[00:18:06] SY: Ok. Makes sense. Yeah, and I remember—I think it was... was it last episode where we were talking about the idea of Big O notation is about figuring out what is the worst case scenario? So if I'm looking, if I'm searching for a particular node—let's just say I'm looking for the value four—that actually happens pretty fast. That's in the, the second thing I'd check will be the number four. So that's pretty quick. But worst-case scenario, I need to look at every single node. So if I have to look at every single node, what—how long would that take? And that's how we get to "n."

[00:18:40] VJ: Exactly. And, and you always wanna be prepared for the worst case, I guess. It's like...

[00:18:45] SY: Yeah.

[00:18:45] VJ: ...you're, you, you wanna, you wanna know like do I have enough life jackets on this boat?

[00:18:49] SY: Yeah. Yeah. Ok, so what about space? How do we think about space?

[00:18:56] VJ: So the amount of space that you really have to think about with BFS completely goes back to that queue 'cause you'll remember we're leaning on this queue data structure for us to like keep reference to nodes, to know which order we're gonna check them, to know how many are left to check. And so if you're imagining a tree with three nodes, then worst-case scenario, you're gonna have three nodes enqueued at a time because you'll look at the root node, and you'll be like, "do you have a left? Do you have a right? Enqueue them." And now you've got every single node in the tree is enqueued. But you could also have a situation where you have a large tree and you're looking at the number of nodes that are enqueued at any given time. And depending on how wide the tree gets and how many, how many nodes you have at a given level, you'll remember like when we kind of talking through I was like, "oh look we have a little bit of one generation and then the whole next generations enqueued."

[00:19:56] SY: Yeah.

[00:19:57] VJ: So depending on how wide your tree is, the amount of space could be—it will be proportional to how wide your tree gets. Basically...

[00:20:06] SY: Oh.

[00:20:06] VJ: ...the biggest or the widest generation. So if you've got a tree with, you know, a level that is 10 or 20 nodes wide, then you could have 10 to 20 nodes enqueued at a time. And for every one node that you're reading, you're still adding two more. You're adding its left and right child. So you're gonna be popping things off, but you're probably going to be adding more things before you can pop them off.

[00:20:30] SY: Right.

[00:20:31] VJ: So queues can grow in size and they can, you know, get wider and skinnier. But the worst-case scenario for BFS is that you'll have one generation, one level that is very, very big—probably the last one—and you will have the space complexity be completely proportional to the width of that tree. So that's how much extra space you need to keep track of those references to the nodes and the number of nodes you have enqueued.

[00:20:59] SY: Ok. That makes a lot of sense. And when we talk about width, we're talking about like levels, right? So we talked about like one whole level would be added when a whole generation would be added to a queue. Like that's what we're talking about.

[00:21:11] VJ: Exactly. Yeah. Yeah.

[00:21:12] SY: Right? Ok.

[00:21:13] VJ: And so if you had like five nodes in a generation that were all, you know, descendants at the same level, you could have a worst-case situation where you've got all five of them enqueued at once. And sure you'll dequeue them, but that'll take time, and you're still adding two new things every single time as you're going through this tree until you get to the last level. Finally, when you get to the last level, that means you've hit all the leaf nodes and there are...

[00:21:37] SY: Yeah.

[00:21:37] VJ: ...no children to enqueue. And finally, you'll just be popping things off.

[00:21:41] SY: Poppin' things off.

[00:21:42] VJ: So you'll get a little bit... (Laughing) Exactly. You'll get a little relief at that point, but you know, this, this queue will kinda start to swell as you add more and more children nodes depending on how wide the tree is. And if you...

[00:21:56] SY: Yeah.

[00:21:56] VJ: ...think about it, it, it's a really good way of thinking about DFS versus BFS because...

[00:22:02] SY: Right.

[00:22:02] VJ: ...you'll remember that with depth-first search, it all had to do with how deep the tree was, the height of the tree. And that...

[00:22:10] SY: Yeah.

[00:22:10] VJ: ...was your worst-case scenario.

[00:22:12] SY: Yeah.

[00:22:12] VJ: And so in the same way with BFS, breadth-first search, your worst-case scenario has to do with how wide the tree is. And it, it makes sense when you start to think about your two different strategies and how much space you could, you know, really have to account for depending on how big your tree and or how wide it is.

[00:22:31] SY: Ok, so if I have a tree, how would I pick between DFS and BFS?

[00:22:41] VJ: You would probably have to think a lot about what your, what your tree looks like and what it is that you're trying to do. And to be honest, I think it probably has a lot to do with how much knowledge you have about the tree and what you're looking for. And we kind of, you know, used this metaphor early on in the season when we were talking about diving deep versus kind of like snorkeling. (Laughing) So like...

[00:23:08] SY: Yeah.

[00:23:08] VJ: ...if you, if you know exactly which section of the tree you're looking for something, you might wanna just immediately go for depth-first search. But if you're like, "I'm not really sure. I maybe need to kind of look around a little bit," you might wanna run a breadth-first search. And then you'll realize, "oh I don't even need to look at this left section of the tree. I only need to look over here." And so honestly I think it depends a lot on the problem you're trying to solve, how much information you have about the tree and what the tree looks like. And think a lot of the times you probably even wanna do a combination of breath-first and depth-first. You might wanna combine strategies and do a little bit of breath-first and then figure out, "oh, this is where I wanna go deep."

[00:23:47] SY: Oh man. I didn't even think you could do.

[00:23:48] VJ: Yeah. I, I mean there's, I don't think there's an actual algorithm where it's both, but most algorithms will, you know, implement a little bit of—or I guess most people will use a little bit of one, get some more information and maybe use a little bit of the other. And sometimes you know exactly what you're trying to do, and you might just run a BFS automatically 'cause you don't have enough information. And sometimes your approach is very targeted, and so you might know, "oh, I definitely wanna just run a depth-first search." But as long as you know that the worst-case scenario for space and time are related to the, you know, the strategy you use and what the tree looks like, at least you'll be more well-equipped to make a more informed decision about (Music) whichever of these two strategies you decide to use.

[00:24:33] SY: And that is the end of today's show. So if you liked what you heard, please leave us a review. And make sure to check out Vaidehi's blog post. Link to that is in your show notes. Also wanna give a huge shout out to Twilio for sponsoring the season. Make sure to check out their awesome API. To get your twenty dollars in free Twilio credit, text your name to 480-485-4321. Phone number and link are in your show notes. Vaidehi, you wanna say good-bye?

[00:25:00] VJ: Bye, everyone.

[00:25:00] SY: Thanks for listening. See you next season.

[00:25:06] SY: Ok, what about...

[00:25:07] VJ: I have ten p—

[00:25:07] SY: ...space?

[00:25:08] VJ: ...sorry, I was gonna keep going. (Laughing) We can stop talking about lifeboats. (Laughing) Um, anyways. Keep going.