In this episode, we start our discussion of searching, or traversing, through a graph with breadth-first search (BFS). The breadth-first search algorithm traverses broadly into a structure, by visiting neighboring sibling nodes before visiting children nodes. The power of using breadth-first search to traverse through a graph is that it can easily tell us the shortest way to get from one node to another, which you'll experience first hand by brining muffins to your neighbors!
This episode of the Basecs podcast is based on Vaidehi's blog post, Going Broad In A Graph: BFS Traversal from her basecs blog series.
[00:00:02] SY: (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.
[00:00:08] VJ: And I’m Vaidehi Joshi, Author and Developer.
[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about…
[00:00:15] VJ: Graph traversal.
[00:00:17] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.
[00:00:54] SY: So we’ve talked about traversing things in the past. When we talk about traversing a graph, what exactly are we talking about?
[00:01:02] VJ: So when we talk about traversing through a graph, we’re really just talking about searching through the data structure. And as we know, a graph is a data structure similar to a tree but a little bit different and we’ve talked about traversing trees, so now we’re just going to focus on traversing and searching through a graph structure. And it’s actually fairly simple because really when you’re talking about searching, you’re really just talking about visiting every single vertex, every single node in the graph, and by proxy, when you visit every single vertex, you also visit every single edge in the graph. So that’s really what searching through a graph data structure is. It’s a similar thing to like searching through a tree, but this is a different structure. So it’s a slightly different traversal algorithm.
[00:01:48] SY: And so we’re going to talk about two ways to traverse a graph breadth-first search and depth-first search. These are not new things for us, right? We’ve talked about them before.
[00:01:56] VJ: Yes, we have. We talked about them quite a lot in depth. And just as a quick refresher, when we’re talking about either of these two types of traversals, these two ways of searching, the real difference between them is the order in which the vertices of this graph are visited. So when we were talking about it in the context of trees, we are talking about, “Oh, how do you go about checking all the nodes in the tree?” So similarly, when we’re going to talk about BFS, breadth-first search, versus DFS, depth-first search, we’re going to talk about it the same way. We’re going to talk about how we’re visiting the different nodes in the graph and there are two big differences between depth-first search and breadth-first search and we’ll see those differences still hold true when it comes to a graph data structure.
[00:02:44] SY: And what are those differences?
[00:02:45] VJ: So with depth-first search, we’re traversing deep into the structure by visiting the children of a node before you visit its siblings and a depth-first search algorithm also uses a stack data structure to sort of help it keep track of which nodes it has visited as it works its way through the tree and searches it or in our case through the graph. But a breadth-first search algorithm traverses broadly. It visits the siblings of a node of a vertex before visiting its children. Unlike the depth-first search algorithm, it uses a queue data structure to help keep track of which nodes it has looked at and which ones still need to be visited in the searching process.
[00:03:35] SY: And for this episode, we’re going to focus on breadth-first search. So what is the first step to BFS? Where do we begin?
[00:03:42] VJ: Oh, that’s a great question because it’s the first difference between a tree searching algorithm and a graph searching algorithm. So when we had a tree, it was pretty easy to know where to start because you start at the root and then you work your way either down or out depending on whether it’s depth first or breadth first. But with a graph, the first step is basically like figuring out where to start because there literally is no root node. A graph data structure is very different than a tree data structure. It’s not as obvious. So the first step is picking a vertex to start with. And because there’s no root, there’s no hierarchy, graph traversal can begin with pretty much any vertex in the graph. So you can just kind of choose one arbitrarily and that’s sort of your starting point for your search.
[00:04:28] SY: Okay, cool. So I have my graph. I have my, not root node, my first node that I’m starting with and then what happens?
[00:04:37] VJ: There are basically five steps to BFS in a graph and I’ll kind of talk through all of them and it might seem like a lot but just remember that once we know these steps you’re going to do the same thing to every single node. So once you really understand the order, just know that you’re just going to repeat this again and again and again. So it’s actually simpler than it might sound.
[00:04:58] SY: Okay.
[00:04:58] VJ: So the five steps of breadth-first search are, first, you have to add a node from the graph into a queue of nodes that are the ones we want to visit. So I mentioned earlier you need a queue data structure to help you with BFS. This is that queue. So that’s the first step is add a node into the queue because that’s basically your list of things to visit. The next step is to visit the first node in the queue. Pretty easy, right? Put something in the list, now we’re going to look at first thing in the list. And when you visit the first node in the queue, we want to mark it as visited because we’re in the process of visiting it. And the process of basically visiting a node is basically noting, “Hey, here’s a node that exists.” And also who are the its neighbors because those are the two real important piece of information that we need when it comes to figuring out how to search through this structure. So we visited the first node in the queue. We’ve marked it as such. The next thing we want to do is look at whether the node we just visited has any neighbors. And if it does have neighbors, we want to look and see if any of those neighboring nodes, if any of those adjacent nodes have been visited or not because we have this queue, right? We have a queue where we’re keeping track of things we have to visit. So you can kind of probably guess what we’re going to do next.
[00:06:24] SY: So next up, you would check to see if there are any neighbors and then you just add those to the queue.
[00:06:29] VJ: Exactly. But we don’t just add them, we want to also check if they’re visited or not because remember that queue, that list of our nodes is any node that still needs to be visited. So in a situation, maybe it’s not like obvious yet why this would happen. So in a situation where we’re looking at a node and we check to see if it has any neighbors and its neighbors have already been visited, we don’t want to add those to the queue because remember our queue is just the list of things to be visited.
[00:07:00] SY: That we want to visit. Yeah.
[00:07:01] VJ: So if we’ve already visited it, it doesn’t need to go into the list.
[00:07:03] SY: Don’t add it. Okay. Cool.
[00:07:05] VJ: Also, if the node that we’re checking has a neighbor who’s already in the queue, we also don’t want to add it because we somehow have already noted to ourselves, “Hey, visit this later. Come back to it.” So if we run into it again, we also don’t want to add it to the to-be-visited queue.
[00:07:23] SY: Okay. So if it’s already in the queue, don’t add it, and if it’s already visited, don’t add it.
[00:07:27] VJ: Correct.
[00:07:29] SY: Cool.
[00:07:29] VJ: The last step, we’ve looked at the node, we marked it as visited, we’ve looked at its neighbors and potentially added them to the queue, we have nothing left to do at this node. So we just remove it from our queue once we finished visiting it. It seems easy, right?
[00:07:44] SY: That seems pretty straightforward. It’s like go next door, see who the neighbors are. If they are there, if they haven’t been added to the queue, add them to your list of neighbors to visit and then go and meet some more neighbors. Once you’re done with the neighbor that you visited, you just cross them off your list.
[00:07:59] VJ: Yeah. It’s like when new people moved to the neighborhood, you bring them a basket of muffins, it’s like if you’ve already brought them a basket of muffins, you don’t need to keep bringing them.
[00:08:06] SY: You can’t bring two muffins.
[00:08:07] VJ: Everybody gets one, one basket.
[00:08:08] SY: That’s just too thirsty. Yeah.
[00:08:11] VJ: You don’t have infinite time to bake muffins here.
[00:08:14] SY: That’s true. Muffins take a while.
[00:08:17] VJ: But yes, that’s a good way of thinking about it. Basically for every single node in this little neighborhood, we only want to visit it once and check its neighbors once and make notes to visit those neighbors only ones.
[00:08:29] SY: And once we’re done, we cross them off.
[00:08:31] VJ: Yeah. Yeah, exactly. Now we’re like, “Oh, we’re friends. I’ll go borrow sugar from them one day or whatever.”
[00:08:37] SY: Yes. I can finally get my sugar. Okay. Cool. All right. Okay. So those steps make sense to me just on a really high level, but I want to see an example. Can we do one with a real graph?
[00:08:48] VJ: Yeah.
[00:08:49] SY: Okay. So we’ll start with Node E. That node will have a neighbor A which will have a neighbor B which will have a neighbor F. Now neighbor F will have two nodes: C and G. One more time. So we have E attached to A attached to B attached to F. F has two nodes attached to it: C and G.
[00:09:14] VJ: So it’s almost like a line of nodes, right? And then this little fork.
[00:09:17] SY: Yeah.
[00:09:18] VJ: That’s not too complicated.
[00:09:19] SY: The first thing that I think we have to do is pick a starting point up.
[00:09:22] VJ: Yup. You got it. Who should we start with, because we could really just have a free-for-all here.
[00:09:28] SY: We can go nuts. I think we should start with B.
[00:09:32] VJ: Sounds good. I like B.
[00:09:34] SY: So that’s the first step. Okay. So now we need to add B to our queue of nodes that we need to check.
[00:09:43] VJ: Exactly.
[00:09:44] SY: Cool. So basically, it’s me saying, “I have my neighbor B. I know that neighbor B exists. Now I need to add B to my queue of neighbors to give muffins to.”
[00:09:52] VJ: Exactly.
[00:09:53] SY: Cool.
[00:09:53] VJ: Unfortunately, you only know about B though so you basically have to go over B’s house because that’s all you know about.
[00:09:59] SY: Okay. So we’re on a hill and a cliff.
[00:10:03] VJ: I trust you, we’re going to go somewhere with this, but I did not expect to be on a hill.
[00:10:09] SY: We are on a hill and we can only see the house B. We can only see neighbor B.
[00:10:15] VJ: I see.
[00:10:15] SY: You see what I’m doing here?
[00:10:18] VJ: This is your queue. Your queue is on a mountain now.
[00:10:21] SY: Great topography. Yes. So we’re going to get to B and then it’s only when we’re actually visiting B that we can see B’s neighbors.
[00:10:31] VJ: Exactly.
[00:10:32] SY: Which are F and A.
[00:10:33] VJ: Yeah. And when you get to B, you basically can say, “Oh, I have B, I am going to mark it as visited. I’m officially at B’s neighbors. I’m on their front porch. Have the muffins…”
[00:10:43] SY: Give them the muffins.
[00:10:44] VJ: Yeah, exactly. I officially visited it. And also, now I can check its neighbors. And based on that, I’ll know, “Do I need to go visit anybody else or not?”
[00:10:54] SY: So now I’m at B. I’m at a different part of the mountain. So now I can see the neighbors A and F. So now I’m going to add A and F to my queue, to my list of neighbors to give muffins to.
[00:11:08] VJ: Yeah. And the reason you can add A and F is because they don’t already exist in the queue and you know you haven’t visited them yet because so far you’ve only visited B.
[00:11:17] SY: So now that I have done all those things, I’m basically done with Node B, right? I’m done with my neighbor B. So I will cross them off my list or I will dequeue them and now I’m ready to move on to the next item in my queue.
[00:11:34] VJ: Exactly. So now the next person that we need to visit is basically whichever neighbor you marked on your list first. In other words, whichever node is first in the queue and I think we got to be and then we are like, “Oh, look, you have neighbors F and A.” So that means you put F in the queue first and then A. So once you dequeue B, the next thing at the front of the queue to look at is Node F. So that’s who you’ll go over muffins to next.
[00:12:01] SY: So we are on our way to F, we land at F. Now that we are at F’s house, we are at a different part of our mountain, we can see its neighbors, and remember that F actually has its own two neighbors. It has a C and a G. So now that we’re there, we’re going to add C and G to our queue, to our list of neighbors to give muffins too.
[00:12:24] VJ: Yup.
[00:12:25] SY: And now that we’ve done that, we can mark F as visited we cross them off the list and we dequeue them.
[00:12:34] VJ: Oh, I have one more thing to say, which is that technically F has one more neighbor, B.
[00:12:41] SY: Oh!
[00:12:43] VJ: I almost forgot about it too, but because this is a graph not a tree, so technically F is connected to C and G and B.
[00:12:52] SY: Yeah. So even though we came from B, that doesn’t really matter in terms of us looking at the neighbor. We still have to consider B.
[00:13:00] VJ: Yeah, because B is still adjacent to F and because this isn’t a tree structure, B isn’t a root node. It’s just this random node we started with.
[00:13:09] SY: Random neighbor.
[00:13:10] VJ: So we still have to like look from F’s porch out at the vast wonder of the neighborhood and you’re like, “Oh, wait, B is actually its neighbor too.” But the reason I said it was okay is that if you look at B and say we consider all of F’s neighbors and we’re like, “Oh, it has C, it has G, and it has B,but B has already been marked as visited. So we don’t have to do anything with it. We can just ignore it.
[00:13:36] SY: We don’t need to worry about it.
[00:13:37] VJ: We’re not going to add it to the queue, but this is a great example of why we want to check whether a node exists in the queue already or whether it’s been visited because otherwise you’ll just be perpetually baking muffins and going to everyone’s house. We don’t want to go back to B. We already gave them muffins.
[00:13:51] SY: Nobody got time. Okay. So F has three neighbors: C, G, and B, we can ignore B since we’ve already been to B’s house. So we’re just looking at C and G. Now that we’ve acknowledged C and G and added them to our queue, added them to our list of neighbors to give muffins to, we are done with F so we can cross it off your list or dequeue it.
[00:14:14] VJ: Yup.
[00:14:15] SY: Okay. So now we look down at our list of people to visit and people to give muffins to and we have A.
[00:14:21] VJ: Yes, A is now at the front.
[00:14:23] SY: Now it’s at the front and then we have the two nodes we just added, C and G. So next, we’re going to go over to neighbor A and we’re basically going to keep doing the same steps. We’re going to look and see what neighbors A has. And now that we’re at A’s house, we can see that A actually has a neighbor, E, and because we came from B initially, A also has the neighbor B because remember everything started from B. And so A technically has two adjacent nodes, E and B, but we can ignore B since we’ve already been to B’s house.
[00:15:00] VJ: Exactly. So we really just have to deal with E and we’re like, “Okay, well, E is someone we have to visit.” Add it to the end of the queue and we marked A as visited, pop it off the queue, and now we just have C and G and E.
[00:15:19] SY: Our newly added E. Okay.
[00:15:21] VJ: I guess technically we dequeue it, popping is usually with the stack.
[00:15:24] SY: Yeah. I like popping though, it’s so much fun. So one thing that we didn’t clarify at the beginning of describing this graph is that C and G are also connected. They’re actually neighbors. So we have our Node F, F has two nodes, two adjacent nodes, C and G, but then C and G are also connected. There’s a little triangle going on.
[00:15:43] VJ: It’s a little love triangle in the neighborhood.
[00:15:45] SY: A little muffin triangle, there we go. And so when we are at C’s place, we can see our neighbor G and we can also see our neighbor F because remember C has two adjacent nodes, G and F. Now this is kind of where it gets a little interesting because we want to add G and F to the queue, we know we can’t add F to the queue because F has already been visited. Can we add G to the queue?
[00:16:16] VJ: Well, if we look at our queue currently, what does it look like? It has C, G, and E. We haven’t dequeued C yet. We have C, G, and E in that order. We’re visiting C, but G is already next in line. So it hasn’t been visited, but it has been added to the queue. And those are those are times where we don’t want to re-add something. So in this case, there’s nothing to do with F, it’s already been visited, and there’s actually nothing to do with G because it’s already been queued. It’s where we’re going to go next. There’s really nothing to add when it comes to C’s neighbors. We kind of covered all our bases. We already went to one of its neighbors and we’re going to go to its next neighbor in the next iteration of this queue.
[00:16:55] SY: Okay. So all we can do is just dequeue C.
[00:16:58] VJ: Exactly.
[00:16:59] SY: Ha! Dequeue C.
[00:17:00] VJ: I know. I saw. I noticed that too. I was like, “I love it.”
[00:17:02] SY: Yeah. That was fun.
[00:17:04] VJ: The grammar nerd or the word nerd in me was just euphoric.
[00:17:09] SY: Okay. So C is gone, so left in our queue are G and E. So now we go over to G and G has two neighbors. It has C, which is where we just came from, and then it has F because remember F had the C and G as its neighbors, that little triangle we talked about. But in both these situations, F has already been visited and C has already been visited. So we have nothing to do with G.
[00:17:36] VJ: Yup. Bye-bye G.
[00:17:37] SY: Bye-bye G. So we can dequeue it, cross it off our list of neighbors to give muffins to and now we look at our queue and we only have one thing left. We have our neighbor, E. So now we go all the way over to E, and now that we’re on this side of the mountain, we can look out and see what neighbors E has and E only has one neighbor. It has Node A as its neighbor and Node A is a node that we’ve already visited, already gave muffins to. So there’s nothing else to do for E.
[00:18:09] VJ: Yeah. And we are basically done with our traversal because now we’ve gone through every single node in our graph. We’ve made sure that it was added to our queue. We made sure that we visited it, looked at its neighbors, and then we removed it. And now that we’re at the queue and has nothing in it, we know that we’re at the end of the search because, well, we haven’t added anything else. There’s nothing else to do. We’re done.
[00:18:32] SY: Yeah. So ones we dequeue E, our queue is empty and we’re done.
[00:18:36] VJ: Exactly. And what’s kind of cool about this graph traversal, I mean actually there’s a lot of things that are cool about it, but one of the things that I like about it is that even though we picked an arbitrary place to start, we sort of did create this like leveling effect where we started with level 0, the first node in the graph was B, and then we sort of added its neighbors, F and A, and we added those next and that’s sort of like level 1, and then we added their neighbors, so that would have been E, C, and G, which is level 2. And as we visited the neighbors of each node, we sort of got further and further away from our starting point, which was B. And if we were kind of to reorganize this in the form of a tree, B could have been a root node, A and F could have been its children and then E, C, and G could have been B’s grandchildren. It’s almost like you are sort of doing the BFS of a tree in a way and that you are going like level by level. We didn’t go down one long path. We went level by level. We started at a node and then we sort of took one step further and further and further away from it. And that’s the breadth part of BFS, of breadth-first search.
[00:19:54] SY: So what does this accomplish? What does this help us do?
[00:19:58] VJ: Well, I think the interesting part about this is that if we were to take note of one additional thing in this search, which we didn’t really do in our example, but I think it would be easy enough to go through this example and do one additional step, which is that every time we visited a node, if we took note of who its parent was we could basically create a path between any node and the node we started from. So in our case, it was B. But as you work through this graph and you do BFS through the structure, you could basically create these little parent pointers. So like little references that lead back to the parent of each node. And if you trace the parent pointers, you can work your way back to the starting node and use the parent pointers to sort of restructure the graph into a tree, which is what I was sort of getting at. Imagine if we went to deliver muffins to all of our neighbors and we were like, “Oh my God, this mountain is so big, I want to leave some bread crumbs,” or muffin crumbs as it were.
[00:21:07] SY: Muffin crumbs.
[00:21:08] VJ: So I can sort of trace my path and I’ll know, “Oh, I came from this place.” So when I want to go back, back to the starting point, I have to follow these crumbs.
[00:21:14] SY: Just all the muffin.
[00:21:15] VJ: And so that’s sort of what those parent pointers are. And if you have parent pointers, you can actually determine the shortest path between any node in the graph and the so-called parent node or the starting node, which was B in our case. And the interesting thing is sometimes you’ll find many shortest paths. Sometimes you’ll find many paths and like one of them might be shorter or two of them might be shorter than the rest, but it will give you at least one of the shortest paths between a node and the starting node.
[00:21:47] SY: So now that we know that, I’m kind of wondering how to actually Implement BFS and how well it performs. But unfortunately, we are out of time for this season, but next season we’re going to dig into the implementation of BFS and its performance.
[00:22:03] VJ: Yeah. And next season we’ll get into some more graph traversal stuff like depth-first search too and those adjacency lists will be back. So stay tuned. Good stuff coming out.
[00:22:13] SY: Stay tuned. And that’s the end of the season. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.
[00:22:51] VJ: Bye everyone.
[00:22:52] SY: Thanks for listening. See you next season.
Thank you for supporting the show!