In this episode, we are looking at a different type of self-balancing tree: red-black trees. By following four very important rules while we paint our tree red and black, we can make it not only self-balancing, but also make it run super efficiently in logarithmic time.
[00:00:02] SY: Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.
[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.
[00:00:11] SY: And she is the brilliant mind behind the Base.cs Podcast series. Today, we’re talking about…
[00:00:16] VJ: AVL Red-Black Trees.
[00:00:18] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website and with their In-App Payments SDK, you can add it to your mobile app, too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie. Okay, we’ve talked about tries before. I don’t think we’ve talked about the color of tries. Is that right?
[00:00:58] VJ: That’s true. We haven’t really talked about what they are at all.
[00:01:00] SY: I didn’t even know there could be colors. So that’s kind of interesting. Okay. So what is a red and black tree?
[00:01:08] VJ: So a red-black tree is a variation of a type of data structure that we’ve actually seen a lot. We’ve talked about it in our most recent episodes. We’ve been discussing something called “Self-Balancing Trees”. So a red-black tree falls into that category. It’s a type of self-balancing binary search tree and it’s very similar to what we talked about in our previous episode when we talked about AVL trees, which were also self-balancing trees. But a red-black tree is a little different because it has to adhere to this very strict set of rules and those rules are how the tree stays balanced. It’s how it maintains its self-balancing characteristic.
[00:01:51] SY: So just to be clear, there are many ways in which a tree can be self-balancing. This is just a set of rules that talks about how the red-black trees themselves work.
[00:02:00] VJ: Exactly.
[00:02:01] SY: Is that fair?
[00:02:02] VJ: Yeah.
[00:02:02] SY: Okay cool. Cool. Okay. So let’s jump right in. What are the four rules?
[00:02:06] VJ: The four rules, they get significantly more complicated as you go along. They may not seem complicated when you just look at them in isolation. But then when you think about trying to follow all four of them at once, you’re like, “Oh, wow, there’s a lot to think about.” So we’ll just work through easiest to hardest.
[00:02:23] SY: Okay.
[00:02:23] VJ: First rule, we’re talking about a red-black tree. So rule number one is that every single node in the tree must either be red or black. That’s it. That’s rule number one.
[00:02:32] SY: What does that actually mean? There’s a data structure, like we can’t like see the data because it almost sounds like there’s a coloring book involved or something and there isn’t because it’s like data. So tell me more about that.
[00:02:51] VJ: You’re right that it is just data and it’s a data structure. But because it’s a data structure and we can arbitrarily store whatever information we want in it, right? Because we’ve had other tree data structures where we have a node that has a value and then it has a left child and a right child, right? It’s just like information we’re storing. We’re just storing pointers, references to something else, somewhere else in our memory. So in the case of a red-black tree, you have an additional piece of information per node. So it’s not just what its value is and what its left child and what its right child happened to be, you also have one additional piece of information which is, is the node red or is it black. So it almost doesn’t even matter whether you have the color red or black. It’s more the fact that you’re storing one of these two values. An interesting thing, this is like a really fun nerdy tidbit. Way back in the beginning of this podcast, like Season 1, Episode 1, we talked about binary and bits and bytes in order to store whether a node is red or black. You just need one bit because a bit can contain either zero or one so it contains two possibilities. So all we’re doing is we’re basically adding one extra bit to each node in the red-black tree and we’re just saying, “We’re going to use this bit to decide, is it red or is it black?” It could be like, is these apples or is it bananas? Is this like blue or green?” IT doesn’t matter. It just happens that this is called a red-black tree.
[00:04:23] SY: We’ve decided.
[00:04:24] VJ: Yeah, but it’s more the fact that we have this flag, this binary of like it has to be X or has to be Y. It has to be red or it has to be black.
[00:04:33] SY: Okay.
[00:04:33] VJ: That’s how we delineate what the color of the node is.
[00:04:36] SY: Do most trees have colors? Is this like a normal thing we just haven’t come across yet?
[00:04:42] VJ: That’s a good question. I think most of them that I’ve come across don’t. I don’t know. It seems like most of the time people want like a very basic tree structure.
[00:04:51] SY: Okay.
[00:04:52] VJ: This is a more complicated structure because it’s trying to solve a specific problem.
[00:04:57] SY: Okay.
[00:04:58] VJ: And the red-black tree in particular is kind of stringent with its rules because as we’ll see later on, the reason that these four rules have to be followed is it guarantees logarithmic runtime. And so it’s like if you want to have guaranteed logarithmic runtime, you got to do these four things. In other trees, if you don’t really care about the guaranteed logarithmic runtime, who cares if you follow those rules? But this one’s just like really particular.
[00:05:26] SY: And then why red and black? Why not yellow and greens or some other two colors?
[00:05:31] VJ: When I was researching this and learning about it, the story behind this is that when this structure was created, I think it was created at the Xerox PARC Lab by a couple of computer scientists. In the lab, they had a laser color printer that could print out things in color and they decided that out of all the colors the one that looks the best with black was red. So they’ve picked the color red.
[00:05:54] SY: Okay. It’s pretty easy, pretty straightforward. Okay. Deal. Deal. Cool. Cool.
[00:05:59] VJ: I think it’s funny that like a printer from 1970 is now dictating all of this conversation, but sure.
[00:06:05] SY: Okay. So rule number one that the nodes in the tree have to be either red or black. That’s the first rule. What’s the second one?
[00:06:13] VJ: So the second one is that the root node of the tree must always be black.
[00:06:19] SY: So far so good. What is the third rule?
[00:06:22] VJ: So the third rule is a notch above, like it’s a little bit more complicated, but I think we can do it. The third rule is that in a red-black tree two red nodes can never appear one after the other. They can’t appear consecutively. So what that means is that a red node must always be preceded by a black node, which means it has to have a black parent node. And if you have a red node, you have to have a black child node. And this is specific to the red nodes in the red-black tree. So you just have to remember if you have a red node, you have to look at who its two neighbors are, maybe neighbor isn’t the right word, who its ancestor is…
[00:07:06] SY: Attached to.
[00:07:07] VJ: Yeah. You have to look at the familial line, right? So you need to look up and down. You need to know is the parent black and is the child black because that’s the way you have to follow rule number three when it comes to red nodes in the red-black tree.
[00:07:19] SY: Okay. So we’re getting a little more complicated but still seems doable. Okay. Number four, what’s the fourth rule?
[00:07:26] VJ: So the fourth rule is the one that I always find the hardest because it involves looking at every branch in the tree, every branch path. And the rule is that every branch path, which means the path from the root node to any empty null leaf node, all of those paths have to pass through the exact same number of black nodes. Think about this, you need to look at the red-black tree and look at where all the null leaf nodes are and you need to make sure that in order to get to that ending part of the path, that null node, you’re passing through the same number of black nodes. It’s not enough to just be like, “Oh, I passed through one black node here and I passed through two black notes here.” Black node is great. No, they need to be the same number.
[00:08:15] SY: That sounds kind of annoying.
[00:08:17] VJ: Yeah. I think that one is the tricky rule because it’s kind of easy to, like we’re smart people. Let’s just assume we’re not going to use anything that’s not red or black. So that’s easy to solve rule one, right?
[00:08:29] SY: Okay. That’s easy.
[00:08:30] VJ: And then, look, we’re so smart, we can always start our root node with the color black, also easy.
[00:08:36] SY: Nailed it.
[00:08:37] VJ: Rule three is also not that hard. You’ll just be like, “Whenever I see red, I got to check who its parent and child is.”
[00:08:42] SY: Add a black one. Yeah.
[00:08:44] VJ: Rule four is like, “Oh, it’s so easy to get through all the first three rules and be like, ‘Yeah, I’m doing it,’ and then you get to rule four and you’re like, ‘Oh, no. Now this branch path is one shorter or one longer than all the other branch paths.’”
[00:08:56] SY: Yeah. That sounds really hard to do, but I’m confident that we can do it. I mean, if it sounds doable, it might be hard to do, but I think we can do it. Why are these rules so important? Why does it matter?
[00:09:08] VJ: The idea here is that if you follow these four rules, then the time that it takes to insert a node, to delete a node, or to search for a node in a red-black tree is guaranteed to be logarithmic. So that’s O(log n).
[00:09:25] SY: Okay.
[00:09:26] VJ: And the number of nodes you need is just the same as a binary search tree. So it’s just like O of n, that’s the amount of space you need, right? But the time complexity is sometimes hard with trees. We’ve talked about this in the past where if your tree isn’t balanced, like it could take you a really long time to find a note if you are heavily weighing your tree on the left or the right. But if you follow these rules, then to find a node or insert or delete it should always be consistent and it should always be logarithmic. So the payoff is great, but the challenge is the rules.
[00:10:00] SY: Not following it is tough.
[00:10:01] VJ: Yes.
[00:10:02] SY: Yeah. Okay. Let’s walk through an example. So let’s say I start with node two, my root node, and as we know to follow the rules, that node has to be black. I add a left child, let’s say I added, and it’s number one. What color should I make that one? Does it matter?
[00:10:20] VJ: Well, we could make it red or we could make it black.
[00:10:26] SY: Okay, either one I’m fine.
[00:10:27] VJ: Yes, because importantly if you add a red node, you just have to make sure that its parent and child are black. But if you add a black node, that rule doesn’t apply, right? So for now we could just make the left child one just be black.
[00:10:41] SY: Okay. Let’s make it black. Cool. So now we have another node we want to add. Let’s say a number three and we want to make it the right child of my root node. How do I decide what color to make that?
[00:10:55] VJ: So we know because of rule number four that the branch paths matter, right?
[00:11:00] SY: Yes.
[00:11:01] VJ: So we have to think about what it means to get to the leaf nodes of this tree. So let’s say right now we have our root node two and we have a left child one, that left child has two empty pointers right now because it doesn’t have any children.
[00:11:17] SY: Yes.
[00:11:18] VJ: So left child is null, its right child is null. So both of them are pointing it to null and they are both so basically empty leaves.
[00:11:26] SY: Yes.
[00:11:26] VJ: The problem here is that if we put a red child here, now the path to the left is going to have one more black node than the path on the right. So we have to make the number three, this third node, the right child. It has to be black here.
[00:11:43] SY: Okay.
[00:11:44] VJ: So we are kind of constricted here. We have to make this node three the right child of two. We have to make it black. And the reason is that the left and right children, one and three right now, they both have their own pointers. They’re not pointing to anything. They don’t have their own children, but they have pointers. They’re just pointing to null. So the left child one has two empty pointers, which are just null, and the right child, three, both have empty pointers that are null. And when it comes to red-black trees, the null empty leaf nodes are considered to be black. So that means we already have some black nodes that we maybe aren’t considering here. So we had the root node as two, the left child is one, and it has null empty leaf nodes that are black.
[00:12:38] SY: That are considered black. Okay.
[00:12:39] VJ: Yes. So if we add the right child node three, it also has to be black because if we add it as red, the first and second and third rules aren’t violated, but the fourth one is. You remember I told you the fourth one is like the one that I hate because it’s really annoying to think about.
[00:12:58] SY: The branch path one, right?
[00:13:00] VJ: Yes, because now you’re going to break the rule of every single branch path to empty leaf nodes has to pass through the same number of black nodes because the left child is black so you’re passing through two black nodes, the root and the left child. But if the right child is red now, you’re just passing through one black node, which is the root. Now if the right child is red, you’re not passing through a black node. And so now the branch paths are uneven.
[00:13:26] SY: And that’s not okay.
[00:13:27] VJ: Yes. That’s correct. But if you want to make the three red, we can color the one to be red and now that solves that problem.
[00:13:38] SY: Another balance. The paths are now the same.
[00:13:41] VJ: Exactly. And it fulfills the four rules. So basically, we could do to the root node being black and make one and three black or make two black, which is a root node, because remember the second rule is that it has to always be black, but we could make the left and right child, one and three, red and now the branch paths are even.
[00:14:02] SY: Okay. So if we’d already decided that my number one node was going to be black and now we’re like, “Oh, wait a minute. We want our three to be red.” Can we go back and recolor the one and make it into a red or is it kind of too late?
[00:14:15] VJ: Oh, you can recolor it. Yeah, totally. You just have to be careful with recoloring. Just remember you have to look at the whole tree and be like, “What unintentional consequences will this have?” Because imagine if one had its own child and maybe that was red and then you colored one red, now you’re violating the third rule where two red nodes cannot be sitting next to each other. You actually can recolor and that’s actually a great point because you were asking me earlier like, “Oh, this data structure seems like finicky,” and like, “Oh, wow, we had follow these rules and why?” Now we know why and now I think what we’re doing is we’re answering the question of how and one of the strategies is recoloring. You can recolor nodes and often there’s a strategy of like just every node you insert just make it red and then recolor it black accordingly. The interesting thing with that is that if you insert a node and immediately color it red, it’s easier to identify where you’re violating any of the rules and then you couldn’t kind of fix any of the violations you have. That tends to be a thing that you see in red-black trees where you see the strategy is that the node that’s inserted is red immediately, inserted color red, and then maybe you move things around, you restructure the tree or you can recolor some other node that needs to be black.
[00:15:37] SY: Okay. This might be a silly question, but why not just make all your nodes black? Why even bother? Because it sounds like it’s when the reds come in that things get all messed up, but if everything is black always, then doesn’t that kind of solve all our problems?
[00:15:54] VJ: I mean I guess then you don’t need any color. But then I bet if you took away the color and that aspect, would the tree be self-balancing and also have a time complexity of logarithmic time? I’m going to guess no for reasons that are math and Big O notation that I can’t think about, like my mind came back around it. That’s a good question. Like why is the color like make it work exactly?
[00:16:18] SY: And also wondering for rule number four that says, “Every branch path from the root node in the tree to a null pointer passes through the exact same number of black nodes,” I guess the red kind of helps break that up a bit, right? Because otherwise you’d have to have the same… like I think it’d be almost impossible to have an entirely black tree and have that same number of black nodes be true. You kind of need the red to kind of help you start over again, right?
[00:16:41] VJ: Yeah. What you just said made me think about something which is that you would always have to have the same level of nodes if it were all black. You couldn’t ever have an extra level because now you’re like, “Well, I don’t have a red to add to this without counting it.” Oh, we just answered our questions.
[00:17:00] SY: We did. We’re so good. We’re so good at this. Okay. So you saw that we can use recoloring as one way to make sure that we’re following our four rules. Is there anything else that we can use as a tool, as a way of making sure we’re doing a good job of making a red-black tree?
[00:17:16] VJ: Yes. So I said that we can move things around and restructure the tree. I didn’t use the special word that we learned recently. We learned it with AVL trees, the concept of rotations. So we can basically handle most scenarios, especially when it comes to inserting and deleting things in a red-black tree. We can handle it by recoloring and potentially rotating notes.
[00:17:40] SY: Okay. Let’s do an example. Let’s start with a node number 21 and we know it has to be black because that’s one of the rules. So next number we want to add to our 21 is the number 8 and we can decide if it’s going to be red or black. Actually, we talked about the rule of just make it red and then fix it later. So we’re going to make it red. So now we have 21 and then it has a left child of 8 which is red.
[00:18:05] VJ: Great. Yes.
[00:18:06] SY: So far so good.
[00:18:07] VJ: As a reminder, the reason that it’s a left child is because it’s binary search tree, so 8 is less 21.
[00:18:12] SY: Yes. We always start at the left. Yup. Yup. Okay, cool. So now we want to add a new node, let’s add node number 3, and what I kind of want to do is I kind of want to just stick it at the bottom, like right under 8 so that it’s 8’s left child. But that probably violates one of our rules, doesn’t it?
[00:18:30] VJ: It does it. It violates rule number three, which is that two red nodes can never appear consecutively in a row or rather like a red node can’t have a red parent or a child. So in this case, we have that problem where because it’s a binary search tree, we want to insert 3 to the left of 8 because 3 is less than 8 and 8 is less than 21. However, if we insert 3 as the left child of 8 and we insert it as being red, now we have two consecutive red nodes and we’re shit out of luck.
[00:19:04] SY: Okay. So we can’t do that. So what do we do?
[00:19:09] VJ: So we can actually rotate. We can rotate these nodes around and an interesting thing we’ve stumbled upon here is that you can’t have a chain of three nodes in a red-black tree. That’s just generally like a thing that you’ll see. It’s a characteristic where when you have a chain of three nodes, that usually means you’re violating some rule and the reason is that we aren’t thinking about this but 21 has a right null leaf. That’s the reason that this chain of 3 is going to violate a rule because remember we care about how many black nodes it takes to get to a leaf and now we have two null leaves, actually we have like one, two, three null leaves that we’re not even thinking about. When we go to look at rule four, we are going to realize, “Oh, these are all off, the branch paths are not the same.” So when you have a chain of three, just generally you should get ready to do some work because it’s going to violate a rule.
[00:20:05] SY: Okay.
[00:20:06] VJ: So we can rotate and this will solve two problems. First of all, it’ll solve the problem of breaking rule number four and we will no longer have a chain of three. We’ll have a nice little triangle.
[00:20:17] SY: Okay. So remind me again, how do we rotate?
[00:20:20] VJ: So you can have left rotations or you can have right rotations. In this case, what we want to do is I think we want to do a right rotation so that 21 becomes the right child of 8 and 3 stays the left child of 8 the way that it is now. So we’re going to kind of scoot 21 down and move 8 up. So 21 goes down and 8 it goes up.
[00:20:44] SY: I love our sound effects. Okay. So we have 8 at the top and then left child 3 and then right child 21, but we have a problem because our 8 and our 3 are both red nodes and our 21 is a black node. And the way that we’ve rotated things are 8 is our new root and the root cannot be red.
[00:21:05] VJ: Correct. We’re violating two rules actually. We have two consecutive red nodes and our root note is not the right color, but we can solve this pretty fast.
[00:21:14] SY: Okay. We can just recolor it.
[00:21:15] VJ: Yeah. It’s my favorite strategy. It’s just like change the color.
[00:21:20] SY: Yeah, just change it. Get your erasers out, just change the color. Okay. So now that we have our 8, our root note as now black, is that it? Are we done? Do we have any more rules that we need to worry about?
[00:21:32] VJ: We have one more thing to think about which is again that nasty little rule four where you have the left child as a different color than the right child and specifically now the branch path is different when you’re looking at the null nodes of the left child versus the null nodes of the right child. So the null nodes of 3 are passing through a different number of black nodes than the ones of 21.
[00:22:00] SY: Right. Right. Okay. Okay. That’s why we couldn’t have the two children before being red and black. They either both have to be black or both have to be red.
[00:22:08] VJ: Exactly. We ran into this exact problem and now we see it in action.
[00:22:11] SY: Okay. So now we can go to our 21 which was black and we can just recolor that and make that red.
[00:22:17] VJ: Exactly. And now we have all the four rules being followed. We have only red and black nodes. We have a black root node. We have two red nodes and that’s fine, but they’re not being followed consecutively. They’re both just children of this root node and the branch paths to the two children’s null pointers are completely the same.
[00:22:40] SY: Nice. All right, we did it.
[00:22:42] VJ: We did do it. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog posts. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy, whether you’re building a mobile app in iOS, Android, Flutter or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.
[00:23:21] VJ: Bye everyone.
[00:23:22] SY: Thanks for listening. See you next week.
Thank you for supporting the show!