Description

We continue our discussion of tree data structures with 2-3 trees, where the nodes of every tree contain data in the form of keys, as well as potential child nodes. Not only that, but it can contain MORE THAN ONE KEY. They are also the -key- to what we'll be talking about next episode, B-trees, and you won't tree-lieve how cool those are.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Busying Oneself With B-Trees from her basecs blog series.

Transcript

[00:00:02] SY: (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about.

[00:00:15] VJ: 2–3 trees.

[00:00:17] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.

[00:00:54] SY: 2–3 trees, what are those and why are they called 2–3?

[00:00:58] VJ: Well, those are a lot of questions. Conveniently, I guess we can answer all of them.

[00:01:04] SY: You have answers. Okay. Great. Great.

[00:01:06] VJ: Well, a 2–3 tree, as the name would suggest, is a type of tree data structure. But keeping with the theme of like tree data structures that all have weird rules that we have to follow and all behave in their own little idiosyncratic ways, a 2–3 tree is kind of quirky. So we’re familiar with tree data structures, but a 2–3 tree data structure is one where the nodes of the tree contain data in the form of keys. So we have nodes that contain keys as their data and they also have child nodes.

[00:01:43] SY: Okay. There’s a lot going on.

[00:01:45] VJ: Yes. However, we haven’t talked about the 2–3 parts.

[00:01:47] SY: Okay.

[00:01:48] VJ: The 2–3 bit of a 2–3 tree has to do with how many keys that tree can have and how many children it can have, specifically how many keys one node can have and how many children that node can have.

[00:02:01] SY: Okay. So is the idea that at a node I would have multiple keys?

[00:02:06] VJ: Yeah, exactly.

[00:02:07] SY: Okay.

[00:02:07] VJ: So in a 2–3 tree you have keys that are the data. So just like when we learned about binary search trees we learned that all nodes, you know, they can contain some kind of data. They hold some information. So in a 2–3 tree, you have some data but that data is also called a key. It’s just like a way of storing data, but the way that we store it is a key.

[00:02:30] SY: So usually when we talk about keys, there’s a key and there’s a value and there’s like a key value pair. Is there any key value pairing in this situation or we just talk about keys?

[00:02:40] VJ: In this situation, it’s really just one piece of data per key.

[00:02:43] SY: Okay.

[00:02:44] VJ: So the key is really just a piece of data.

[00:02:46] SY: That’s the whole thing.

[00:02:47] VJ: Yeah. That’s the whole thing. So we can have a node that has one key that just means it has one piece of data. And if a node has two keys, that means it’s holding two pieces of data in it. And that’s all that keys are. They’re pretty simple.

[00:03:00] SY: All right. So we have our nodes. At the nodes, we can have some keys. Okay, let’s keep going.

[00:03:05] VJ: Now we have keys. We have some kind of data, but I also said that the nodes can have children that they point to, right? This is like a pretty…

[00:03:15] SY: Normal tree stuff.

[00:03:16] VJ: Yeah. Just trees being trees doing their normal tree thing. A node having references to children is pretty normal. But when we’ve talked about binary search trees in the past, one of the rules that we know about a binary search tree is that you can only have two children, right? You have the left child and the right child. I remember back when we were talking about it we were like, “Oh, very strict,” but there’s a reason for this. The left child is less than the root nodes value and the right child is greater than. However, a 2–3 tree is a little bit different. There’s a little bit more flexibility. So unlike a binary search tree where you are limited to only having two children, in a 2–3 tree you can have two child nodes or you can have three child nodes.

[00:04:04] SY: I can make some more babies.

[00:04:06] VJ: Yes, but only two or three.

[00:04:08] SY: Only two or three. Okay. I won’t get carried away.

[00:04:08] VJ: So you’re still limited, but like with a more flexibility.

[00:04:11] SY: Still limited, but I got a little more freedom.

[00:04:13] VJ: Yes.

[00:04:13] SY: Yeah, a little more freedom. All right. So is that where the 2–3 comes from? Like I can have two to three kids?

[00:04:19] VJ: Yes. Yes. And what’s kind of cool about the 2–3 ratio, because that’s really what it is, this 2–3 ratio and that’s where the 2–3 tree gets its name is from this ratio, the number of child nodes that a node can have, a number of child references it has, corresponds to the pieces of data that it holds. In other words, the keys that the node contains are tied to the number of children that it has.

[00:04:47] SY: Okay.

[00:04:47] VJ: And this is true for every single internal node in the tree. So that basically means every parent node, every node that is a parent in the tree has to follow this ratio, but I guess I should tell you what the ratio is, right?

[00:05:01] SY: Yeah. Yeah. So what’s that ratio?

[00:05:04] VJ: The ratio is that a node can have one key and two children or it can have two keys and three children.

[00:05:14] SY: Okay.

[00:05:15] VJ: And that’s where the 2–3 comes in, but that’s what the ratio is. It can basically be one piece of data, two references to your two children or two pieces of data and then you have to have three references to three children.

[00:05:27] SY: Interesting. Okay. So does that mean that in one node I can only have at most two keys?

[00:05:34] VJ: Yes. Yeah, that is true. At least for a 2–3 tree, which is what we’re talking about today.

[00:05:39] SY: For 2–3 tree. Right.

[00:05:40] VJ: So you could have a great one piece of data that you’re holding and that’s still valid. That’s still fine. And you can have up to two pieces of data and that’s all you’re pretty much allowed to do.

[00:05:49] SY: Okay. So that’s how 2–3 trees work. Are there other rules that we should know about?

[00:05:55] VJ: So there are a couple rules that we’ve already sort of talked about. There’s a couple other patterns that I think are worth noting because if we talk about them you’ll start to see them when we talk about 2–3 trees in more depth. The first rule of course is that the parent nodes have to contain data and references to children based on this ratio. That’s true for every single parent node. Another fact about 2–3 trees, another rule is that the leaf nodes, which are nodes that have no children, they can contain one or two keys because the references to children doesn’t really apply because they are leaves, they terminate, they don’t have children. So those leaf nodes can have up to two keys or one key and that works. They’re kind of a little bit more flexible. You don’t have to worry about children because as we’ve noted they don’t have children. But those are like the basic rules, but I think the more interesting thing to look at and to talk about is the patterns that emerge when you start to implement these rules on a 2–3 tree data structure. So there are two things. I want to talk about in particular. The first one is that even though a 2–3 tree is different than a binary search tree, it’s different because you can have up to three children, right? There’s this interesting pattern you’ll see when you start to look at a 2–3 tree, which is that all of the children nodes in the left subtree of a node are still going to be smaller than whatever key references them.

[00:07:28] SY: Interesting.

[00:07:28] VJ: So what that means is you have a node that has two pieces of data, right?

[00:07:33] SY: Okay.

[00:07:34] VJ: And those pieces of data will have references to children nodes. And for example, if you have a piece of data that has reference to a left child, the keys in that left child are still going to be smaller than the key of the parent that reference them. So it is kind of like in the same vein as a binary search tree, right?

[00:07:55] SY: Interesting.

[00:07:55] VJ: In a binary search tree, we always know that the left child is going to be smaller in value than the root. Similarly, in a 2–3 tree, we know that the left subtree of a node is going to be smaller in its value, in its key, then whatever reference did, whatever its parent was. So that’s the first interesting pattern you start to see emerge.

[00:08:19] SY: Yeah. Okay, I think that makes sense, but I think I need an example.

[00:08:22] VJ: Sure.

[00:08:23] SY: So let’s say that we start with a root node of 37.

[00:08:29] VJ: Sure.

[00:08:29] SY: Okay, just one key, one root node, nothing too fancy, and then we give it a left child. And at the left child, we have two keys. We’re going to store a 15 and a 29. So two keys at the left child node.

[00:08:45] VJ: Yes.

[00:08:45] SY: So far we’re good, right?

[00:08:47] VJ: Yes. And just remember that we have two pieces of data. So we have to follow the ratio.

[00:08:51] SY: Yes. Okay. So now that we have two pieces of data at that node, that node needs to have three children. So it has its own left child then it has, what do you call it, like a middle child?

[00:09:03] VJ: Yeah, let’s call it middle child. Yeah.

[00:09:05] SY: Middle child, okay, cool, cool. And then we have a right child. What you’re saying is for the left child I cannot have any keys that are greater than my parent node, which is made up of two keys. So my parent node is made of 15 and 29. So my left child cannot be bigger than 15 or 29. Is that what you’re saying?

[00:09:28] VJ: Yes. So if you have it specifically the fact that it has to be less than the key that references it, so you basically need to pick a key that’s going to reference the left child. So in this case, you have two keys, right? You decided, “Oh, this node is going to have two pieces of data, 15 and 29.” So the 15 is the first key and the 29 is the second key, the 15 is what references the left child node.

[00:09:56] SY: Okay.

[00:09:57] VJ: The 29 is what references the right child and interesting.

[00:10:01] SY: Who gets the middle?

[00:10:02] VJ: The middle child has to be in between those two values.

[00:10:05] SY: Oh, okay. So initially I was thinking that the child belongs to the node, what you’re saying is the child belongs to the key.

[00:10:15] VJ: I guess we can think about it in two parts. If you have a node and you divide it into two halves, each half is what references a child if you are dividing the node into half and you’re like, “Okay, each half is going to have a value.” One half of that node needs to reference its own child. The other half needs to reference its own child. And then there’s this weird middle child that is just referenced by the node itself.

[00:10:38] SY: Interesting.

[00:10:38] VJ: I didn’t really ever think about how you’re going to actually represent this in memory or code because maybe it actually is just that, you know, you have a reference from the node and you just determine that the value has to be less than one half and the other half, but I guess it kind of depends on how you implement it, but the idea is like if you kind of imagine this tree, you have these three children, you need to like follow this rule of the left child needs to be less than whatever its parent was and its parent just happens to be one half of the node, in this case the 15.

[00:11:14] SY: Right. Right.

[00:11:14] VJ: Does that make sense?

[00:11:15] SY: Yeah, that makes sense. Okay. So if our node is split into two halves like you said, left node is 15, right node is 29, then our left child of that node must be smaller than 15.

[00:11:27] VJ: Yes, specifically whatever keys it has, has to be smaller than 15.

[00:11:31] SY: Sorry. Sorry. Sorry. Okay. Yes. Yes. So the keys of that left child has to be less than our number 15.

[00:11:38] VJ: Exactly.

[00:11:38] SY: And then for our middle child, the key there has to be somewhere between 15 and 29. So we can have like 22, for example, and that would work fine.

[00:11:47] VJ: Perfect. Yeah.

[00:11:48] SY: And then finally our third child would have to have keys that are bigger than the right half of our parent. So basically it has to be bigger than 29.

[00:11:58] VJ: Yes. However, one thing to note, we’re talking about this whole left subtree and what I said earlier was that all the children nodes in the left subtree have to be smaller than the key that references them, but the key that references them is a left child of the root node, right? And we started with the root node 37. So you can give the left child its own right child. So now we’re talking about the grandchild of the root, right? However, whatever keys you give that grandchild cannot exceed the value of the root because now you are breaking the rule at a similar to the binary search tree, right?

[00:12:35] SY: Okay. Yeah.

[00:12:36] VJ: So it’s kind of weird because the 2–3 tree has all these little flexibilities and like, sure, you can have three children and you can have two keys, but you probably want to think about what the root node is even though you have this flexibility. You can’t just go and add the number 1,000 because 1,000 is bigger than 37 and it cannot be in the left subtree. So it’s like a variation of a binary search tree. You know what it really is, it’s just like a binary search tree that is harder to deal with and you think about it more, and my God, it’s just so difficult to think about.

[00:13:12] SY: Well, if it’s so annoying, then why do we have it? What do we use this for?

[00:13:16] VJ: Well, the thing is 2–3 trees are kind of like an underlying data structure for another data structure we’re going to talk about, which is all over the place. We’ll talk about it next episode. It’s called a B tree and B trees are super useful because they let you handle and deal with data and represent data that can’t fit into main memory. It allows you to like represent large, large amounts of data and B trees are based on 2–3 trees. So we need to know how 2–3 trees work. They’re actually, honestly, I take it back, they’re not that hard, they just require you to think a little bit more than usual.

[00:13:54] SY: Yeah. Yeah.

[00:13:56] VJ: They’re not bad. I’m not trying to be like an anti-2–3 tree here.

[00:13:59] SY: Anti-2–3 tree. We’re very pro-2–3 tree. Okay. So we’ll get to really see the benefits of it in next episode, but for now they’re important for B trees, and that is good enough for me. So how do these 2–3 trees perform? Because always we’re talking about trees, we’re talking about searching, inserting, deleting, doing all these things to the tree, so how does this tree perform?

[00:14:25] VJ: Well, because it’s sort of like a loose binary search tree, the performance of it, the searching, deleting, inserting nodes for a 2–3 tree end up taking logarithmic time or O(log n), which we’ve seen before when we were dealing with binary search trees. There’s one other thing I want to talk about which is kind of cool because we’re going to talk about ratios more and I mentioned that 2–3 trees are just really this balancing ratio thing, but there’s this interesting pattern we didn’t get to talk about which is honestly my favorite, which is that in a 2–3 tree, you see this little relationship with the number of keys in a node and the number of children it has. So this might be obvious, but I like to abstract things out. If there are N keys in a node, then that node is always going to have N plus 1 children.

[00:15:15] SY: Yup.

[00:15:15] VJ: You just believed me. Okay. I felt like you were going to ask me to prove it, but I like that you were just like yes.

[00:15:20] SY: No. Well, that’s kind of what we described earlier, right? If we have one node, then they could only have two kids, and then if we have a node with two keys and that has three kids. Isn’t that what we’re talking about?

[00:15:30] VJ: Yes. Yeah. Exactly. Exactly. The reason I want to like say it concretely is because when you get to more complicated versions of 2–3 trees, which spoiler alert that’s what a B tree is, you’re going to see this mathematical formula reapply it again.

[00:15:47] SY: Okay.

[00:15:47] VJ: So I just want to note that there is a relationship. And when I say ratio, the ratio I’m talking about is like N keys, means N plus 1 children.

[00:15:57] SY: Okay, good to know. So with this new tree structure that has some keys and multiple children, I’m wondering how do we actually move around? How do we traverse a 2–3 tree?

[00:16:10] VJ: You can Traverse through the tree anyway you want because you can walk through the nodes. Like if you’re searching for a node in the tree, you’d probably do something similar to binary search except that you need to like look at two pieces of data instead of just one, right? You’d have to look at a left child and be like, “Okay, if my options are 15 and 29, now I need to compare the thing I’m looking for and look at do I need to go to the left half of this node or the right half.” But I think there’s something interesting about traversal in general, which is that if you look at how larger 2–3 trees are structured, you’ll start to notice that the way that the tree is sorted by its keys is as an in order traversal. So basically you start from the left child, you go up to the root and then you go to the middle child and then you go back up to the root and then you go to the right child. So in the case of like that node you created earlier where we have 15 and 29 and it has a left child, a middle child, and a right child, the smallest numbers will be to the left of 15, the child of 15, then you want to go back up to 15, then we had 22, which is the middle child. Then you go back up to 29 and then you go read whatever numbers are to the right of 29.

[00:17:27] SY: Right.

[00:17:28] VJ: There’s like this in order traversal if you want to be able to find the keys in sorted order in a 2–3 tree.

[00:17:36] SY: That’s kind of convenient.

[00:17:37] VJ: Yeah, it’s kind of like a wavy thing where it’s like I’m at the child, I go to the parent. I go back to the parent.

[00:17:43] SY: Then go to the other child.

[00:17:44] VJ: Yeah.

[00:17:47] SY: Okay. Cool. It’s funny because when we talk about the rules, it does sound a little complex. But when you talk about the traversal of it, all of a sudden it kind of all makes sense. It feels like it comes together.

[00:17:57] VJ: Yeah. It’s kind of like, you know, I feel like we’ve graduated from like first grade to second grade of trees where we just were like, “Oh, first we just have like binary trees and here are the rules.” And now we’re like, “Hmm, but what if we have fractions and you divide the nodes?”

[00:18:13] SY: Yeah, “What if we had a third child?” Yeah.

[00:18:16] VJ: But like we’re pretty familiar with some of these tree structures. We’ve been doing more complicated things this season. So I feel like we’re ready for it and so I’m excited that we’re getting into these advanced data structures, which are basically the same fundamental concepts we’ve seen but with little and tweaks. When you have self-balancing trees or like coloring trees, we’re always like, “Wait, how do you give it a color or how do you do give it two keys?” And we’re like, “Oh, actually it’s simple.”

[00:18:44] SY: Yeah. Yeah. It always ends up being more straightforward than it looks at first.

[00:18:50] VJ: I hope that continues the rest of…

[00:18:52] SY: Yeah.

[00:18:53] VJ: The rest of our exploration into trees.

[00:18:54] SY: We’ll see how long that was. Yeah. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. The link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.

[00:19:33] VJ: Bye everyone.

[00:19:34] SY: Thanks for listening. See you next week.

Thank you for supporting the show!

Thank you for supporting the show!