CodeNewbie

Description

In this episode we go through some trie-als and tribulations to retrieve and build words using tries!

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Trying to Understand Tries from her basecs blog series.

Transcript

[00:00:02] SY: Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today, we’re talking about

[00:00:16] VJ: Tries.

[00:00:18] SY: This season of the Base.cs Podcast is brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website and with their In-App Payments SDK, you can add it to your mobile app, too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie. Today, we are going to try to learn tries.

[00:00:59] VJ: We’re going to be trying very hard.

[00:01:01] SY: Yes.

[00:01:01] VJ: I just try to piggyback off your joke, but it’s just…

[00:01:04] SY: You did try and you did a great job trying. This is going to be a fun episode. Okay. What is a trie? Wait, how do you spell trie, first of all? Because it’s not actually T-R-Y.

[00:01:15] VJ: Exactly, yeah. That was the first thing I was going to say. A trie is actually spelled T-R-I-E, and I like to think of the fact that it comes from the word “retrieved” like T-R-I-E. I don’t actually know if it does. Does it come from the word retrieve? I should read my own blog post.

[00:01:35] SY: I think it does. I like the story you just said. Let’s go with that.

[00:01:39] VJ: Well, I mean, even if it didn’t come from that word, it’s what you use a trie for, you’re retrieving things from it, but a trie is different than a tree. I’ve noticed that there’s like a little bit of a best practice of differentiating between the two data structures. That’s why it’s pronounced “try” rather than tree, even though maybe tree makes more sense.

[00:02:01] SY: Because that’s not what I would have guessed. I would not have guessed that that’s pronounced “try”. I would have thought it was pronounced like “try-ee” or tree-A.

[00:02:09] VJ: I like that.

[00:02:10] SY: You know? Like, “How are your tree-As doing today?” “Oh, my tree-As are great.”

[00:02:13] VJ: You know, it sounds like a tree-A is what a try-ee is like when it goes to Europe. It’s like, you know, it’s got a little beret and basically, it’s French.

[00:02:22] SY: It’s holding a croissant, yes.

[00:02:25] VJ: Exactly. It’s a fancy tree-A.

[00:02:28] SY: It’s a fancy tree-A. Okay. So it’s a data structure, right? Tries are data structures. What does this data structure look like? What is this one all about?

[00:02:42] VJ: So confusingly, a trie…

[00:02:47] SY: I love how we’re starting. This is great.

[00:02:49] VJ: A trie is a tree-like data structure. So when you think about what it looks like, it looks like a tree but it’s not a tree and it’s not spelled like a tree. Anyways, it’s tree-like and it’s a tree-like data structure where the nodes of the tree-like structure store things that map to an alphabet because, basically, we use tries in order to represent strings and words, which is why you have the concept of letters and using nodes to represent letters of the alphabet. And you use a trie structure to retrieve values by traversing down the trie. Almost a tree, it is a tree but I’m trying to use the word trie and trying to catch myself.

[00:03:36] SY: Yes. You’re doing a great job. Okay, so this is interesting because, usually, when we talk about our data structures, we are storing numbers like we’re dealing with integers most of the time, but this one’s kind of different. This one, we’re talking about some letters.

[00:03:49] VJ: Yeah.

[00:03:49] SY: That’s exciting.

[00:03:50] VJ: Yeah, and it’s interesting because it’s not just like any old letters. It’s letters of an alphabet so that you can represent lots of different words for that alphabet. Something to note here is that you’ll see a trie that represents the English alphabet, but you can represent any alphabet so you could have an alphabet with 72 letters or you could have one with 26. And you use the trie data structure to represent the letters in the alphabet in the form of nodes because we’re dealing with a tree-like data structure.

[00:04:21] SY: So let’s dig into that structure a little bit. When we talk about trees, usually, we start with the root node. Do we have a root node in a trie?

[00:04:28] VJ: We do. Every trie is going to have a root node, but it’s going to be empty.

[00:04:34] SY: Oh, okay. Like on purpose?

[00:04:35] VJ: Yes.

[00:04:36] SY: Like we’re going to leave it that way?

[00:04:37] VJ: That’s funny. You’re kind of like, oh, anticlimactic. You were kind of let down by that. I could tell in your voice. You’re like, “Mmmm.”

[00:04:44] SY: Just a little bit. So it’s empty. We want it to be empty?

[00:04:48] VJ: Yeah.

[00:04:48] SY: Very mysterious.

[00:04:49] VJ: The interesting thing here is because we’re dealing with letters and strings, we are eventually going to add letters and it ends up being useful to us because if you add enough letters together, you have a word and you can use tries to represent words, but if you have no letters, you basically have an empty string. So often, you’ll see that the value of the root node is an empty string.

[00:05:13] SY: Where do we go from there? It sounds like we’re not really starting off on the right foot, but I trust you. I trust it’s all going to come together somehow so if we’re starting off with no information, this empty string, where do we go?

[00:05:26] VJ: Well, we start off with this root node, with this empty string, but we also start off with something else, too, which I haven’t really talked about, but I will finally reveal, which is that every single node inside of a trie actually has some interesting things going on inside of it. Every node has potentially a value and that value could be null. It could be empty, but also, whenever you create a new node, it also is created with an array of references. So basically, when you have a node in a trie, it comes built in with this array of references to its children. In the past, we’ve talked about like, a tree data structure has two children, it’s a binary tree, but because we’re dealing with a trie, it’s not just two children actually. It could potentially have many children. If you’re dealing with an English alphabet trie, then it could have 26 children. If we continue down that logical path, then the node of a trie, the root node, will have 26 possible letters that it can represent.

So it gets initialized with an array with 26 slots. That would be like indices 0 through 25. And the reason for 26 is because we’re dealing with strings. We want to represent letters. How many letters could we represent? Well, 26. That’s why we have an array that gets initialized with our root note even though its value is empty, it potentially could store references to 26 children. Right now we’ll just say that it’s empty. Those little slots in the array are empty. They don’t point anywhere, but you can fill them and that’s where we’ll put our children nodes.

[00:07:09] SY: We talked about how we can use tries to spell a word and I think I kind of get the whole nodes and the arrays and all that, but I think I need an example to really see how it all works together, especially how we’re going to use it to spell an actual word. So can we start with a simple word and see how that works?

[00:07:28] VJ: Yeah. Let’s do it.

[00:07:28] SY: I’m going to pick the word pie.

[00:07:31] VJ: Nice.

[00:07:32] SY: Because pies are delicious. Wait, what’s your favorite pie?

[00:07:35] VJ: Pumpkin.

[00:07:35] SY: Let’s see if we can still be friends. Okay, okay, that’s fine. I respect that one, a solid answer.

[00:07:39] VJ: Pumpkin’s a solid good one.

[00:07:41] SY: That’s a solid one. A lot of people are not really into pumpkin and it makes me very upset. Mine is pecan pie. I love pecan pie.

[00:07:48] VJ: That’s good. With chocolate?

[00:07:49] SY: I’ve never had it with chocolate.

[00:07:50] VJ: Oh, I only had it with chocolate. Half the reason I have it is because I want the chocolate.

[00:07:55] SY: Okay. Now I know for Thanksgiving, I’m going to request my pecan pie with chocolate, done. Okay, so we’re going to spell pie with trie. Oh, my God, they rhyme. Trie and pie. I did not mean for that to happen. That’s amazing. Okay, so we start with our root node, which is empty. It has nothing. Nothing’s there, but it does have an array and then it’s the indices of the array that really matter. Those are kind of the stars of the show.

[00:08:21] VJ: Yes.

[00:08:22] SY: Okay. So if I want to insert my first letter P, where do I put it?

[00:08:28] VJ: First things first, you have to find the right place to put it and the only place you can really start from the root node, but you know that a root node has that array of indices. You need to find the right place for wherever the letter P would go.

[00:08:41] SY: Okay.

[00:08:42] VJ: So what would the letter P… I guess what number of the alphabet is the letter P? A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P.

[00:08:54] SY: So it’s number 16, but since it’s zero, index zero, it’s number 15.

[00:08:58] VJ: Exactly. So that basically means from our root node, we want to insert a reference to where P would live. So index P. We go to Index P and we say you have a reference but there are no P nodes in this empty trie. So we’re going to insert a node at index 15 which is going to be basically our node P.

[00:09:19] SY: And it is the insertion of that node that represents that letter P.

[00:09:24] VJ: Exactly, the fact that that reference at index 15 points somewhere and not to null, not to nothing tells you that there is somewhere to go. Like, if you were looking for the letter P, it would exist. When I was learning about tries, I was like, “Oh, this kind of feels like hash tables,” because in hash tables, it’s a similar thing where if something exists, like if you go to that spot and there’s some where to go from there, there’s a value there, there’s something that exists, that means there’s something to pull from there. Basically, in this situation, we have a trie. It’s all empty except for one reference at index 15, so we know that there is something that exists at 15 because there’s something you can retrieve from there.

[00:10:06] SY: So we have this node at index 15 and where does it go? Where is it taking us?

[00:10:14] VJ: When we inserted the node at index 15, the node that represents P, what we actually did was we inserted the node, but that node itself also has its own array with 26 values and references.

[00:10:28] SY: Oh, boy. Wow! This is quickly expanding.

[00:10:33] VJ: Yes, but I would bet that you could insert the next letter of the word pie by doing the exact same thing we just did.

[00:10:42] SY: We are going to find the index that represents the letter I so we’re going to count again. Okay, so we have A, B, C, D, E, F, G, H, I. Okay. It’s like the ninth letter. So index zero, it’s going to be at index eight.

[00:10:55] VJ: Yes. So if I is the ninth letter, therefore going to live at the 8th index, index eight, we can do the exact same thing we did, which you just described, insert a node right into index eight. Again, that node, when we insert it, gets initialized with its own set of references. Right now they’re empty because we just have P and I, but you can see where this is possibly going. I also want to mention that when we inserted the node I, it doesn’t really have a value right now. It just exists and when we inserted the node P, we also didn’t have a value, but it is possible for nodes to have values and maybe when we get to the end of our word, we can add one.

[00:11:41] SY: Just to retrace our steps, we started at the root node, which had an array of 26 indices. And then we said, “Okay. Our first letter is P so we found the index that mapped to P. At that index, we inserted a reference to a new node and that new node has its own array.” And then we said, “Okay. We need to insert our next letter, which is I,” so in that new node, in that new node’s array, we found the index that mapped to our letter I and then there, we inserted a new pointer to a new node where I’m assuming we’re going to insert our final letter E?

[00:12:24] VJ: Exactly. Do you know what letter is in the alphabet? Man, this is the hardest part of tries, is counting letters.

[00:12:33] SY: Counting letters, okay, A, B, C, D, E. Okay. E is the fifth letter which means it’s going to go at index four.

[00:12:42] VJ: Same thing again. At are node I, which is the node we’re looking at basically, the one we just inserted, we’ll find the fourth index. We’ll insert a node there. And again, that node is initialized with its own array. Actually, I want to point out something here, which is usually when you’re dealing with tries, I haven’t mentioned this yet, but you’re trying to retrieve some sort of value and usually, what you’re doing is you have a key and you’re getting a value from it. The key happens to be a word or some sort of string. And so if you search in the trie for the string and it exists, and it has a value, you can basically say, “Oh, this key has a value.” Since we have the word “pie”, I think it would be cool if we use that as a key and added some value to it. So what’s your favorite number or how many pies do you like to eat?

[00:13:35] SY: Oh, those are tricky questions. Well, I feel like I shouldn’t eat more than one pie, but I like two types of pies.

[00:13:45] VJ: We can do two.

[00:13:46] SY: I will say two, two pies.

[00:13:47] VJ: So let’s say that we want the word, the string pie to be a key. If you use this key, you can retrieve a value and the value is two. At our last letter P-I-E, E is our last letter of pie, at that node, we can insert a value and you’ll remember all the other nodes so far, P and I, haven’t had any value.

[00:14:11] SY: That’s true.

[00:14:12] VJ: But we want P-I-E. We want the E, the last letter, to basically complete the word pie and we want that to be a key so we can insert a value of 2 at the node E. So, it’s kind of cool.

[00:14:26] SY: Okay.

[00:14:26] VJ: Right? Because we also have another word in this which we may or may not have intended to do. We also have PI, pi like the mathematical thing.

[00:14:34] SY: That’s true.

[00:14:36] VJ: Yeah, that math thing.

[00:14:38] SY: That old math thing.

[00:14:40] VJ: So that word exists in our trie, but it doesn’t have a value so it’s not a key.

[00:14:44] SY: Right, because we didn’t put a value at the I index, at the I node. So what happens if we accidentally or maybe not accidentally spell another word that is not the word we intended to spell?

[00:15:01] VJ: So like pi for example?

[00:15:03] SY: Yeah.

[00:15:04] VJ: The math pi.

[00:15:05] SY: PI, yeah, the math pi is in the word pie, but we didn’t mean to do that. That’s kind of like a little accident. What happens? Does that mean something?

[00:15:15] VJ: Well, it’s interesting because the trie structure is basically about retrieving values for keys. In our case, P-I-E, pie is a key. So if we search through our trie and we say, “Oh I have this word, this string P-I-E, is there any value for it?” We can traverse down the structure. When we get to E, we can say, “Oh, this is the end of the word. Is there a value here? Does a value live here?” and if it does live there, well, then we basically have succeeded in our search. We found a value and that’s what’s called a search hit, which is what we just experienced. When you have a search hit, that basically means you actually successfully had a real key because there was a value to retrieve, but the question you asked is what if you have a word that doesn’t have a value at the end of it, so P-I.

We didn’t have any value at the letter I. We only inserted a value at E. So if we go through trie and we say, “Okay, I’m going to go from P to I, well that’s the end of the word. Is there a value here?” Well, no, there’s not. That tells you two things. That tells you this is a search miss, not a search hit. It also tells you that the word PI, there is no value for it so it’s not a key. You could always insert something and make it a key, but there’s just no value that exists there.

[00:16:45] SY: Pis are just so valueless.

[00:16:47] VJ: We are all about eating, not math.

[00:16:51] SY: You know, I think it’s actually a very fair assessment of this entire show. It’s like we are totally down for food. Math?

[00:16:56] VJ: I like math. It’s just, you know.

[00:17:00] SY: It’s just not better than pie.

[00:17:02] VJ: Yeah, that’s true.

[00:17:03] SY: It’s just not better than pie.

[00:17:03] VJ: Math and pie, that’s an easy equation.

[00:17:05] SY: Yeah. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog posts. Link to that is in your show notes. Also want to give a huge shout-out to Square. Square has APIs and SDKs to make taking payments easy, whether you’re building a mobile app in iOS, Android, Flutter or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/codenewbie. This episode was edited and mixed by Levi Sharpe.

[00:17:46] VJ: Bye everyone.

[00:17:46] SY: Thanks for listening. See you next week.

Season 6 Episode 5 May 29, 2019

Tries: the golden retriever of data structures

Description

Show Notes

Transcript