Description

This episode we're diving into radix sort! The word has no relation to Raid, so it is definitely non-toxic and you don't have to bug out. It IS, however, a great integer sorting algorithm, and the first one at that!

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Getting To The Root Of Sorting With Radix Sort from her basecs blog series.

Transcript

[00:00:00] SY: If you haven’t yet gotten your tickets for Codeland, you totally should. It’s our annual conference about all the wonderful things you can do with code. And besides great food, great talks and great people, this year we’re offering complimentary on-site childcare. So bring your babies with you and see you there. For tickets go to codelandconf.com

[00:00:23] (Music) Welcome to the Base.cs podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:30] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:32] SY: And she is the brilliant mind behind Base.cs blog series. Today we’re talking about…

[00:00:37] VJ: Radix Sort.

[00:00:39] SY: This season of the Base.cs Podcast is brought to you by Linode. If you’ve got a personal project, a small business or a big business with lots of data, Linode offers you a secure hosting for all your infrastructure needs. They are a Linux Cloud Hosting provider where you can get a new server up and running in under a minute. Plans start at one gigabytes of RAM for just five bucks a month. And with the promo code CodeNewbie2019, you can get a $20 credit. So go to linode.com/codenewbie and give it a try. Also, they’re hiring. Check out their jobs at linode.com/careers. Link is in your show notes.

[00:01:15] It’s also brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.

[00:01:48] SY: So we’ve got a new word, a fun word, a fun word to say “Radix”. What is a radix?

[00:01:54] VJ: So a radix is actually something we have talked about way, way, way, way back in the beginning of this whole podcast. We learned about binary and we learned about hexes, hexadecimals and radix is actually just another mathematical term for the base of a number. And we’ve learned about a couple different types of number systems with different bases. So like for example binary, it is the base 2 number system. So we could say that it has a radix of 2.

[00:02:27] SY: Okay.

[00:02:27] VJ: Whereas hexadecimal is the base 16 number system and it has a radix of 16.

[00:02:34] SY: That’s not hard.

[00:02:35] VJ: Yeah. It’s actually pretty simple. It’s just like you know this thing all along and this whole thing is like something you’ve been using. Here’s just another term for it.

[00:02:44] SY: You know why I thought it was going to be scary?

[00:02:46] VJ: Why?

[00:02:46] SY: Because it sounds like to me, when I saw a radix, I heard like raid, like the thing you use to get rid of like bugs. And so I was thinking like, “Oh my God!”

[00:02:58] VJ: Aggressive.

[00:02:58] SY: Yeah, like aggressive, it’s going to be aggressive, it would be toxic, poisonous, might kill me. You know, things like that. So I’m glad that’s not what that is.

[00:03:07] VJ: Did you bring a mask? Because today’s sorting algorithm is a bit toxic. It could be harmful to your health. Please do not allow children to swallow it. And if you get it in your eye, contact an emergency official for help because we can’t help you.

[00:03:20] SY: Immediately. Yes. Okay. We’re talking about radix sorting, radix sorting algorithm. So the first word is definitely not as scary as I thought it was going to be and we’re talking about sorting algorithms. So we’re talking about how to sort some stuff. So yeah, what do these bases have to do with how we’re sorting? How are those two connected?

[00:03:43] VJ: The Radix sorting algorithm is interesting because it’s going to sort using the digits of a number and the digits of the number are how we’re going to go about sorting the whole collection so we’re kind of going to break it down and basically look at the integers within the items that we’re sorting. So we could say that the radix sorting algorithm is an integer sorting algorithm and what it does is it groups numbers by their individual digits and that’s where the radix part comes from because every digit is tied directly to the base of the number basically. So just as a refresher, when you count the digits in a number, you start with the units, place the tens, place the hundreds, place the thousands, place and so on and so on, and there’s an interesting relationship there with sorting based on the radix or the digit of every number.

[00:04:39] SY: Okay. So can we only do this with numbers then?

[00:04:43] VJ: You usually do it with integers, but there is an interesting scenario where you can sort fixed length strings because you can basically map fixed length strings on to integer numbers basically and then you can make that work. But usually it’s for integers and the reason for that is, I didn’t mention this earlier, a lot of radix sort will use counting sort internally, and we talked about that last episode.

[00:05:10] SY: Right. That’s integer based.

[00:05:12] VJ: Yes. And one of the rules is that you have to use an integer and usually you want it to be like… you’re going to use a range of numbers and you should know what the biggest and the smallest number in that range is going to be and you also don’t want that range to be too big. Those things are all addressed by radix sorting because if you’re going to use counting sort you need to make sure that you can address all those rules and not violate any of them.

[00:05:34] SY: And you mentioned that we can use strings as long as they’re fixed length strings. What does that mean?

[00:05:40] VJ: Basically what that means is if we’re going to try to apply radix sort, which is an integer sorting algorithm on to strings, we need to be able to assign an integer value to every single letter in the string. For example, if you had a letter that was three words long, you would basically map every single letter in that word to an integer. So like you could say the first letter is going to be zero and the second letter is going to be one and the next letter is going to be two and you need to be able to depend on the size of that.

[00:06:13] SY: That one string.

[00:06:14] VJ: Yeah. You need to be able to know that you can map onto that basically.

[00:06:17] SY: So basically I can’t start sorting and then like add another letter.

[00:06:21] VJ: Exactly.

[00:06:22] SY: That’s fair. That seems like a reasonable requirement. Okay. So how does this radix sorting algorithm work? What do we do?

[00:06:30] VJ: It’s interesting because I think the best way to kind of introduce how radix sort works is actually not even going through the steps of it, but rather just go into an example.

[00:06:43] SY: Always down for that.

[00:06:44] VJ: Yeah. And I know we kind of talked about how we can run radix sort on strings potentially and I actually want to use an example using strings to show you how radix sort works because it’s actually intuitive. But for some reason, I think with words and strings, it’s easier because we’re humans and words are sometimes easier to reason about than numbers. So what if we try to just sort a bunch of words and try to sort them alphabetically?

[00:07:10] SY: What words should we sort?

[00:07:12] VJ: I think we should use houseplants.

[00:07:14] SY: Okay.

[00:07:15] VJ: Let’s say that we have just a group of unsorted plant names.

[00:07:20] SY: Okay.

[00:07:20] VJ: So let’s say that we have a fern and a ficus and a palm and agave and aloe and fig.

[00:07:29] SY: It sounds delicious.

[00:07:31] VJ: So if I was just like, “Oh, okay, here’s a list of words. In this case, it’s like a list of houseplants. Can you please sort them for me in some sort of logical manner?” What would your first instinct be?

[00:07:41] SY: I would probably group them by their first letter, right? Because I know that when we’re sorting alphabetically, well, we need to sort like by the alphabet. So I’d probably start by saying, “Okay, here all the As, here all the Fs, and here all the Ps.”

[00:07:58] VJ: Exactly. As you mentioned, when you’re sorting alphabetically, you kind of know the order of how things are going to go. Yeah, you know that the As have to come first and the Ps are probably going to come at the end because conveniently we only have three real letters here to deal with, As and Fs and Ps. Basically what you’re doing is you’re grouping, right? And you’re sorting by the first letter of every single one of these words. And when you’re grouping, you basically came up with three groups. You came up with the words that are under the letter A, the words that are under the letter F, and words that are under the letter P.

[00:08:30] SY: Yes.

[00:08:31] VJ: However, you’re also not completely sorted yet. So probably the next thing you would do is you would look at one of the buckets, like one of these sets and you would look at them and say, “Okay, now I need to sort by the next letter.”

[00:08:43] SY: Yes.

[00:08:44] VJ: So if we have aloe and agave, like are they in alphabetical order, aloe and agave? Okay. They both have As to start, but then what’s the next letter in each of the words? Because you need to sort alphabetically, it’s not just enough to put them together.

[00:08:58] SY: So I’d have to compare my L and my G.

[00:09:00] VJ: Exactly. The next letter in each of these words and sort it by the first letter. Now you’re sorting by the second letter. Interestingly, we’re just going to do the same thing for each of these other buckets too. So if you have the bucket with the Fs, in this case, you have fig, ficus, and fern, what would you do next?

[00:09:20] SY: A, B, C, D, E. Okay. So we look at the second letter for each one. So for fig, I’m looking at I, for fern I’m looking at E, and then for ficus, I’m looking at I again. And I know that my E comes before my I because I just did the alphabet just now. And so I would switch them. So I basically switched like if the original order was fig, fern, ficus, then I would take my fern and put it at the beginning. So then I would read: fern, fig, ficus.

[00:09:53] VJ: Yes.

[00:09:54] SY: Okay.

[00:09:55] VJ: However, we have the same problem again, right? Because fern is sorted, but we’re not sure that fig and ficus are because they both have the same first digit, they have the same second digit. Oops, I said digit. I meant letter. But it’s really just foreshadowing. It’s really just a digit.

[00:10:15] SY: That wasn’t really a mistake.

[00:10:17] VJ: So we’ve sorted by the first letter F. We sorted by the second letter I. Now we just need to do the same thing again and notice we’re repeating the same process again. We’re faced with the same problem in the same solution works, which is just sort by the next letter.

[00:10:31] SY: So then I’m comparing my G for fig and my C in ficus and C comes before G. So my new order is now Fern, Ficus, Fig.

[00:10:44] VJ: Exactly. So now the A bucket is sorted, the F bucket is sorted, and finally we have one more bucket, which is the P and there’s only one word here, one element we could say, palm, and there’s nothing to compare it to. There’s no other Ps so we know by virtue of all the sorting algorithms that we’ve done, one single element is considered sorted. So palm is sorted and Ps are taken care of.

[00:11:09] SY: That felt very natural.

[00:11:11] VJ: Yeah. If you were just alphabetizing anything, you’d probably do something like that, right?

[00:11:17] SY: Yeah.

[00:11:17] VJ: And the last step really is, okay, we have three buckets, the last step that I could almost skip over, but I won’t because I want to be thorough. We’re just going to reassemble our list together and we’ll just take the first element from the first bucket and grab all the elements and then pick the next bucket and grab the first element from that. So in our A bucket, we have agave, aloe, so we’d put agave first in our list then aloe. Then in our next bucket, we have the Fs: fern, ficus, fig. We would just stick those right in to our list and then at the end would be palm. So our sorted list is Agave, Aloe, Fern, Ficus, Fig, Palm and we’re done. We’ve now alphabetized the whole thing.

[00:11:57] SY: Okay.

[00:11:57] VJ: I said alphabetized.

[00:12:00] SY: Okay. That was not hard at all actually. That felt very, very straightforward. But we talked about how radix sorting is usually with numbers and even when we have letters, we have strings, we need to translate them into integers. So how does it actually work if we’re talking about integers?

[00:12:20] VJ: So I slipped up earlier and I was like, "Oh, we have to just sort the next digit and I meant letter. It’s the same concept. We’re just going to reapply it to integers.

[00:12:28] SY: Okay.

[00:12:29] VJ: So if we’re sorting integers, there are two ways we could really address this and I really like the houseplants example to start because that is one of the ways that we can sort. We can sort one digit at a time and we can basically sort our whole data set using one digit as like the focus point and then keep sorting to the next digit if we need to, which is what we did with like the fern, ficus, fig thing. But with palm, there was nothing else so we didn’t need to do that. With the radix sort, there are two different ways that this really works on integers. We can use our most significant digit or we can use our least significant digit to start.

[00:13:09] SY: Okay. Okay.

[00:13:09] VJ: So with our houseplants example, that comes closest to the most significant digit version, which I’m just going to call it MSD, and that’s basically when you process each integer by its greatest digit and move forward towards the least significant digit as you sort. So the most significant digit is basically the leftmost, the largest.

[00:13:31] SY: Yeah.

[00:13:32] VJ: And the reason that I said that it maps to houseplants is because we started with the first letter, right?

[00:13:37] SY: Right, right, right.

[00:13:38] VJ: So you can probably guess how least significant digit works, LSD.

[00:13:42] SY: Backwards. LSD. Don’t do drugs, kids. So we’re starting backwards then. We’re starting from the rightmost. Yeah.

[00:13:53] VJ: We start with the least significant number which is basically the rightmost digit and we do the same thing. We basically work our way towards the left. We move towards the most significant digit as we sort.

[00:14:04] SY: Yeah.

[00:14:04] VJ: Here’s the interesting thing is that both of them can use counting sort internally which is what I mentioned earlier that we need integers because we’re going to use counting sort to actually do the sorting and we won’t even get into the details of that because we did that last episode. But if you are going to use counting sort, then you need to use integers which is the story there. So least significant digit radix sort also does that, but you have to kind of work iteratively as you do LSD. There’s a sentence I never thought I was going to say. However, with our most significant digit version, which is the houseplants example, we just did that with strings instead of numbers, we were doing the same thing sort of again and again, right? For each step we were kind of repeating the same steps, which is a hint that we can do it recursively.

[00:14:55] SY: So we’ve basically covered most significant digit and how that works with the houseplants example, but how does LSD work?

[00:15:07] VJ: Oh, that’s a different podcast.

[00:15:11] SY: Can we walk through an example with that one?

[00:15:13] VJ: Yeah. I think let’s move away from strings and use an example that actually has integers so that we can see this like concrete version of what it means to use radix sort on a set of integers and then it will hopefully make more sense like how counting sort might actually work even though we won’t go into the details of that.

[00:15:31] SY: Okay.

[00:15:31] VJ: Let’s say that we have a list of integers. Let’s say we have 10, 52, 5, 209, 19, and 44.

[00:15:40] SY: Okay.

[00:15:40] VJ: We want to sort these using radix sort and we want to sort using the LSD approach, which basically means we’re going to start with the least significant digit, which is the first base digit, the rightmost. I keep using different names for it. First base digit, least significant digit, rightmost, also known as the units.

[00:16:04] SY: The unit’s place. Yeah.

[00:16:05] VJ: Yeah. It’s the smallest number basically and we’re going to start by considering the unit’s place for every single one of these numbers as we sort.

[00:16:14] SY: Okay.

[00:16:15] VJ: So. Just for the sake of complexity. All right, not complexity.

[00:16:19] SY: Simplicity.

[00:16:19] VJ: Yes. For the sake of simplicity, let’s just take note of what these integers are in terms of their size.

[00:16:28] SY: Okay.

[00:16:28] VJ: So the largest integer we have is 209 and the smallest we have is 5.

[00:16:35] SY: Yes.

[00:16:35] VJ: And I think because we’re going to be considering digits here, what I would do is make all of them the same size.

[00:16:42] SY: Meaning, give everyone a ten’s place and a hundred’s place?

[00:16:46] VJ: Exactly. And so for like the number 5, you could just do 005. It doesn’t change the value of it, but at least we can look at every number and immediately know what’s your unit’s place, what’s your ten’s place, what’s your hundred’s place.

[00:16:57] SY: Okay. Who are you?

[00:16:59] VJ: Yeah. Exactly. So if we rename these numbers, we now have 010, 10, 052, 52, 044, which is 44 and so on.

[00:17:10] SY: Okay. So if we do that whole giving everyone a value in the ten’s, the hundred’s place, we end up with our array looking like 010 which is 10, 052 which is 52, 005 which is just 5. 209, we don’t have to add anything for that one because it already has the hundred’s place. 019 which is 19, and 044 which is 44.

[00:17:35] VJ: And so now everybody has a hundred’s place, everybody has three digits, and we can start the work of sorting. Since we’re using counting sort, I’m not actually going to go through the sorting steps. But we kind of have to set it up for counting sort to run. So what we’re going to do is we’re going to just look at the unit’s place of every single one of these numbers. So the first number is 010, 10, right? And the unit’s place is 0.

[00:18:00] SY: Okay.

[00:18:01] VJ: The next number is 052, unit’s place is 2.

[00:18:05] SY: Okay.

[00:18:06] VJ: The next number is 005, unit’s place is 5. The next number is 209, the unit’s place is 9. And the next number is 019, the unit’s place is 9. And the last number is 044, the unit’s place is 4.

[00:18:20] SY: Okay.

[00:18:22] VJ: So what we’ll do is basically for each of the unit’s place that I just described, we’re going to use counting sort and just pretend that those are the numbers we care about.

[00:18:33] SY: Okay.

[00:18:33] VJ: And we’re going to run counting sort with only the unit’s place.

[00:18:37] SY: Okay.

[00:18:37] VJ: And whatever counting sort tells us is the new order will become the new order of all the numbers.

[00:18:43] SY: So if we sort it just on the unit’s place, then we would end up with 010, 052, 044, so that’s a new place, 005.

[00:19:00] VJ: That also moved.

[00:19:01] SY: Yeah. You’re right, the 005 moved because it’s a little bit coming on in a little bit later, 209 and 019. So they’re not in order, but they’re on their way to being ordered. They’re in order according to their unit’s place.

[00:19:16] VJ: Exactly. And because counting sort does a really good job of sorting a bunch of integers in a certain range, in this situation we can basically say, “Oh, our range is 0 through 9. Here are six numbers. Sort them according to these numbers and counting sort will tell us, ‘Okay, new order.’” We’ll rearrange the numbers and we know that they’re ordered according to the first digit. So can you guess what we do next?

[00:19:43] SY: So we did the unit’s place, now we’re going to move over to the ten’s place. If we look at the first item, we have 010. So the number we’re actually paying attention to is the 1. Then for the next number, we have 052, we’re paying attention to the 5.

[00:20:03] VJ: Yup.

[00:20:03] SY: Then we have 044, we’re paying attention to the first 4. 005, we’re paying attention to the 0. 209, we’re paying attention to the 0. And then 019, we’re paying attention to the 1. So if we look at these values at the ten’s place and we organize them based on that, we order them based on that, we’ll end up with 005, 209, 010, 019, 044, and 052.

[00:20:35] VJ: Exactly. And you know what’s kind of cool is now we see the 10 and the 19 kind of closer together.

[00:20:40] SY: That’s true.

[00:20:41] VJ: We see the 44 and the 52, they’re ordered correctly, like 44 is coming before 52.

[00:20:46] SY: Because they kind of started out pretty far apart, right?

[00:20:49] VJ: Yeah, almost at opposite ends.

[00:20:51] SY: Yeah. Yeah, almost opposite, now they’re getting a little closer, but that 209 is completely out of place. That 209 is like it’s the second item in the array. It’s clearly not supposed to be there.

[00:21:02] VJ: Yeah, it’s interesting. That’s the only thing that’s in the wrong place, right?

[00:21:05] SY: You’re right. Oh my goodness, you’re so right. Yes. Everything else is in order except for that last one. Okay. Okay. I see what you did there.

[00:21:15] VJ: Well, here’s the interesting bit here is that everything else in the array had only two digits to start off with.

[00:21:22] SY: Yes.

[00:21:23] VJ: Two or less actually because 5 has only one digit, but everything else was like a two-digit number. 209 was the only one that had three digits. It’s in the hundreds.

[00:21:31] SY: Yes.

[00:21:32] VJ: So we’re going to do the same thing again basically through this, again, semi-sorted array because you sorted it again with focus on the tens. Now we’re going to move on and focus on the hundreds. And in this situation, almost everything is a zero.

[00:21:45] SY: Yeah. Yes.

[00:21:46] VJ: 005, 010, 044. Like those are all just like two digit numbers except for 209. And the reason we have those zeros was so that we could easily see what the hundreds value of each of these numbers is. But interestingly, the only number here that has a value in the hundred’s place really is 209.

[00:22:07] SY: Yeah.

[00:22:07] VJ: So if we do the same step again and we focus on the hundreds and we run counting sort on our numbers, we’re going to end up with 005, 010, 019, 044, 052, and 209.

[00:22:23] SY: Wow.

[00:22:23] VJ: Which basically is the same as. 5, 10, 19, 44, 52, 209.

[00:22:30] SY: I cannot believe that worked. I really can’t.

[00:22:33] VJ: 209 in its right place finally.

[00:22:35] SY: Yeah, because when you’re talking about alphabetizing the houseplants, that felt very natural like, “Yeah, we start at the beginning, you move way down, but like starting backwards, starting from the least one and then working your way up just didn’t feel very intuitive to me and I really cannot believe this worked. That’s so freaking cool.

[00:22:52] VJ: Well, the way that it works is pretty cool because it’s just like, “Oh, I’m just incrementing the place value that I’m looking at, the digit. If I just sort by that, every single time I pass through the array, I’m getting a little bit closer to being sorted.” And here’s a really cool thing. We basically just did three passes, right? We did the first pass on the units, the second pass on the tens, the third pass on the hundreds.

[00:23:14] SY: Yes.

[00:23:15] VJ: And the largest number was three digits and the number of passes was three passes. So basically we can deduce from this that we’re going to have to pass through our data set the number of times that maps to the number of digits in the largest number. So instead of 209, we had 5,642, well, that’s four digits so we’d have to do four passes.

[00:23:40] SY: So alternatively, if we did not have the 209, we could have stopped after the second pass.

[00:23:46] VJ: Yeah, instead of 209 it was 29.

[00:23:48] SY: Right.

[00:23:48] VJ: By the time we sorted the tens, we would have been done.

[00:23:51] SY: Okay. So what does that mean in terms of how efficient this algorithm is? Because three passes sound like a pretty good deal, right, considering that we had, what is it, six elements, six elements, three passes. That sounds pretty reasonable. How does this actually perform?

[00:24:06] VJ: Basically that number of digits in the largest number, when we’re talking about it in the context of big O notation, we can kind of abstract that out into like a variable. So a lot of the times you’ll hear it being like the variable K for example and like K can represent the number of digits in our largest number.

[00:24:25] SY: Okay.

[00:24:26] VJ: So the big O notation in terms of how many passes we’ll have to make through this, the time complexity is basically the number of elements times that constant K.

[00:24:38] SY: Okay.

[00:24:38] VJ: So if K is a small number and we’re just running through the number of elements in the array that we’re trying to sort, then it’s big O of N times K. And if K is like very small, then it’s almost linear because K is pretty miniscule in terms of like what you’re actually doing. So for example, when we ran counting sort last week on a set of numbers and we ran it once, I mentioned it was the first time we had a linear runtime. So if K is 1, then really it’s just O of N times 1, which is 11. If K is 2, then it’s slightly larger. If K is extremely large, well, now…

[00:25:19] SY: It’s messing everything up.

[00:25:20] VJ: Now you’re not really being efficient about it and maybe radix sort is potentially not good.

[00:25:27] SY: Okay. So this is a pretty good algorithm then.

[00:25:30] VJ: Yeah. I think it’s good if you have to do something like what we did, like you have a small set of numbers, the range is not too big. If I wanted to throw a wrench in this, I could give you a set of numbers and then throw one in there that had like seven digits and that would make it not as good. But for certain scenarios, I think radix sort works really well and I think it’s pretty beautiful when you start thinking about the fact that you’re just repeating either the same steps over and over again or you’re just iterating through a collection whether you’re doing by most significant digit or least significant digit, you’re kind of following the same pattern and you’re just using counting sort to do the work for you and just sorting kind of like one step at a time, which is kind of nice.

[00:26:14] SY: Yeah, very cool. And that’s the end of today’s show. If you like what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link to that is in your show notes. Also want to give a huge shout out to Linode for sponsoring the show. Linode is giving you a chance to try their powerful servers built for all your infrastructure needs. They’ve got nine data centers worldwide with new data centers opening this year in India and Canada. Getting started with a shiny new server takes less than one minute so you can get up and running in no time and you can get a $20 credit by using the promo code CodeNewbie2019. Just go to linode.com/codenewbie for more info. And want to say thank you to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/CodeNewbie. This episode was edited and mixed by Levi Sharpe.

[00:27:23] VJ: Bye everyone.

[00:27:24] SY: Thanks for listening. See you next week.

Thank you for supporting the show!

Thank you for supporting the show!