Description

We've gotten acquainted with heaps as arrays, now we're diving into heap sort with some help from a few condiments!

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Heapify All The Things With Heap Sort from her basecs blog series.

Transcript

[00:00:00] SY: If you haven’t yet gotten your tickets for Codeland, you totally should. It’s our annual conference about all the wonderful things you can do with code. And besides great food, great talks, and great people, this year we’re offering complimentary on-site childcare. So bring your babies with you and see you there. For tickets, go to codelandconf.com.

[00:00:23] (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:30] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:33] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about…

[00:00:37] VJ: Heapsort.

[00:00:39] SY: This season of the Base.cs Podcast is brought to you by Linode. If you’ve got a personal project, a small business or a big business with lots of data, Linode offers you a secure hosting for all your infrastructure needs. They are a Linux Cloud Hosting provider where you can get a new server up and running in under a minute. Plans start at one gigabytes of RAM for just five bucks a month. And with the promo code CodeNewbie2019, you can get a $20 credit. So go to linode.com/codenewbie and give it a try. Also, they’re hiring. Check out their jobs at linode.com/careers. Link is in your show notes.

[00:01:14] It’s also brought to you by Square. If you’re designing a website or building a mobile app, you’re probably going to want to take payments at some point. Square APIs allow you to easily implement their payment form on your website. And with their In-App Payments SDK, you can add it to your mobile app too. You don’t have to worry about dealing with PCI compliance because they’ll take care of that for you. Don’t let anything come between you and your money. Start building with Square over at squareup.com/go/codenewbie.

[00:01:48] SY: So I assume heapsort has to do with heaps.

[00:01:51] VJ: It does. It does. It does.

[00:01:52] SY: That fair? Okay. So before we get into heapsort, can we do a quick heap recap?

[00:01:57] VJ: the heaps that we’ve been talking about, they are binary heaps specifically, and the reason that we call them binary heaps is because they look very similar to binary trees. So they look like tree data structures, but they have to follow two specific rules, which is what makes them heaps. We’ve talked about this quite a bit. So I won’t go too much into detail, but the two rules are that they have to follow a shape property where they need to be complete trees or complete heaps rather and they also have to follow a Heap Order Property where all of the nodes in the tree have to follow either a max-heap order or a min-heap order and that’s the way that the nodes in the heap have to appear in a certain order.

[00:02:39] SY: So that’s what heaps are. What is the heapsort algorithm? We talked about many sorting algorithms. This is I guess another one. What is this one all about?

[00:02:51] VJ: The heapsort algorithm is a way of, as the name would suggest, sorting the elements in a heap, but it’s also very closely tied to using an array structure to help us do that. Basically, we sort through the nodes in a binary heap structure by finding the largest element and then basically pulling that out and sorting it and repeating those steps in order to get the next largest and the next largest until we’ve sorted the whole heap structure and ended up with a sorted array.

[00:03:23] SY: So when we do the heapsort algorithm, when we talk about sorting algorithms in the past, it’s been, “Here’s a list of numbers. Now we need to sort.” When we’re talking about the heap sort algorithm, what are we starting with?

[00:03:37] VJ: This kind of ties into how heaps are pretty powerful when you pair them with other data structures specifically with an array and last episode we talked about how a heap has a unique relationship with an array and how it can be used in a priority queue and that relationship with an array is the same thing that’s going to come up when we start with heapsort. So the first thing that we’re going to do usually is you’re usually not going to start with a heap. You’re usually going to have an unsorted collection. Most of the time no one’s going to like be like, “Hey, here’s an unsorted heap for you.”

[00:04:11] SY: It’s sort of what I was thinking. I was like, “Ah!”

[00:04:14] VJ: And I guess also depending on it, if you had an unsorted heap, you could very well be breaking some of the properties of the heap, right?

[00:04:21] SY: That’s true. Yeah.

[00:04:22] VJ: Because you have a certain order.

[00:04:24] SY: Heaps are very particular. Yeah.

[00:04:25] VJ: Yeah. So I don’t even know if you can have a completely randomized unsorted heap because then you could be breaking that order rule.

[00:04:32] SY: It’s not a heap. Yeah.

[00:04:33] VJ: Yeah. So really when we’re talking about heapsort we’re actually talking about using heapsort to help us sort an unsorted collection. And for the purposes of this, we’ll think about being handed an unsorted list and we are like, “Oh, this is basically like an unsorted array. We have unsorted collection of items and we can use heapsort to help us sort through that.” So it’s kind of like heapsort is going to be the tool in our tool belt that we reach for and we can use that with its relationship with arrays to help us sort through anything really.

[00:05:05] SY: So basically we’re starting with this unsorted collection of items. Can we just say it’s an array of items? Is that okay?

[00:05:12] VJ: Yeah. Yeah. Let’s call it an unsorted array.

[00:05:14] SY: So we are starting with an unsorted array and then somehow the heap comes in. How do we get from array to heaps?

[00:05:25] VJ: So the first thing that we have to do is we need to take our unsorted array and we need to turn it into a heap and what we’ll probably want to do is we want to turn it into a max-heap when you’re using heapsort. I think you could probably do heapsort with a… what’s the opposite of max-heap? Min-heap.

[00:05:43] SY: Min-heap.

[00:05:45] VJ: I think you could do it the same way. The logic still applies. It’s just like whether you’re sorting with the largest or the smallest first. But let's just say for the purposes of this, we want to turn an unsorted array into a heap, specifically a max-heap, and you’ll remember from a previous episode, our last episode, we talked about how you can use the index of an element to figure out where its children would live in a heap. So if you kind of had like a mirror image, like if you imagine an array looks into a mirror and a heap looks into that same mirror, they can kind of see themselves reflected within each other. I like the idea of like an array in a mirror.

[00:06:23] SY: That’s kind of beautiful.

[00:06:24] VJ: Yeah. They’re just like looking into the same mirror and they’re like, “Am I beautiful?” And then the other one is like, “You’re beautiful! You’re just like me, but opposite.” I don’t know where I’m going with this.

[00:06:36] SY: That was amazing. That was great. I really appreciated that. Yeah.

[00:06:40] VJ: Basically we can say that, we know that an array can be transposed as a heap, right? You can represent them similarly. By using the index of an element in an array, you can figure out, “Oh, where its two children? What are the indexes that it lives at, the indices?” And if I give you an index, you could find the index of an element’s parent because we have those handy algorithms to figure out. They’re like, “2 I plus 1, 2 I plus 2.” That’s what we were talking about previously in our last episode and this is like where that connection comes into play. So we’ll basically use those algorithms to help us build a heap out of our unsorted array.

[00:07:21] SY: So we start with our unsorted array. We use that understanding between how an array maps to a heap to take that array as it is and make it into a heap and then from there, what do we do?

[00:07:33] VJ: I should add one thing. When we return the array into a heap, like immediately the first thing that we’re going to end up with is not actually a heap, it’ll just be a binary tree, right? Even if we end up with a complete structure, we need to make sure that it follows our Heap Order Property. So we’ll probably need to have a function, some sort of algorithm that knows how to turn a tree into a heap by making sure that it follows the property. So we’ll turn our heap-like structure into an actual heap that follows our max-heap property.

[00:08:05] SY: Got it. So we are doing like a heap attempt. And in that attempt, it’s in a tree, it’s a binary tree, but it may not follow all the heap rules but then we are going to, I assume, swap things to get it to be a max-heap.

[00:08:22] VJ: Imagine that we had like a handy function to do this, it would probably be like a function that just knows how to build a heap and it knows like how to look at a parent and its children and figure out where something should go and it’ll heapify something down or up accordingly.

[00:08:37] SY: In terms of the details, it sounds like we don’t really have to worry about how it’s doing that. We just know that it is doing it, but the important step is that we’re going from that unsorted array to a max-heap.

[00:08:47] VJ: Yeah. I’m not even going to think about figuring out how to build the max-heap right now because we could do the whole episode on that. But imagine that we have an array that’s been turned into a max-heap because this is where the cool part happens and this is where like the sorting happens and that’s the fun part, I think.

[00:09:04] SY: Okay.

[00:09:04] VJ: So say we’re super smart, we’ve figured out how to build a max-heap. Now we have one. We’ll recall that the max-heap property means that every single parent node is larger than or equal to its children, right? We’ve talked about this before and this also means that the root node is going to be the largest element in the heap by default. If you know that the root node is going to be the largest element and you’re trying to sort through a collection, you kind of have your foot in the door already because you have this piece of information that’s super helpful to you, which is that the first element in your list is actually the one that is at the root because it’s like, “Oh, it’s going to be the largest, it’s going to have to go at one end of the array.” And let’s say we’re doing like largest to smallest. Well, this is kind of perfect because now we have the largest number that needs to be at the front of our sorted array.

[00:09:58] SY: Yes.

[00:09:59] VJ: However, we also know from previous episodes we can’t just pull off the element that’s the root node because if you pull off the top node…

[00:10:06] SY: It’ll be mad at you.

[00:10:08] VJ: Yeah, and you will have broken the whole family.

[00:10:10] SY: Right.

[00:10:11] VJ: It’ll be very sad.

[00:10:11] SY: And there will be no heap at all.

[00:10:13] VJ: Yeah.

[00:10:14] SY: The heap needs its root. Yeah.

[00:10:15] VJ: Yeah. It’ll ruin the whole structure of it and we can’t just blindly do that. Do you remember how to delete a node from a heap?

[00:10:24] SY: Okay. I remember there’s only one safe place to do it, right? There’s only one safe place to either add or remove from a heap. Okay. It has to be the last row in the leftmost child.

[00:10:39] VJ: Yeah. We could basically say it’s like the last node in the last level.

[00:10:45] SY: Yes. The last node in the last level. Yes. There you go.

[00:10:47] VJ: Yeah. So that’s the safest place. So we kind of talked about this in previous episodes and we talked about how do you insert and delete things from a heap and the interesting thing with inserting, you can just add it there and figure out where the element needs to live. But with deleting, if you want to delete the root node, you can’t just pull it off, but you know that it’s safe to delete the last… we’re going to just call it the last node in the tree, sorry, the last note in the heap, which is our leftmost last level node. So we know that the last node in the heap is safe to delete. We know that we can’t delete the root node, but we can swap them.

[00:11:23] SY: Yes.

[00:11:23] VJ: So we know that by swapping the root node and the last node, now the root node is safe to delete.

[00:11:29] SY: Yes.

[00:11:30] VJ: We also know that we can just heapify things up and down so we can deal with solving the order property later. We’ll just move the new root node to the right place.

[00:11:38] SY: Yeah. We’ll just swap them up or down until they’re in their appropriate places.

[00:11:43] VJ: Exactly. Now that we have this max-heap and we know that we want to pull the root node off basically and say, "Hey, this is the first thing we’re sorting. Well, we can use what we already know about deleting nodes from a heap and we can swap the root node with the last node in the heap."

[00:12:01] SY: Yes.

[00:12:02] VJ: So now once you’ve swapped the root node and the last node, the last node, we don’t even care what it is at the moment. It’s become the new root node and the root node, which is our largest element, is the one that is at the last node spot. So we can just remove it from our heap.

[00:12:18] SY: Yeah, do the same process over again?

[00:12:20] VJ: Yeah. And the interesting thing about this is we’re actually doing some sorting here, right? Because we’ve pulled the largest element from the heap because we know that heaps follow this rule so we know that we are sure that the largest element is going to be the largest of the whole data set. So the root node is going to be the largest in the whole data set. So now we know, "Oh, I know it’s safe to sort this first and I can pull it off the heap."

[00:12:47] SY: Okay. So now that we can pop off that last node, the only one that’s safe to pop off from, what do we do with that?

[00:12:55] VJ: Well, basically once you remove that node and you know that it’s like the largest one, that node and the element that it maps to in the array, because remember we started with this unsorted array, we can consider that element to be sorted because we know that it’s the largest. We know that it was the correct item to remove from the heap based on the Heap Order Property and because we’ve removed it we know that it’s been sorted and everything else that’s remaining in the heap still has to be dealt with.

[00:13:27] SY: And everything else in the heap is smaller than that number.

[00:13:30] VJ: Correct. Yes.

[00:13:32] SY: Okay. So I think I’m getting the idea and I think I’m getting the rhythm for what we’re doing, but I think an example would help me. Can we walk through an example?

[00:13:39] VJ: Yeah.

[00:13:40] SY: Okay. So we have an unsorted array. Let’s say it reads 3, 19, 1, 14. So that’s 3, 19, 1, 14. The first thing we’re going to do is we’re going to turn it into a max-heap. Okay. So if we do that, we know that the root is going to be the biggest number. So for 3, 19, 1, 14, the biggest number is 19. So we have that as our root node and then we’re going to take our second biggest number and put it as the left child of 19. So again 3, 19, 1, 14, that number is 14. And then for our right child, we’re going to take the next biggest number which is 3 and then we only have one number left which is going to be the left child of my number 14. So from the very top, it’s 19 then from left to right 14, 3, and then the next level down just 1.

[00:14:40] VJ: Exactly. And this currently constitutes as a max-heap, because first of all, we filled up all the nodes from left to right level by level and every single parent node, in this case, there are only two parent nodes. There’s the root node, which is 19 and then 14, which is the left child of 19. Both of them are greater than or equal to the value of their children because 19 is greater than or equal to 14 and 3 and 14 is greater than the number 1.

[00:15:06] SY: Okay. That was the first step. So we’re well on our way. Now the second step is where things get really fun. So that’s where we want to take that largest value and put it into the right place in the array. So we know that the largest number is my root number so it’s that 19. But as we talked about earlier, we cannot just remove the root node because if we do then we no longer have a heap, but we can swap things.

[00:15:34] VJ: Exactly.

[00:15:35] SY: So we have the only safe place to add and remove from this heap is where the one node is. So we’re going to swap 19 and 1. And if we do that, my heap now looks like number 1 at the root then 14 and 3, then on the lowest level, 19.

[00:15:55] VJ: Exactly. And we now know that 19 is the last node in our heap and one isn’t in the correct spot, but we know…

[00:16:04] SY: The 1 is all messed up.

[00:16:05] VJ: Yeah, 1 is messed up.

[00:16:07] SY: We’re going to talk to 1 later.

[00:16:08] VJ: We know that we can deal with that later. So it’s okay.

[00:16:10] SY: Yes. As a temporary, it’s okay. It’s an intermediate stuff. We’re going to be alright. We’re going to fix it. Okay. So now that 19 is in that last node on the bottom, that place where it is safe, we can now pop it off and stick it at the end of our new array.

[00:16:25] VJ: So this is sort of… I understand why that’s the way you’re kind of imagining it, but here’s a cool thing about heapsort. I’ve been saying we’re popping it off of the heap, but remember that the heap is just a mirror of this array, right, that it maps to.

[00:16:40] SY: Right.

[00:16:41] VJ: And the cool thing about heapsort is it doesn’t actually need all this external memory. We don’t need to create a copy of this array because we have indices and we know that the indexes of the items in the array have this property where you can figure out who’s the parent and who’s the child. So really all we’re doing is when we move the 19 around in the heap and we find its new correct space, as we move things around, all we’re doing is moving around the indexes of the elements of the array. We don’t have to copy a new array and stick it at the end. So what we’ve done is by putting 19 at the end, we actually just change what its index is.

[00:17:20] SY: That sounds very efficient. If that’s the way I would have guessed to do it, like I would have all kinds of new arrays, but it sounds like a very efficient way of using space.

[00:17:29] VJ: Yes. And what’s kind of cool is when we started off the heapsort algorithm, we had our array originally was 3, 19, 1, 14, right? Now our array reads 1, 14, 3, 19, and we need to do something about the fact that 1 is the root node. But I mean, I guess spoiler alert, we’re going to put 1 in the right spot. It’s not a very big heap. So it’s okay.

[00:17:55] SY: I can’t believe you ruined the end of that story. Gosh!

[00:17:57] VJ: Well, we’ll remember from previous episodes that we will swap the root node and we’ll percolate it down or like the term is heapify down, right? So we’ll heapify 1 to its correct spot. And in this case, we’ll look at its two children, 14 and 3, and say, "Oh, 14 is bigger. Swap those two." So now at the end of this actual process of sorting the number 19, our new array reads 14, 1, 3, 19.

[00:18:27] SY: Yes.

[00:18:27] VJ: And 19 is considered sorted. We know that it has been sorted. It’s in the right place. We know that 14, 1, and 3 need to do some like shuffling around. There’s some musical chairs that needs to happen.

[00:18:37] SY: There’s some work there.

[00:18:38] VJ: Yeah. But one of the elements, one of the four elements isn’t its correct space. So that’s good.

[00:18:42] SY: So actually if we keep going there, we have our 14 and then 1 and 3. And if we follow the same steps, so we take that root node, which we know is the biggest number which is 14 and we want to put it in its safe space, would we swap it with 3?

[00:19:00] VJ: Yup. That is the safe. That’s the last node now.

[00:19:02] SY: That’s the new safe space. Okay. So then our heap, that’s technically not a heap right now, would read 3, 1, 14. So 3 at the top and then 1 and then 14. And now 14, we can pop off and… I want to put it in a new array, but that’s not really what we’re doing. We are changing the index of 14 so that it comes right before 19.

[00:19:27] VJ: Yes. Exactly. We’re just saying, "Okay, 14, you’re taken care of. We’re considering you sorted." And when we remove something from a heap basically, when we are going through the heapsort process, all we’re basically saying is this element is now in the correct spot in its array.

[00:19:41] SY: Yes.

[00:19:42] VJ: That’s really the big momentous…

[00:19:46] SY: Okay. I like my momentous. I’m just saying. It sounds really monumental.

[00:19:49] VJ: It’s bigger than a moment. That’s what it means.

[00:19:52] SY: Yes. Okay. So at this point, we have half of our array sorted. We have our 14 and 19 sorted. We just have two numbers left. We have our 3 at the top and then our 1, which is the child of 3. So at this point, we know that the root node, that number 3 is the biggest one, but we still, even though they’re just two of them, we still cannot just remove it. We have to bring it to its safe space and there’s only one number to switch it with. So we’re going to swap 1 and 3. And so 1 is going to be my new temporary root note and then we have our 3. Now our 3, we’re going to kind of pop off and put it in its right spot which just means we’re going to give it the right index and now we have sorted the 3, the 14, and the 19.

[00:20:43] VJ: Exactly. And our array now reads 1, 3, 14, 19. And hey…

[00:20:49] SY: We’re done.

[00:20:49] VJ: It’s actually sorted. So here’s the cool thing is when our algorithm…

[00:20:53] SY: Oh, that’s so cool.

[00:20:54] VJ: I know, right? And when our algorithm gets to the point where there’s just one node left, it doesn’t even have to sort anything because it knows, "Hey!"

[00:21:01] SY: Okay. I was wondering about that.

[00:21:02] VJ: Now we know if this is a max-heap, which is what we’re dealing with. In this situation, we know, "Oh, I’ve gotten to the element that’s the smallest. Therefore, it must be the first in the array." And if it was a min-heap, then it would be the element that’s the largest and I guess it would be at the end of the array. I haven’t actually ever tried to implement a min-heap array. I’ve always seen them as a max-heap. So anyways, that’s the cool thing is you get to the end and you don’t even have to do anything that doesn’t really pop off. You just know automatically, "Oh, one element left. It’s in its right place. Nothing to swap it with. No indexes that need to rearrange."

[00:21:35] SY: That was a lot of fun. I think that was my most fun sorting algorithm that we’ve done so far. It just seemed like just so much fun. Maybe it’s because there are shapes involved, you know, because we made a tree and then we put it back in the array even though technically the same array. There’s just a lot going on. It was fun.

[00:21:52] VJ: I’m really glad. And you know what’s really funny is that you thought that this was really fun, but. I actually think that it’s very similar to another algorithm that I don’t think you thought was fun, which is why it’s funny to me. It’s like I presented the same algorithm. I was like, "Do you like it with this ketchup on top?" And you’re like, "Oh, I like it with ketchup." And then I was like, "Ha-Ha-Ha! It’s the same thing." Okay, I should tell you which one. So I think heapsort, it’s almost like a super improved version of selection sort, which is the one…

[00:22:26] SY: I do not like that one.

[00:22:28] VJ: Yeah. I know. See, there’s the ketchup.

[00:22:31] SY: I need my ketchup.

[00:22:34] VJ: It’s basically the algorithm where we were finding the largest element in an unsorted collection and then we were like plucking it and ordering it at the back of the list one at a time, except heapsort is way better, I think, because it uses heaps to do this.

[00:22:50] SY: The ketchup.

[00:22:51] VJ: Yes. It uses heaps, the ketchup of the computer science condiment world.

[00:22:58] SY: I’m okay with that. Yeah.

[00:23:00] VJ: Since it uses heaps, it actually ends up being… this runs in like linearithmic time. So it’s way better than selection sort. I think it’s really fun because heaps are pretty cool and I like that we can just sort of swap things, move them around, and be like, "Oh, sorted, sorted, sorted, sorted. Oh, I’m done."

[00:23:16] SY: Yeah. I like it. And that’s the end of today’s show. If you like what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link is in your show notes. Also want to give a huge shout out to Linode for sponsoring the show. Linode is giving you a chance to try their powerful servers built for all your infrastructure needs. They’ve got nine data centers worldwide with new data centers opening this year in India and Canada. Getting started with a shiny new server takes less than one minute so you can get up and running in no time and you can get a $20 credit by using the promo code CodeNewbie2019. Just go to linode.com/codenewbie for more info. And want to say thank you to Square. Square has APIs and SDKs to make taking payments easy whether you’re building a mobile app in iOS, Android, Flutter, or React Native. If you want to embed a checkout experience into your website, it’s super easy. Square has processed billions of transactions so you know it’s tried and true. Start building with Square over at squareup.com/go/CodeNewbie. This episode was edited and mixed by Levi Sharpe. Vaidehi, you want to say goodbye?

[00:24:27] VJ: Bye everyone.

[00:24:28] SY: Thanks for listening. See you next week.

Thank you for supporting the show!

Thank you for supporting the show!