CodeNewbie

Description

You probably sort things all the time -- files, clothes, dishes. But have you thought about how to categorize your sorting? How do your sorting algorithms hold up in terms of, say, time complexity? We give you an introduction to sorting algorithms, what they are and what they're used for, and dig into the six ways we can classify them.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Sorting Out The Basics Behind Sorting Algorithms from her basecs blog series.

Transcript

[00:00:00] SY: Okay. I’ve got a very exciting announcement. We finally have a date for Codeland, our annual conference. It’s happening July 22nd in New York City. Early bird tickets are now on sale. For just 99 bucks, you get breakfast, lunch, dinner, a coding workshop, technical talks and after-party, and of course an amazing community. Sale ends March 10th. Go to codelandconf.com for more info. Link is in your show notes.

[00:00:29] (Music) Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:37] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:39] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today we’re talking about...

[00:00:44] VJ: Sorting Algorithms.

[00:00:46] SY: This season of the Base.cs Podcast is brought to you by our wonderful friends at DigitalOcean. DigitalOcean provides the easiest cloud platform to deploy, manage, and scale applications of any size. They remove infrastructure friction and provide predictability so you can spend more time building what you love. Try DigitalOcean for free by going to DO.co/codenewbie and get $100 in infrastructure credit. Link is in your show notes.

[00:01:15] This season is also brought to you by the fine folks at Linode. If you’ve got a personal project, a small business or a big business with lots of data, Linode offers you secure hosting for all your infrastructure needs. They are a Linux Cloud Hosting provider where you can get a new server up and running in under a minute. Plans start at one gigabytes of RAM for just five bucks a month. And with the promo code CodeNewbie2019, you can get a $20 credit. So go to linode.com/codenewbie and give it a try. Also, they’re hiring. Check out their jobs at linode.com/careers. Link is in your show notes.

[00:01:53] SY: (Music) Okay, let’s get started. Okay, so sorting algorithms. So I think I’m going to like this one because there are two words and I feel like I know a little bit about both of them. So I’m feeling pretty good.

[00:02:10] VJ: You’re also like organized, so you also probably are a big fan of sorting things in general.

[00:02:16] SY: I do like, I do like a good sort. I really do. I do like a good sort. I said sort, not snort. Okay. So sorting is, you know, putting things in their proper place and grouping things sort of kind of, right? And algorithm we’ve talked about before. Remember when I said don’t tell me that algorithm is like a brownie and then you kind of basically did that and then I was like, “Vaidehi, I just said don’t do that,” and then you had like a whole thing about it.

[00:02:44] VJ: Oh, yes.

[00:02:48] SY: So we’ve talked about algorithms before, but I don’t think we’ve talked about these two things together. So what is a sorting algorithm?

[00:02:54] VJ: So we all know what sorting is, but for the context of a sorting algorithm, I just want us to all be on the same page about it. When we’re talking about sorting and the idea of sorting, what we’re really talking about is organizing items and those items can be anything, in some… I should say similar items. It’s not just organizing items, but organizing items that are similar in some way by some sort of property in like increasing or decreasing order. So you need to know what the items are in order to sort them and you need to know what the property is that you’re going to sort by and then you need to know are you sorting by increasing or decreasing order. So those are the important bits.

[00:03:33] SY: That’s actually a lot of information.

[00:03:35] VJ: Yeah.

[00:03:35] SY: Now that we’ve listed them out, that’s like you need a couple ingredients for this recipe.

[00:03:40] VJ: Sorting is not a game. It’s serious business.

[00:03:45] SY: This is hardcore. It’s tough stuff. Okay. Cool. Okay. So we got our three ingredients. Tell us more.

[00:03:52] VJ: Another interesting thing about sorting and sorting algorithms is a sorting algorithm usually results in some kind of sorted lists, some sort of output, right? And it’s after the algorithm has done its job of sorting it by whatever property and whatever order, it gives us this output, it gives us a sorted list. Another important thing to note is that a sorted list is really just a permutation of the original unsorted list. So what I mean by that is it’s a reordering of our OG data.

[00:04:27] SY: Oh, interesting. I never thought of permutations in that way of being like it’s another version, you know, and there’s many different versions we can have depending on how we sort them and in what order, in what property. Interesting. Okay. So when I… I’m trying to think like what are things that I sort in real life, laundry. Laundry, that’s an easy one, right?

[00:04:51] VJ: Yeah. Yeah.

[00:04:52] SY: So when I sort… I’m pretending like I sort my laundry. (Laughter)

[00:04:57] VJ: I: love it because you were the one who suggested it so enthusiastically and so I was like, “Wow, I wonder how she sorts her laundry. Is it by color? Is it by size? Is it by garments?”

[00:05:06] SY: All my clothes are messed up. I was actually at the Levi’s this weekend and I said to the lady, I said, “Hey, how do you wash jeans?” And she’s like, “You don’t.” And I said, “Oh, okay, okay, that’s cool.” But yeah. She informed me that you’re not supposed to and if you do, you should do it inside out and you should hang dry them. And I said, “Well, I have never done that right.” So that’s why all my jeans are faded and just too stretched out. So there we go. Okay. So yes. So sorting laundry though, you know, if you do, if you’re that type of person, it makes sense because you want to have all the whites together and then you can add bleach and it’s fine, you’re not going to ruin your colors and the whites will be brighter and then the color will, you know, whatever. So that makes sense, right? There’s like clear advantages to doing that. Why do I want to sort things in the computer science context? What do I get?

[00:06:10] VJ: Oh, this is a great question. I think a good example of this is remember our binary search tree and the fact that we could run the binary search algorithm on it or I think we had even another episode where you talked about another thing that we can do with a binary search tree. I’ve even forgotten what it was, but there are lots of things you can do with a binary search tree because it is sorted and the reason why sorting lends itself really well and why computers leverage sorting is because they make it very easy to read data and then search and retrieve it because it’s organized. And when you know that something is going to be organized, well, now you can be smart about how you’re going to add something to it or remove something or find something.

[00:06:56] SY: Okay. So it’s like a pre-step, like before you do the operation, if you sort it first then the operation would ideally be faster, easier, take less processing power, that sort of thing.

[00:07:11] VJ: Yeah. Yeah. And like you can write clever algorithms or you can use data structures that will make it easy for you. For example, like an Array, an Array has to have data that’s sorted in some way because it’s an ordered index thing.

[00:07:27] SY: That’s true. Yeah. Yeah. Yeah.

[00:07:29] VJ: If you have data that you can sort, then now you can do a whole lot of other things like use indices or find something quickly or add or remove something and put it in the right place quickly because of the fact that it’s sorted.

[00:07:42] SY: Okay. That makes complete sense because I’m thinking, you know, if I am in an app or something and I have a list of stuff and I want to quickly find something, usually the first thing I’ll do is I’ll sort it in alphabetical order, you know, if there’s a sort option in how I’m viewing my data or my files, right? That’s just in Finder on my computer. The first thing I’ll do is, you know, list it, sort it in alphabetical order by filename.

[00:08:06] VJ: Oh, that’s interesting.

[00:08:08] SY: Right? Like that makes it so much faster for me to then find, right, if find is my operation, the thing I’m looking for or other times if I know, “Okay, this was the last recording that I did,” then I’ll sort by date created.

[00:08:22] VJ: Yeah. Yeah. I was going to say I was surprised that you sort by alphabet because I cannot keep my files under control so I just sort it by last added. So I basically like if you visualize my life it’s like files on files and I don’t go looking for the last files down there like once in a year. All I’m doing is just grabbing a file from the top. I’m just like, “I can’t even go digging through this.”

[00:08:44] SY: This is so good. This is no bueno. Okay. We need to work on this. This is not sustainable.

[00:08:49] VJ: No, it’s not. This is why throw back to another episode. My email inbox is atrocious. This is my problem.

[00:08:56] SY: But not as bad as mine. Okay. So I totally get how, you know, if I put on my computer hat, that means that I’m a computer and I think about my own data, which is usually like files since that was usually what I’m dealing with and I’m always sorting and I’m sorting by different properties and you know in descending and ascending and all that and that makes it easier for me to find stuff. So it makes total sense that you’d want to do that with your data. But my next question is when it comes to programs, what kinds of things are machines sorting? So I’m sorting laundry, I’m sorting through file names, we’re talking about sorting algorithms, what kinds of stuff are we talking about?

[00:09:43] VJ: You can really sort anything that has a property that it can be sorted by I think is the answer to that because if you have a string you could sort it by… like you could alphabetize it, if you have documents you can sort by size, if you have numbers you can sort by numerical order, if you have time stamps or dates, you can sort by most recent or the oldest. You know, you can basically name any sort property as it were and if it’s a valid property, then you can sort by it. The important thing is you need to be sure that your whole dataset has that property on it. So if you have a bunch of timestamps and then you have an image of a cat, well, now that’s not helpful because unless the image has like a time stamp on it, like I don’t know how to sort by that if I’m a computer. So there’s an important rule here, which is that your dataset that you’re sorting has to have items that are homogeneous. They have to be of the same type. So you can’t sort a dataset that’s like numbers and you know like, letters because there’s no shared property to sort by. You’d have to give them some sort of shared property or you’d have to change your dataset or divide it up and sort the datasets independently.

[00:10:57] SY: Interesting. Okay. That makes total sense. I like that everything so far is logical and you know same with… oh, actually this drives me nuts when I’m sorting by file size. Sometimes my folders, in Finder I mean, sometimes the folders, yeah, I haven’t calculated the size of the folder yet, I haven’t pressed the button or it hasn’t finished calculating. And so it’ll just like be at the bottom. And I’m like, “Well, that’s not helpful. I thought, you know, I can’t sort by because it’s nothing, there’s no value there.” You know, so it just all kind of huddles at the bottom. I think it’s just hiding from me with, you know, because it’s full of shame and how it’s not being very helpful it.

[00:11:33] VJ: Wait, is it an empty folder?

[00:11:34] SY: No, it’s not an empty folder. It’s just that, you know, when you because all the files themselves have, you know, their size already there, but for the folders because it’s, you know, it holds a bunch of other stuff, usually it takes a while to actually calculate it or you have to actually go and say like, “Calculate, you know, the file size for this thing.” And so if that isn’t done, then it’ll just be nothing. It won’t have a value there and then it just goes to the bottom being useless.

[00:11:59] VJ: Well, that’s a lie. Those folders aren’t empty.

[00:12:03] SY: That’s right.

[00:12:04] VJ: It’s bad design.

[00:12:05] SY: Exactly. Yeah, you got it. You got to have those properties all the time. Yeah. Okay. So we talked about all the different things that we’re sorting. Now where does the algorithm part come in? Because so far we’ve really just been focusing on the sorting of sorting algorithm, but where, but what about the algorithm part of sorting algorithm?

[00:12:27] VJ: Well, the algorithms are really just different strategies for how to sort and you’ll remember from back when we did like DFS and BFS we talked about the fact that you can have different variations on a search algorithm and there are different strategies. So it’s the same thing with sorting algorithms. You might want to have a different strategy based on the type of data or the size of the data. Those strategies, those are really how we kind of classify sorting algorithms. So a good way of thinking about it is that you could have a sorting algorithm and you might have a sorting algorithm that leans heavily on preserving one strategy and maybe doesn’t care about another strategy and those are the ways that we classify sorting algorithms are by these. There’s actually six different types of sorting algorithms that are commonly used.

[00:13:18] SY: That’s way more than I thought. I thought there’d be like you can pick from two, but we actually have six of them.

[00:13:23] VJ: Yeah. Yeah.

[00:13:24] SY: Wow!

[00:13:25] VJ: And actually some of them, some of the six are actually twofold. There are more ways of describing the algorithm. So like you could have actually, you know, why don’t I just tell you what you can…?

[00:13:35] SY: Okay. Let’s do it.

[00:13:36] VJ: Let’s get right into it. I just keep dancing around, but I think we’re ready and let’s dive in. So the six different ways of classifying a sorting algorithm, they are: You can look at time complexity.

[00:13:49] SY: Oh, we’ve talked about that before.

[00:13:50] VJ: We’ve talked about that a lot.

[00:13:51] SY: Right. That’s like the Big O Notation stuff. Cool, cool. Familiar.

[00:13:55] VJ: That’s our familiar friend. We can also talk about space complexity or the memory usage of a sorting algorithm. That’s another way of classifying.

[00:14:04] SY: We’ve also talked about that before as well.

[00:14:06] VJ: We’re going to get into some stuff we haven’t talked about now. You can also classify by a sorting algorithm stability and whether or not it’s stable.

[00:14:13] SY: Oh!

[00:14:14] VJ: You can also classify a sorting algorithm based on whether it is internal or external, which has to do with memory a little bit and you can classify sorting algorithm. This is number five, by the way, in case you’ve lost count because we just keep listing them out. Number five is that you can classify it by whether it is recursive or non-recursive.

[00:14:34] SY: Okay. And recursive we’ve talked about before. That’s a familiar idea.

[00:14:38] VJ: And finally we can classify an algorithm based on whether it is comparison or a non-comparison sort.

[00:14:47] SY: Okay. So for things like stability, internal versus external and comparison versus non-comparison, are we talking about the algorithm itself or the output of the algorithm? What is the stability in the internal, external? What is it referring to on a high level?

[00:15:12] VJ: Well, in some cases it is talking about the output. For example, when it comes to stable sorting algorithms, you can basically classify if it’s stable or not looking at the output and whether it preserves the order of elements in a certain way, but the reason that it is stable is because of the algorithm itself, right?

[00:15:32] SY: Right. Right. So it’s kind of both.

[00:15:34] VJ: Yeah. Yeah. In some situations like a comparison sort, you don’t actually look at the output so much, you’re looking at how the algorithm compares between two items. So you’re not really looking at, “Oh, does it output it in this way?” You’re looking at how the algorithm works internally, like how it functions, if it’s doing one thing versus another.

[00:15:57] SY: Interesting. Okay. Let’s dig into… we’ll dig into the first one, time complexity. What does that look like?

[00:16:04] VJ: Well, I think the easiest way to classify an algorithm honestly is by time complexity because one, we’ve talked about it a lot and so we can kind of imagine what it means to have an algorithm that is fast based on a input size or slow based on another input size and it’s something we’re familiar with. So time complexity is basically the Big O Notation of how much time a sorting algorithm takes with respect to its input size. So basically, if I give this algorithm 5 items versus 50 million, how is it going to behave? And the way that we talk about time complexity is -- and actually something I want to say is that we’re going to talk about all six of these things, but we’re going to talk about it in relation to different sorting algorithms. So there’s no time complexity algorithm, but we’ll say like for example if we’re looking at an algorithm like merge sort, we might say the time complexity of merge sort is whatever.

[00:17:04] SY: Ah, okay.

[00:17:04] VJ: So it kind of goes the same way. Like if we’re talking about stability or comparison, we’ll say, “This algorithm, Algorithm X is a stable algorithm versus Algorithm Y, which is not stable or Algorithm X is a comparison algorithm but Algorithm Y is a non-comparison algorithm. So we’re kind of using these as classifiers. We’re saying, “You know, these are the ways that we can identify the way that this algorithm behaves, what we can expect from it, what it might output to us, or how it might be doing the work of sorting.” And depending on certain situations like your dataset or what you want to do, you might choose to use one algorithm or another.

[00:17:41] SY: Okay. I’m okay with that and I like that because, well, frankly we’ve been kind of doing that the whole time, right? Like we’ve been talking about different algorithms and data structures and we’ve been… I don’t think we’ve talked about it in terms of classifying them, but we’ve described them and talked about if they’re good or not so great or when we would use them and what are the pros, what are the cons. So we have been, you know, for, I want to say since maybe the first season even, gone through this process of reflecting on the tools that we have and talking about how to think about them and how to evaluate them. So this feels like a nice, maybe a more formal way of doing it, like we have six labels and six groups in which we can talk about different algorithms. So that’s pretty cool.

[00:18:26] VJ: Yeah. Yeah. It’s a really nice way of being able to compare them because as we go on through probably this season and the next season and who knows how many seasons it’ll take us to get through all the algorithms, it will start to become evident that these labels, these classifications are really going to help us when you want to compare them. And also if you want to kind of keep your head straight and be like, “Which was that really bad sorting algorithm again? Which one was the one that just, you know, gets worse and worse with each thing that you add to it? Let’s not use that.”

[00:18:57] SY: Yeah.

[00:18:58] VJ: That’ll be helpful to us down the road.

[00:19:00] SY: Yeah. So time complexity, I think we were good on that one. Space complexity, what’s that about?

[00:19:07] VJ: Space complexity, we also are familiar with, it is the Big O Notation when it comes to how much space an algorithm requires to do the work of sorting. So one of the things to note is that depending on different algorithms, you might need extra space or extra memory, I guess, to actually do the work of sorting the input so you could have a sorting algorithm that does one thing, which is it copies over all of the inputted data that you give it and then any sorting that it does, any changes that it does it does on that copied version. So it kind of creates a whole other set of data. It uses a whole bunch of extra memory and then does the sorting on that and that’s something that we call an out-of-place algorithm.

[00:19:58] SY: Yeah.

[00:19:59] VJ: So you might be able to imagine then what an in-place algorithm does. An in-place algorithm operates directly on the input dataset and changes it. It’s not going to copy it over. It’s going to work on the thing you gave it just now. So it’ll modify and sort the data in place basically almost like in line. And the thing about that is there’s no original input left if you give a sorting algorithm that’s in-place data because you know it’s going to sort it right there.

[00:20:29] SY: Well, that sounds dangerous.

[00:20:30] VJ: Yeah, maybe. It depends.

[00:20:33] SY: Maybe you want to get rid of that evidence, I don’t know.

[00:20:37] VJ: The other thing is I do want to note one thing which is that even in-place algorithms they sometimes will do something that will require a little bit of extra space. So I don’t want to completely let them get away with everything because sometimes in-place algorithms do things like create extra variables and they create temporary variables or they need to allot a little bit of…

[00:20:56] SY: To help them out.

[00:20:57] VJ: Yeah. They need to allot a little bit of space because remember they can’t copy everything over so they might need to like all the reference to one thing as they sort and then put that thing back in another place, but that small amount of space is pretty minimal. In fact, I think we could usually think about it as like a constant amount of extra space because even as the sorting, as the data grows, it’s still going to be pretty much O of one amount of space. So it is still less space, but it’s not like it’s, you know, completely free of not needing to allocate a little bit space.

[00:21:32] SY: Okay. So with this category, it sounds like in-place is always awesome, right? Because it uses the least amount of space possible. Is there a situation where we would want to do an out-of-place space complexity thingy?

[00:21:49] VJ: Well, it can be safer to do sometimes, right? Because you’re not working on the dataset itself, but there is a drawback which is that the memory usage for an out-of-place algorithm is going to grow the more space you give it because it’s got to copy it all over. On the other hand, an in-place algorithm is -- it doesn’t need to do that, but there’s a danger because the data gets completely transformed in the process of sorting it and the original data is now gone and that can be helpful if you don’t have enough memory to spare. It could be bad if you need that original data for some reason.

[00:22:23] SY: Question. What if you do an out-of-place situation and then once you know that you’ve completed it and everything’s all amazing and fantastic, you go back and you delete the original data? Does that still count as an out of place or is that now a in-place or is it you know something I just created magical…?

[00:22:46] VJ: Well, if your algorithm outputs a new dataset that it copied over and it didn’t operate on the original dataset, then at some point it did have to allocate the memory so it’s still out of place.

[00:22:56] SY: Okay. So it’s still out of place. Man, I was hoping I invented something just now.

[00:23:00] VJ: Really, I mean, I think what you’re saying is pretty smart, which is that if you’re using an out-of-place algorithm, you should probably be do the logistics of making sure you either clean up after yourself or maybe if you’re working some higher-level language, you know, you have the garbage collector that can do that for you and actually usually a lot of times that’s what’s going to happen. But if you’re thinking about implementing the org... I was going to say the organism.

[00:23:24] SY: I went organisms all the time.

[00:23:25] VJ: You’re now a science podcast. Scales, feathers, fur. If you’re thinking about implementing the algorithm itself, then you’re not going to have anybody to like lean on for garbage collection. You’ll clean up after yourself, but your algorithm still had to allocate all the memory so it makes it out of place.

[00:23:47] SY: So we just went through the first two out of the six classifications for algorithms and we’ll have to finish the rest next time. And that’s the end of today’s show. If you liked what you heard, leave us a review and make sure to check out Vaidehi’s blog post. The link to that is in your show notes. Also want to give a huge thank you to Linode for sponsoring the show. Try their powerful servers built for all your infrastructure needs. Getting started with a shiny new server takes less than one minute so you can get up and running in no time. Get a $20 credit by using the promo code CodeNewbie2019. Just go to linode.com/codenewbie for more info. Also want to give a huge shout-out to DigitalOcean. They are the easiest way to deploy, manage, and scale your application. Everything about it was built with simplicity at the forefront: setting, deploying, even billing. Try DigitalOcean for free by going to DO.co/codenewbie and get $100 in infrastructure credit. Link is in your show notes. Vaidehi, you want to say goodbye?

[00:24:48] VJ: Bye everyone.

[00:24:50] SY: Thanks for listening. See you next week.

Season 4 Episode 6 December 05, 2018

It's sorting time!

Description

Show Notes

Transcript