CodeNewbie

Description

In this episode, we get into parse trees, an illustrated, pictorial version of the grammatical structure of a sentence, which is important to understanding how computers understand coding syntax.

Show Notes

This episode of the Basecs podcast is based on Vaidehi's blog post, Grammatically Rooting Oneself With Parse Trees from her basecs blog series.

Transcript

[00:00:02] SY: Welcome to the Base.cs Podcast where we explore the basics of computer science concepts. I’m your host Saron, Founder of CodeNewbie.

[00:00:09] VJ: And I’m Vaidehi Joshi, Author and Developer.

[00:00:11] SY: And she is the brilliant mind behind the Base.cs Blog Series. Today, we’re talking about…

[00:00:16] VJ: Parse Trees.

[00:00:17] SY: This season of the Base.cs Podcast is brought to you by DigitalOcean and Heroku. DigitalOcean’s cloud infrastructure is optimized to make it super intuitive and easy to build faster and scale easier. The pricing is predictable. They have flexible configurations and excellent customer support. Get started on DigitalOcean for free with the free $100 credit at DO.co/basecs. That’s DO.co/basecs.

[00:00:45] Did you know you can build, run, and operate applications entirely from the cloud? With Heroku’s intuitive platform, you can streamline development using the most popular open source languages, so you don’t have to worry about infrastructure logistics. Also, you’re not walk-in to the service. So why not start building your apps today with Heroku?

[00:01:10] SY: So I feel like we’ve talked a lot about trees on this podcast, but I don’t think we’ve come across this particular type of tree. What is a parse tree?

[00:01:23] VJ: So a parse tree is sort of specific to a subset of computer science that we haven’t even gotten into yet, which is why we really haven’t talked about them. And they have to do with parsers and compilers, which is like a whole new CS world for us.

[00:01:42] SY: Yeah.

[00:01:43] VJ: But it’s important because we sort of built this knowledge base of like different types of trees throughout this whole series and all these episodes of this podcast. And now we’re sort of getting to a new phase in that where we are going to start talking about how our computer actually makes sense of the code that we write.

[00:02:02] SY: Okay.

[00:02:03] VJ: So to that end, a parse tree, at its core, it’s an illustrated pictorial version of the grammatical structure of a sentence and that can just be like a sentence in English, but also it can be a sentence in code too.

[00:02:21] SY: An illustrated pictorial version. Interesting. Okay. So we’re basically like drawing something.

[00:02:28] VJ: Yeah. I guess another way you could think about it is like we’re diagramming something.

[00:02:32] SY: Okay. There we go. I like that one better. Okay. We’re diagramming something, so we’re diagramming the grammatical structure of a sentence, and the sentence could be like, “I drove to school.” I don’t drive to school, but I drove to school.

[00:02:46] VJ: You could drive to school.

[00:02:48] SY: I still drive to school and that would be a sentence so I guess like it could be like a line of code, right? Like if I define a variable like var equals school, then that could be structured into a diagram?

[00:03:00] VJ: Yeah. And it could be like an expression in your code too. So like it could be a line of code that’s like some logic and with like a return statement or it could just be an expression that you’re evaluating.

[00:03:14] SY: Okay. So I can have a loop and that would count as a sentence.

[00:03:19] VJ: Sort of. Yeah. It’s like when you write that loop out, it’s like if you have a keyword like “for” or “while”, like those are the parts of the sentence. And so it’s basically like everything you write in code could be diagrammed, and actually it is diagrammed into a sentence. And that’s basically the parse tree. That sentence gets structured into a tree structure called a parse tree.

[00:03:44] SY: Okay. Cool. So let’s do a fun one as an example. It’s going to be all about you.

[00:03:49] VJ: Okay. I love that.

[00:03:50] SY: Let’s do, “Vaidehi ate the pie.” Vaidehi, what pie did you eat, by the way? Do you know what pie you ate? Well, hopefully, you know what pie you ate because you ate it.

[00:04:00] VJ: I mean, you could just eat it and not even realize. What if you’re so excited and you don’t even look at the pie, you just ate it?

[00:04:06] SY: Okay. Okay. True story. I had a pie. And it was a pecan pumpkin pie, which by the way, it sounds good, but it’s basically a pumpkin pie with pecan crust. You know what I mean? Like it wasn’t like the gooey. Anyway, so it wasn’t very good.

[00:04:23] VJ: No. It wasn’t like pecans on the top. It was just wedged up pecans.

[00:04:26] SY: Right. Yeah. Yeah. Exactly.

[00:04:28] VJ: Is it pecan? I always said pecan.

[00:04:30] SY: That’s absolutely incorrect. It’s definitely pecan.

[00:04:32] VJ: No, I’m going to say pecan though. Pecan sounds funny.

[00:04:37] SY: You know what? I respect you for that. I respect you for that. But anyway, like in my mind, I was thinking, “I’m going to eat a pecan,” like I forgot about the pumpkin. You know what I mean? Because I saw pecan and that’s what attracted me. So I thought I was going to bite into like a gooey, ooey, overly sweet, and then I like ate it, and I guess I wasn’t looking because they obviously look very different. Anyways, I ate it and I got like the texture of pumpkin and I was like, “What did I just eat?” It really threw me out. Anyways, all that was to illustrate that you can eat a pie and not know what you’re eating. So anyways, Vaidehi, what kind of pie did you eat?

[00:05:11] VJ: I had a pie. That’s where we’re going. Let’s say that I ate, ooh, let’s say that I ate a salted honey pie.

[00:05:24] SY: I have no idea what that is. Is that like a filling of honey with salt?

[00:05:27] VJ: Have you ever been to like milk bar? They have something called like the crack pie.

[00:05:36] SY: Crack pie is so good.

[00:05:36] VJ: It’s basically like that, but it’s like the honey. It’s like more stronger flavors of honey in the actual pie and then saltiness and it’s delicious.

[00:05:46] SY: That sounds amazing.

[00:05:46] VJ: I don’t actually know what else is in the bake, like there has to be something else. It’s not just straight honey.

[00:05:50] SY: It can’t be straight honey.

[00:05:51] VJ: No. There’s other things.

[00:05:52] SY: That sounds great.

[00:05:52] VJ: I mean I ate the pie. I did not deconstruct all the ingredients for the pie.

[00:05:57] SY: You’re too busy eating it. Okay. And all of that is to explain a parse tree. So “Vaidehi ate the pie” is our sentence and we’re going to create a diagram out of this sentence. So what do we do? What’s the first thing that we do?

[00:06:12] VJ: Well, we need to break it up into smaller pieces because when you diagram a sentence, I don’t know if you remember this from like, I don’t know, when was the last time I did this, like elementary school?

[00:06:20] SY: Definitely not. Yeah.

[00:06:22] VJ: But in elementary school, like one of the things I learned was that when you diagram a sentence, you break it up into its individual pieces. And so in this case, we have “Vaidehi” and then “ate the pie”.

[00:06:35] SY: Okay.

[00:06:35] VJ: So those are two distinct pieces to start with. So basically what we’re going to do is we’re going to say “Vaidehi” is the noun and then “ate the pie”. Before we break it up further, we’ll just pause and say that “ate the pie” is actually its own expression. It’s something called a verb phrase, which we probably know instinctively even if we don’t know the term for it, if you’re familiar with English. And if you have a noun and a verb phrase, well, you can’t break down a noun any further, but you can break down a verb phrase further. And so we can break down “ate the pie” into a verb and then a noun phrase, which is “the pie,” and then you can break the noun phrase “the pie” into a noun pie and also a determiner, which is “the.” So yeah, we can break up “Vaidehi ate the pie” into all of those little pieces. And as we’re breaking it down, you can sort of imagine that we start with a node at the top, which is the sentence. And as we diagram it, we break down the sentence into other nodes. So basically like children, and if you visually think about it, you can create a tree out of it.

[00:07:41] SY: Right. So I have a sentence as my root node and then I have one child called “noun” and then I have another child called “verb phrase” because that’s the first way we break it down. And the noun has nowhere to go. It’s just Vaidehi. And then my verb phrase, however, can be broken down. So I have another child called “verb” and a second child called “noun phrase”, because my child’s name is verb.

[00:08:02] VJ: Yeah. That’s why it made me laugh.

[00:08:04] SY: Okay.

[00:08:04] VJ: I got a node, name, verb.

[00:08:07] SY: I got a node, name, verb.

[00:08:06] VJ: That’s normal.

[00:08:08] SY: That sounds like something like sitcom, node, name, verb. Okay. So now I have my noun phrase, and then my noun phrase itself has two children, one called determiner, one called noun, and they respectively point to “the” and “pie”.

[00:08:22] VJ: Yeah, you got it.

[00:08:23] SY: I got myself a little tree.

[00:08:24] VJ: Yeah. And if you think about it, it’s interesting because when we had that sentence and I was like, “Okay, let’s break it down,” and you instinctively knew like, “Well, ate and the pie,” like, “Let’s break it down into its pieces,” how did you know how to do that?

[00:08:40] SY: Well, I’m a genius, so it was pretty easy for me.

[00:08:43] VJ: Well, if you weren’t spectacular and smart, how would you do that?

[00:08:48] SY: Well, I guess I just know how the English language works, right? I learned these things in school and whatnot.

[00:08:55] VJ: Yeah. And you use your knowledge of English grammar and like the syntax of how like a sentence is structured, like you knew what the nouns were and you knew like how the nouns and the articles and the verbs sort of work together to construct a sentence, and basically the grammar and the syntax of a language, those are the rules that define that language. And so when we’re working with the English language, knowing the grammar and the syntax of English, that basically is like your guidebook of understanding how that language works. And it’s the same thing when you replace the English language with any kind of programming language. You need to know the grammar and the syntax in order to make sense of any sentence or expression or line of code, right?

[00:09:46] SY: Okay. So we need some type of grammar. So we need like code grammar.

[00:09:50] VJ: Exactly. You need like grammar in the sense of like rules in your code of what is legitimate, what’s legal and what’s not and what’s correct and what’s not. And actually we started this whole conversation with parse trees, and this is sort of where parse trees come into play. When you break down the expressions in code, and when you diagram it out, you create a parse tree and parsing is the way that our computers, our machines basically check if a sentence is grammatically valid in code.

[00:10:27] SY: So we talked about this applying to the English language. We talked about applying to coding, although we haven’t really worked through a coding one yet. Are there other things it can apply to or is that mostly it?

[00:10:38] VJ: Oh, no. So you don’t have to apply it just to the English language. You could do it for anything that would be a valid expression in code. So for example, you could do it for like a mathematical expression, like one plus one or two times three. All of those are just simple expressions, but you have to make sense of them. You have to know how to parse them and read them to know what the line of code is saying. In the case of a mathematical expression, you have to know what operation you’re trying to do and what you’re trying to operate on, right?

[00:11:11] SY: Yeah, absolutely. So can we walk through an example of an expression and figure out what it would look like as a tree?

[00:11:17] VJ: Yeah, let’s do that. Let’s do a simple expression maybe two times eight.

[00:11:23] SY: So we have two times eight. That’s our expression. That’s our root node. And I want to break that down, well, I feel like there’s only three parts, right? Like it’s either a two, a times, or an eight. So do I just have like three child nodes that just say two and then times and then eight?

[00:11:42] VJ: Sort of, yeah. Well, there’s one little caveat here, which is that you’re right that you can’t really break down times any further because it’s like an operation, like there’s nothing else to go with that. But when you have two times eight, what you want to do is break it down into its smallest possible expression. And so two and eight are actually expressions in and of themselves. And this is important because we’re working with a simple example here, but if we had a much more complicated example, it’s worth mentioning that a single expression can contain multiple expressions within it and it can also contain things called terms and factors and internal expressions. So like in our example, two times eight is the entire expression. And the number two is a factor. It’s like an expression in and of itself, but it actually can’t be broken down further, but it is technically still an expression.

[00:12:38] SY: And then my eight I’m assuming is the same thing?

[00:12:40] VJ: Exactly. Yeah. That’s just like a little thing to know. It’s not super important for the context of this, but it’s worth noting that if we had something bigger than two times eight, you could have an expression that you need to break down even further into either like an operation or another internal expression.

[00:12:57] SY: Okay. So in terms of my parse tree, I have my expression, my two times eight expression as my root node and then I’m going to pull out another expression, which just points to the number two, and then I have my second child, which is just the times sign, which is its own thing, and then I have my second expression, which points to the number eight.

[00:13:20] VJ: Exactly. Yeah.

[00:13:22] SY: So now that I have this parse tree, what do I do with it?

[00:13:27] VJ: Well, we sort of need to interpret it, right? Because we started out with this expression. We’re trying to understand what it means. We broke it down into its individual smallest elements, and that’s what we built this parse tree out of. But that parse tree doesn’t mean anything to us if we don’t traverse through it and figure out like, “Okay, here are the things we’re trying to do. This is what this expression actually means and what we need to do with it.” So we need to basically interpret the larger expression, make sense of it, and we already know about trees. We know there’s three different ways that we can traverse through a tree. Do you remember what they are?

[00:14:04] SY: Oh, boy. Okay, so tree traversal. Okay. I’m going to see if I can remember one of them. Well, I remember one of them is just going like in the correct order I guess like going from left to right, which I think is in order.

[00:14:18] VJ: In order is looking at the left child, then the root and then the right child.

[00:14:22] SY: Yes. Okay. I think that’s the only one I remember. What are the other two?

[00:14:26] VJ: It’s okay that you don’t actually remember the other two. Spoiler alert. They are preorder and postorder. In preorder, you read the root node first and then the left and the right child. So you would read the root times and then two, which is the left child, and then eight which are the right child, which is a little nonsensical because what does that mean? And then postorder is you read the left child and the right child and then the root last. So that would be two, eight times, which also is nonsensical. So the thing you remembered, the one way of traversing the tree that you remembered is actually the really the only important thing to remember here, which is in order, which gives us the left child, the root and the right, which would be two times eight.

[00:15:10] SY: So basically I was efficient in my memorization.

[00:15:15] VJ: And you figured out like the one way that we need to interpret this parse tree so that it makes some sense. But you stumble upon something interesting here, which is that you remembered like the one way to do it, but notably there are two other ways which basically tells us that you could sort of interpret this parse tree in different ways, which just like put a pin in that, that complicates things slightly because it’s not a straightforward answer. But in this exercise, like we kind of get a sense of how parse trees will show us what our expressions look like. And another term for parse trees is Concrete Syntax Tree or CST and the reason that they have this name is because they show us what our trees look like. They show us the concrete syntax of our expressions. So all that’s telling you is like, “Okay, here are the parts of your expression.” It doesn’t really tell you what is the right way of expressing that expression, which means that you could have a single parse tree that allows you to come up with different ways to evaluate the expression, right? So we could have done two eight times or times two eight or two times eight. They’re all using all the elements of the expression that we started off with. They’re showing us like, “Concretely, here are the pieces of that expression.” But the problem is a single parse tree can express that expression in different ways.

[00:16:47] SY: So we took our expression, our sentence, or our line of code. We diagrammed it. We made it into this beautiful parse tree and now we’re going to use, well, there’s different ways that we can read it, but the one we hear about is in order, which means we’re going to go from left to right. What are we going through all that for? Because it feels like we took a thing, broke it down just to read it together again. You know?

[00:17:12] VJ: Yeah. Well, the thing is we broke it down so that we could make sense of it and like that might’ve seemed like extra work for us because we knew what two times eight meant to begin with. But if you think about your computer specifically, the part of your program that’s really dealing with this is something called a parser, which is part of the compiler that handles the parsing, and it’s the thing that builds the parse tree. So your parser, whether you give it some simple expression or a complex expression, it doesn’t know what it means without going through and evaluating the expression and breaking it down into its separate parts. It’s easy for us to look at that expression and be like, “Oh, I know exactly what that means,” or, “Oh, yeah, Vaidehi ate the pie, I know exactly like what’s going on there.” But our parser doesn’t know and our parser doesn’t know how to evaluate a line of code without knowing what the different parts of that with one single line are and what needs to happen and also what order it needs to happen in. So something that seems very simple to us is actually not a trivial task for our parser. And actually this is a great segue, the fact that there are three different ways of evaluating and reading that expression in our parse tree, that’s something called “Ambiguous Grammar”. And if you imagine that I just gave you like a bunch of symbols and instead of two times eight and I gave you something like, I don’t know, three weird symbols or three words in a language you’ve never heard before or read before. And I was just like, “Saron, tell me what this means.” You probably wouldn’t know off the top of your head, just looking at it, what it meant. And then if you made this parse tree out of it, you might not know what’s the right way of order in those things.

[00:18:59] SY: Putting it together?

[00:19:00] VJ: Yeah.

[00:19:00] SY: Yeah.

[00:19:01] VJ: And so ambiguous grammar or multiple parse trees for a single expression is really bad because a parser would get pretty confused because remember like it can’t do what we do intuitively, it’s just following good grammar and syntax of a language. So if you don’t know that grammar and syntax or if you know the grammar and syntax, but now there’s some ambiguity, what do you do?

[00:19:25] SY: Okay. So connect this back to being a coder for me. So I’m a coder, I’m doing my lines of code. Would I ever be interacting with a parser or am I making parse trees or where’s the connection?

[00:19:40] VJ: Well, if you are working in a programming language like actually on the language, then you may very well be dealing with the parse tree, but also if you are dealing with any kind of structure that gets broken down into a parse tree, which probably most programmers don’t do on a daily basis, unless they’re doing pretty low level things, then you might need to deal with it. But the important thing to know is that like whenever you’re dealing with the syntax or the grammar of a language, whether you’re adding a new symbol to it or trying to create some new operator, you need to tell the language what that means because if you give it something completely new and it doesn’t know like, “Oh, it’s a method,” or it doesn’t know, “Oh, this is an operation,” it will not know how to handle it.

[00:20:30] SY: Right. And usually it’ll tell you that. It’ll say error. This is nonsensical. What are you doing?

[00:20:36] VJ: Yeah. And what’s cool about that is that’s like a syntax error, right? And where do you think that comes from? That comes from deep within the parse tree when the parser reads your code and it’s like, “Oh, I don’t know what that is,” it basically has built a parse tree and it has traversed it in some order, probably in order, and when it finds a syntax error or if something breaks, the parser has to figure out where in the parse tree that issue is coming from, and then it needs to surface that to you. So it’s like building out these structures under the hood with every line that you write, even though you don’t really know about it or maybe you’re not thinking about it, except now you do know about it because you listen to this episode.

[00:21:19] SY: That’s right.

[00:21:20] VJ: But that’s basically all it’s doing and it’s up to us as programmers, if we’re working with programming languages, to make sure that we have grammar that either enforce some kind of rules, like we need to enforce the way the grammar is interpreted, different languages will do this differently or we need to like create some kind of precedents and say, “This symbol or this operator means more,” like has a higher weight. So if you see this times or if you see this open parentheses, I want you to think of it like this and then the parser will follow that rule. So it’s up to us to remember those things. And like next time we come into contact with like any kind of parsing or syntax error, just remember that parse tree and the little parser who’s chugging away.

[00:22:01] SY: Oh, I love that, little parser chugging away. And that’s the end of today’s show. If you liked what you heard, please leave us a review and make sure to check out Vaidehi’s blog post. Link to that is in your show notes. I also want to give a huge shout-out to DigitalOcean and Heroku. DigitalOcean is a simple developer-friendly cloud platform, which makes managing and scaling apps easy with an intuitive API, multiple storage options, integrated firewalls, load balancers, and more. There’s also a robust community that provides over 2,000 tutorials to help you stay up to date with the latest open source software, languages, and frameworks. Get started on DigitalOcean for free with a free $100 credit at DO.co/basecs. That’s DO.co/basecs. There’s a reason over nine million apps have been created and ran on Heroku’s cloud service. They not only manage over two million data stores, but they make over 175 add-on services available to you. So whether you’re creating a personal app, a free app, or an enterprise app, start building today with Heroku. This episode was edited and mixed by Levi Sharpe.

[00:23:12] VJ: Bye everyone.

[00:23:13] SY: Thanks for listening. See you next week.

Season 9 Episode 3 February 19, 2020

Parsing out parse trees

Description

Show Notes

Transcript