CodeNewbie

Transcript

[00:00:05] SY: Welcome to the CodeNewbie Podcast where we talk to people on their coding journey in hopes of helping you on yours. I’m your host, Saron. And today, we’re talking about database architectures and some of their use cases with Kyle Bernhardy, CTO of HarperDB.

[00:00:20] KB: And so it’s upfront trying to determine, not just I need to put some data in and get some data out, but like how and why and who are your users going to be.

[00:00:31] SY: In this episode, Kyle talks about what a database is, different types of databases, and when you might want to use one type of database over another after this.

[MUSIC BREAK]

[00:00:51] SY: Thanks so much for being here.

[00:00:53] KB: Thank you for having me.

[00:00:55] SY: So Kyle, you are now the CTO of a pretty well-known data management company, but tell us where your coding journey began.

[00:01:02] KB: Oh, geez. So way, way back, I had a Commodore 64 in the ’80s. And so I did some basic programming on that. But like college, I did not study computer science. I started out architectural engineering, didn’t like that, switched over to exercise and sports science.

[00:01:21] SY: Oh, wow! That’s a big switch.

[00:01:22] KB: Yeah, a huge switch. My parents were completely unhappy about that.

[00:01:27] SY: Were you planning on being like a gym teacher or what was the plan of that one?

[00:01:32] KB: Yeah. That’s a good question. I was more thinking about getting into like PT, maybe doing like sports medicine. But after college, I realized not what I wanted to do. And thankfully, it was the tech boom of the late ’90s at that time. And so even though I didn’t have like a background in computer science or anything like that, there is like a real lack of talent in the tech industry. So they were really just kind of hiring anyone that was willing to learn. And so I started out in tech support at a software company in Boston, and then just took some classes to educate myself. At BU, I took like a Java certificate program, but primarily self-taught. And the company I worked for really took care of their employees. And so I did a lot of movements in different departments inside the company. So I started out in tech support. I was there for two years and then moved over to the development group. I was there for about two years. And then when I moved out to Colorado, I moved over into their professional services group.

[00:02:38] SY: So you went from architectural engineering to fitness. Tell me about architectural engineering. What exactly does that mean? What did you learn? What was that like?

[00:02:47] KB: So architectural engineering, so a lot of engineering programs, there’s always that strong basis and math physics, those like core science, logic-based classes. But then on top of that, architectural engineering is really geared around like the systems that are in place in a building. So HVAC, electrical, there’s also an aspect of construction management. And just as I started getting into the core classes and I started thinking about what my professional track may look like, I was like, “I don’t want to do any of this. I don’t want to be like designing HVAC systems or electrical systems,” like how they lay out in the building, and I really don’t want to be a foreman on a construction site. It just wasn’t what I wanted for myself. This is college. I didn’t know what I really wanted. That’s also part of the problem, but I just knew it wasn’t that. But the real benefit that I got out of that was like I said, all that math, all that science, and all that logic-based foundation has helped me as I ultimately got into development and engineering on the programming side.

[00:04:01] SY: Tell me a little bit about your journey. Once you got that tech support role, how much of that was self-taught? How much of that was company taught? Walk me through that.

[00:04:10] KB: Primarily self-taught and obviously work experience. I took some classes, but that track was really not linear and I never set out to be a CTO. It wasn’t like what I wanted to do. I always just wanted to make things and like really scratch that creative itch for myself. But the path was, I started out in tech support and it was supporting the reporting tools that the application uses. And so it was Crystal Reports, SQR. It was a queue-based reporting engine. And so I was really fortunate to get this job because they were hiring for a bunch of different support positions and very fortunate for me that I landed in this group because of all the support positions, it was the most programming adjacent because it was more reports. And so there was a lot of interactions with databases, a lot of interacting with SQL because these SQL reporting engines, and just giving me the opportunity to learn about how to interact with the database where I had no experience with that at all and getting that investment from the company of just teaching me on the job. And then moving over to the development team, they had an opening for developing the core reports that came packaged with the application. And so I moved over. And so it was this real natural progression over into this group, from my support position to this development position, now I was like building reports, not just supporting them, but I had this deep background in that. And then getting that cross-pollinating with the other engineers that were developing the product and I had to integrate with the product and talking to them about the architecture of the system and learning more about like HTTP protocols and what is even that and how do these JavaBeans interact with database and how does that tie into the JSP pages was early 2000s, beginning to learn that. And then when I moved over into that professional services team, I was now building extensions on the application. And so now I’m actually building new JavaBeans and building new front-end pages. And to be honest, a little bit over my head and knowing that when I type this, this thing happens, but I kind of don’t know why. And so I was at a really dangerous part of my career path because you just don’t know the why, like necessarily I was learning it, but really sort of in the deep end. And then from there, I had a friend that had his own company and started out. He was just doing like web pages for really small companies’ websites and doing some like bespoke shopping carts for whatever it was they were selling. And so it was on the side, helping him with that, and then he just kept giving me so much work. And I just said, “You know, you either need to hire me full time or I just really can’t do this because I’m just getting too crunched.” So I made the switch over to working for him. I took like a 50% pay cut. I had no benefits. I was married at the time and my wife at the time gave me the support to be able to do that. So I learned a lot because he was doing all the front-end work. I was responsible for all the back-end work. He would just give me the requirements and be like, “Okay, we need a shopping cart that does this and this where we can enter discount codes.” I would get stuck on something. I’d ask him for help. And I’d be like, “How do we do this?” He’s like, “I don’t know. That’s why you’re here.” And so I just had to figure it out.

[00:08:12] SY: So what’s a database? Hopefully beginner friendly. Let’s talk about just what a database even is.

[00:08:19] KB: To me, a real basic definition of a database is, let’s just call an application that can store data and get you your data back. So store the data in a structured manner and in a reliable manner and then allow you to get that data out with some type of language behind it to then retrieve your data, whether it’s just like a simple Git or maybe do some aggregates, maybe. But to put your data in, get your data back out. This is super basic way of thinking about a database to me.

[00:08:54] SY: And how does it fit into the whole development ecosystem? When I’m building in my app, where is the database come in?

[00:09:00] KB: So database typically lives in what you would call the back end of your application. So you would have a front end, which is your user interface. And then the back end is where your logic is occurring. And typically, what you would use a database for is something that needs to persist. And so you could certainly hold information in memory, but if your application stops and starts back up, anything that you held in, say, variables will get purged. And then when it comes back up, you don’t know your applications essentially starting from zero. And so what the database allows you to do is to keep things persistent. And so your application can stop. You can start back up. And it can go to the back end and say, “I need to know about John,” and it’ll go and grab John’s record. Great! John ordered a pair of shoes. And then say like, “Okay, Jane is coming in, we need to source some information of Jane,” and then we put Jane into the database and maybe we spin up another instance of our application or a new application comes up, it needs to know about Jane. And they can go to the central data store and ask the same question about this person. So persistence of information that applications can request things too.

[00:10:32] SY: So you mentioned back-end development. A couple of times, generally when you think about an app, there’s kind of the back end, there’s a front end. A lot of gray areas in between, but generally those two categories are still hold up. Tell us about the difference between the back end and the front end.

[00:10:46] KB: So in mine, I will say there’s always like in the different domains, like front-end developers and back-end developers, they both think they have the hardest job. I think they’re very hard, and it could be in different ways. And to be honest, I always feel like a front-end developer has the hardest job.

[00:11:04] SY: Tell me why.

[00:11:05] KB: Because you’re dealing with the users really, and you’re creating say a form with some buttons and some validation. So let’s talk about like you go to Amazon and that screen that you’re seeing, that’s the front end right there, like I can search for some soda that I want to order and I can click on buttons and I can add things to my cart. But there’s all these buttons and ways of interacting with things, and users will do things that you don’t expect. And that’s why I find it so much more challenging is there’s so many more variables to the situation. And then also with web development, less so now because there’s more robust frameworks, but different browsers can sometimes interpret the UI in different ways. And sometimes let’s say you’re in Chrome, the button is off barely on screen or you’re on mobile and this button doesn’t work. So there’s so many more variables that can be hard to anticipate what’s going to happen. That’s also why you have QA.

[00:12:16] SY: That’s true.

[00:12:16] KB: That’s why they also test.

[00:12:17] SY: Yeah. Yeah.

[00:12:18] KB: Your front end will have like user interface validation, some light logic, and that’s where what you’re saying is there’s that gray area because where does the business logic start and end. And I think sometimes it’s a bleed of both. But the back end is going to be more, like we talked about the direct integration to the database, it could be like your API code itself or calling out to other systems. So it could be making a call to Stripe to validate a credit card and then sending a response back to the front end to say, “Credit card was accepted. Allow them to move forward and complete the purchase.” But again, to what we’re talking about with databases is implementing things like ORMs, which are Object Relational Maps to the database. So interpreting the information from the front end, being able to consume it and transfer it to other places.

[00:13:24] SY: Right. Right.

[00:13:24] KB: But those back-end systems end up being more structured and more noble. I think there’s complex problems, but typically the interfaces aren’t as chaotic as the front end.

[00:13:37] SY: Absolutely.

[MUSIC BREAK]

[00:13:56] SY: So let’s talk about picking a database. There’s generally a couple options, different things that we could choose. What are some of the considerations we might want to first consider when we think about what database we want to use?

[00:14:09] KB: So the first thing you should always think about is, “What do I currently know?” Because sometimes if you’re trying to get a project off the ground, start with what you know and you can go from there. If you’re just trying to prototype something, you can start with what you know, but then also starting with what you know, ask yourself, “What do I need to do with this data? So how do I need to store it and how do I need to get it back?” And so when you ask yourself, “How do I need to store it? Do I need to store it in a very structured manner?” Meaning that the data types always have to be the same. So my date of birth column, is that going to be a date? Can it be a string because I’m going to allow different formats? Will my age column just be a number? Or again, could that just be a string? Because sometimes it’ll be 22 or it’ll be 22 years. So you need to ask yourself those questions, “Is my data structure known already?” If you’re like, “Well, I want something that’s going to be more flexible,” I would say go towards something like a NoSQL database or a document store, because those will allow you to typically store your different attributes where there won’t be as many constraints on the data. Meaning if you pass in that string value to this attribute with the attribute is set up to be a number, it won’t stop you from doing that. But if you say, “I always need these columns to always be in a certain format,” you should choose a SQL database because that will define those constraints. And so it’s also where do you want to put those constraints. Do you want the database to do that for you? Or do you also feel comfortable saying, “I’ll do that validation in my code and I will trust that my code will do this and we’ll just pass it down to the database and the database will just give me double thumbs up and say thank you, cool”? Or if it is something like a relational database, then it’ll say, “No, thank you. Please try again.” So it’s the planning the data in, getting the data out. So how do you want to interact with your data? So for NoSQL databases, super flexible, but the ways that you can query the data are also typically pretty basic. So you can definitely do get by your primary key value and just get the row back. But do you need to do joins to another collection or table? So you’ve got your person collection and you’ve also got the order’s collection. And do you need to tie those together? And some NoSQL databases will allow you to nest that and do that.

[00:16:59] SY: And that would be our join, right? Bringing these two collections together?

[00:17:01] KB: Yeah. If you're doing a join or doing aggregates. And so if you’re like, “I need to do more complex queries, we do a SQL database for that because I need to do aggregates and I need to do joins and I need to do grouping of data.” It’s like, “Well, maybe I need to start thinking about SQL database.” And so it’s upfront trying to determine, not just I need to put some data in and get some data out, but how and why and who are your users going to be and what might they need as well. And so starting to think about this, less about just like, “I need to make this thing happen,” but thinking about some of the broader implications around your application.

[00:17:41] SY: There’s also the question of, “Where is your data?” Right? I mean, there’s generally three types, right? There’s doing it on-prem or on-premises would be the full way of saying that. There’s the cloud hosted and then there’s the database as a service. Those are kind of the three categories I’m thinking about. Can you maybe walk us through each one and tell us what they are?

[00:18:01] KB: Yeah. So for on-prem, that would be I’m going to manage my own servers. I’m going to manage whether those servers are you’ve got a rack in your office somewhere or any more, it’s really you’re using some form of data center somewhere, but you are going to dedicate some resources to keeping your database running, managing whether it’s containers or virtual machines or you just installed straight to the raw server itself, but you’re going to maintain this yourself. You’re going to control it. So the benefit of that is you’ve got more control around costs. You’ve got more control around the specific hardware it’s going to run. You just have more control, but you also are responsible for keeping it running. And if any more businesses are 24/7, so you also got to have employees or contractors or someone that’s on call and available to make sure things are running and optimized and then upgrading the database, just doing all of that management. Those are some cons. The pros though are like you have complete control over your data. And so for some highly regulated industries, that’s really important and you completely own everything and you control everything. But it does come at a literal cost of maintaining this and having staff to keep it going. So you can then move to cloud. And so with cloud, you’re still managing a database, but it’s the servers are now say on like Azure. And so with a few clicks of a button, you can spin up a SQL server install. And so Microsoft who runs Azure will make sure that the infrastructure, the servers and the networking, that all is up and running. And so you have some space between yourself and that maintenance of that infrastructure, but you’re now just paying more for the costs around that, and it’s not as directly managed. And since they’re maintaining multiple servers inside their infrastructure, you get that deferred cost of what that all means for you. But you still have specific instances that you are using. So if you need to resize say the server, you can click some more buttons and resize. And so you typically have less control around like, “Where is my data exactly?” But you get the benefits of managed infrastructure and nice user interfaces and creating these ways of spinning up databases and things like that. But ultimately, who has your data? It is your data. You’ve signed an agreement, but it’s sitting in someone else’s infrastructure. So there also comes with a level of trust, but that’s the world we live in is we trust these providers to take care of our infrastructure, and they’re also managing the security around that. Security is a huge deal and becoming a bigger deal every day. So when we move to as a service, now it’s a lot more abstracted. So something like Firebase is you just are signing up for a service. You are putting some tables, but you’re not maintaining anything at this point. And the really nice thing with these services is they create really robust what’s called auto scaling. So maybe when you’re initially building out your application, you’re not doing a lot of data inserts or updates or a lot of data reads because you’re creating something new and you’re just testing it out. But then as your application, your web app, your API, whatever this is, it goes viral, it gets big, these services, databases service, they allow you to scale without having to do anything and the costs behind it are per use. And a lot of times now they also handle real spiky traffic. So let’s say you’re doing something around gaming, and typically evening, nighttime, that’s when you’re seeing a lot of traffic. And so you see these big spikes at night, you don’t have to do anything. The service itself will auto scale up to handle that traffic. Your costs will go up, but then once things subside, your traffic will go down. And so really creating this really nice way of like, “I’m just relying on them to manage the infrastructure, manage the scaling, manage everything.” I’m essentially just accessing an API to send in my data, to get my data out. But then it gets more abstracted, like now I don’t really know where my data is, who has my data. I think then that’s the trust.

[00:23:03] SY: You get further away from it.

[00:23:04] KB: Yeah. Because as a service gets more abstracted, everything else is getting more abstracted too. And I’m not trying to make it sound nefarious by any means. I don’t believe it to be because in order for these companies to make money, they need to maintain trust. And so by no means I’m trying to make it sound like sinister.

[00:23:25] SY: Another concept or term that people may come across in their research through the world of data and databases is key value stores. What is that? What’s that all about?

[00:23:37] KB: A key value store, in a lot of ways, that’s like one of the most basic databases you can have. And so key value store is if you think about say like a two column Excel table, it’s a good way to think about it. And so the first column, that would be the unique identifier for your entry. And then the second column would be your value. And that could be anything. It could be a strain. It could be a number. It could be an object. It could be an array, but that’s it. It’s just you’re putting in a unique key and then a unique value. And typically, with key value stores, there’s no nice joins that can happen. There’s no nice aggregates that you get. So basically to write data, you have a PUT, you would have an update, and you would have a remove. That’s how you would interact with your data. And then to get your data out, you would have a Git. And then that’s really just getting that entry, just that key and value by the unique identifier. Maybe there’s some range searching. So you say, “I’m going to start,” and this always based on the key, “I’m going to start at five and traverse forward.” Maybe traverse backwards. Not all of them allow you to do that, but that’s typically all the key value stores allow you to do. Then you could think, “Well, that sounds not really robust.” But to talk about HarperDB and our data storage, our underlying data storage is a key value store. We just use this very simple storage mechanism in very powerful ways because we can leverage this range searching and we can add multiple key value indices and then how do we overlay logic over top of this really basic storage mechanism. These key value stores, the really nice thing about them is they’re highly optimized to modern storage like NVMe or like SSDs, which is not the spinner disks that we used to have back in the 2000s or whatever, but these solid state drives that some key value stores are optimized for. And so being able to write really, really fast, like millions of entries per second, being able to read exceptionally fast so you can get your data in and out really fast because these are really basic ways of interacting with the disk and also interacting with the data and then creating some really big APIs on top of this to allow engineers like our company to interact with this simple key value store in really creative ways that really create a lot power.

[00:26:28] SY: So we’ve got key value stores. We also have document stores. Tell us about that.

[00:26:33] KB: Oh yeah. So document stores, that’s one level up. And so key value is really just like simple data types. Like I said, so maybe an integer with like a string. So documents are going to be more complex. And so you can think about documents as something like a JSON object or XML or some other format, but typically JSON. And so a document store allows you to do much more complex data ingest and more complex data reads. So when you’re storing this data, you’re storing not just the individual key and an individual value. You’re typically still using a key to create a unique identifier for this document. But then when you’re storing that document, you’re storing say that whole JSON object and then also saying, “I want to index the age column because I’m going to need to store that.” And so this document store then starts allowing more discreet and intelligent searching on specified columns that you say, “Well, age is something that we need to search on. Name is also important.” Maybe you want to also really quickly be able to search on order IDEs, things like that. And so being able to create with JSON, you have a really flexible data object, but then being able to create more robust searching on top of those things. One other thing too is something like Mongo, they then allow you to do like nested collections, which in the relational world you could think of is like a joint table but you can have these nasty collections that are then also indexed as well. So it starts out simple, but can get really powerful and complex as well.

[00:28:22] SY: So you mentioned relational databases. What are graph databases?

[00:28:26] KB: So those are typically used for like social, like relationships between things. And so you would have nodes. And so those nodes could be like names and then the relationship between them. So the nodes could be people and then who is interconnected between these people. And typically graph databases are used really heavily by social media providers so that they can really quickly create these networks of like, “These are my friends and then who are their friends of friends and how are we related between each other?” And so creating this almost like a neural network of data between people. And so it’s just like creating these more complex connections of data.

[00:29:14] SY: Another type of database that frankly, I don’t know that much about them. I’ve heard very little about this one, time series databases. What can you tell us about that?

[00:29:22] KB: So these are used really heavily in like internet of things. So you think about things like you have a nest thermostat or something like that, but time series databases, what they do is typically the key on the time series database is going to be timestamps. And so it’s always ordering the data or events by time and so that you can traverse your data by time. And they start out very, very granular. And so every second your thermostat saying, “Okay, it’s 70. Now it’s 71. Now we turned on the thermostat to cool it down back down to 70.” So it’s like these recordings and events, but then time series databases will then also start compacting that data because when you’re collecting this data, like high frequency, which is really what they’re very, very good at is like high-frequency data collection for events. I was talking about like temperature and things like that, but then you want to be able to say, “Well, at a certain point, I don’t care about the granularity. I care about the aggregate.” And so over time, they will then start compacting the data and creating aggregates around that granular data. And so they will do those as like jobs and so that over time you can start seeing like trend analysis over time around like what has happened. Another good use case for time series databases is like event logging. So in application development, you want to log errors or warnings or things that are happening in your application and you may be have like an integration to a service like Datadog. Datadog is really good at storing log data so that you can then view these things have happened. Most likely, I don’t know for certain, but most likely Datadog’s using a time series database because all of those things that are like that are timestamped. So being able to see things on a time-based fashion, but then also being able to do like aggregates. And so they’re really good for collecting data very rapidly and then seeing like aggregates over time, very performant ever.

[00:31:40] SY: So I want to talk about search engines. That’s something that a lot of people probably need to incorporate into their app, especially if they’re dealing with data like we’ve been talking about. And that’s something that, to me, feels very intimidating. I don’t really know how they worry. They sound really hard. They make me think of Google, which is this huge company. So tell us about search engines. Where does that fit into the whole data, database conversation? What are those look like?

[00:32:06] KB: Yeah. So search engines, they get thrown into the database quadrant, but they’re typically not databases like per se. You could use them as a database, but a lot of times things like elastic, they are doing things like full text analysis so that you can do auto-complete on a form. Say you’re on Amazon and you want to search for Nike shoes. And so you start typing N-I, and then all of a sudden, it autocompletes with K-E, and then maybe starts giving you some other top things like Nike shirts, Nike shoes. And so they’re really good at doing like this full text analysis to do like anticipatory autocomplete, things like that, but also doing like full text searching. And so maybe you have a collection of unstructured data, like Word documents and PDFs and things like that. And so if you pump those things into like a search engine and you allow it to do indexing on that data, you can then start doing like, “I want to find something that contains events,” and then you get like your list of documents that have those words in it, the word events, and you’ll be like, “Okay, Comic Con,” all these other things. And so typically, they’re very good at being caches. So a database is like a finite resource. So typically the databases ended up being the bottleneck in an application because you can scale your logic out on multiple servers, but they’re all going to the central place to put their data in and to get their data out. And so you want to really try to conserve that resource as much as you can. And that could be just because of cost. So let’s say you’re using DynamoDB and you’re able to insert your data and your data ingest. It’s not super high, but your data reads are very, very high. And so let’s say you’re going to databases, a service to do those reads, for every one of those requests, that provider is going to charge you for it. And it’s typically more expensive than using something like a search engine. And so a lot of architectures people use these external caches to check the cache first and be like, “Do I have this entry over in this cache?” And also that cache, like a search cache can be a lot faster because it’s in memory. So rather than something like a DynamoDB, where it’s ultimately, you’re making a request and it’s going to disk and has to get it from disk and then return it back to you. Search engines typically run in memory. That’s why they’re so fast and they do this indexing ahead of time. So you can get that back much faster for a cheaper cost. But the thing to consider with that too is you’re now adding multiple data layers. How do you keep them in sync? Is one good question. And then also like sometimes like, “Is the cost worth it?” But it’s about performance and there’s multiple considerations to start doing things like that as well.

[00:35:32] SY: Coming up next, Kyle talks about why knowing the differences between types of databases matter and what the repercussions can be if you end up choosing the wrong database for your use case after this.

[MUSIC BREAK]

[00:35:54] SY: Why does knowing about these differences between architectures and considerations? What is it about that that we should be using to kind of guide? Because those were a lot of different ideas, a lot of different words, a lot of different jargon and terminology. How do we use all of that to make a decision? Or do we kind of let someone or something, maybe a framework or something, do we kind of let that make the decision for us? I can just see analysis paralysis happening here with all the different options and choices that people have. How do we maybe simplify that a little bit, especially for a first time app developer, someone who’s just getting their mind wrapped around databases and different architectures?

[00:36:42] KB: You know, what I would say is just to like zoom it out really far is your first time app developer, I would say don’t use a basic key value store because it’s probably going to be way too simplistic and it will not give you the needs. You’re going to want something that has more robust APIs to interface with. I would stay away from a key value store for the most part. But then I would look to see, do I want something that’s flexible and easy? I would say use a document store like Mongo or HarperDB, just to do a hug. The really nice things with documents stores is they will allow you to be adaptable to your changing needs, especially as you’re creating a new application. The interfaces are typically easier to work with. And like I said, with having a flexible data model, I would say if that’s what your goal is, it’s just like something flexible and easy to work with aim towards a NoSQL database. If you know that you need something more structured, like consistent data type thing, you want to have strong indexing, strong analytics, I would then say look at a relational database. And this gets to be less of an issue between relational databases and NoSQL databases. But if you know that you will need to scale your data, meaning like spin up more instances on a database, that’s what NoSQL databases were built for was being able to sort of scale out your data footprint, easier and faster. Typically, in the past relational databases were built to not scale across multiple servers. That’s getting to be less and less true. But if you want something that you know will scale up, look towards a NoSQL database. And really honestly, if you’re like, “I am anticipating high-scale traffic and I just do not want to maintain it,” look towards a NoSQL database as a service. And there’s a ton out there, but I would say like if it’s just something easy you want to work with, go NoSQL. If it’s structured, go relational.

[00:39:13] SY: What happens if we pick the wrong thing? If we pick the wrong architecture or the wrong type of database, what are the consequences? I’m thinking worst-case scenario, how bad could it be versus likely unhappiness? What are some things that are more likely to happen, but might cause us some concern? What are the repercussions of that?

[00:39:35] KB: Well, you just need to go that big switch on the wall in your office and just turn it all off and just powering down, “Sorry, the development team let you down.” Typically, it’s not so dire, but you will feel some pain. So let’s say you said, “Okay, we need a flexible data model. We are going to start with a NoSQL database and it helps us build this app really fast. The performance is great.” So you build your app, you’ve got a good user base, and then the business says, “Hey, we need to start doing financial reporting on what people are purchasing.” And you look at your NoSQL database model and you’re like, “We’re not indexed for that. It’s going to take us a week to get you that data because we have to crawl the entire database to get all of these values out and then to do aggregates.” So that’s a downstream impact, right? So maybe it doesn’t directly impact your users, but it impacts your business. And now like your CFO has to weigh a week in order to be able to do his financials and that’s problematic. Or let’s say you’ve got a NoSQL database, and as the current state, everything’s indexed perfect. Everything’s set up properly. But the product team says, “Hey, we want to add a new feature, but we need to be able to search on certain types of data. How do we do this?” And your database administrator says, “Well, we need to re-invest the entire database and we’re going to have to bring the database down in order to do that.” So then you have to figure out like, “How do we handle an outage during this time?” It’s certainly achievable to do that, but these unintended consequences, and in part, you only can anticipate what you know at the moment. The nice thing now is there’s lots of people that have gone through the same pain. You go on Stack Overflow to figure out how you overcome these things. It’s never quite so dire, but it definitely creates some temporary pain points. I would say though, like, there’s always a point in time with a product’s lifecycle where there is a determination to say, “We need to change database platforms.” Say you’re on Oracle and we need to move to big table on Google. And so there’s just like a big development investment. Depending on how your application is architected, typically you want to try and decouple the database logic from the business logic. And so that there’s a little bit of an abstraction for how you’re sending data in or asking for data. And so your business logic that never gets impacted, it’s just that layer in between your business logic and your database that would have to change. And that’s where ORMs are really nice because that does allow people to more easily switch from one database to another. It’s never easy, but from a programmatic standpoint, those pieces do help. But I would say it’s never the end of the world, but it is definitely depending on scale of data. It could be a couple of months’ worth of work to change databases to maybe a year, depending on how complex your logic is and what is the migration path from one to another. But also there’s other ways of thinking about these things too is you could still, like in my first example, the CFO needs to reporting is to say, “Well, our NoSQL database, that’s our operational database.” And so that means that is handling our day-to-day, meeting our customer’s needs, and we need another database that is for internal use. So maybe you just create a job every night, this synchronizes data from your operational NoSQL database to this relational database. You don’t always have to think about it as all or nothing. It can be a blend of things.

[00:44:02] SY: So for folks who are really interested into digging more into the nitty-gritty, maybe getting to be the CTO of a database management company like you are, where can they start? How do you kind of dig into the super technical back-end stuff that you’re working on?

[00:44:18] KB: So part of it is getting yourself. If you’re starting as like a new developer, take a Udemy course on like database basics just like I would start with just how do I interact with the database, like how do I put data in, how do I get data out. Start with your database of choice. Just start there, like how do I interact with databases?

[00:44:47] SY: Now at the end of every episode, we ask our guests to fill in the blanks of some very important questions. Kyle, are you ready to fill in the blanks?

[00:44:54] KB: I'm ready.

[00:44:54] SY: Number one, worst advice I’ve ever received is?

[00:44:58] KB: The worst advice I ever got, and this was more messaging through high school and college, was like, “Have a track. Know what you want to do for the rest of your life and just get on it and ride that train.” That created a lot of conflict for me, especially in college, we talked about early on, where I started out with an engineering track. I moved over to exercises and sports science. I had a lot of internal anguish because I always felt like I should already know what I’m doing and I feel lost. As I’ve learned over many years that life is not linear, I’ve learned to unwind that advice and that story. But that’s probably the worst thing I’ve ever attached myself to very early on in my life.

[00:45:45] SY: Number two, best advice I’ve ever received is?

[00:45:49] KB: To take a step back. If you’re trying to solve a problem you can’t crack or you’re in a challenging situation, whatever it is in your life, take a moment, take a step back and ask yourself, “Is there a different way that I can approach this?” It’s a super simple question, but it, one, gives you a moment to take a break, breaks you out of it, and it just opens up like that question of like, “Is there something else I could be doing here? Could I ask for help? Could I think of a different way of approaching this? Could I ask my friend a different question or connect with them in a different way?” It can open up all sorts of doors, not just technically, but just in life. That was the best thing I’ve ever heard. It was from a therapist that gave me that phrase.

[00:46:33] SY: Number three, my first coding project was about?

[00:46:36] KB: In tech support. We were using Siebel, which was like a ticketing system, and I somehow got access to the database, got the credentials to access the database, I built some custom, it was more out of curiosity, but I built custom reports to help our team understand better like the work we were doing and to get better information around like the rates that we were answering customer’s questions and wherever we are dropping the ball and being able to be more effective at our jobs. So I built like this custom report portal very early on and I just muddled my way through it.

[00:47:16] SY: Number four, one thing I wish I knew when I first started to code is?

[00:47:20] KB: To ask why more than what. So why does the system work the way it does? Why when I write this piece of code that it writes data, it creates a ticket in this other table? Why does it work this way? I was so focused on what and getting things done. And I think if I had this perspective of more curiosity, I would have been better at my job and I would’ve learned more early on.

[00:47:51] SY: I like that. Well, thank you again so much for joining us, Kyle.

[00:47:54] KB: Yeah. Thank you. This was great.

[00:48:02] SY: This show is produced and mixed by Levi Sharpe. You can reach out to us on Twitter at CodeNewbies or send me an email, hello@codenewbie.org. Join us for our weekly Twitter chats. We’ve got our Wednesday chats at 9 P.M. Eastern Time and our weekly coding check-in every Sunday at 2 P.M. Eastern Time. For more info on the podcast, check out www.codenewbie.org/podcast. Thanks for listening. See you next week.

Season 17 EP 9 September 26, 2021

What are some database architectures and their use cases Kyle Bernhardy

Description

Show Notes

Transcript