[00:00:00] (Music) SY: Welcome to the CodeNewbie podcast, where we talk to people on their coding journey in hopes of helping you on yours. I'm your host Saron and today we're talking about using data to build apps. (Music) Jeff Nelson is an executive at two companies. He's the CTO of Blavity.

[00:00:26] JN: So Blavity is a tech media company. We wanted to use technology to bring black millennials together in a space that was cultivated specifically for us.

[00:00:37] SY: And he's the CEO of Cinchapi.

[00:00:39] JN: Cinchapi is a company that I founded to really bring innovations in the data analytics space. And in particular, what we wanna do is we wanna use natural language and machine learning to give companies real-time insights and the power to act on those insights when it matters.

[00:00:55] SY: And even though these companies are very different, data is at the core of both. Jeff describes why data is so important; how to use data to make your app better and smarter; and shares his own journey of building his own database.

[00:01:12] Flatiron School teaches you how to code from anywhere. They've got an awesome community of career-changers and a number of different options for you to pick from to become a software engineer. They've got full-time in-person courses, self-directed introductory courses and a remote online web developer program. They even have a free 75-hour online prep course where you can learn Javascript, Ruby and do some interview prep. Go to flatironschool.com/podcast to learn more. That's flatironschool.com/podcast. Link is in your show notes. 

[00:01:46] One of the best parts of being a coder is finally being able to bring your passions to life. You have the skills to design, to code, to create the thing you're excited about and share that passion with the world. And Hover can help you with the first step of sharing your passion with the world: getting your domain name. They've got a really beautiful and easy-to-use interface where you can find and register your new domain name in just a few steps. And to give you full control, they separate your domain name from your hosting so you're never stuck with one service. They keep your domain name safe while giving you the flexibility to use whatever hosting service is best for you. They also give you free WHOIS privacy, so your personal information is safe, too. To get started, go over to hover.com/newbie to save 10% off your first purchase. That's hover.com/newbie. Link is in the show notes. 

[00:02:33] DigitalOcean provides the easiest cloud platform to deploy, manage and scale applications of any size. They remove infrastructure friction and provide predictability so you can spend more time building what you love. Try DigitalOcean for free by going to do.co/codenewbie and get $100 of infrastructure (Music) credit. Link is in your show notes. 

[00:02:58] SY: Ok. So when I think about Cinchapi and the description you just gave, that sounds like a, a pretty technical product. Like tech is at its core. But when I think about Blavity and the idea of being a media company, a community that feels less technical in terms of it being the core value proposition, what are the technical aspects of something like Blavity?

[00:03:23] JN: Yeah, so Blavity is certainly a, you know, we intentionally call it a tech media company. We don't necessarily build tech products yet, but we are certainly tech enabled. How Blavity has really been able to gain a competitive advantage amongst other digital media players in the space are for a couple of reasons. One, being millennials ourselves, we really understand, you know, in some ways we, many of us were, were sort of born with technology. We don't know a world that exists without social media and digital media. And we're able to capitalize on that instinctive nature of it and build platforms. Another aspect of tech that helps Blavity is the fact that we use tech on the backend to give us some strategic advantages. So we use tech in the content management system that we built that allows us to do some cool things in terms of syndicating content across our brands very easily so we can very easily capture content and disseminate that content to our audience. But we can also get very deep metrics and understandings about our audience based on not only what they're doing on a particular site, but what they're doing across sites in our ecosystem.

[00:04:35] SY: Very cool. So it's interesting because even though Blavity and Cinchapi are—sounds like fundamentally very different ideas, different customers, different value propositions—data and this idea of being able to analyze and act on data feel fundamental to both.

[00:04:52] JN: They are, and it's very fundamental to me as a person. I started my career working in data. I started working at a company called Palantir Technologies. So Palantir is where I got my start in—when I first started to code, I didn't really get it. Software development sort of came easily to me, but it was kind of boring. (Laughing)

[00:05:14] SY: You are not allowed to say that on this show, ok?

[00:05:16] JN: I know, right? It, it was—it seemed boring. And, and I actually remember that when I was in college—because I studied computer science in school. I actually started in high school—there were three things that I said I'd never do. I said I'd never become a lawyer. I'd never become a doctor. Both of those because I wouldn't be qualified. And three, I'd never major in computer science.

[00:05:35] SY: Wow.

[00:05:35] JN: You know I intended to be a math major. And part of the math curriculum was taking an intro computer science course. And I had a great professor who really gave me a different perspective on coding and computer science. And that professor showed me that coding is a way to solve problems. And that really appealed to someone like me who my nature is that of a problem solver. And so I saw coding as being more than just syntax and methods...

[00:06:03] SY: Yes.

[00:06:03] JN: ...and functions and objects. And it was really a way to engineer solutions to some real challenging problems, and I got opportunity to look at that first hand at Palantir. And we were working on very complicated data problems from two perspectives: one perspective from building the systems to store and process that data but then the other perspective, I also got the opportunity to work directly with customers who are trying to analyze and use that data. And what became apparent to me was, was that data is essential to solving any problem. You know, knowledge is power and information is knowledge. And data is information, right? And so that, that sort of logical chain there, data is at the heart of everything we do. So yeah, it's absolutely core to what makes Blavity a competitor and have an advantage over others in the space. And it's also at the heart of what Cinchapi is trying to give to other companies in the enterprise space.

[00:06:59] SY: You know, the thing about data is I've always been very interested in it conceptually, but it, it sounds and looks very hard and complex. And I have really no idea where to even begin. So for you—especially I'm thinking about the, the job that you had at Palantir—where was the starting point for you? Where did the beginning of your data journey—what did it look like? 

[00:07:25] JN: Yeah, so when people think of data, their understanding of what that means is different, right? Some people think of data as just, you know, sort of a spreadsheet with minimal information. Some people think it's very intense data science-driven predictive analytics.

[00:07:42] SY: Yeah.

[00:07:43] JN: For me, data's always been something that, that's been important to me because I'm a very analytical person. On the working at Palantir, you know, I was on the software engineering team and also did a rotation on the business development team. And so I would say my professional data journey began from the standpoint of actually wanting to build software and understanding that software itself—any software that's interesting—is data driven. Right? It's dynamic. If you think of something as simple as a web page that displays today's date, right? That relies on some data, some dynamic data that has to come from somewhere.

[00:08:21] SY: Yeah, that's true.

[00:08:22] JN: You've gotta integrate with that data, right? And so any idea that you can think of that's interesting, it's gonna rely on being able to one, get information from, from users to be able to store that information and then retrieve it later so that users can leverage it again to do additional stuff on the product that you're building. And so my data journey began from that standpoint. I was working on a lot of side projects and building data-driven applications. And what frustrated me—and this is... at least what I'm gonna assume you were alluding to when you, when you said data can be kind of scary at first—is that for me as a developer at that point in time, working with databases was such a headache. I actually found that I spent more time thinking about how to integrate with the database and the data source and to deal with the data than I did thinking about the application that I wanted to build. And so that's where, where my data journey began, and that's where I, where I got fascinated with this idea of simplifying data and interacting with data and processing data and managing data. Simplifying it from the perspective of people that use data to build products and build cool things with it. 

[00:09:32] SY: So you mentioned that you were doing some side projects where you were trying to use all this data, and that's where you felt that pain point. What were some of those projects? What kinds of things were you trying to do?

[00:09:42] JN: Oh, yeah. So this is, this is taking me way back. I might, I might be embarrassed by some of these. (Laughing) 

[00:09:47] SY: No judgment.

[00:09:48] JN: Yeah. (Laughing) Wait, we've all had bad ideas, right? And it won't be my last one, I'm sure, but, but one of the things that we were building then was it was a location-based social network that was really trying to solve the problem of getting people to determine what events they wanted to go to based on who else was at the event in real time. And so this was...

[00:10:11] SY: Oh, ok.

[00:10:12] JN: ...when we were—yeah, this was when we were in college. And we were really trying to solve the problem of "hey, what, what am I gonna do tonight?" Right? But it's one thing to pull out your phone and say, "ok, ten people I know are at this place that are at this party that I didn't even know about. I'm gonna go there." Right? And, and we had all kinds of ideas about privacy and, and all that stuff.

[00:10:31] SY: Yeah.

[00:10:31] JN: And it was, it was really cool.

[00:10:33] SY: That is such a college app, by the way. (Laughing)

 [00:10:35] JN: It, it really is, right?

[00:10:36] SY: It's such a college student app. 

[00:10:37] JN: If someone else is building that, please let me know, and I'll happy—happily support you in whatever way I can. So, so that, that was one of the ideas. And there are others. A lot of them were social networking in nature. What we were really trying to do—and, and you can maybe kinda see the origins of the Blavity thesis—what we were trying to do was connect people and use technology to connect people along common interest or along location or in any other way. As you can imagine data is extremely important to that, and so that's where those beginnings came from. 

[00:11:07] SY: So in an example like that—let's just go with it. Let's go with the college party Friday night app idea. What is a way to leverage data in that? Like how do we make that, how do we enhance that experience with data? Tell me what that would look like. 

[00:11:27] JN: Yeah, so you'll often hear this concept of MVP, which is minimum viable product, and I sometimes catch flak for saying this—I'm not a huge MVP person.

[00:11:35] SY: Really? 

[00:11:36] JN: Really.

[00:11:37] SY: What?

[00:11:37] JN: And I'll tell you why. I don't, I don't disagree with MVP.

[00:11:39] SY: Ok. 

[00:11:40] JN: I think MVP is implicit in what I'm gonna tell you.

[00:11:43] SY: Ok.

[00:11:43] JN: I actually focus on the MRW, which is the minimum required workflow.

[00:11:48] SY: Interesting.

[00:11:49] JN: And what MRW forces you to do, it forces you to think about your user and their journey. And so I map out the journey of the user step by step by step, and I figure out what I can eliminate, what I can consolidate, what I can optimize and what I can minimize. Now implicit in that I do think you end up getting MVP because when you're focused on the user journey you force yourself to focus on what really matters to the user, and you do come up—hopefully if you're doing it correctly—the minimum viable product. But I find when people just think about MVP, it gets misconstrued... 

[00:12:21] SY: Yeah.

[00:12:22] JN: ...as being "how can I hack together something that...

[00:12:24] SY: Yeah.

[00:12:25] JN: ...sucks, but, but I can say, but I can say that I've solved the problem, right?"

[00:12:28] SY: Yes.

[00:12:29] JN: For the use case that we were envisioning, the MRW, that minimum required workflow, is that I'm able to open an app and it tells me what I should do tonight. It's almost sort of like an intelligent assistant. We know this is what you're interested in. We know this is happening around you. We know these are the people you are interested in and that you wanna hang out with. And we know where—not only where they are now, but where they're planning to go for the rest of their evening. And you can imagine that, at scale, has a lot of potential. And so we were trying to build something that had that sort of workflow, right? Data is a very important part of that because you've gotta be able to, to track what users are interested in. You've gotta be able to track where they are, where they're going and build this sort of network graph of, you know, different layers of interest, right? Not only what I'm interested in, but what are my friends interested in? What are my, you know, tertiary level of friends interested in? So that's something that you've gotta do and that's really important. And that's an example of how data was important in some of the stuff we were trying to build. 

[00:13:28] SY: And so I, I get that conceptually, this idea that I wanna see what parties you did go to in the past and see if I can potentially, you know, give better recommendations with that. And like you said, the parties where your friends are and, you know, maybe your best friends versus your, you know, the people you just kinda know, you just kinda hang out with. And so I, I get that you want to track that data and you wanna integrate it. I think the part where it feels tricky and confusing and kind of like a black hole is the how. 

[00:13:57] JN: Yeah, and that's where the machine learning comes into play. If you think about just humans in general, we're very complex. And so machine learning comes into play, and machine learning is such a data-driven process because you've gotta be able to collect so many different data points and so many different dimensions and attributes, some of which that you may not even know are important... 

[00:14:19] SY: Right.

[00:14:19] JN: ...because you've gotta be able to scale out the processing to see ok, what factors actually influence decision making? And this is where some of the predictive analytics comes into play. When you're building something like a recommendation engine, it isn't as simple as well you listened to this, this song, you know, five times so we think you like this artist, right? They, they try to introduce you to new artists or new TV shows, right? And it's based on not only what you've watched but also what other people like you have watched. And so that's where you can kinda see the richness of data coming into play because if you've got information not only about you as a user but you've got information about other users and you can then begins to cluster them into cohorts of user types. And then you can be able to analyze what those user types do, when they do it, what factors are present when they're doing that. And then you can begin to sort of, you know, when a new user comes onboard, you can put them into one of those buckets and say, "ok, well you're like all these other users, so we think you're gonna like this information. And then we're gonna learn from you and make our algorithms even better."

[00:15:23] SY: Ok, so when I hear clusters and, you know, the data points and the factors and the variables, I'm thinking statistics. Is this basically statistics?

[00:15:34] JN: So statistics are certainly a component of the machine learning data science process. And machine learning and data science—those are two terms that are really buzz-wordy takes on things that have been around for a long time. (Laughing) So data science in many ways is statistics, right? It's, it's advanced statistics. And now people don't say that they're statisticians, they say they're data scientists. And sort of machine learning is a branch of arti, artificial intelligence, but machine learning is sort of a revamped play on some of the deep learning techniques that were happening in the 90s that people were using to build recommendation engines and, and really build an understanding of humans and behavior based on past behavior, right? Building, building intelligent games and, and things like that. And so what makes it different from just being pure statistics is that you are building systems that need to continuously learn and adapt.

[00:16:29] SY: Right.

[00:16:29] JN: So statistics are really good when you have sort of a static model, but in the software systems, the more mature software systems that we're building, the model itself needs to adapt and update based on users entering the system or leaving the system or, or what have you. And so that's what makes it a little more interesting than plain old statistics, not that that's boring, (Laughing) but I think that's what the distinction is.

[00:16:54] SY: So when you are trying to build such a system, how much of it is using algorithms and libraries and tools that have kind of figured a lot of that stuff out versus you having to start from scratch and say, "ok, I need to understand every single part of how to make those cohorts and how to cluster data," you know, because it feels like we can go real, real deep, right? Like we can go all the way to the bottom and, and have to understand every single piece of math and equation that goes into it. Is that generally how it is? Do you have to understand every single layer? Or how much of it has figured out already? 

[00:17:33] JN: In machine learning especially, the techniques or algorithms, if you will, those aren't the secret sauce. So Google, for example, open sourced their machine learning processing framework Tensorflow. And people were like, "why is Google doing this?" (Laughing) And...

[00:17:50] SY: Sounds crazy.

[00:17:50] JN: ...with machine learning. It isn't so much the algorithms. While there are some proprietary approaches and techniques and algorithms out there, the real advantage when it comes to machine learning and building intelligence systems is the data, right? And so a lot of the approaches when people are doing stuff with machine learning, the question that's always gonna be asked is "where are you getting your data from? Is it proprietary? You know, can anyone just go and get this data?" And two, "how are you going to continue to update that data and refine it?" That's the major question.

 [00:18:21] SY: Ok. That makes sense. So it's, it's basically like, you know, I don't, I don't need to build a web framework. I can just use Rails. Like that part is figured out. It's open source. It's available. Anybody...

[00:18:34] JN: Yeah.

[00:18:34] SY: ...can look at it. What makes my app, my website, my, my thing special is the fact that it's a really great blog publishing platform or, you know...

[00:18:44] JN: Ex—yeah.

[00:18:45] SY: ...the application of it. So it sounds like it's the same idea with data science and natural language processing...

[00:18:49] JN: Exactly.

[00:18:49] SY: ...and that whole thing. 

[00:18:50] JN: Exactly. What's interesting are not, is not the wheel of the car, but it's actually the car and the interior and the—how the engine and, and all the—I'm, I'm gonna expose the limits of my knowledge about cars (Laughing) because I think I've, I think I've named all the parts... 

[00:19:02] SY: You did better than me.

[00:19:03] JN: ...that I actually know. (Laughing) But yeah. So, so, so you've got it spot on. That's what makes it interesting.

[00:19:07] SY: Ok. Well that's kind of a relief, I have to say. Because when I read about this stuff, I'm like oh man. (Laughing) That sounds like that's a whole other degree.

[00:19:15] JN: Yep. Yeah.

[00:19:15] SY: I gotta go back to school to learn this. So how much should we know, right? If Tensorflow already exists, if a lot of the stuff is already open and available, how much should we familiarize ourselves with, you know, should we have some type of comfort, some level of comfort in this world before we dig in? Or can we just start building?

[00:19:38] JN: Well I on Twitter—and I don't know who tweeted it—but I saw, I saw a tweet. Someone said, you know, if you're a developer, you should aim to understand one part of the stack lower than what you work on.

[00:19:51] SY: Oh. I like that.

[00:19:53] JN: Yeah. So the example is that if you, you know, if you build websites or if you're building, building a web framework, then you should have an understanding of the HTTP protocol, the networking even though that stuff is there for you and you're not gonna implement it directly, you should understand it. I think that's a good rule of thumb. 

[00:20:10] SY: Yeah. 

[00:20:10] JN: It certainly, certainly doesn't hurt to go deeper in the stack. You know, I'm gonna go back to my days as a math major. And part of the reason I switched to computer science was not only because I found it to be more interesting than I thought, but I also realized I'm not as good at math (Laughing) as I thought I was. But one of the things that I do remember from math is when you're learning derivatives. 

[00:20:33] SY: Yeah. 

[00:20:34] JN: They teach you the approach of taking the limit and doing the calculation manually, and then you try to solve one problem and you've written on the front and back (Laughing) of a piece of paper trying to get it right, eraser marks all over the place. And then the, the next class you go in and they're like, "ok, here are the rules, right? If it's an exponent, then you just break it down." And it's like, "woah, wait, wait. Why didn't you just tell... 

[00:20:56] SY: Why did you make me... 

[00:20:56] JN: ...me this right out of the box?" Right. It's like why do we waste time on that other thing. Right?

[00:21:01] SY: Yeah. 

[00:21:01] JN: But, but it give, it gives you an appreciation for the shortcuts, if you will, or the, the frameworks that have been provided to you if you at least have that understanding of the inner workings of what's going on.

[00:21:13] SY: Yeah, and I assume also that it gives you the ability to tweak the system a little bit, right? Like if, if those rules don't work exactly for you, you have enough knowledge to know what you need to do or how you can use it differently or maybe you need to build your own thing. You know, you get to really understand that system enough to know how to change it if you need to.

[00:21:34] JN: Yeah, and that's, that's how I got started with Cinchapi. Cinchapi started with an open source database called Concourse. And, you know, when I was building these side projects and being frustrated by how you interact with databases and data sources and, and, and data layers, I was frustrated with it from a developer standpoint. And, you know, certainly I could've used any number of frameworks, but I actually said, "you know what? I'm gonna go a level deeper in the stack, and I'm gonna figure out how these databases work. And you know what? I'm gonna build my own. I'm gonna build the database that as a developer I want." So, you know, I ended up building Concourse, and I built an array of features, you know, automatic indexing for data so that if you want to do complex queries or ad hoc analytics, you didn't have to think about that upfront. You know, we built in version control so that you can do queries across time so that if you wanted to know what data looked like in the past, you know, you could just do that out of the box, and you don't have to manually set up revisioning or tracking for data changes or anything like that. But that came from being frustrated by the framework and then being able to go (Music) a level deeper in the stack and figure out what was going on and not only improving it, but building something that was a, was a better fit for what we wanted to do.

[00:22:56] SY: Coming up next, Jeff builds his own database. But to do that, first he needs to understand what a database actually is. He takes us on that adventure and shows how you can get started on your own data adventure. After this. 

[00:23:14] When I learned to code, I was so excited to finally bring my passions to life. I could build things that I really cared about and share them with the world. And the first step in sharing is getting a great domain name. That's where Hover comes in. They've got a really slick east-to-use interface. They've got awesome domain names to pick from and they separate your domain from your hosting so you have full control and flexibility over your online identity. So go to hover.com/newbie to save 10% off your first purchase. That's hover.com/newbie. Link is in the show notes. 

[00:23:46] You want to get serious about learning to code, but where do you start? Flatiron School's got the perfect thing. They're offering their free 75-hour online prep course, where you dig into Javascript, Ruby and more. If you're not sure where to start, start there. And when you're done, you can keep learning with their self-directed introductory courses, remote online web developer program or full-time in-person courses. Whatever your schedule, they've got options to help you reach your coding goals. To learn more, go to flatironschool.com/podcast. That's flatironschool.com/podcast. Link is in your show notes.

[00:24:24] DigitalOcean is the easiest way to deploy, manage and scale your application. Everything about it was built with simplicity at the forefront. Setting, deploying, even billing. Their support is amazing. They've got hundreds of detailed documentation and tutorials, so if it's your first time deploying an app, they've got great tools and community to make it nice and easy. Try DigitalOcean for free by going to do.co/codenewbie and get $100 of infrastructure (Music) credit. Link is in your show notes. 

[00:24:55] SY: So I know you mentioned that you had some experience and kind of started your data journey at Palantir, but when you were doing this, when you were doing Cinchapi and building your own database, were you already knowledgeable on how to do that? Or did you have to start from scratch?

[00:25:12] JN: No. I didn't learn databases in college. I guess my first—I remember I was building the website for our student government. That's the first time I, I actually worked with a database 'cause it was a, it was—we, we sort of had a, a CMS that we built internally, and it was backed by MySQL. And so I had to learn SQL and, and how to query data, how to store data. But beyond that, I didn't know anything about the internals of databases. All the stuff that I do, that I know now. I learned that stuff on the fly while trying to build a database.

[00:25:46] SY: Wow.

[00:25:46] JN: And I learned everything from, you know, what, what is an ACID transaction in a database, right? Which is the—ACID stands for atomicity, consistency, isolation, durability. Those are properties in a database that make databases safe, which is essentially it says if you sort your data in a database, it won't get lost, right? And it's a very fundamental and technical property of databases. And I not only had to learn what those mean, but I had to learn how to guarantee those properties... 

[00:26:13] SY: Yeah.

[00:26:14] JN: ...in code that I wrote, right? I had to learn all that on the fly, and it was really fun. And now, now I'm like an expert in databases because of it.

[00:26:21] SY: I'm trying to imagine if, if I said, "I wanna build a database today." (Laughing) Where I would start—I think I would start by googling "how do you build a database?" Like, like what, like what's...

[00:26:34] JN: Yeah.

[00:26:34] SY: ...the first step in that? 

[00:26:36] JN: Well that, that was, that was one of my questions. I was like, "how, how do I even..." 

[00:26:39] SY: Right. 

[00:26:39] JN: What's the first method that I write for this? 

[00:26:41] SY: Right.

[00:26:41] JN: Right? (Laughing) And it really was a great exercise because it reinforced just everything that I learned in school and along the way about building software and modular pieces. You know, building software to be testable and building software to be abstract so that you can reuse components. 

[00:27:04] SY: Yeah. 

[00:27:04] JN: And so yeah, if you wanted to build a database from scratch, the first thing you'd start with is, is your storage engine, the very basic sense of "ok, what is data? How does your database define data? How are you gonna actually store the bytes on disk?" And so I really had to do a deep dive... 

[00:27:20] SY: Wow. That is deep.

[00:27:22] JN: ...on—yeah. On, on binary formats, and I had to make sure that I was storing enough information about the data without so that I could reconstruct the data in memory when I needed to, but... 

[00:27:31] SY: Wow. 

[00:27:31] JN: ...also not storing so much of it that I was being inefficient in how I was using disk space and therefore consuming too many resources. So yeah... 

[00:27:41] SY: Oh my goodness.

[00:27:41] JN: It was, it was a lot of fun.

[00:27:43] SY: Yeah. So for the database that you made, what language did you write it in?

[00:27:49] JN: So I wrote it in Java.

[00:27:51] SY: Ok. 

[00:27:51] JN: And the story here is that Concourse actually started as a sort of wrapper around MySQL written in PHP because I was doing a lot of projects in PHP. And it ended up being the fact that I was this wrapper sort of this framework around MySQL was technically using MySQL, but it was really just abusing how MySQL was intended to be done, intended to be used. And so I had this, this PHP project, and I thought well maybe I should write a database in PHP. And then I, I looked into that and, and quickly learned why that's a horrible, terrible idea. (Laughing) And so I, I chose Java. And the...

[00:28:30] SY: Wait, wait, why was it, why was it a terrible, horrible idea?

[00:28:33] JN: Well in—so PHP in general is a language where—it isn't built for a long-running processes. You know, like if you've got a, got a Wordpress instance, right? Whenever someone comes to a Wordpress site, all the context of what's happening in the PHP world is loaded every time you do something on the site, right? So it isn't like there's this big long-running process that's standing in the back and, and doing stuff, right? And databases really need that. And PHP isn't really good at that. So that was not a good idea. 

[00:29:07] SY: Ok.

[00:29:07] JN: And so I decided to write it in Java. And I wrote it in Java—I wrote Concourse in Java when I decided to actually write the database for a couple of reasons. One, Java was the first language I ever learned. So I had a, you know, a partial affinity for it. Two, Palantir used Java. Three, when you're thinking about building a low-level system, there are a couple of things you wanna balance. And here, here's where the optimization comes in. One, you certainly want a balanced performance, but two, when you're a developer that is one entering a problem space that you don't know a lot about, you certainly want to have a language that abstracts away the things that could trip you up. Right? So if I'm writing a database and I have to manage memory on my own, then if I have bugs I don't know if it's because the database itself is written poorly or...

[00:30:03] SY: Right. 

[00:30:03] JN: ...if it's because it's a memory management issue or it's a low-level bug. And so Java being a Javian-based language abstracted a lot of that stuff out of the way for me. And so I was able to focus again on the part of the building the thing that I wanted to build that was really interesting. And then lastly, Java has a huge open source community. It has a... 

[00:30:24] SY: Yeah.

[00:30:24] JN: ...huge open source infrastructure. Everything from the Apache Software Foundation to Google doing a lot of open source Java work. And so I sort of looked at all those factors—Java's a low, low-enough level language and has good performance that it's suitable for a database, but it's also a friendly developer language. I also did like the fact that it is a fairly verbose language because I could go back to my code five years from now and know exactly what, what it's supposed to do and why it does what it does. So those are some of the reasons I chose Java.

[00:30:56] SY: So whenever I think about picking a, a language or a framework for a new idea that I have, one thing that I'm trying to balance is do I pick a tool that I'm really familiar with and I, you know, I understand it. I know its limitations. I know what it can do. I know what it can't do, and I'm just really comfortable. Or do I pick something that is perfectly suited or at least very well suited for the application, for the end goal. And it sounds like in your situation there's a little bit of both. You know, it ended up being...

[00:31:27] JN: Yep.

[00:31:27 SY: ...a really good tool and it was something you were exposed to. But in general, how do you, how do you think about that?

[00:31:32] JN: Yep. So you definitely wanna look at your primary goals, right? So if your goals are strictly to learn, perhaps learn a, a new language or, or a new technology, then you've got a lot of flexibility there. And, and certainly if you wanna learn a language, you should, you should use that language. But if you're, if you're looking to learn—for instance, if you're, if you're saying well I really wanna learn about blockchain, right? Well you should pick something that you already know so that what you're learning is Blockchain instead of learning... 

[00:32:02] SY: Right.

[00:32:02] JN: ...another language in addition of blockchain 'cause then you can't really separate the two.

[00:32:06] SY: Right. 

[00:32:06] JN: I think it always depends on your goals. If you're building a product that's gonna be, you know, used either commercially or you're hoping to turn into a business or you, you wanna get a team around, then I think another consideration is how easy or how hard is it going to be for me to find other developers to join me on this project. 

[00:32:25] SY: Right.

[00:32:26] JN: Right.

[00:32:26] SY: Yeah.

[00:32:26] JN: And so yeah, if you're using, if you're using what I consider to be a hipster language, (Laughing) like, like Haskel or something, right? You know.

[00:32:34] SY: I love that.

[00:32:34] JN: Cool. Right.

[00:32:34] SY: I love that category of languages. These are the hipster languages. 

[00:32:38] JN: These are the hipster languages, right? If, if you're using one of those, then yeah, it's probably cool. It's probably very specialized, and, and it does something, but is there a knowledge base out there or a big enough community of people to help you if that's what you're looking for. But whatever language, you know, you have to decide if even though there is a wide community of knowledge out there, is that language trending towards the direction that technology is heading? Right? And so tons of factors go into that, but I think it always depends on what your goals for the project are.

[00:33:10] SY: Absolutely. So with something like "I wanna build a database," the first step sounds like it's figuring out well what is a database even? What is the core of that enough to know, "oh, I need to learn about bits and how much space it's gonna take and that, that sort of thing? So it's almost like a learning what you need to learn step.

[00:33:32] JN: Yeah.

[00:33:32] SY: How did you figure that part out? 

[00:33:35] JN: Where I always start from is when you're thinking about problems, right? You think about what the problem or what that—or what the solution to the problem depends on. It's like ok, I wanna build a database. And I know that databases store things on disk. So how do I store the stuff on disk? Do I just like get a Microsoft Word file and, and write the stuff in there? Or is there something else that I need to learn. Ok, now I have to go through a process called serialization. Ok. What does serialization mean? Ok. Serialization is the process of taking objects that are stored in memory and representing them in a way where they can be stored on disk. And then the de-serialization, the opposite of that, is reading that stuff back from disk into the same object in memory. Ok. How do I do that? What, what are the rules for that? Well, there are no rules, but there are some frameworks. And so there was Apache Thrift, which is what we used in Concourse. And Apache Thrift was a serialization framework. And so you look at that, and you read the documentation there, and it has some terms that you've never heard of before. And you look those up. And so it's sort of this...

[00:34:37] SY: Yeah.

[00:34:38] JN: ...depth-first search way...

[00:34:39] SY: Yeah. (Laughing)

[00:34:39] JN: ...of understanding dependencies. And then you come back up, and you're like "ok, I know—I now know everything about how to store data on disk in the most efficient way possible."

[00:34:49] SY: Yeah.

[00:34:49] JN:  Now what do I need to learn next? And then you go down the stack. And that's how it worked for me. 

[00:34:53] SY: Interesting. So it's almost like embracing the rabbit hole.

[00:34:56] JN: Oh yeah. Yeah.

[00:34:57] SY: How long did it end up taking you to finish the database?

[00:35:01] JN: Well, you know, it's a, it's a continual effort, right? So it's, it's still under active development, but I do remember I at least wrote a working version of it, and then I rewrote it two times maybe. (Laughing) And, and part of that was because... 

[00:35:13] SY: That sounds right.

[00:35:14] JN: Yeah, the, the approach that I was using to store data on disk, I found out it was really good for reading data but it sucked for writing data. And it was not efficient, right? And so I had to go back and rethink it. And those are the kinds of challenges I love because I love optimization problems and building Concourse, you know, I really had to think about what am I trying to optimize? And how am I gonna sort of thread a needle through some really tight spaces to come out with a system that accomplishes a set of goals that I have for it? 

[00:35:50] SY: So how long did the first version, the first... 

[00:35:53] JN: So the first version...

[00:35:54] SY: Yeah.

[00:35:54] JN: Yeah, so I think—I got that, you know, I started working on it in January. And by March... 

[00:35:59] SY: Nice. Ok. 

[00:35:59] JN: What, when—well this is one of the ones I had to rewrite, so maybe I should have taken, taken a bit more time with it, right? So I spent about three months to get a, to get a version—you call it a prototype—that I didn't really like. I didn't really love (Laughing) right? After that, I spent maybe another five or six months working on the second version, and that formed the basis. I didn't do a complete rewrite at that point. That sort of formed the basis of the approach that I was going to take. And I've been iterating on it ever since. 

[00:36:29] SY: Yeah.

[00:36:29] JN: And so—but yeah it's fun. I mean, it's five years now, and, and I still...

[00:36:33] SY: Still doin' it.

[00:36:34] JN: ...write code for it everyday.

[00:36:35] SY: Yeah.

[00:36:35] JN: Yeah.

[00:36:36] SY: Very cool. So when you were creating I guess the first year development, was it something you were doing full time? Was it on the side? Were you, you weren't a student at this point. You had already graduated, right?

[00:36:46] JN: Yeah, so this was—so this is 2013, right? 

[00:36:49] SY: Ok.

[00:36:50] JN: So this is—I had been at Palantir for a while, and I'm doing this, you know, on the side, right? So I'm doing it nights and weekends. That—it's a common question I get, which is how do you balance those two? I always start by saying "well there are 168 hours in a week." And everybody gets that, right? And so it's about how you prioritize things in that 168. And of course there are sacrifices. 

[00:37:15] SY: Yeah.

[00:37:15] JN: I'm a father. I have two young children. I have a ten-year-old boy and a three-year-old girl, right? And so being, you know, involved with two companies is kind of like having my two kids. Blavity is like my ten-year-old son. (Laughing) Blavity is, is much older. What Blavity requires from me—similar to what my son requires from me—is to give  him some more independence. In the case of Blavity it's to be—for the team to provide high-level thinking and mentorship and strategy whereas with Cinchapi—like my daughter—I'm a lot more hands on. Right? I'm, I'm still...

[00:37:47] SY: Yeah.

[00:37:47] JN: ...I'm still tucking her in at night, right? (Laughing) At Cinchapi, I'm still writing code.

[00:37:51] SY: Yeah.

[00:37:51] JN: And, and so that's where—how it's possible to balance that because they're in two different stages, and they require two different sets of skills from me. But that work ethic for me comes from this idea that for me, I believe in work-life flow, right? And...

[00:38:07] SY: Amen.

[00:38:07] JN: ...there's a continual—yeah, there is a continual sense in my life that I'm always thinking about a ton of different things that I wanna solve, whether they're personal things or professional things. And those things are constantly on my mind. That's where that work ethic comes from, and that's how I'm able to do it. And I love it.

[00:38:25] SY: Yeah.

[00:38:25] JN: I really do.

[00:38:26] SY: Your life sounds very full and exciting and full of adventure. (Laughing) So there you go.

[00:38:31] JN: Indeed. Indeed.

[00:38:32] SY: So I'm hoping that this interview has inspired people to maybe get into some data stuff and get into machine learning and hopefully find out that it doesn't have to be scary. We don't all have to build our own databases, right, to be part of your world. (Laughing)

[00:38:46] JN: Right.

[00:38:47] SY: Data-driven apps. What advice do you have for people who might be inspired to try this out and learn more?

[00:38:53] JN: So just dig in. The barriers to entry for tech—I don't want to say they're nonexistent, but they're minimal, right? If you've got an internet connection and a device, you can build stuff. You can code. You can learn. You can explore. And so my advice is one, just dive in. Create something. Build something even if you don't think it's gonna be something that, you know, you can launch into a big company and, and make, you know, millions of dollars off of. Build. And as a corollary to that, to that, build in the open, right? Build that portfolio of how you're learning because you never know who you're gonna inspire to themselves want to code or to learn from you, but also maybe they're gonna join your vision. And maybe you're gonna be able to create something that you didn't think would be possible and certainly probably wouldn't be possible if you were just doing it yourself. Learn as much as you can. That's my very practical advice to developers of, you know, any experience level. 

[00:39:51] SY: Yeah. Oh love that. That's great. So next, we're gonna do some fill-in-the-blanks. Are you ready?

[00:39:56] JN: I am.

[00:39:57] SY: Number one, worst advice I've ever received is...

[00:40:00] JN: So the worst advice I've ever received—and in general it's not terrible advice I would say 'cause the person that gave this advice to me they're, they're a great person. And, and they're one of my favorite people that I've interacted with professionally. But for me, it was bad advice because I remember this person telling me when early in my career when I was trying to chart out my growth plan. And it was like well do I wanna go down a technical path of, of sort of being a technical, a senior technical person on the team?

[00:40:29] SY: Right.

[00:40:29] JN: Or do I wanna go down the path of being a manager and managing people? And this person told me, "well, you've gotta choose. You can't do both." And I say that was bad advice for me—and I think bad advice for technical people in general—because often times the world tries to put boxes around people and to, to fit you into boxes. But yeah that, that was bad advice for me because that person wanted me to put a box on myself. And I just—implicitly, I rejected that because when I wake up in the morning, I don't see limitations, right? I see certainly strengths and areas where I'm drawn and things that I think I'm naturally inclined to and, and things that I'm good at, but I don't wanna place a box on myself and say that I can't do all those things. It's just—it takes dedication, and it takes sacrifice.

[00:41:18] SY: Number two, my first coding project was about... 

[00:41:21] JN: So my first coding project was about—and this was in high school. I was in the International Baccalaureate program.

[00:41:28] SY: Nice.

[00:41:29] JN: So yeah, we took two years of computer science. And the first coding project I did was building an inventory management system. And it was for my IB project.

[00:41:39] SY: Wow. 

[00:41:39] JN: And it was—yeah, it was a system to allow people to check in for a store essentially to check an inventory. And then it has, it had a component for a cash register component where when people would buy merchandise, then the inventory would automatically update. And if inventory on the floor was low, it would automatically alert an associate that they needed to go to the back and restock.

[00:42:01] SY: Wow. That's a real project.

[00:42:05] JN: That, that was a real project. Now the, now whether the execution of the project matches the, the pitch that I just gave you, you know...

[00:42:12] SY: Yeah.

[00:42:12] JN: That, that's another story.

[00:42:13] SY: That was the idea.

[00:42:14] JN: But that, that—yes, that was the intention. Yes. (Laughing)

[00:42:17] SY: How long did you work on that?

[00:42:19] JN: I'd say probably about a year and a half because we spent the first year of the, the CS course learning computer science, learning the, the fundamentals and the basics.

 [00:42:29] SY: Yeah, yeah.

[00:42:29] JN: And then we really started to work on it in year two.

[00:42:31] SY: Very nice. Number three, one thing I wish I knew when I first started to code is...

[00:42:37] JN: So one thing that I wish I knew was that coding is, is subjective. Coding is not just about translating thoughts into computer syntax, but coding is about making decisions. It's subjective in the sense because you can solve a problem in many different ways.

[00:42:54] SY: Yes.

[00:42:55] JN: And you have to decide what is important and what do you need to optimize for. And if I had known that about coding earlier, then I probably would've never said I didn't want to go into computer science. Fortunately, I was able to see the error of my ways and I ended up in the field, but I just wish I had known that earlier.

[00:43:14] SY: Yeah, absolutely. That's one thing that I, I love about coding is realizing like "wow, I get to make a lot of decisions."

[00:43:21] JN: Yeah.

[00:43:21] And as someone who likes to be in control of everything, (Laughing) I find that really...

[00:43:24] JN: Yeah.

[00:43:24] SY: ...appealing.

[00:43:26] JN: It is. It is. Absolutely. 

[00:43:27] SY: Yeah. Very cool. Well thank you so much, Jeff, for sharing all that amazing knowledge about data with us. You wanna say goodbye? 

[00:43:34] JN: Yeah, it was my pleasure. Thank you so much for having me on, and I thank the listeners for tuning in. And thank you so much.

[00:43:40] SY: And that's the end of the episode. Let me know what you think. Tweet me @CodeNewbies or send me an email hello@codenewbie.org. Make sure to check out our local CodeNewbie meetup groups. We've got community coding sessions and awesome events each month. So if you're looking for real-life human coding interaction, look us up on meetup.com. For more info on the podcast, check out www.codenewbie.org/podcast. And join us for our weekly Twitter chats—we've got our Wednesday chats at 9PM EST and our weekly coding check-in every Sunday at 2 PM EST. Thanks for listening. See you next week.

Copyright © Dev Community Inc.