CodeNewbie

Transcript

[00:00:00] SY: Hey, CodeNewbies! Before we start the show, I want to share a little teaser with you of another show I work on called DevNews.

[AD]

[00:00:12] SY: Hi there. I’m Saron Yitbarek, founder of CodeNewbie, and I’m here with my two cohosts, Senior Engineers at Dev, Josh Puetz.

[00:00:19] JP: Hello.

[00:00:20] SY: And Vaidehi Joshi.

[00:00:21] VJ: Hi everyone.

[00:00:21] SY: We’re bringing you DevNews. The news show for developers by developers.

[00:00:26] JP: Each season, we’ll cover the latest in the world with tech and speak with diverse guests from a variety of backgrounds to dig deeper into meaty topics, like security.

[00:00:33] WOMAN: Actually, no. I don’t want Google to have this information. Why should they have information on me or my friends or family members, right? That information could be confidential.

[00:00:42] VJ: Or the pros and cons of outsourcing your site’s authentication.

[00:00:45] BH: Really, we need to offer a lot of solutions that users expect while hopefully simplifying the mental models.

[00:00:53] SY: Or the latest bug and hacks.

[00:00:55] VJ: So if listening to us nerd out about the tech news that’s blowing up our Slack channels sounds up your alley, check us out.

[00:01:01] JP: Find us wherever you get your podcasts.

[00:01:03] SY: Please rate and subscribe. Hope you enjoy the show.

[AD END]

[00:01:18] SY: Welcome to the CodeNewbie Podcast where we talk to people on their coding journey in hopes of helping you on yours. I’m your host, Saron, and today, we’re talking about how to learn data science and machine learning with Jay Feng, Cofounder and Head of Data Science at Interview Query.

[00:01:33] JF: Models themselves are just lines of code that is very poor, but there are ways to basically encapsulate these algorithms into specific ideas.

[00:01:42] SY: Jay talks about how data science got him back into development after some bad coding experiences, how the different tools for machine learning and data science work together and whether or not machine learning is really as difficult as it sounds after this.

[AD]

[00:02:03] SY: TwilioQuest is a desktop roleplaying game for Mac, Windows, and Linux to teach you real world developer skills. Take up the tool of software development, become an operator, save the cloud. Download and play TwilioQuest for free at twilio.com/quest.

[00:02:20] Heroku is a platform that enables developers to build, run, and operate applications entirely in the cloud. It streamlines development, allowing you to focus on your code, not your infrastructure. Also, you’re not locked into the service. So why not start building your apps today with Heroku?

[00:02:37] Cloudinary is an end-to-end image and video management solution with a powerful API that lets developers upload, store, create, optimize, and deliver your media with ease. They have a generous free plan as well as advanced plans with enterprise configurations. So check out Cloudinary today at cloudinary.com.

[00:02:58] MongoDB is an intuitive, flexible document database that lets you get to building and MongoDB Atlas is the best way to use MongoDB. It’s a cloud global database service that gives you all of the developer productivity of MongoDB, plus the added simplicity of a fully managed database service. You can get started free with MongoDB Atlas at mongodb.com/atlas.

[AD END]

[00:03:28] SY: Thank you so much for being here.

[00:03:29] JF: Thanks for having me.

[00:03:31] SY: So Jay, you’ve had a variety of technical roles under your belt, but from my understanding your relationship to coding hasn’t always been an easy one. Can you tell me a bit about your coding journey?

[00:03:42] JF: Yeah, definitely. I started out coding back in college, took like a couple intro to computer science classes, got pretty bad grades in them, actually.

[00:03:53] SY: Oh, no!

[00:03:53] JF: Yeah, that was probably the most disheartening experience, I think, for anyone starting their journey and getting that. I think academic learning of coding is a lot tougher than you’d expect given how they kind of grade you against the bell curve and everything at school. But I really got interested in it. Actually, again, my senior year when I was introduced into data science in class and I restarted just my own like kind of coding journey on the side while I was majoring in electrical engineering, effectively that kind of major allowed me to then pivot into more of a data science path outside of class because I got to take a lot of really interesting classes that are related to coding, and eventually just started doing coding projects on the side and implementing data science classifiers, building models, and just writing a lot of blog posts about it. And that’s kind of how I got interested and started with coding. And after that, it just kind of took off.

[00:04:51] SY: So when you hit some of those road bumps, what pushed you? What kept you going?

[00:04:54] JF: Initially, it was a general interest in this concept of data science. I would say data science back then in like 2013, 2014 was this idea of analyzing data and somehow providing like these grand insights that no one else had access to more would even know about. And at that time, I think when it was kind of blowing up and companies like Kaggle were coming out with these competitions, I think it got really interesting in terms of being able to showcase your analysis and combine this technical aspect with writing, which is something that I really enjoyed. So I think being able to analyze a graph, analyze some data, produce these insights and write about it I think was probably like initially something that I found internally interesting, but also I know externally it was a factor of just trying to find a job like my senior year out of college and kind of worrying about like, “I needed to do something extra for my portfolio that would really kick it up into gear.”

[00:05:57] SY: Tell me a little bit about your career trajectory after school. What was that like?

[00:06:02] JF: Right after college, I actually joined this startup in Silicon Valley and it was called Inflection and basically it was like a people searching service. It’s very interesting. It was like what you imagine as white pages and that you would go to this website and you search someone’s name and basically a list of matching names would come up. It is kind of like those old ads you see for like people in the ’70s like, “Find your high school graduate class of 1975 here,” and that was basically what this company produced.

[00:06:34] SY: Cool.

[00:06:35] JF: Yeah. It was super interesting. It was kind of like this classic Silicon Valley startup experience where we had like a group of 20 new grads and we all joined and the founder was like some very kind of hippy guy, wearing these like toe-shoed sandals and would kind of preach about like the ethics of like the Valley and tech culture and stuff and we’d have like free lunch and massages. It was a very interesting experience because I think after the first like two months, everything kind of came crashing down and I exited the startup after like four months because I haven’t done much.

[00:07:10] SY: Wow!

[00:07:11] JF: Yeah. It was a crazy experience.

[00:07:13] SY: When you say crashing down, what does that mean?

[00:07:16] JF: I think it exploded in a way that was very typical now of many Silicon Valley startups. So I think the first premiere note is that Facebook was obviously the biggest company at that time and still kind of is. We’re building a people search service that is kind of outdated for what is Facebook is actually doing now. Right? You just type in someone’s name and you find them pretty easy. But I think ultimately, it was more about kind of like over-hiring, trying to launch new products that didn’t really work using our existing data and ultimately I think a lack of motivation on many sides. There’s this kind of like classic Silicon Valley trope of everything that is offered, all these free benefits, perks, and everything are a lot of times shielded from the fact that a lot of the stuff that you’re doing is making money in a very straightforward, all profitable way, but definitely not as grandiose of vision as many have come to kind of expect. So I think this was something that was pretty similar to a lot of people’s experiences. After four months, I left to actually join Jobr where I met my current co-founder of Interview Query, Shane, and a lot of other people that I’ve come to highly respect and work with for the next few years at that time.

[00:08:36] SY: And what’s Jobr?

[00:08:37] JF: So Jobr was like a Tinder for jobs, so to speak. So you would swipe right to apply to a job and then swipe left to like pass on one. And it was definitely like a really interesting concept because what we saw was that for the most part, like mobile was taking over at the time, it was 2015, and there were a lot of big players in this space with Indeed, ZipRecruiter, Monster, et cetera. And effectively, I think, the whole goal of Jobr was to basically allow millennials to just apply to jobs really, really easily. We were in fact like the cell phone generation. So everyone was on their phones. And so the idea that you could apply to a job on your phone was essentially like a pretty straightforward concept. And what I was kind of hired in to do was to actually help build like the matching algorithm between user and jobs that they would see. And this was pretty much the core concept of the app because you would essentially sign up, upload your resume, and then suddenly you would get a bunch of jobs. And if you got unrelated jobs, say you were like a student or like a computer science student and you got like nursing and trucker jobs, you would definitely delete the app immediately.

[00:09:51] SY: Right.

[00:09:51] JF: Yeah. But if you actually got relevant jobs, software engineering jobs, then you would actually swipe right and apply. We effectively made money off of each apply that we sent to like our job partners.

[00:10:04] SY: Nice.

[00:10:05] JF: Yeah, it was a pretty straightforward concept, like a great business with good unit economics where we were potentially like pretty close to being profitable, making money off of each user, and I think that was kind of like a really good introduction to startups and just how tech worked in like the industry where startups were kind of like booming in that day and age in terms of everyone wanted to be at one.

[00:10:28] SY: And that company was sold to Monster, which is a pretty big deal.

[00:10:32] JF: Yeah, definitely. It was like a great experience I think for me too, just because I had joined and then shortly after like six months or so we got sold to Monster. So I thought like, “Wow! This is great. I guess every startup is like this.” Right? You just instantly join and then you get acquired.

[00:10:49] SY: So how long had Jobr been around before it got acquired?

[00:10:52] JF: Yeah. So I think Jobr was actually about two years old before they got acquired, which is a pretty great turnaround rate, I think.

[00:11:00] SY: So the time you were at Jobr, tell me about the role you played, because when I think about machine learning, I think of something that’s really complex. It takes a lot of time, takes a lot of resources. And I’m wondering in that time, what were you able to accomplish, to contribute?

[00:11:15] JF: During that time, it was definitely you hit the floor running almost, and I would say that on my second day, I worked with a mentor in terms of shipping something into production. And I thought that was, one, like a pretty amazing experience just personally, because at my last job I had like basically walk the hallways for like six weeks asking for work and nothing had happened. And on this job, I basically got to push something that actually I could see on the app within the second day. So I think you just start off like running and I think having like a mentor is super helpful in terms of being able to pair program with them and like they show you the code base and exactly what you need to do. A lot of the initial work was just taking out and like fixing a lot of these bugs or working on projects that quite honestly, I think no one else wanted to work on because I was a junior programmer. And as a junior programmer, you’re tasked with learning as much as you can just intrinsically and then also contributing as much as you can as well without the greater knowledge that everyone else has. And so a lot of that was just kind of them pointing at stuff and telling me to work on it and then I would try to seek guidance from my mentors or other engineers and then try to basically work on that and ship it out with as fewer bugs as possible. And I think that is generally how most junior programmers should kind of like start out. I think a lot of them do end up like at bigger companies, writing tests and kind of doing some of the grunt work. I think being at a startup, the nice part is that you don’t have to write tests, but at the same time you still have to work on some of the stuff that is the next in line in terms of undesirable kind of work. And so for me, that was a lot of data engineering, a lot of the machine learning aspects that were kind of more complex to implement such as like implementing our search functionality or our matching algorithm from users to jobs and doing so in a way that was well executed and well tested so that we could actually measure like our gains from implementing new algorithms and such.

[00:13:27] SY: So hiring a junior at a startup is interesting because on the one hand hiring a junior developer is very cost-effective is the way that I’ll put it. But at the same time, when you are a startup and you have to hit the ground running and things are always changing and shifting, it can be tough for junior developers to catch up and to kind of stay at that pace because you’re still learning, right? You’re a little bit slower. You need a little bit more time, a little bit more help. Why do you think they took a chance on you as a junior developer?

[00:13:55] JF: Yeah, that’s a great question and probably not something I’ve figured out, even though we still remain pretty close friends. I think the interview process was really interesting and it was almost like it was a test for them because they were using this new functionality on Jobr at the time and I had downloaded the app looking for data science jobs. And I swiped right on a job that was like data scientists at Jobr on the app that I was using.

[00:14:20] SY: Nice.

[00:14:21] JF: Yeah. And I was like, “Oh, okay. They’re dogfooding their product. That’s cool.” But the CEO actually reached out, called and asked me to come in the next day, what was supposedly to check out the office, and I arrived into this room that was basically the size of like a studio with like two by four, a desk in the middle. The engineer is thinking I was like the FedEx delivery guy till the CEO kind of came in and was like, “Oh no, no, yeah, yeah. I remember this. We have an interview scheduled.” And it’s pretty funny. Yeah. I thought I was doing like a walkthrough of the office and instead I just sat down for like a four-hour interview.

[00:14:56] SY: Wow!

[00:14:57] JF: Yeah.

[00:14:57] SY: That’s intense.

[00:14:58] JF: Yeah, definitely. It was not what I expected and I think that’s kind of the continuing mindset and like theme that I’ve gone from most startups since then and that that is kind of more of the norm. Things are hectic. Things aren’t very like well scheduled, but at the same time, everyone’s laser focused. And I think that experience really kind of hooked me in. I think what they were looking for was someone who was generally kind of within that culture, like willing to learn, willing to basically work on anything that they had in the never ending backlog, but to be able to do so and kind of learn it and adapt. And so I think the two things that are really necessary all the time are just being able to actively take initiative towards working on projects and also being able to learn from your mistakes at a quick enough pace so that you can continue to improve. And I think their whole goal is like finding a needle in a haystack. It’s like engineers are very much like undervalued by like these bigger tech companies and I think that’s effectively what most of startup recruiting still is, is just finding these really eager, really hungry, like junior developers that are really interested in working on something and can also prove that they can improve themselves over time after getting feedback.

[MUSIC BREAK]

[AD]

[00:16:38] SY: Explore the Mysteries of the Pythonic Temple, the OSS ElePHPant, and The Flame of Open Source all while learning the tools of software development with TwilioQuest. Become an operator, save the cloud. Download and play TwilioQuest for free at twilio.com/quest.

[00:16:56] No one wants to manage databases if they can avoid it. That’s why MongoDB made MongoDB Atlas, a global cloud database service that runs on AWS, GCP, and Azure. You can deploy a fully managed MongoDB database in minutes with just a few clicks or API calls. MongoDB Atlas automates deployment, updates, scaling, and more so that you can focus on your application instead of taking care of your database. You can get started free at mongodb.com/atlas. If you’re already managing a MongoDB deployment, Atlas has a live migration service, so you can migrate it easily and with minimal downtime then get back to what matters. Stop managing your database and start using MongoDB Atlas.

[AD END]

[00:17:47] SY: So after you left Jobr, what happened?

[00:17:50] JF: I ended up leaving Jobr/Monster after two and a half years and joined another bigger startup called Nextdoor. Nextdoor is kind of like a neighborhood social app in which you have a private community of your neighbors and you can basically kind of talk to them like Facebook or you can post things, talk about community events, share free food, free items in your neighborhood, and essentially just create like a community online and that was also like a great experience. I think it was kind of joining, as a data scientist, more focused on the analytic side of things instead of machine learning, which is more of the programming side, I think. Yeah, I kind of worked there for about a year on like the local division, focusing on real estate, local services, and then eventually the community health part, which is more about kind of content moderation and civility. And that was a great kind of experience in terms of learning more about what it’s like to apply data science at a bigger organization, especially one that it was kind of focused in a growth mode as we had gotten a new CEO, Sarah Friar, from Square and kind of built out like an executive team during the time that I was there.

[00:19:06] SY: So at a company like Nextdoor where it’s essentially social network for neighbors, right? Neighborhoods and neighbors.

[00:19:14] JF: Yeah.

[00:19:14] SY: Where does data science fit in? Because it doesn’t intuitively feel like a data science centric type of organization. So where did that fit in?

[00:19:23] JF: At Nextdoor, we split data science into two parts specifically, which were revenue based parts of the organization, such as ads, real estate, which I was on, and then like local offers, just basically monetization products versus actual engagement and data science kind of plays into product really easily. Initially, I think whenever you launch a product, especially at a small startup, you don’t run a lot of analytics because you are focused on engineering and you have general intuition. But once we get a little bit bigger, I think, as a company, there’s a focus more on creating measured gains for everything. And that means being able to measure the potential impact that you made on the business. And so for like an example of project, I would say we would be wanting to launch like a new feature such as a messaging agent between users and real estate agents. And I think after analyzing some of the data myself, I saw that a lot of the messages that were going to real estate agents, which we always thought was leads or people being like, “I need a house, sell me one right now,” instead actually it turned out to be like the neighbors selling to the agents by posting as like home staging services or like plumbing services or essentially trying to basically sell parts of their business to the real estate agent. And so we definitely saw that as being kind of not an actual good usage of the product. So I was then tasked to do was to build an actual like leads classifier or like a spam classifier in some respects in which if a message came in and it was an actual good lead, we would like send a text message or something that would show that it was actually something that they should respond to quickly. This is coming from the fact that a lot of analytics was run on this before in the sense that if we thought about it, a lot of the times like real estate agents wouldn’t respond fast enough to like these leads and we’d see like 30% of these leads were actually not responded to because the agents were active on the app. And so a lot of this stuff kind of like its insights that you would never really think about. You have all these like prior assumptions about how your users, your customers all use the product, but then what happens is that once you dig into the data, you find out that a lot of this stuff was actually used in different kinds of ways by your customers. And I think being able to then differentiate and create new products from these insights is probably one of the key areas where businesses kind of like need data scientists to thrive as they adapt and grow their business, but also being able to build out new features that are actually important for the end customer without having to run like entire site-wide surveys and basically do something that’s not scalable.

[00:22:19] SY: So you mentioned the term spam classifier. What is a classifier?

[00:22:24] JF: So a classifier essentially looks at data and then essentially puts it into a specific bucket. So for the leads one, basically we took in texts from customers that would be sending messages to agents and we will either classify them as being leads or marked as important or as spam and marked as not important. And I think this was effectively just a way to make sure that before an agent actually saw anything, we could have a model that actually would try to route like the correct messages to the agent or put it into their like other inbox folder. And at its core, it’s just a way for a machine to make decisions for a human.

[00:23:13] SY: So I want to dig into that word model a little bit more because I feel like every time I hear about machine learning I hear two things. I hear, “I made an algorithm and I have a model,” and I don’t really understand what model means in this context. Can you maybe dive a little bit deeper and explore that?

[00:23:29] JF: Yeah. I would say that an algorithm is just a way that you use to effectively like rank or match things together and it’s some sort of like higher level concept that’s generally within something that I can’t even understand at this point, but modeling is I think a little bit easier because it surfaces these algorithms. It’s basically kind of like a placeholder for algorithms in a sense. So if you build a model, the model then consists of a ton of different algorithms that effectively do all this stuff behind the scenes that eventually take your input, which is something like a text message and then output it into one of three different classes and separates it. And I think that models themselves are just lines of code at its very core, but there are ways to basically encapsulate these algorithms into specific ideas in which one algorithm can connect one way of doing it. And then another algorithm can have another way of doing it and models are generally ways to just name like these algorithms into like higher level concept ideas.

[00:24:42] SY: So as the engineer, are you the one writing the model? Is that the right way to think about it?

[00:24:49] JF: Yeah. I would say that a lot of the times the way that most data science orgs are structured are that the data scientists will build a model and then the engineers will effectively implement it.

[00:25:02] SY: Ah, okay.

[00:25:02] JF: And so nowadays, it’s become a lot easier to do this at a higher level, so that both the data scientists and engineers can work pretty closely together without knowing exactly how either way functions. But I think the modeling part, all the tuning, the aspects of where you’re basically trying to make the accuracy of the model better so that you’re predicting more correct like values instead of incorrect values is all focused on the data scientists and then the engineer doesn’t have to worry about that. All they have to worry about is making sure that the model works in production in the web app, so that when a user does send like a text message, it gets outputted to the real estate agent correctly.

[00:25:46] SY: So after Nextdoor, where did you go? What happened?

[00:25:49] JF: Yeah. So after Nextdoor, I actually decided to start my own company called Interview Query, which you mentioned in the very beginning, it is effectively like a data science interviewing platform for people that are interested in becoming data scientists and data analysts and machine learning engineers. And it’s definitely been like an interesting time in the life of the past seven and eight months or so in terms of just working on the solo project with another cofounder, Shane. What we’ve done is we’ve tried to bootstrap it from scratch and not taking any kind of venture funding and just try to build like a small business on its own that can stand against all the other startups and kind of carve a niche for us within helping data scientists basically get jobs.

[00:26:39] SY: So what is your role at Interview Query?

[00:26:42] JF: I am the CEO at interview Query and I work on effectively the product and marketing and then a little bit at the engineering side. Shane, my cofounder, works more on the technical side, on building the platform itself. My job is to essentially curate and create the best content out there for data scientists and data analysts that are looking to either get into the field without any experience or to just prepare for their next interview, if they’re already an experienced data scientist. And the way that we do that is just by surfacing practice problems and interview questions previously asked at different tech companies, just so that they practice and learn, as well as provide courses that help them understand the material and kind of learn data science through practice problems. Yeah, for us, I think we find it to be a really interesting pursuit because at our core mission, like we want to become a data science company that uses data science for our courseware. And so for us, our goals are to basically build engineering and build data science into the product to eventually become much more adaptive in terms of learning and can kind of tailor basically your studying practice based on your prior needs. Kind of like how we did so at Jobr where you’re basically taking a user and you’re just finding out the most optimal ways to tailor practice for them just like you’re trying to find the most optimal jobs for someone who has a resume.

[00:28:14] SY: So I want to switch topics to talk about learning about machine learning. So I think a lot of people, including myself, are very intimidated by machine learning because it’s such a powerful tool. It feels so complex. It feels like something that takes years and degrees to learn. Is it as hard as people think it is?

[00:28:33] JF: I definitely do not think so. Starting out, I think machine learning has become so much democratized in the past few years to the point where you can find millions of tutorials online, just teaching you how to initially just create a classifier. Kaggle has done a great job of broadening the scope of machine learning to the overarching field. And so I think machine learning, while it seems intimidating because of its association with artificial intelligence and like Google artistic images that got created from like AI bots of like a zebra like in impressionistic form, I think that is like a level of degree that like kind of like the one percent are kind of doing. So it’s kind of like the intimidation when you go on Instagram or something or you go on YouTube and you see all these like pro skateboarders and snowboarders that are doing like massive tricks. What you do in machine learning can be very, very easy and can be applied to many different things, like now with just like high level packages and libraries that basically can do a lot of the work for you. But a lot of the intimidating parts I think machine learning are just that kind of research based stuff that is very advanced, but at the same time, like you or me could write like five lines of code right now, given like a dataset and make a classifier, and it would take you about like five minutes.

[00:30:02] SY: So what are some of the tools and technologies I might want to be familiar with? What did you use at Jobr, Nextdoor, Interview Query?

[00:30:10] JF: I would say for getting started in data science, I think Python, R and SQL are probably like the best bets you need to just understand. I think three languages is a lot though, but I think starting out with Python is super useful. Each tool is really useful for different things. I think SQL is super useful once you get on the job. In terms of just learning machine learning, learning data science, I think Python is probably one of the best tools out there because it’s fully integrated with, I think, all aspects of data science now. Before it used to be just R was used for analyzing data and used within the academic circles and then Python was used for like machine learning and implementation and then Spark was used for like deploying big data on the cloud. But nowadays I think Python has kind of figured out how to basically get all of those things. So you can analyze your data, you can store it in a database, you can create machine learning algorithms all within Python and it’s become this fully kind of like really well built out tool by like the community that has really kind of embraced data science with Python nowadays.

[00:31:24] SY: So of those tools and technologies that you mentioned, I’d love to kind of walk through an example of how they actually work together. So if we go back to maybe that spam classifier we talked about might be a good example. If we were to go through that and figure out where Python and SQL, where all those things fit in? Can you kind of walk us through that process?

[00:31:46] JF: Starting out, all your data exists in a SQL database. And so all the messages from the users, all the user attributes, such as like a user’s name, how long they’ve been on Nextdoor, and then all the real estate agents, everything exists in SQL. So generally how it starts out is that you write a pretty complicated SQL query to get all your data out. You try to get your data out in so that every single row is effectively like a data point. And so for the text classifier, that means that every single row is essentially then a message from the user to the real estate agent. So once you get all your data out, the most common analysis is then to take that data and put it into Python to analyze it. And I think this is where you do more of an “Exploratory Data Analysis”, otherwise known as like EDA in which basically you’re just looking at your data and trying to understand what it looks like. And this is really crucial for the pre model building process because you need to understand exactly how your data looks in terms of understanding more of like the general scope. So for example, if I looked at the data and I saw that every single message set from the user to the real estate agent was completely spam, then that would basically give me the insight that we needed to fix this product. We don’t even need to build a classifier. We just need to somehow stop users from spamming real estate agents. Another example is like if I looked at this and I saw that the times in between 12:00 AM and 7:00 AM where the times where most of the real estate agents, like 99% of the messages were all spam, then that would give me insight that the time that the message sent is like an interesting feature that we should pull out and create as its own column for more analysis for the model. So after you do all this analysis, generally you clean the data up a bit as well. Basically you take out punctuation, you take out outliers, and then now you have a clean data set with all the features that you want as columns, which are essentially the attributes that go into the model and then you do the model building aspect, which is pretty much now calling like two lines on like Scikit-Learn to basically build and train your model, which is I think more of the traditional data science step of taking your data set and creating a model out of it. Once you have your model, you can analyze it, check it for metrics, check the accuracy rate. If half of the messages are spam and your model classifies only half of them correct, then you’re not doing a great job there because then your model is basically like someone randomly flipping a coin. So a lot of it is about checking if a model is doing better, then what random chances are just a general baseline is. After you’ve finalized that and created a good enough model, generally, then you either hand the model off to an engineer or I guess the Nextdoor’s case, the data scientists actually had to do the part of the implementation themselves, which is basically taking this file, which is the model, and then creating like a serving application for it in which you have like a machine learning kind of instance or box in the cloud that has its own API. You basically hook it up to this API so that it takes in like inputs, which would be the example features that you initially did from a user such as like the time the message was sent and like the text message. And then essentially once you feed that into what you call the API, the API should output something that is basically like a one or a zero for spam and then that information then goes back into the actual web application so that whenever now a new user messages a real estate agent, they’ll go back like a one or zero.

[00:35:39] SY: Wow! That was very thorough. Thank you very much. That was great. Exactly what I wanted. That was wonderful.

[00:35:45] JF: That’s great.

[00:35:56] SY: Coming up next, Jay talks about some of the biggest mistakes that people make while trying to learn machine learning after this.

[MUSIC BREAK]

[AD]

[00:36:14] SY: Images and videos are the heaviest resources users have to download when they use your site or app. Achieving great experience with media was a complex and tedious task. But with Cloudinary, there’s a powerful API that employs sophisticated algorithms, machine learning, and automation to deliver great media experience on any device for any framework in web or a mobile app. At Dev.to, we’ve been impressed with Cloudinary’s serverless API platform and recommend it to developers as the one and only solution they can use to solve all their media problems.

[00:36:49] Over nine million apps have been created and ran on Heroku’s cloud service. It scales and grows with you from free apps to enterprise apps, supporting things at enterprise scale. It also manages over two million data stores and makes over 175 add-on services available. Not only that, it allows you to use the most popular open source languages to build web apps. And while you’re checking out their services, make sure to check out their podcast, Code[ish], that explores code, technology, tools, tips, and the life of the developer. Find it at heroku.com/podcast.

[AD END]

[00:37:29] SY: In terms of learning these tools and technologies, where should I go? What should I do?

[00:37:33] AJ: I would say that I think there’s a lot of really good courses on Coursera and then also Kaggle has like a really good kind of introduction kind of beginner to machine learning as well. I would say that in general, I think finding like specific blog posts through Google is probably like the best bet that I found in which that the way that I’ve always found like learning data science the easiest is just having a problem set or like a problem that I need to face such as doing some sort of like prediction based on a data set that I have or I scraped off the internet and then being able to then just kind of Google how to solve the problems that I need once I actually face them when I’m actually analyzing the data. And that has kind of helped me speed up my own learning process the fastest because at that point, you’re not really pigeonholed into like this huge scope of a problem of like how do you learn machine learning. It becomes a lot easier once you break it down, once you’re facing like a general problem that is a little bit more specific towards like something that you actually have right in front of you.

[00:38:42] SY: What are some of the biggest mistakes people make when they’re trying to learn machine learning?

[00:38:47] JF: I would say the biggest mistakes that people make would be that they do like the course or they find a tutorial on Kaggle and then they do one prediction algorithm and then they think they’re data scientists for life or just qualified to be data scientists. And I think the issue comes with a lot of the coding parts can be fun and generally pretty ubiquitous, but once it gets to the actual practical application in terms of generating value for a business or for doing something that actually creates new knowledge in terms of analyzing publicly or scraping data. I would say that that’s where it becomes a little bit more difficult because the actual tying of those two things together takes time and experience. And I think in terms of learning data science, I think it’s really good to like have those fundamentals down after you do the course or you do like that first dataset, but just understand that there’s so much more to learn and there’s so much more in terms of practical experience that you can get. And so it’s all about continuing to push yourself in terms of learning more things in that respect.

[00:40:03] SY: And what are your favorite tools and resources to become a data scientist?

[00:40:08] JF: Well, I guess Interview Query is a great resource from my perspective, but I would also say that I’ve seen a lot of good stuff on specifically DataQuest or DataCamp because they allow you to actually go into the code editor and work on specifically datasets and data challenges. So I think those have been very useful. Additionally, I think YouTube is like a very underutilized resource because there are so many different kinds of full stack data science projects on YouTube and I think being able to look through all of those is super helpful in terms of understanding the content and understanding what people are doing and understanding like the thoroughness of how they go through these projects.

[00:40:58] SY: Now at the end of every episode, we ask our guests to fill in the blanks of some very important questions. Jay, are you ready to fill in the blanks?

[00:41:05] JF: Yes, definitely.

[00:41:07] SY: Number one, worst advice I’ve ever received is?

[00:41:10] JF: To buy shares at my first company.

[00:41:13] SY: Ooh! Tell me about that. Interesting.

[00:41:16] JF: Yeah. I actually didn’t end up going through with it, but I think generally some people were bullish and I think that was quickly proven wrong after like a year or so.

[00:41:28] SY: Number two, best advice I’ve ever received is?

[00:41:31] JF: To focus on key points and keeping it simple when communicating insights or any kind of work. I’m pretty bad at this, but I can ramble on forever, but I think a lot of people also do that when they started out and being able to keep it simple, because at the end of the day, there’s only three things in a TLDR that people need to know. I think that has helped me for like my entire career ever since I got out of the college.

[00:41:59] SY: Number three, my first coding project was about?

[00:42:02] JF: Seattle’s rental analysis and this was analyzing the data on Craigslist on Seattle rents and I think that was kind of a blog post that blew up and really helped me get my first job because it got media attention and just having that kind of blog post analysis out there was definitely kind of just proving that data science is interesting, that I could do it, and it was helpful for just every kind of career building.

[00:42:35] SY: Number four, one thing I wish I knew when I first started to code is?

[00:42:39] JF: Definitely how much funner it is than it is in class, I think. Yeah. And I think this is a given for everyone that meets it in academia versus tries it on their own, but I think being able to try coding projects and data science projects outside of class or just a formalized kind of learning environment is super important just in terms of your own growth because it’s intrinsic motivation, which is something that no one can take away from you versus external motivation from the classroom.

[00:43:14] SY: Well, thanks again for joining us, Jay.

[00:43:16] JF: Thanks for having me.

[00:43:24] SY: This show is produced and mixed by Levi Sharpe. You can reach out to us on Twitter at CodeNewbies or send me an email, hello@codenewbie.org. Join us for our weekly Twitter chats. We’ve got our Wednesday chats at 9 P.M. Eastern Time and our weekly coding check-in every Sunday at 2 P.M. Eastern Time. For more info on the podcast, check out www.codenewbie.org/podcast. Thanks for listening. See you next week.

Season 13 EP 8 September 21, 2020

How to get into data science and machine learning Jay Feng

Description

Show Notes

Transcript