220px jason scott %282017 portrait%29

Jason Scott

Co-Founder Archive Team

Jason Scott is the co-founder of Archive Team who speaks passionately on the never-ending and critical saving of online history. He has been a video game art director, unix administrator, documentary filmmaker and public raconteur.

Nikolas guggenberger headshot

Nikolas Guggenberger

Executive Director Yale Information Society Project

Nikolas Guggenberger is the executive director of the Yale Information Society Project and a Lecturer in Law at Yale Law School. His research focuses on the intersection of law and technology, specifically platform regulation, antitrust, and privacy.

Josh puetz

Josh Puetz

Principal Engineer Forem

Josh Puetz is Principal Software Engineer at Forem.

Description

In this episode of DevNews, we cover companies rescinding job offers after they have been accepted. Then we speak with Nikolas Guggenberger, executive director of the Yale Information Society Project, about Justice Clarence Thomas arguing for categorizing some digital platforms as utilities and why this is a huge deal for the tech world. Finally, we chat with Jason Scott, co-founder of Archive Team, about their efforts to archive Yahoo Answers which is shutting down after 16 years.

Show Notes

Transcript

Printer Friendly Version

Hey CodeNewbies, since we’re in between seasons, we wanted to share with you the first episode of this season of on of the other shows I host, DevNews. In this episode we talk about we talk about companies rescinding job offers after they have been accepted, Justice Clarence Thomas arguing for categorizing some digital platforms as utilities and why this is a huge deal for the tech world, and a team archiving Yahoo Answers, which is shutting down after 16 years. I hope you enjoy!

[00:00:10] SY: Welcome to DevNews, the news show for developers by developers, where we cover the latest in the world of tech. I’m Saron Yitbarek, Founder of Disco.

[00:00:19] JP: And I’m Josh Puetz, Principal Engineer at Forem.

[00:00:22] SY: This week, we’re talking about companies rescinding job offers after they’ve been accepted.

[00:00:27] JP: Then we speak with Nikolas Guggenberger, Executive Director of the Yale Information Society Project, about Justice Clarence Thomas arguing for categorization of some digital platforms as utilities and why this is a huge deal for the tech world.

[00:00:41] NG: I think it could create a completely new equilibrium in terms of what kind of functions these platforms take on.

[00:00:51] SY: Then we chat with Jason Scott, Co-Founder of Archive Team, about their efforts to archive Yahoo Answers, which is shutting down after 16 years.

[00:01:01] JS: If you look at the full story of Yahoo, if you spoil it, it’s basically two guys, create a directory of the internet, gather as much data as possible, and then delete it.

[00:01:13] SY: So during our little time off in between seasons, there were a couple of troubling tweets I saw from pretty prominent developers talking about tech companies who rescinded some candidates' job offers. So first, there was a tweet by Chloe Condon, Senior Cloud Advocate at Microsoft, who responded to a tweet thread by CEO of Fast, Domm Holland, which started off by saying, “We grew @Fast from 2 to 120 plus people in less than 18 months with an exceptionally high talent bar, primarily taking people from the top 1% of major tech companies or companies that have built incredible products and scale some things that worked down,” pointing backhand index, to which Condon replied with several tweets saying, “Fast recently pulled an offer from my friend. Personally, I’m not a fan. Like, offer signed 2 weeks’ notice at the previous job given. I’d love an explanation because I’m pissed. This would have been their first job in tech from a non-tech background. Imagine getting offered a life-changing amount of money in an offer, giving your 2 weeks on a Friday and waking up on email with a revoked offer on Monday. This ain’t it, @Fast. Do better.” We reached out to Fast for comment and they replied, “This appears like it’s coming up because of a personal connection one of the hosts has with the mentor of this individual. There’s a conflict of interest there, and it seems unethical to pursue a story given that connection.” So clearly, we don’t agree with that statement, which is why we are talking about it, but that is our official statement and it’s only fair that we share it with all of you. So now let’s move on to a tweet by Laurie Barth, Senior Software Engineer at Netflix, who wrote, “Stop rescinding offers. It’s one thing if a candidate is hostile during negotiation, or really trying to drag things out, but so many of these stories are first-time developers trying to negotiate like we tell them to. What a horrible way to treat people.” So this is in response to companies pulling out of an offer after the candidate negotiated higher than they wanted, instead of just staying firm with the original offer. So I’d always heard of this as a fear that could happen with job offers, but I don’t think it actually did happen. So I looked at the website Blind, which has company reviews, and there apparently tons of posts about different tech companies rescinding their offers to candidates, and a lot of the similar ways as the two that we just talked about. Now these are anonymous reviews. So we have to take them with a grain of salt and there’s a little bit of trust in there, but it does seem to shine a light on something that’s a pretty sucky thing to do. So is this new to you too? Or is this like a thing I just was never aware of?

[00:03:44] JP: I don’t know, maybe I’ll chalk it up to like urban legends. I think everybody has a friend, right? Everybody has a friend of a friend who had an offer rescinded. But to hear about this stuff happening, like much more frequently than I thought it would have occurred on Twitter recently, like in the last couple of weeks is really surprising to me.

[00:04:04] SY: I think I’m also surprised because we keep hearing about how developers in demand and everyone needs a software engineer and there’s not enough developers. It feels like we’re the ones in power. You know what I mean? In that sense. And so the idea that you would go through, and let’s be real, like job interviews are intense, right? Especially tech job interviews. There’s a code assignment, there’s sometimes pair programming. Sometimes there’s a whiteboard. There’s like a six-hour interview with like 10 team members. It’s intense. There’s a lot that goes into it. And so I think what’s really surprising to me is it’s bad enough that the candidate has to go through that, but it’s also a heavy lift for the company as well. They’ve invested a lot of time into that process. So I guess what’s confusing to me is you invest all that time into the process on both sides. You get to the very, very end. You’ve picked what I think is probably your favorite person. Right? And then you’re like, “Okay, I’m ready. We’re done with this month’s long process and we found the right person.” And I would feel like you would do not anything to keep it, but that it would take a lot to break it. You know what I mean? Just given the investment and given how hard it probably was to find that person. So the idea that all it takes to kind of break that agreement is simply one person asking for more money. I just don’t get that. I just don’t understand the logic behind that. It’s incredible.

[00:05:24] JP: Yeah. So some of the job interview experts and HR experts that I’ve read postings from on this topic, they said a lot of times offers will be rescinded for one of two reasons. And that could be that the company’s financial situation has changed, like, “We were going to hire 10 developers and now, oops, we have no money. So we’re not going to hire any developers or we just were acquired.” This has actually happened to me in the past where I got a job offer and then the company announced it was being acquired and everything was put on hold. So something could have happened behind…

[00:05:58] SY: Hiring freeze. Right?

[00:05:59] JP: Yeah. Yeah. Something could have happened behind the scenes with the company. The other reason that a lot of these job hiring experts pointed out was candidate behavior post offers. So that could be like you talked about the offer publicly. You did something that got you in trouble publicly that could possibly reflect bad. Maybe you’re Twitter’s bad character of the week. I mean, I don’t know. Something like that, but negotiating the offer? Oh, actually one thing that a lot of these experts did point out was coming back to the company after you had negotiated and then wanting to renegotiate, once you had an accepted offer, wanting to renegotiate. That was also kind of like…

[00:06:45] SY: Yeah. Yeah. Yeah. I get that. It’s a little grimy.

[00:06:47] JP: Yeah, a little grimy, but none of them ever pointed out like just trying to negotiate would be grounds for having an offer rescinded. It really breaks a lot of the advice that job seekers have had over the years to say, “Advocate for yourself. Stand up. Negotiate. It’s okay.” The worst that can happen is that a company will say no and that you’ll have to accept or evaluate the initial offer they gave you. If the worst that could happen is the offer gets taken away, that’s really bad.

[00:07:18] SY: Yeah. Actually I’m thinking that there is a third example that I’ve seen at a tech company. I can’t remember which one it was, where they made an offer. The candidate said, “I’d like to negotiate and also I would like a little bit of time to consider the offer.”

[00:07:33] JP: Right.

[00:07:34] SY: And they rescinded. And the reason they rescinded is they said, “Well, if we’re not your number one pick, then it’s probably not a good fit.” And I was like, “That is ridiculous.”

[00:07:43] JP: Yeah. And so many situations that I’ve heard about, I definitely feel for the candidate. I would feel terrible if this happened to me. I mean, that’s a huge fear I have, negotiating any job position, is that they’re like, “Oh, God! They’re going to take it away.” They’re just going to be like, “Oh, you’re terrible. We’re taking it away.” On the other hand, you got to think like, “Did you dodge a bad situation?” What else is happening at this company that they have an attitude, like we need to be your topic, you need to take what we give you or it’s deal off?” That’s not a great attitude to start…

[00:08:16] SY: That’s not a great sign.

[00:08:17] JP: Yeah.

[00:08:17] SY: That’s not a great sign. Yeah. And as an employer, I’ve experienced kind of this feeling of the other person over negotiating and kind of negotiating so far outside the scope of what we think the job is worth. And I was just kind of like, “Maybe you’re not sure the right person for this job. Maybe you need to apply for a job that you think is a better fit because what you’re thinking is not what we’re thinking.” You know what I mean? There was such a huge gap between the offer and the counter that I was genuinely thinking. It feels like you’re applying for a different job than what you were actually applying for. And maybe there’s a better fit for you. So that’s kind of how I felt about it, but my response wasn’t, “Therefore, no, thank you.” My response was to clarify that. So we ended up getting to an agreement and it was fine, but I just can’t imagine just coming back and being so offended, so upset that I would just stop altogether because there’s a cost to stopping. Right? Because now you have to restart. It just seems really expensive.

[00:09:18] JP: Yeah, I actually do see this kind of behavior, not so much in the job market, but in buying a house. You a lot of times will encounter situations where the seller is offended because it was such a low offer or taking a lot of things emotionally. Same thing with the buyers, they'll be offended.

[00:09:38] SY: It’s very emotional. Negotiation is very emotional. Yeah.

[00:09:41] JP: Negotiation is incredibly emotional. I think, in our industry especially, we’ve been conditioned to think about these things dispassionately, like it’s just business, it’s not emotional, it’s not personal.

[00:09:52] SY: Biggest lie.

[00:09:54] JP: Yeah. As a founder, you can attest. It’s very emotional. It’s very personal. So I think a lot of times emotion can be playing into this from either side. I think the last thing I would just tell people is that we should clarify, this is a very US centric discussion.

[00:10:08] SY: That’s true. That’s very true. Yeah.

[00:10:10] JP: It’s legal. That’s like the scariest thing. It’s very, very legal. Employment is at will in most US states. So even if you have a signed offer, either party can walk away for whatever reason. I’m not saying it’s good form. And if you are accepting job offers and walking away from them regularly, or as a company if you are giving job offers and rescinding them, that will get around in the tech community. But unfortunately, it’s legal.

[00:10:40] SY: Yeah, absolutely. I still think people should negotiate. I still stand by that.

[00:10:45] JP: Absolutely.

[00:10:46] SY: But it is just good to know that it looks like there’s some risk and there’s apparently a little more risk than I thought that there was. So good to keep that in mind.

[00:10:56] JP: And in tech regulation news, Supreme Court Justice Clarence Thomas has argued that digital platforms such as Twitter, Facebook, and Google should be categorized and regulated as utilities, like water and electricity. This comes after the huge decision by social media platforms like Twitter and Facebook banning former President Trump after the January 6th attacks on the Capitol Building, which led to the deaths of five people. In his recent Supreme Court concurrence, Justice Thomas writes, “As Twitter made clear the right to cut off speech lies most powerfully in the hands of private digital platforms. The extent to which that power matters for the purpose of the First Amendment and the extent to which that power could be lawfully modified raises interesting and important questions.” We’ll put Justice Thomas’ full statement in our show notes. To help us shed some light on what the impact of this position by Justice Thomas is, we are joined by Nikolas Guggenberger, Executive Director of the Yale Information Society Project, an intellectual center at Yale Law School after this.

[MUSIC BREAK]

[AD]

[00:12:11] JP: RudderStack is the Smart Customer Data Pipeline. It makes it easy to build event streaming, ETL, and reverse ETL pipeline. It’s warehouse first. RudderStack doesn’t persist any of your data. It builds your customer data lake and your identity graph in the data warehouse and it’s open source. Sign up for free at rudderstack.com and give them a star in GitHub.

[00:12:30] Scout APM is the leading edge application performance monitoring designed to help developers quickly find and fix performance issues before the customer ever sees them. See why developers call Scout their best friend and sign up for your 14-day free trial today at scoutapm.com/devnews.

[AD END]

[00:12:48] SY: Here with us is Nikolas Guggenberger, Executive Director of the Yale Information Society Project. Thank you so much for joining us.

[00:12:55] NG: Thank you so much for having me.

[00:12:57] SY: So tell us a bit about your legal background and expertise.

[00:13:00] NG: So I’m a lawyer by training. I’m mainly working on issues relating to platform regulation and I look at platform regulation from an antitrust angle, from a competition policy angle, and also from a private law angle mainly, but I am deeply invested and interested in the legal frame of the digital public sphere, the part of the law that shapes the incentives and the structures of the digital public sphere that creates and maintains online platforms and really has led to the situation in which we are right now with a rather concentrated market of digital platforms.

[00:13:39] JP: Tell us a little bit about the Yale Information Society Project.

[00:13:42] NG: The Yale Information Society Project or ISP is a center at Yale Law School that was founded in 1997 by Jack Balkin. And it’s a center that’s dedicated to free speech, specifically free speech online, and everything that relates to law and technology. Our fellows cover a broad array of topics ranging from algorithmic discrimination to autonomous weapon systems, from content moderation to questions that relate to copyright online.

[00:14:17] SY: So can you help us sum up some of this argument that Justice Thomas put out in regard to regulating some digital platforms as utilities? Help us understand that a bit.

[00:14:27] NG: So this was really something that brought Twitter, at least the interest that circles on Twitter to a boil, so what Justice Thomas here did is he issued a concurring opinion in the case that was originally the dispute between the Knight First Amendment Institute and then-President Donald Trump. The question was whether or not President Donald Trump was allowed to Moderate his Twitter feed, meaning define whether or not people could comment on the tweets that he sends, whether or not people can actually see these tweets. And the then ruling was that actually what President Trump had created was a so-called public forum, a place for public debates that is structured and maintained by the government. And that therefore he’s not allowed to moderate that at will. To the contrary, he has to tolerate that people subscribe to his feed, that people comment on his tweet and so on. And what then happened is the election happened and a new administration came in and that’s why the case became mute. It’s no longer relevant. Trump was no longer in office. And the court recognized that muteness of the case, but Justice Thomas issued an opinion that went way beyond that issue of the muteness. What he did is he broadly declared that he thinks certain digital platforms should be treated as infrastructure and that we should think about giving people access to these digital platforms in a similar way as we would give people access to infrastructure. And he saw a contradiction between declaring that President Trump was prevented from moderating, even though he has only minor control over his own feed and Twitter on the other hand is actually allowed to moderate despite the fact that Twitter is able to define that communicative space online. That’s what his concurring opinion was about.

[00:16:52] JP: So what would it mean for these digital platforms hypothetically if they were to become public utilities? And what would that mean in terms of regulation?

[00:17:02] NG: So I think we need to take one step back and just clarify that under the current understanding of the First Amendment, but also just the way those platforms are structured. It is the case that these platforms may moderate any sort of content on their platforms by and large at will. So they can take down content. They can ban users. They’re not limited in doing so by the First Amendment. And that is because they are private actors. They’re not state actors. So the guarantees of the Constitution, the First Amendment, so the logic goes, does not apply to them. Now what does it mean if we were to change that calculus? It’s a very, very hard question because I think it could create a completely new equilibrium in terms of what kind of functions these platforms take on. And when you look at the opinion issued by Justice Thomas, then I see at least three ways to interpret that opinion. I think one is just a lack of understanding of how these platforms work. That is sort of the easiest interpretation of what’s happening, that justice here just doesn’t realize that this type of moderation is necessary for the functioning of online discourse. That’s kind of a mean way of characterizing the contribution by Justice Thomas. The next way of interpreting things that I see is like a cynical interpretation. It would be to say, “Well, what Justice Thomas is doing is pretty much the same as what some politicians, specifically on the political right have done, namely to implicitly claim that there’s some sort of bias in the way these platforms moderate content that there’s some sort of anti-conservative bias.” Now that bias doesn’t exist, but still it is leveraged by some to support an argument that calls for less moderation and more of an infrastructure like treatment of these platforms. In that sense, one can understand an opinion like that as sort of political retribution or retaliation. It’s like because Facebook is acting or because Twitter is acting in a certain way, government should pursue certain types of regulation. That’d be Interpretation 2. Interpretation 3, I think is the most interesting one. To understand this third interpretation, let’s assume the best. Let’s try to interpret it the best way possible and the best way possible to interpret that is to say, “If we turned platforms into common carriers, treated them as infrastructure, that doesn’t necessarily mean that there is no moderation. It just means that it’s not the platform’s moderating.” And that’s an important distinction to make. So the most beneficial interpretation of this take would be to say, “Well, Justice Thomas just thinks that we should disaggregate things, implicitly things that we should disaggregate the functions of the platforms.” And one way to disaggregate the functions of the platforms would be to say, “It shouldn’t be the platform that decides over the banning or the takedown of content. It should be a different entity.” And more than happy to dive deeper into that sort of third, what I think is the most interesting interpretation.

[00:21:00] SY: Absolutely. Yeah. Do you want to go ahead and dive in?

[00:21:02] NG: So what that could mean is it could mean that we have a platform that functions more like a dumb pipe, like something that just transmits information, like an internet service provider operating according to principles of network neutrality, maybe operating like a telephone company. And then you have different entities that apply filters to that content based on the preferences of the users. And this actually very closely resembles some of the recent suggestions for reform of the digital platforms. So there’s, for example, a paper from a group of researchers from Stanford and Duke that suggests something that they turn middleware. That’s an idea where the platform, meaning Facebook or Twitter, is not necessarily the only provider of content moderation and filters, but it is a platform for other companies to offer filtering mechanisms and content preferences. What that would mean is that you could subscribe to a certain filter, I don’t know the children’s safety filter, or you could subscribe to a filter provided by a media company, The New York Times filter, or you could subscribe to whatever other filter is offered to you. And that filter would take on the content moderation function. And at the end of the day, these suggestions are very, very similar to, I guess, the most beneficial interpretation of what Justice Thomas could theoretically implicitly have meant.

[00:22:59] SY: So tell me a little bit about the wider impact that you think this will have on the tech world as a whole. You mentioned that Justice Thomas was referring to just a few tech companies, Twitter, of course being one of them, but how do you see this resonating or this impacting the tech industry on a bigger scale?

[00:23:21] NG: That very much depends on the extent to which you would want to apply these common carriage obligations that Thomas mentioned here. And the reason why that matters most is because the question that’s important to Twitter and to Facebook is not really whether my or your tweet or share or like is displayed on the platform. What they care about is their core business model and the core business model rests on advertising revenues and to some extent the monetization of data, but to the largest extent, the advertisement revenues. And so it very much depends on whether you apply this type of thinking only to content or certain aspects of content or whether you apply it also to the money-making machine behind these platforms. The latter is what would really matter and would really change the calculus for these platforms. If it only, and I put “only” in invisible air quotes here, if it only concerns the content, then it might change the political calculus for the platforms as they would no longer be responsible for what’s happening on the platform, but it wouldn’t necessarily as much impact the economic calculus with one asterisk, and that is that the functioning of the platform would still be guaranteed. Why I mentioned that is because one consequence of just applying common carriage rules without creating sort of a market for filters or what others have called middleware would just not work because Facebook and Twitter would be flooded with content that nobody would want to see. And therefore, that type of social media just became unusable. And as a consequence, of course, Facebook or Twitter, couldn’t monetize the communication platform any longer either. So as long as whatever regulation or approach follows from that ensures that there are ways to filter and to moderate content online. And as long as it doesn’t touch the advertisement side of things, then the economic consequences are, I think, limited. If it were to extend to the advertisement side of things or if it were to kill the functionality of the platform, then regulation like that could be a real threat to platforms.

[00:26:13] JP: So obviously this is a statement of Justice Thomas’s personal opinion and it’s not an opinion in a Supreme Court case, but what do you think the likelihood is that someone will try to pass legislation that would turn this opinion into law?

[00:26:29] NG: Answering your question in a very narrow sentence, there will certainly be people who will try to pass that. Answering your question in the broader sense is whether or not such an approach could find the majority in the House and/or the Senate and could actually become law at a national level. That I very much doubt, at least at the moment. What we might be seeing and what might be more realistic in political terms is if we focus not on communication platforms but on other types of online platforms, that’s where I see more potential. So let’s say Amazon and to question whether or not merchants should have access rights to sell things online, or when it comes to app stores, which I think take a position that’s somewhere between an e-commerce platform like Amazon and a speech platform like Facebook. And the other area where we might see something actually happening, that could be the state’s level, and then of course it would make its way through the courts.

[00:27:37] JP: To your knowledge, has a Supreme Court justice’s personal opinion or statements ever led to a law change like this in the past? I understand you’re not like a Supreme Court historian, but I guess I’m just curious if there’s any kind of precedent for Supreme Court justices calling for legislation and then it actually being enacted.

[00:27:59] NG: I think that’s an important point here to distinguish. Whatever Justice Thomas’s personal opinion is on whether that regulation should be passed is sort of irrelevant. His take is more interesting in terms of what he thinks the Constitution could tolerate if Congress were to take action. So the legal change here would not stem from at least the way Thomas sketches it out would not stem from the Supreme Court. The Supreme Court, at least in his opinion, would then just potentially tolerate the new legislation passed by Congress. But just a turn in the jurisprudence of the court wouldn’t have an immediate consequence. It would take Congress to pass common carriage like regulation or it would take an agency to create a framework like that or it would take a state to follow that route.

[00:29:05] JP: Right. I guess my question then is why do you think he’s making these statements? Obviously the role of the Supreme Court is to give us rulings about what is constitutional and what is not and him stating an opinion isn’t really meant to encourage the legislature to make such a law, what do you think his goal is in making statements like this?

[00:29:29] NG: Well, obviously it’s very hard to look into a justice’s head from the outside.

[00:29:34] JP: I’ve asked you to be a historian and now a psychic. I’m sorry.

[00:29:37] NG: Joking aside. So what he could be doing is it could be meant as a signal to the players that he thinks are most likely to pursue an agenda like that and to create legal arguments for them to make the case that whatever they’re doing is not unconstitutional. Imagine you’re a state legislator and want to pursue an agenda like that. I mean, you’re definitely going to take that opinion and attach it to your bill and say, “Look, what I’m trying to do does not run counter to the First Amendment at all. It’s actually encouraged by the Supreme Court or by one of the justices.” So that it can be a very powerful argument in political debate and Justice Thomas is obviously aware of that. Another potential reason could also simply be that he’s stating his opinion and that he thinks that this is the way to go and he knows that by putting something and in an opinion, he has a huge platform. It becomes an official document. And so if that is your opinion, then his most impactful way of articulating that personal opinion is to put it into a court opinion.

[00:30:59] SY: Yeah, that makes sense. Well, thank you so much for being here.

[00:31:03] NG: Thank you so much for having me.

[00:31:09] SY: Coming up next, we’re joined by Jason Scott, Co-Founder of Archive Team about their efforts to archive Yahoo Answers, which is shutting down after 16 years after this.

[MUSIC BREAK]

[AD]

[00:31:34] JP: Scout APM pinpoints and resolves performance abnormalities, like N+1 queries, memory bloat, and more. So you can spend less time debugging and more time building a great product. With developer centric UI and tracing logic that ties bottlenecks to source code, get the insights you need in less than four minutes without dealing with the overhead of enterprise-platform feature bloat. You can rest easy knowing Scout’s on watch to help you resolve performance issues with Scout’s real-time alerting and weekly digest emails. As an added bonus for DevNews listeners, Scout APM will donate $5 to the open source project of your choice when you deploy. Visit scoutapm.com/devnews for more information.

[00:32:10] RudderStack is the Smart Customer Data Pipeline. It gives you the flexibility to choose the tools you want to use without worrying how to connect your customer data. Instrument once with RudderStack to capture event data, then send it to your entire customer data stack. It integrates with over a hundred cloud tools with new integrations releasing all the time. Start building a smarter customer data pipeline today. Sign up for free at rudderstack.com.

[AD ENDS]

[00:32:36] SY: Here with us is Jason Scott, Co-Founder of Archive Team. Thank you so much for joining us.

[00:32:41] JS: Happy to.

[00:32:42] SY: So let’s get into a bit of your developer background. Where did it all begin for you?

[00:32:47] JS: I was at a game company where I got in as a temp back in 1995. I had done computers as a kid in the ’80s, MS-DOS and Amiga and some Macintosh, but I was really kind of thinking I was going to be some sort of filmmaker or game maker. So I got a temp job at a game company. I was there for about a year and it didn’t really pan out like a lot of them don’t pan out. So at the time, I went to Usenet newsgroups, found the jobs board, looked for UNIX and dollar sign and that’s how I got my job at a company that I was at for about 14 years.

[00:33:28] SY: Wow! That’s a lot.

[00:33:29] JS: Yeah, the name changed. The owners changed, but I had the same cubicles and I took care of customers, worked in mostly the UNIX system and doing a little bit of coding here and there, but that’s kind of where I was. And we were an internet connected company, which is what put me on a really nice pipe, being able to see things happen and also therefore love the early internet.

[00:33:55] JP: Awesome. Well, tell us about Archive Team and what it is and why you helped to create it.

[00:34:03] JS: So there were all of these beautiful sites in the 1990s with this credo of if you can get online, we have a home for you. And there was also this rather strange idea that to get people to adopt this internet, everything had to be free. And the enormous amount of costs that that was, was all hidden from people. So there were these servers running hundreds and sometimes thousands of homepages where people could make their own way on the internet and there were no rules and people were trying all sorts of things. We went like that for a good number of years. Then over time, they figured out, “Hey, we could actually charge money for this.” So a lot of that early internet was kind of left adrift in favor of what we now think of as social media networks. One of the side effects of being left adrift is that eventually somebody comes in and turns off the lights. And after watching this happen a few times where 10 and 15-year-old communities were being given 30 days to get your stuff off the server and we’re going to shut it all off, I started to say, “Hey, there’s a trend here. And I don’t like it. And we should have somebody who at least makes one more copy of it before it’s gone. This is all historical. It should be an A team that swings in and takes everything and gets it out of there before destruction.” And that’s how Archive Team was born in about 2009. And so I’ve always said that one of the greatest virtues of an archivist activist is kleptomania, the ability to go in and grab as much as possible without really giving much thought to whether or not it’s “useful”, which is always something people say, right? They try to devalue what was there before. “The old stuff was terrible. The old approach was worthless. Nobody knew anything.” The new is better. It’s cleaner, it’s nicer. And what we find later is wow, people really did things in a different but interesting way back then. It would be nice to reference it. As others who are smarter than me have said, “If you get rid of the past, you live in an eternal present, which allows anybody to manipulate what came before to indicate what’s now is the way it’s always been and always will be.” And you don’t want to do that. You want proof that people came up with an idea 20 years ago or that people thought this was an issue or how did they address this at the time when something now we’re trying to say.

[00:36:45] SY: So we wanted to have you on the show specifically to do a little moratorium for Yahoo Answers because your team is currently trying to archive its 16 years of data. So first off, for all the youngins in the audience, what is Yahoo Answers?

[00:37:00] JS: Sure. Ever since the beginning of what became Yahoo, there have been message boards and specifically message boards where people could ask questions and other people could answer. Now we know this is a terrible idea. Crowdsourced answers are either an invitation for all sorts of yarns and making things up and dictating that a fact is a fact, even though it’s not true, but something about Yahoo Answers was very effective in terms of when your question got answered, when that miracle happened. It was great. It was free. And when it didn’t get answered, it was spectacularly bad. The fact that there are so many memes and humor references to how bad a Yahoo question or answer could be, to me, tells me that it was widely adopted, people who barely were getting by with their computer skills who still needed to know something were able to use it. And so Yahoo, in its infancy, was basically a directory of other websites. It was just meant to be you’re looking up agriculture, here’s every farm on the internet. And that obviously got wiped away by the idea of a Google or a Bing and even Yahoo had its own search engine for a while. But in the beginning, we just thought that the internet was going to be one big phone book and you just looked up. “I want to know about buying cows.” And it would go, “Oh, look here under the cow’s section and you’ll be able to find all the cows.” And one of them was maybe we should have a place where people can play games, a place where people could ask each other questions, and that’s where Yahoo Answers comes from. And as Yahoo goes through its very different phases, if you look at the full story of Yahoo, if you spoil it, it’s basically two guys, create a directory of the internet, gather as much data as possible, and then delete it. That’s basically the whole story of Yahoo.

[00:39:12] SY: That is so accurate. Oh my goodness!

[00:39:13] JP: Yeah. Wow!

[00:39:15] JS: A lot of places sold out to Yahoo saying what a great company. And then as their fortunes went down and down, they would shut services off with no warning. So for me, they’re the nemesis of Archive Team because they make us work too hard. Our first major job was rescuing GeoCities. So we’ve tangled with them before. This is not a new situation of, “Oh, Yahoo is shutting down a multi-decade endeavor with 30 days’ notice.” The only part of this that’s really new for me is that they were still accepting answers.

[00:39:53] SY: Oh!

[00:39:54] JS: Like you can still ask questions and answers until I think it’s like the 23rd.

[00:39:59] JP: I mean, you got to get those answers.

[00:40:01] SY: Got to get them in.

[00:40:02] JS: Right, before they shut everything down on May 5th or whatever it is. It’s like saying, “We’re going to burn down this warehouse, but it’s a hot sale until tomorrow morning when we put the matches.”

[00:40:13] SY: Yeah.

[00:40:15] JP: So. Can you walk us through like some of the technical aspects of how you even go about archiving 16 years of data?

[00:40:24] JS: So like I said, Archive Team was founded in about 2009 with some things we were complaining about in 2008. And so we’ve been at this now for 11 years and I used to say, “I can’t wait till we’re no longer needed,” and it just keeps getting worse. And so we are still around and like EMTs, we really have a process now. When we first did it, it was literally like a bucket brigade. It was just people being assigned random swaths of GeoCities and making big zip files and handing them to each other. It wasn’t perfect. That’s all you’re going to get, that’s all you’re going to get, but we can do much better now. So when we save things, we save them into a format that’s compatible with the internet archives way back machine. It’s called “web archive”. All of the code is public. So there’s nothing proprietary here, but web archive format is very good for saving HTML and images in one big block. That’s the only thing to say. So when I have a user account, I will create anywhere from 1 to 20-megabyte work and it will be handed off. And that’s important to think about that these are being kind of put into a memento kind of situation, a yearbook or a compilation. So that’s the first thing. What we ran into a long time ago was that when your numbers start getting really huge, you can have a few people who have a lot of servers, but it’s also very helpful to have others who contribute bandwidth and time and machine processing. So if people are familiar with the concept of SETI@home, which is where you turn on your machine and it randomly tries to find aliens because it’s doing something for a central group, there’s also Folding@home, it’s the same idea. Years and years ago, I said that we should have something called “Archive at Home”. So there’s something called the Archive Team Warrior. Download it. Put it on your box. And you can watch as whatever we’re working on gets grabbed, packaged, and sent to our central processing where it then goes into the internet archive way back machine. At this point, the Archive Team has an almost military precision. So first thing they do is they go, “Okay, what is that site? What does it consist of? Does it have user accounts? Do those user accounts have numbers or some sort of identifier that we can scroll through to find them all?” Sometimes we have to use links from each account to find other accounts and kind of hope we find all the unique accounts because there’s no central directory. And then we write customized scrapers and the customized scrapers go through and grab a pretty good copy of the website. Some of them have JavaScript and other tricks or quirks. They’ll all use a centralized image repository or they will require some weird click-through that we have to emulate. And once they write that core and they test it on a few hundred collections, they then start running it and then they open it to everybody. Because once it goes, the numbers are off the charts. The last time I was checking, we had grabbed between something like six or seven million questions and we had 30 million to go. I haven’t checked. One pause while I check the leaderboard.

[00:44:02] SY: Sure.

[00:44:03] JS: The leaderboard by the way is completely public. It’s tracker.archiveteam.org. And then you’ll see our various projects and you’ll be able to see that Yahoo Answers Part II is up there because we grabbed a copy of Yahoo Answers back in 2017 because we don’t trust these people, but there’s been a lot of questions since then apparently. So I’m seeing that we have 14.73 million questions grabbed, 5.28 million in the hopper being worked with and an estimated 31 million questions.

[00:44:37] JP: Wow!

[00:44:38] JS: And do you see that right now it’s compressed to a mirror, 1.42 terabytes, which is good. That’s because it’s mostly HTML and text. It’s not images. Sometimes we’ll do images and we will move up into literally half a petabyte, 500 terabytes or something. It’ll be really large, but this is such an old school webpage that it’s going to be fun.

[00:45:05] SY: So I imagine that just handling so much data, it’s probably a big challenge, but what are some of the other big challenges when it comes to digitally archiving anything and in particular Yahoo Answers?

[00:45:17] JS: So in this case, Yahoo keeps blocking us. And part of it is automated.

[00:45:21] SY: Really?

[00:45:22] JS: Part of it is people. Oh yeah. We’ve had this go around with them before. They block swaths of IPs. They try to prevent us from taking a copy. Sometimes because we’re going too fast. Sometimes because they just don’t want this to be copied. And so that doesn’t work. We are what we call a distributed preservation of service attack. So we go through a whole bunch of global IPs. But yeah, I guess that’s a surprise to some people that sometimes these sites not just shut down, but like lock people out. Long time ago, we had one group that shut down early once they found out we were doing this.

[00:46:06] JP: That’s just mean.

[00:46:08] JS: I know.

[00:46:08] SY: That is mean. Yeah.

[00:46:10] JS: Like most annoying activists, I’ve been paying so much attention to the problem and realizing how much of it is societal or economic and everything else that I have these very long, very elaborate views on it. Right? And that just comes from having to watch the heartbeat and the waves of the industry and how they do it and how they treat people. Like for instance, Archive Team knows that four times a year is some of the most dangerous time for websites because it’s the month before financial quarters, when somebody who works at the company wants to show they’ve made movement so they shut down a service that’s older to say, “Look, I’m going to save us $1.3 million a quarter by turning off this hosting.” So we’re like, “That’s dangerous.” It’s not based on like the merits or anything. It’s just literally, “This used to make a million dollars when we were making $4 million, but now we make $300 million. So who cares about the million dollars?” And maybe now it’s down to a whole $800,000 and we’ve been starving it of resources for 15 years. Of course, we’re geniuses. We are unmitigated geniuses. And so I have to watch that and I try to be, as you can hear, the voice of, if not reason, at least alternate reason, primarily at Archive Team, my job is to make noise these days.

[00:47:44] SY: So as we build products today, is there anything that we can do as developers to make future archiving easier and more successful? What can we do to help you make your job easier for the apps that we bring up we have content for and then we shut down?

[00:48:02] JS: So cynically, I’ll say you have nothing you can do because it will be pressed upon you by your owners to do the most economical institutional cheapness that you can and export and retention and all of that process doesn’t translate to money. So that’s a bummer, but it is possible for you to leave hooks, to leave a few minor hooks that you can sneak in while you’re doing doublethink. So making it so that there are unique identifiers that are in any way you want to be in an order that a person can decode later, that helps us. So even if you quietly put in a hexadecimal code in the URL and it increments in some basic way that somebody else can interpret from the outside, that’s helpful, even though nobody told you to do that, that’s just you saying, “Hey, why not?” There’s a lot of other things that I would love people to do, but frankly, we just need smarter customers, right? We just unfortunately need people to go, “Hey, this is great. Where’s the export function?” And I’ve done this, right? I strangled the social media in the crib. It was called Ello. They started up.

[00:49:29] JP: Oh, yeah. I know that.

[00:49:31] JS: Yeah.

[00:49:31] SY: That sounds familiar. What was that?

[00:49:33] JS: It was a little tiny, wishes it was a Google circle, wishes it was a Facebook site that came up in Colorado and it was called Ello and it was the same old, same old. You could import your information into it and it would make you have a little circle and then you could link to other circles and it was great.

[00:49:53] SY: Yeah.

[00:49:55] JS: So I showed up and I got big into their DMs. And I remember this exchange where I’m just like, “Where’s your export function? Why would anyone even use you? How can we take our data out? How are you going to protect this?” And the person who was kind of in charge was like, “Man, we’re just starting out. You can’t expect us to have this. Why are you being this way?” And I’m like, “Because your neck is going to get so thick I can’t fit my arms around it.” He didn’t like that, but it’s a fact, like Facebook doesn’t listen to people like me. They made a temporary export function that people sometimes cite as, “Look, Facebook listened to the government.” It was like, “No, the Zuckman was going off to Congress and he wanted to show that they weren’t a monopoly.” And one way to show that is that your data is transferable. So they made a truly horrible export function because it could only export what you created. Therefore, it couldn’t export any conversations with the other party.

[00:50:59] JP: Oh!

[00:51:01] JS: So great. Thanks. I’ve got my pictures and I’ve got my statements on my pictures, but I have none of the comments, none of the reactions, none of the links. And yeah, no, it was purely a…

[00:51:16] SY: Theatrics?

[00:51:17] JS: Yeah. So mostly I say, “We should have a law,” which is not a popular stance in the tech industry. Boy, the tech industry is just so much like previous industries to go, “Oh, please, don’t burden us with laws.” And I’m like, “Well, you’re not burdened with societal responsibility. So apparently, it’s going to be a law. It should be like a tenant law.” It’s a major pain in the ass to kick a person out of a building, mostly. I mean, people get around it and there’s things they do, but it’s pretty hard. A person doesn’t just go home after work with a sign saying, “Sorry about your house. Thank you for the incredible journey.”

[00:51:58] JP: So we were going to ask you what your favorite Yahoo Answer was. It seems like such a frivolous question quite honestly, but do you have a favorite one?

[00:52:06] JS: Of course the classic is, “How is Babby formed?”

[00:52:09] JP: Of course.

[00:52:10] JS: Where the person shows how something like 15 different people ask pregnancy questions.

[00:52:17] SY: Yes!

[00:52:18] JS: The guy is reading it with the authority of a radio announcer.

[00:52:21] MAN: To become pregnant. Does anyone know how many teens get bregant a year? Are these systoms of being pregarnt? Girlfriend ain’t had period since she got pregnant. Is it possible having sex to an eight months pregnant? If a woman has starch marks on her, wait, if a woman has starch masks on her body, does that mean she has been pregnant before period?

[00:52:53] SY: Is there anything else you wanted to talk about that we haven’t covered already?

[00:52:56] JS: Yeah. I’ll give a shout out to the CEO who is the best CEO I ever dealt with. His name was Ted Rheingold and he created Catster and Dogster, which were the social media…

[00:53:08] JP: I remember Dogster.

[00:53:09] JS: Yeah, social media for pets.

[00:53:11] JP: Yes!

[00:53:12] JS: And he made the site and worked on it, loved it, photos of him with his animals, did an amazing amount of work, and he sold it and it lived under its new corporate masters for a while. Ted retired from it. And then they announced they were shutting it down. So we’re over here trying to figure out how to pull things out of dogs during Catster. And Ted shows up and shows up and says, “Let me write some stuff for you. I know how to pull everything out of that site. I still have some back channel access and I’ll write something for you.” So he took one of my machines and basically wrote a Dogster and Catster extractor on his own because he was really concerned about all of these things and he worked really hard on it. And I did not know he was dying and he died sometime after that. So one of his last acts was ensuring that his thing he believed in because he really did like pets, he really did like pet owners, he really thought that it was important, got saved. So you only get like one in a million, but there are a few nice ones, and Ted Rheingold was definitely one of them.

[00:54:28] SY: Wonderful. Well, thank you so much for joining us.

[00:54:30] JS: Thank you.

[00:54:41] SY: Thank you for listening to DevNews. This show is produced and mixed by Levi Sharpe. Editorial oversight is provided by Peter Frank, Ben Halpern, and Jess Lee. Our theme music is by Dan Powell. If you have any questions or comments, dial into our Google Voice at +1 (929) 500-1513 or email us at pod@dev.to. Please rate and subscribe to this show on Apple Podcasts.


Thank you to these sponsors for supporting the show!

Thank you to these sponsors for supporting the show!