In June of 2009, Apple announced Snow Leopard, the new version of Mac OS X. Instead of adding big, impressive new features, Snow Leopard — the successor to Leopard — had a singular focus: refinement. Better performance, more polish, fewer bugs. Snow Leopard’s biggest feature was that it had no new features.
2023 will be Nebula’s Snow Leopard.
For the last four years, we’ve raced from milestone to milestone, adding new features, supporting new content types, and putting apps on new platforms. We’re proud of the work we’ve done, but when you’re constantly rushing to get to the next thing, you accumulate debt. Design debt, technical debt, expectational debt. We have tons of it. For Nebula to be what we dream it to be, we need to take the time to pay that off.
We recently posted a Reddit thread for subscribers to share their thoughts on the parts of the experience they felt could use improvement. When the dust settled, I was grateful for two things. One, every single post on that thread was thoughtful, kind, respectful, and supportive. The feedback was genuinely very useful. But two, there were absolutely no surprises for us. Every complaint we saw, every suggestion, every bug was something we’ve encountered or documented or captured from support emails.
Using Nebula should be a premium experience. When you sit down to watch a Nebula video, you should get lost in what you’re watching. When you’re looking for something to watch, it should be easy to find a great video or a great new creator. When you finish watching episode two of Jet Lag, Nebula should make it easy to get to episode three. Nebula should reward curiosity and exploration. This service costs you money, and our job is to create a low-friction system for you to enjoy the work you want to see from the creators you want to support.
“No features” doesn’t actually mean no features in the strictest sense. Recommendations, playlists, better player controls, and lots of other improvements are on the immediate roadmap. We consider these improvements to be “quality of life” features. Things that enhance the experience and remove friction. There are hundreds of little details, little pieces of the overall experience, that aren’t currently as refined as they could be. But they will be.
As we’ve discussed this plan with creators, staff, and audience, it has been truly galvanizing. I don’t think we’ve been as unified as a team on all fronts since the inception of Nebula itself. Sharing the plan publicly is partially about telegraphing our plan to the world, but also largely about keeping ourselves accountable. This is the mission: to polish the rough edges and make Nebula the best place to enjoy the videos and podcasts our creators produce. Every step we take this year will be in service of that mission. If you’d like to join us, keep reporting bugs and making suggestions. We’re listening.
Not only was this our biggest day ever for signups, it was also our biggest day ever for traffic. In this post I’d like to give you a peek behind the curtain, to see how Jet Lag has impacted Nebula’s backend services, and to show you one of the tools we have at our disposal when things become too much to handle.
What we’re dealing with
Here’s 30 days of combined traffic to all of our backend services, between September 25th and November 25th:
We see a clearly defined daily cycle, and then every Wednesday we have these large spikes. These are caused by Jet Lag episodes being released.
“How do you know they’re Jet Lag?”
The release of videos is something we keep a close eye on. If we zoom in on one of these spikes and turn on our push notification annotations, this is what we see:
It looks quite dramatic, doesn’t it? What we see here is a ~3x increase in traffic to our backend services over the course of about 40 minutes. This is substantial, but not horrifying. I’d like to focus on two things we have in place for handling these spikes: autoscaling and rate limiting.
The backend services that make up Nebula run on Kubernetes in AWS. Kubernetes is a container orchestration platform; you tell it what you want to run and it figures out how to do it. Explaining Kubernetes in detail is outside of the scope of this post, but if you’re interested the official website has a great overview: https://kubernetes.io/docs/concepts/overview/.
We make use of 2 different types of autoscaling: pod autoscaling, and node autoscaling. You can think of pods as independent units of work. Each pod can have multiple containers working together inside of it, and those containers have to run somewhere. This is where nodes come in. Nodes are computers that can run pods. In our case, EC2 instances. Pods additionally specify how much CPU, RAM, and disk they need, so you can also think of them as having different sizes depending on how much of each resource they need.
We have multiple types of pods for each of our services. The most important for this post is what we call “web” pods. These run the containers that handle web requests to our APIs, and each service has at least 3 of them. They contain application code to serve video data, handle signups of new users, create and cancel subscriptions, and so on. It’s these pods that have to absorb the 3x increase of traffic we see when a new Jet Lag episode drops.
Pod autoscaling kicks in when a pod hits a threshold of resource utilization. Our web pod groups are configured to add in a new pod if the average CPU utilization of all pods is above 60%. We run a minimum of 3 web pods per service, for redundancy, and then we also configure a maximum in order to prevent excessive traffic from filling up our nodes and rendering other services unable to scale up.
Given that pods have a size, and a node can only run a finite number of pods, we also have…
Just like pods, nodes have a size. They have a finite amount of CPU, RAM, and disk space, and when they get full they won’t be able to run any more pods. For this reason, our Kubernetes cluster is also configured to add more nodes when it hits a threshold of utilization.
That was a lot of words. Let’s see how it works in practice.
This is the same 30 day period we used earlier, only this time showing the number of nodes and pods present across all of our backend services. We can see it follows a similar daily cycle, with spikes in the same places we see spikes in traffic. At peak, we can see we’re using up to 37 nodes. This is double what we need at our quietest times. Autoscaling allows us to save money, as well respond automatically to bursts of traffic.
Zoomed in on one of the spikes, you can more clearly see autoscaling reacting to a Jet Lag episode being released. Doing this automatically frees us up from having to respond manually in most cases, only having to periodically tweak maximums in order to make sure we have plenty of headroom, accounting for organic growth. It’s a solid foundation upon which we can grow without being too wasteful.
Sometimes autoscaling isn’t enough. We rely on relational databases, and each service has its own database. Unfortunately, one of the places where relational databases struggle is write scaling, and Nebula is a write-heavy workload.
“Wait, write-heavy? How is that the case?”
It’s a little surprising, isn’t it? Isn’t Nebula mostly about serving content to people? It is, but an important aspect of that is remembering where you got up to. All of our apps are periodically reporting your progress through our videos and podcasts, so that if you suddenly lose connection on one device you can seamlessly resume where you left off on another device. This traffic, at peak, makes up around 70% of all Nebula traffic.
One of the good things about this traffic is that, when push comes to shove, we don’t need to serve it. All of our apps are written such that progress is saved locally, and if requests to save that local progress don’t succeed, they are retried later. It’s not ideal to delay this progress syncing, but it’s a small price to pay to maintain stability of the rest of Nebula. This is a valuable pressure valve for us, and we have had to use it recently.
This graph shows the percentage of traffic we are rate limiting over the last 30 days. Our logs show that all of our rate limiting is done against progress reporting requests, and we have a fairly consistent background rate of 5-7% of requests being rate limited. We’re strict with the rate limit on progress reporting, because we know the apps handle being limited well, so we try to skirt close to the expected rate of requests at all times.
You can see a big section in the middle of this graph, though, where we’re rate limiting significantly more requests than normal. This was in response to Jet Lag episode 5. Here’s a close-up.
(Trivia: this was my birthday!) And to show why we responded the way we did, here’s a graph of our database CPU utilization and CPU credit balance over the same time period.
The credit balance falling triggered an automated alert, and our response was to make use of our ability to rate limit specific endpoints on-the-fly to start limiting video progress reporting. This relieves pressure on the database, and allows our credit balance to start refilling. CPU credits are a mechanism AWS use to allow you to have bursty CPU usage. Every second you’re above 50% CPU utilization, your credit balance goes down. Every second you’re below 50%, your credit balance goes up. When you hit 0, AWS either throttles you (really bad for us), or charges you more money (bad, but not as bad as throttling).
Since this happened, we’ve upgraded the size of our database instance for this service, and haven’t had to use this pressure valve since. The most recent episode of Jet Lag, episode 7, released and triggered no alerts, requiring no intervention from the backend team. Success!
It’s at this point you may be thinking: “why write this directly to the database at all? Why not keep this in Redis or some other in-memory data store?”
This has crossed our minds. It’s likely that the long-term future of progress reporting on Nebula is to move to using something where writes are cheaper and easier to scale, but for now we get a lot of benefit from only using relational databases. Introducing something new is a serious decision, one that would have us considering monitoring, alerting, disaster recovery, the effect it will have on onboarding new team members, the added complexity of another moving part. For now, paying a bit more for a beefier database instance made the most sense.
Today Nebula crossed 600,000 paying users. That’s users who are, right now, paying to use Nebula. In an amazing coincidence, it was exactly four years ago today — November 27, 2018 — that I sent an email to our creators titled “Standard Streaming”, pitching them the idea for building our own streaming service.
For context, we had been contacted by Vimeo about setting up an over-the-top streaming service for one of our creators. We thought this sounded a little absurd, especially given that creator’s frequency, but I wondered if maybe a single service with all of Standard’s creators might make sense.
[Extra context just in case: Nebula was built by a community of creators called “Standard” — now usually referred to as “Nebula Talent”.]
I thought it might be fun to share some of that first pitch. The only parts I’ve cut out are things that would reveal behind-the-scenes context that is either irrelevant or could give away important internal information. Also one line about an embarrassing subscription box thing we tried and would prefer to forget about.
Vimeo has approached us with a very interesting idea: they’d like to help us launch and run a Standard streaming video service.
The platform would be entirely ours. Our branding, our customer relationships, our money. We could operate the platform however we like, and charge whatever we want per month. They would get $1 per subscriber per month. Functionally it would be like Netflix — a website and app that loads videos from Standard creators, designed and curated however we choose.
I don’t know if this is a good idea. Maybe it isn’t. But I want to walk through it and see if there’s something in there.
How would this work?
Standard creators post ad-and-sponsor-free versions of their videos, and occasionally the service premiers exclusive (at least for some time window) content. Let’s say we charge users $5 per month for this service.
Haven’t other services tried this and failed?
It’s been attempted a few times, and a few creators have made announcement videos that haven’t aged well. I think it would be hard to convince anyone — let alone everyone — to make a big deal about their involvement. If it fails, everyone looks bad.
So, rule number one: nobody is required to promote this.
Normally a service like this would live or die by subscriber numbers. Because our primary revenue comes from sponsor commissions, and because our operational costs are fully covered already, we don’t need to worry at all about the service being profitable. It only needs to not lose money. Standard will not be financially impacted if this streaming service isn’t wildly popular.
We just happen to be in a uniquely good position to pull this off. We have the content, we have a platform partner willing to take on the operational hassle, and we have an organization already capable of managing the process.
How is this better than the NBC/SyFy thing, or YouTube Originals, or…
We get approached all the time about streaming services. I’ve collected pitches in the past for YouTube and NBC/Comcast, but everything we’ve sent over has been met with a dial tone. I don’t read any malice from this. These are very big organizations whose needs and desires don’t always align with independent content creators. And because they’re so big they don’t have to explain themselves to us.
This, in my mind, is why the only way this could ever really be successful is if the platform is owned by creators. We’re a smaller community and we answer to each other. If an idea is turned down or delayed, we can have a direct conversation about it. If an exclusive content idea is accepted, we can work more closely to make it all work without creators having to guess our goals or strategy.
What’s in it for us?
Funding for experimental original content, for one. I’ve been sending YouTube Premium content proposals, but so far YouTube hasn’t replied. Many of these ideas are really great, and wouldn’t be difficult to produce.
Sometimes it’s hard to justify getting experimental on a popular channel because the algorithm may punish you for it. With our service, we control the algorithm and there’s no ad revenue. Experiment away.
Standard thrives on data. This service comes with a fully-baked analytics API; no matter what happens we’ll get first-hand visibility into how audiences perform between shows and individual videos. The more users we attract, the more data we collect. I can’t even guess yet what we’ll learn from that.
I also really, really love anything that diversifies creator income (and Standard’s income) away from Google. And away from advertisements. More than anything, this would be a low-risk experiment in diversification.
What are the downsides?
My primary concern with this idea is opportunity cost. Is this where we should be putting our energy?
But I don’t think it would take much.
I imagine much of the effort from staff will be around making things low-effort for creators. For regular posts, we would just need ad-free cuts of the videos. Anything higher-effort than that would be paid for as exclusive content.
And, importantly, creators would always have the option to post this exclusive content to YouTube after three months (or whatever we come up with) with a sponsor attached. Since we’d be booking the sponsor in that case, we also win when the exclusivity window ends.
I have more calls with Vimeo over the next week or so to talk through pricing and business model details. This is very, very early. It’s entirely possible that my excitement for the potential of the idea is preventing me from seeing a major flaw. It’s definitely worth playing with concepts, and I think that — in this case, now that equity is about to happen — it’s important that I bring everyone into the spitballing process early.
This could be a terrible idea. Or it could be key for our future plans. There’s a lot to think about. I’d love to hear your thoughts.
What came next was a ton of enthusiastic discussion. More than I had expected. Six months later, we were live. (Without Vimeo, by the way. They wanted to own the credit card relationships with our customers, and that was a non-starter.)
It’s amazing to me how well this email has aged. The spirit and philosophy haven’t changed, and a surprising number of the details have survived the years. The part that has aged the worst is the assumption that it might not make any money. Oops.
I’d love to take credit — this is my email, after all — but I think it’s more a testament to how we work as a community. We come up with ideas and we discuss them together. Despite my title, I don’t get to do anything without the creators being in on it. Every project, every decision, we hash things out in groups before taking action. By the time this email was drafted I’d already been through enough of these conversations to have a pretty solid idea of what the concerns and benefits would look like.
I don’t think you could make Nebula without an incredibly thoughtful and empathetic group of creators working together. I’m proud and grateful to be a part of it.
Over the last three and a half years, Nebula’s weakest link has been the streaming video pipeline itself. Nebula is a bootstrapped service that has grown so much faster than any of us had anticipated, so a lot of the early technical decisions were made in the interests of limited time and even more limited resources. The net effect for users has been occasionally (but not predictably) spotty video as we’ve relied on third-party services.
Last year we began working on our own solution, and we’re happy to report that it’s starting to roll out now. Introducing Starlight, our custom in-house transcoding and distribution pipeline.
Starting now, all new 1080p uploads will be handled by Starlight. 4K videos are still going through the old system for now while we work to improve transcoding times. Our goal is to have Starlight handling 4K content by the end of the year. Once that’s up and running, we’ll begin transcoding catalog videos to bring everything in-house. No ETA on that yet. The priority is very high to get this done, but even higher to get it right.
One of the more common support issues we get is complaints that videos get stuck buffering and never play. The reason for this, almost always, is that our current provider uses h.264 for 4K video. Starlight uses h.265 and VP9, resulting in smaller file sizes that won’t cause browsers to choke and die. Once we flip the switch on 4K later this year, that problem should be solved. In the short term, Starlight 1080p means higher-quality video — especially on mobile — with better insight and control of the system when problems arise.
Another common complaint is Cast support. Right now, videos cast in 480p. On a screen larger than your phone, this looks awful. (Honestly it doesn’t even look that great on your phone.) Starlight fixes this, allowing better video quality when casting to your TV. It also adds support for subtitles. An important win for accessibility.
There are other issues we don’t have clear answers for yet because our current provider provides no transparency in their process. (We’re not even allowed to know who the CDN provider is.) Even here, we’re very optimistic that running our own service will allow our team to better track down root causes and solve them for good. No more reaching out to third parties to help diagnose user problems.
For everyone who has run into problems, we appreciate your patience. The team has been hard at work behind the scenes and we’re excited to finally get to share progress.
So much of Nebula’s mission is about allowing the creators — the primary owners of the platform — better opportunity to control their own destiny. With Starlight, we capture a little more of that control on the technical and business sides as well.
Starting in a couple weeks, we’re bringing Classes back into a single-tier version of Nebula, and keeping the price at $5 per month. This sounds simple, but there are a couple of reasons why it isn’t, and why it’ll take us a few weeks to get it moved over.
First, not everyone is on the same plan. Some folks come in direct with no creator code. Some come in via a specific creator to get the discount that creator offers. Some have upgraded from the Curiosity Stream bundle. Others are on legacy pricing. Still others are on various flavors of discount for Classes-tier pricing. Second, all of these users are scattered across web, iOS, Android, and bundle payment systems. Re-consolidating in a fair way requires some code changes and some artful application of policy. What does this mean for you?
If you’re on an existing monthly or annual base plan, you’ll get Nebula Classes included.
If you’re paying more than the new pricing for monthly or annual plans, we’ll automatically move you to a lower plan.
If you’re a Curiosity Stream bundle user, you’ll be given an opportunity to upgrade to Nebula with Classes for $1 per month or $10 per year.
If you’re on an annual Classes plan over $100, you’ll be given a lifetime subscription to Nebula. You’ll never have to pay us again.
The product team has a little QA testing work to do before this is ready to roll out, but we’re aiming for the end of November. Current pricing levels will exist until then out of necessity.
As for Classes themselves, they are definitely not going away. We love Classes, the creators love making them, and we can see that the audience loves watching them. Videos, podcasts, Classes, newsletters. Originals, Plus, Nebula First. Classes are a key part of our exclusive offering, and we wouldn’t trade them for the world. We just want to make it easier for everyone to watch them, and easier for our audience to understand what they’re paying for.