April 30, 2021

Exciting Broadband News: Building a Software Stack from Scratch

The following transcript has been edited for length and readability. Listen to the entire discussion here on The Broadband Bunch.  

Craig Corbin:

Welcome to the Broadband Bunch. A podcast about broadband and how it impacts all of us. The Broadband Bunch, as always, sponsored by ETI Software.

Craig Corbin:

Cloud-native architecture, the focus of how to optimize system architectures, utilizing the unique capabilities of the cloud. Our guests today are at the heart of pioneering efforts to allow construction event works with cloud-native agility at carrier scale and at much lower costs. RtBrick has pioneered carrier routing software capable of running on off-the-shelf hardware, by applying hyper-scale design principles to provide unprecedented scalability.

Craig Corbin:

Joining the Broadband Bunch with more than two decades of expertise in engineering with Alcatel, now Nokia networks and Juniper networks. The founder and CTO of RtBrick, Hannes Gredler, and alongside Hannes, bringing specialization in the intercept between technology and communications, the vice president of marketing at RtBrick, Richard Brandon. Hannes, Richard, welcome to the Broadband Bunch.

Craig Corbin:

It’s exciting to talk about what’s going on in all the advancements that are being made today at light speed, seemingly. Before we get started, for those who might not be familiar with how RtBrick came about, guys give us an overview.

Disruptive Change in Broadband Space

Hannes Gredler:

Well, where should I start? Both Richard, myself, my co-founder Pravin Bhandarkar, we have been working at a large trial to vendor in a previous life, and we started to see a little bit of changing buying patterns back in 2013, 2014, particularly with the hyperscalers. A whole lot of the large hyperscalers started to procure their own equipment from Taiwan, augmented them with their own homegrown networking stacks. The goal was actually to get a competitive edge in the cost and also fix some of their time to market issues that they had with their own product and Pravin, who was my product manager, just used to ask, “Hey Hannes, when do you think that wave is going to spill over to traditional telecommunication operators?”

Craig Corbin:

When you talk about disruptive change, that’s the concept that brings about the tremendous opportunity in any industry. But certainly when we talk about the telco space and as I understand it, the idea was the approach to rebuild a software stack from scratch, and that concept to me is mind-boggling. Talk about how you approached it.

Cutting-Edge Broadband Solution: Software Stack from Scratch

Hannes Gredler:

Once you get to maintain, I would say a multiple decades-old networking stack, you start a little bit the cracks here. It is software in total doesn’t really age very well. Usually at 7, 15, even 30-year-old software stacks, you have to put in almost an exponential amount of money and time to constantly put it on life support. And a part of that problem is rooted in the fact that this piece of traditional software stacks are very optimized to certain hardware, which is of course very troublesome for porting it to new hardware to portability in total. We said look, whenever we have the opportunity to build something from scratch, let’s really have that having portability in mind. It doesn’t really matter where we want to run it, whether we want to run it on the server, on my x86 CPU, on a bare metal switch, or even almost SmartNIC right. Your design has to be with having portability and modularity as a first principle.

Craig Corbin:

Right and Richard, from your perspective and watching this development, what’s been your impression?

Richard Brandon:

As Hannes said, that divorce from hardware and software will bring together this monolithic system is pretty revolutionary. Telcos perhaps started building out internet networks maybe about 25 years ago and really nothing’s changed. So the chassis has got faster, the blind cards have got faster but if you look at an early monolithic rooting system from 25 years ago, it really looks pretty much like the ones you can get today. This is the first huge shift really; we’ve seen in technology in that timeframe.

Craig Corbin:

You look at what has traditionally been a very long cycle of release. On average 18 months when we are talking about this industry, and I’m curious about how that design process started Hannes, in order to get yourself to a position to be able to roll out features within the order of days, instead of months. Talk about that if you would.

Next-Gen Broadband Rollout Times Reduced to Days with Software Stacks

Hannes Gredler:

If you go back and take a look at some of those traditional codebases, one thing really becomes obvious, there is a lot of copy and paste code. All the state maintenance code, routing protocols, drivers, programming chips, it’s usually a transactional model. There are private databases, private data structures that are going to get manipulated, and then the flow of events starts on. Now, if you look at how those private databases, they’re all being encoded using, I would say handcrafted C code. Common library functions. There was no sort of abstraction that it is normal to web app developers, which have a clear separation between front end, backend. The back-ends are a proper design database with well-known schemas. Based on those databases, you can almost derive all the plugin access codes. You can alter generate documentation. We were really looking for that schema driven database-driven paradigm first, that was the basis of how we started.

Craig Corbin:

I know that in any advance like this, sometimes there can be a struggle of sorts between the guys on the networking side of things, looking to protect the status quo, or the IT guys looking to get things into production more quickly. In this case, Hannes, who’s going to win?

Hannes Gredler:

Well, we have initially placed our bets. We clearly literally almost bet the farm on the IT guys because of what they had to offer in terms of automation, in terms of an IT workflow, that was clearly superior and also taking the cloud world by storm. But still, when we shipped our first flavor of our product, we had this nice back store. Everything accessible using APIs, but still we had to pass through the qualification stage of the networking guys. Now you have this left guy, and he says, “Hey yeah, this is all looking great, but where’s my CLI?” We had to actually then go back and say, “Okay. We have to really make this as a bridging technology.” And bridging the tomorrow, which is everything fully API-fied, but still not really losing the traditional operators, the traditional router jockeys or just used to their standard CLI, to their standard YANG models and all of that.

Craig Corbin:

We’re visiting with Hannes Gredler, the founder and CTO of RtBrick alongside Hannes, the vice president of marketing Richard Brandon. And Richard, I’m curious when we talk about Hannes mentioning a moment ago when things initially were rolled out, what was the initial reception?

Broadband Game Changer embraced by Tier 1 Telco

Richard Brandon:

One thing that’s perhaps a bit surprised when we’re dealing with large tier one carriers and telcos, is there is a fantastic appetite for this stuff. We’re not going into a space here where people are sort of skeptical. They do see that doing things the way the hyperscalers did things is their end game. So yes, it’s new for them and there are challenges with that, of course, there is but there’s a massive amount of interest and a sustained appetite to adopt it to the point that really quite surprisingly for me, there’s been a bunch of carriers who have recently put together a joint paper. Which is really unusual in this industry, and that thankfully Deutsche Telekom, BT in the UK, Vodafone, and Telefonica. So huge international carriers, all combining and putting together a joint paper, essentially saying this is the way that they want to run their broadband networks. Is calling for what they term an open BNG, which is essentially software running on their own choice of hardware. As Hannes said earlier, something they can source from a Taiwanese vendor, high performance, low-cost hardware or possibly even running it NICs x86 in smaller instances.

Craig Corbin:

How intriguing Hannes to the paper that Richard just mentioned. It to shame that you couldn’t find any bigger players in the industry to come together on a joint effort for this paper. That’s amazing. From your perspective, talk about having players of that scale, looking for exactly what you’re doing.

Hannes Gredler:

Let me perhaps go a little bit back to how the open BNG paper really was conceived. Then let me admit, there has been a fairly deal of trial and error. Starting 2016 on Deutsche Telekom originally started out with the cord initiative, which is central office re-architected as a data center. That was actually as a prototype really great, except it only had a limited scale. This is where RtBrick has really entered the stage at the T and they’ve asked us to look, how can we scale this kind of technology to, let’s say one of our largest central offices where we have potentially a hundred thousand subscribers being terminated today on a BNG. And we said, “Okay, then you probably need to deviate a little bit from that idea to that central idea from the court, that you are micromanaging your forwarding plane.”

Unlinking Broadband Software and Hardware

Hannes Gredler:

We probably do have to do a bit more divide and conquer where certain parts of the forwarding state are being managed by the underlying traditional routing protocols, and a certain amount of particularly service-related information is really coming from your controller. That’s a blending of the two layers that has really done the trick for us. It started out with a big SDM micromanagement confusion and ultimately has evolved into; we have to blend the SDN layer with the traditional routing layer.

Craig Corbin:

As I understand it Hannes, one of the fundamental differences is that now providers are no longer buying software from the hardware and networking vendor. You sort of unlinked those two worlds. Talk about that.

Hannes Gredler:

That is how our central business model really operates. We say, “Look please, dear network operator, you recur your hardware from that one out of a half a dozen Taiwan based OEMs, where there is already healthy competition and there in the future is even a more healthy competition because you have donated the open BNG specifications to the open compute project.” But what we are now doing, in addition, is we standardized certain software models, and make sure that in the future you’re also going to have some variety and some choice for the operating stack for the software that all drives up that. That was, I would say, the technological marketplace problem.

Hannes Gredler:

What we found a bit interesting is when we actually were sealing that intent into a contract, the entire contractual framework of telcos has not really been there yet. This was basically, I would say a recollection of the past 20 years of RFQ and experience. And now, where you have sort of sourced everything from a single neck to grad, and now then transitioning it over for desegregation, where you source hardware from vendor A and software from vendor B, that required a really complete overhaul of those contractual frameworks.

One Gig Broadband Service

Craig Corbin:

No, there was a lot of excitement guys earlier this year. And Richard, I think in late January, the press release that Deutsche Telekom had connected their first live customer to its dis-aggregated broadband network using routing software from RtBrick. That had to be a big day for the organization.

Richard Brandon:

It was a big day obviously for us, but I think even beyond Deutsche Telekom, I think there’s been a lot of people sort of waiting for this to happen at the sort of scale that’s… Deutsche Telekom, I think they have about 20 million broadband subscribers. So, to see that for real, I think the first customer went live just before Christmas. So, see a real household connected to, I think there’s a one gig broadband service, I think that that first routing layer there is running on bare metal switches with independent software. It’s pretty exciting to think that they’re sitting there watching their Christmas movies, over what’s going to be a radically new architecture. We’ve seen obviously a lot of interest from other carriers, as well as Deutsche Telekom in that project.

Networking and Routing Stack for Broadband

Craig Corbin:

When we talk about connecting live customers, that’s the success, the culmination of efforts to a certain degree, it didn’t happen overnight. And then Hannes, I’m curious about A, what the initial thought process was for how long it would take, and then B, how long it did take.

Hannes Gredler:

Oh, crikey you’re touching on a very painful topic now. Because originally I told my early seed investors, hey, look we know how to build this, two years and we’re done. Well, it turns out the two years have actually turned into four. So, doing the stack from the ground up is certainly one thing, but also ensuring that its telemetric layer integrates nicely with whatever the customer has been deployed, making sure that the command line interfaces display all the information that may be a command-line interface of other vendors all show. All that tiny little tinkering that a patient, well that took a lot of our time.

Hannes Gredler:

Let me also say here, I was really, and we still are, we are tremendously grateful for Deutsche Telekom, who has really put in all in. They gave us access to their central network qualification labs, to their best engineers, who had run all the cruel tests to penetrate the software and really make sure that it operates well even under boundary conditions. That technology partnership has really paid off. I to say as a startup, if you don’t really have large lighthouse customers like this, it’s probably going to take you much longer until you have your networking and routing stack in a decent form and shape.

Craig Corbin:

No question about it. As I understand, there was a unique way of approaching this, utilizing the concept of reverse engineering, the approach has taken many times in the automotive industry, backing into the final product. Talk a little bit about that if you would.

Hannes Gredler:

I mean, what was actually very different this time on the A4 project team, and it was not really, I would say a pure engineering team who is making the decisions. It was actually a multi-disciplinary team with people coming from finance cost engineers, but also of course engineers. One of the competencies that the cost engineers were doing is they have done things which are normal in the automotive industry for really assessing what is the value of certain parts.

Networking Switches For Broadband

Hannes Gredler:

They completely knock it down, disassemble it, look what kind of material, what kind of technologies they have put in. Then independently try to come up with an estimate, what’s such a unit of this cost on the world market. They have applied the very same methodology here with networking switches. They said, okay here it is, well perhaps this little Intel CPU in their DRM from vendor X, well PCP port design, two power supplies, six fans, certain Broadcom chip on there. The cost of goods, the bill of material is roughly $2000 for that kind of box. Now really tell me, why do you want to charge me $70,000? They have really started the whole cost engineering purchasing process with the true costs of hardware as their central linchpin.

Richard Brandon:

If I just sort of pick up on that as well, I think one of those things that only comes out of this disaggregation of hardware and software because traditionally when you buy these things linked together, the value of the hardware, the cost of that hardware, it’s not transparent to you as a buyer because the software comes included, which you need. Being able to disaggregate those two, gives you a much greater opportunity to see the transparency of your cost base. And Hannes says, you can optimize a particular hardware platform for your network, how much memory it might need. What kind of buffers it might need and so on. And then the software is an independent conversation from there.

Craig Corbin:

As with any major change, especially when seemingly a quantum leap is being taken, there are those that will be more readily willing to accept the change. Some are a bit more skeptical and it takes a little bit of encouragement to move out of the comfort zone. Hannes from your perspective, what’s the biggest challenge there?

Hannes Gredler:

I would say the biggest challenge from the customer side is that it’s not just doing a drop-in replacement, here goes your chassis-based switch and there is a drop-in replacement using a stack of bare metal hardware. Usually, what changes is also how you operational arise and operate that network. Usually, your networking engineers do not just be the traditional router chalky type of qualification. They need to know how to operate a virtual environment. What is in the container? What are some of those toolchains, elastics search for evaluating lock chains, and things like that?

Hannes Gredler:

So they really need to have a bit of an affinity on the IT side of things. So with a pure networking skillset, that challenge cannot really be mastered. That’s also what I tell them. By the way, also our own development engineers do not just focus on being good at mastering a certain protocol BGP or IS-IS, we always have a look at what is going on the IT side of things. Automation using ansible test frameworks like robots. You have to be good at both. Please add both sides until your talents stack.

Craig Corbin:

As we begin to wind down, Richard first to you, the biggest impact for providers for the work that’s been done there at RtBrick.

Richard Brandon:

The obvious first thing to latch onto would be costly because suddenly you are getting this access to this hardware base, which is very competitive and you listen to Dora, you know straight away when you turn around, you say this is going to cost you less than 50% of the old way of doing stuff. I think the bigger thing for them at the end of the day is the flexibility of not being constrained by the services that come from their hardware vendor. Up until now, once you make those huge commitments in putting in these big chassis-based systems, you’re going to have to keep running those for five, seven even 10 years and that means you can only offer the services that come from that same hardware vendor. I think divorcing those two things is massive for them because it means they can take control of their own destiny from their service portfolio to their subscribers.

Craig Corbin:

Hannes to your last question, we always wish in looking back hindsight 2020. I’ll ask you the back to the future question. If you could take yourself back to when the process initially began and could, could whisper in your own ear, some little tidbit of knowledge and wisdom, what would that have been? Would there have been a direction you could have taken perhaps more quickly?

Hannes Gredler:

Basically, there is one RFC that was originally intended as an April 1st RSE called RSC 1925, the 12 fundamental networking truths. We liked that one so much that we actually have put a nicely designed poster out of that and put it on our website for download. It contains 12 eternal wisdoms. And despite having it in front of the face, we were running into some of the mistakes that the RFC makes really fun around. One thing that I’ve learned also is that particularly rule number 10, one size doesn’t fit all. Whenever we tried to address multiple problems with just one software component, we were almost setting ourselves up for failure. I rather learned a bit small is beautiful but have a different small self-confined module and be really better at composing them together, rather than trying to re-utilize at all costs. Modularity has a price as well. It in itself is not really an end goal.

Craig Corbin:

Hannes, Richard, so exciting to see what has been done to look at the impact on the industry and can’t wait to be able to circle back and visit with both of you again, down the line. I know that there is much excitement with what’s going on with Deutsche Telekom and other providers around the world, and absolutely wish all the best as the efforts continue to evolve.

For more information about OSS tools for telecoms, please click here.

Join us here on the web at Broadband Bunch, find us on Facebook and on Instagram to see the latest episodes, news, and photos. The Broadband Bunch is sponsored by ETI Software.