By Jennifer Riggins, December 29, 2021, The New Stack
The next year, 2022, will be a turning point for the Earth. Either we proactively, dramatically cut our carbon emissions or extreme world events as a result of global heating will continue to cause devastation.
The tech industry has an important role to play.
Computing and Information and Communication Technology (ICT) emissions are worse than previously thought. The carbon cost of tech falls into two buckets — the hardware lifecycle and the electricity used in data centers. Since we’re focused on the modern technology stack, this piece will focus on the latter.
Over the last year or so, the individual tech worker has become more aware of the climate crisis, and, to a lesser extent, their organization’s carbon footprint and their ability to influence it. Today we reflect on the current state of green architectural practices and look ahead to the next logical steps including sustainable architecture, carbon-aware tooling and the massive stake cloud providers have in making or breaking this last-ditch effort to literally save the world.
Going Green Starts with the Big Three Being Risk-Adverse
Cloud providers going green isn’t about being altruistic. Carbon-neutral cloud computing is all about business sustainability and protection from regulation — by being greener, you are inherently being more risk-averse.
Amazon Web Services, Microsoft Azure and the Google Cloud Platform account for more than half of the international data center market. This makes them the most important ingredients to any sustainable architecture strategy. And it makes AWS, the market leader and the cloud provider that was lagging behind on the carbon front until this year, the biggest lever.
Container Solutions’ Anne Currie said AWS is late to the game but is spending a lot of money to catch up. Earlier this year, AWS rose to become the biggest corporate consumer of renewable energy in the world, taking the crown away from the historically greener Google Cloud. AWS has also committed to 100% renewable energy in the next three years.
Still, it was a pleasant surprise when AWS announced the new sustainability pillar of the AWS Well-Architected Framework at re:Invent earlier this month. Interim CTO and former AWS employee Paul Johnston told The New Stack this was “possibly the most important project AWS has delivered since AWS Lambda in 2014 — and maybe even more important than that.”
This piece highlights the recommendations from both the AWS sustainability team and the ethics white paper The State of Green Software Practices in 2021*, of which Currie and Johnston are lead author and contributor, respectively.
The Rise of the Green Architect
It’s not just the cloud providers who need to do the work. If you want to go green, there’s lots of work for teams, too. As AWS defines this shared responsibility, it’ll take care of the sustainability of the cloud but AWS customers are responsible for sustainability in the cloud.
“When it comes to sustainability, where AWS is responsible for the sustainability of the clouds, that means that we do good water stewardship. We do lots of innovation in energy management. We’re building our own silicon since we can really drive that efficiency of cost. You, however, as our customers are responsible for sustainability in the cloud, to make sure you pick those technologies that have the most impact for you,” Amazon CTO Dr. Werner Vogels told the re:Invent audience.
According to AWS, good design for sustainability includes:
Understanding (and measuring) your impact.
Establishing sustainability goals.
Maximizing utilization through right-sizing.
Anticipating and adopting new, more efficient hardware and software offerings.
Using managed services.
Reducing the downstream impact of your cloud workloads.
The tech industry needs to embrace architectural considerations that don’t run carbon all the time. All architects must become green architects because “you’ll find within 18 months your hosting bill will go bananas,” Currie predicts, when wind and solar energy becomes much cheaper and carbon taxation becomes more prevalent.
“The architect will just have to look at the ways to use cloud services better,” she told The New Stack over Zoom.
Architects are in a unique position to make decisions with a more lasting impact. As Currie put it: architecture isn’t just for Christmas.
Techniques to Architect for Sustainability
The first thing that’s clear is that teams can’t use on-demand instances and dedicated services anymore. Managing on the public cloud is always going to be more efficient and cost-effective than on customer clouds or data centers — unless you run on 100% renewables. Both Google, in a white paper on carbon-aware computing for data centers from June, and more recently Amazon, have said that no one — companies or the Earth — can afford to monopolize whole servers anymore.
And everything always-on has to end. The green architect must design systems so a smaller part of the application has to run continuously while the larger part is just on as-needed.
Several suggestions from the state of green software white paper are around operational tools and techniques, including:
Use Spot Instances on AWS or Azure, Preemptible Instances on GCP. These pay-per-use practices are not only up to 90% cheaper than on-demand, they give orchestrators discretion over when jobs are run. The next step will be applying carbon-aware algorithms to increase data-center operations efficiency.
End Over-Provisioning. This practice of putting everything on costly 2N full redundancy, has many organizations over-provisioning cloud storage by double. Instead, use AWS Cost Explorer or Azure’s cost analysis to identify zombie workloads to shut off.
Auto-scale for CPU or network traffic. Scale up or down automatically or even leverage predictive autoscaling to again only use what you need without risking uptime.
Let your cloud provider do more. Completely on-demand instances can never become carbon aware. By allowing the cloud provider to manage more, you increase machine utilization which in turn cuts carbon emissions and cost. AWS T3 Instances, Azure B-series and Google shared-core machine types all offer cloud bursting, in order to expand your public cloud capacity whether you’re on your own private data center or all-in on the public cloud.
As the paper states: “It is worth noting that architectures which recognize low priority or otherwise delay-able tasks are easier to operate at high machine utilization. In the future, the same architectures will be better at carbon awareness. These include serverless, microservice and other asynchronous (event-driven) architectures.”
Currie has written in the past about Google’s willingness to delay less urgent tasks to use their hardware resources more efficiently. Gmail makes sure you can access your emails quickly, but YouTube can take minutes or hours to transcode a video — the CPU-intensive process of converting from one file format to another. Like all things event-driven architecture, architectural decisions that consider user experience lead to other benefits like carbon reduction.
Currie also wrote that Google is aiming to actively reduce utilization when there’s not much carbon-zero electricity available. Now, they are more able to know the when and where of renewable-backed data centers, which is something essential for cloud providers to release in the future.
The AWS’s sustainability team also suggests:
Optimizing geographic placement of workloads for user locations.
Optimizing impact on customer devices and equipment.
Minimizing data movement across networks.
Implementing a data classification policy.
Of course, there are a penumbra of strategies you can employ, but start by looking for what can make the greatest impact on your carbon footprint, while maintaining or even improving customer experience. For instance, improving anything with heavy processing will lead to faster response times and less carbon impact. Any work done to reduce CPU or GPU is usually a good place to start.
Can’t Improve What You Can’t Measure
The strongest principle of the science side of tech is that you can’t improve what you can’t measure. Measuring a line of code’s carbon impact is hard enough, but measuring for an entire organization is near impossible.
“Interestingly there’s quite a battle going on about carbon footprint reporting,” Currie said. “Google and Azure both underlined their carbon footprint reporting tools in the past three months, while AWS announced at re:Invent that it would have a carbon reporting tool in early 2022.” Except these are still mostly objectives — noteworthy first steps, but they all still lack clear plans of how and when.
In the past, FinOps — the still-nascent practice of cloud financial management — was a reasonable proxy for carbon reporting. Yes cutting your cloud costs is the shortest path to cutting your environmental ones, but with temperatures still dangerously rising, it’s not enough. Plus we know that finance departments trend toward slow-moving anyway, so it’s tough for them to keep up with a continuous delivery cycle. Now that companies are committing to decreasing carbon impact, financial and cloud management are decoupling again.
Part of the challenge is that it is all very hosting region-dependent. There are regions where carbon production is significantly less than cost. “Say I’m running an application in a region where electricity is renewably generated — AWS new Scandinavian [91-megawatt wind energy] or in a French region running off nuclear — then my costs are the same but my footprint is less,” Curries explained. In some regions of the U.S., the cloud is powered by solar and wind, but then when it gets dark it’s gas-powered. Certainly, Data Center Alley in Virginia is considered one of the dirtiest regions.
Reflecting on how little we still know about the three main cloud providers’ plans to count carbon, Currie continued, “Hopefully these carbon footprint generators will take into account the electricity mix on the local grid because if they don’t there’s really no point to them.” But for now, only the cloud provider knows what is the mix on the local grid at the time they are drawing power from it.
“Only they know that so you are relying on them to tell you.” Currie is quite doubtful, saying “I suspect that level of complexity is not in their first version but in the long run it will be if we keep asking for it.”
The AWS sustainability white paper outlines specific targets for organizations to measure and improve upon:
Proxy metrics. Compute, storage, and network help evaluate what resources are provisioned.
Business metrics. Any quantifiable value provided by a workload, like the number of active users or transactions made.
KPIs. Proxy metric divided by business metric.
With any targeted improvement, it’s necessary to evaluate potential, cost and risk, because there will always trade-offs, like quality of result, response time and availability. Whatever you decide, it’s recommended that all organizations set sustainability as a non-functional business requirement. As the paper reads: “Focusing on getting more value from the resources you use and using fewer of them directly translates to cost savings on AWS as you pay only for what you use.” This tracks true with any cloud service.
How Starbucks Architected for Sustainability
The Starbucks app is powered by APIs around event-driven microservices deployed via Kubernetes pods, and the famously green-focused company is a good case study in intentionally green architectural design. Drew Engelson, in his role as director of engineering, shared with the re:Invent audience his journey to figure out the carbon footprint of the app. He soon learned that the Starbucks’ sustainability team used “a very coarse, spend-based model for estimating annual carbon emissions.” While 20% of that footprint was attributed to dairy, less than 1% was attributed to technology.
“While that 1% seemed low, we are a very large business so 1% of a lot is still quite a big number,” Engelson said. And they wanted to see if they could apply technology to reduce the footprint outside of technology too.
“I also learned that annual spend model isn’t granular enough or timely enough for me to make architectural decisions to be able to make a change, test, and see how it impacted our carbon footprint.” So his team dedicated a whole week to creating a Greener Cloud dashboard that measured carbon footprint and how it could potentially relate to customer experience.
They began with their AWS usage data, applying cost and CPU hours normalized. Since these were proxy metrics, they weren’t necessarily looking for precision but more if they were headed in the right direction month to month, and if they could spot new opportunities to be more efficient.
Then they reached out to the AWS sustainability team to get harder numbers on their hosting. They got data from an early version of some of AWS’s forthcoming carbon footprint tool, and they uncovered that they had actually reduced their carbon footprint by about 32% between 2019 and 2020, which was the same year they scaled their order functionality from 50 to over 15,000 stores.
The Starbucks engineering team realized there’s an opportunity to divorce business growth from carbon reduction. At least in this circumstance, optimizing for compute led to a growth in business and a drastic decrease in carbon footprint. They started focusing on the following patterns moving forward:
Observability and service level objectives (SLOs).
Efficient by design, including: cloud native is cleaner, Kubernetes allowed to densely pack services onto infrastructure beneath it, gRPC binary communication protocol, and they have many services written in Scala.
Right-sizing – optimizing for cost, shutting down under-utilized resources and matching EC2 instances per workloads, and autoscaling tuning.
Engelson also highlighted what he called some less obvious learnings:
They switched some of their instance types to ARM, which had the same cost and performance, but was more energy efficient.
They used AWS spot instances to optimize for costs, but also he said it allows cloud providers “to get much higher utilization of their infrastructure.”
They randomized the timing of their backups, so they no longer were among the majority who run backups at the top of every hour, which in turn drives spikes on AWS.
He also said that Starbucks wants to intentionally access lower-carbon regions, but that of course is still pending.
Engelson went on to say that while they focused much of their sustainability audit on the cloud, his engineering team is a part of a much larger ecosystem, which also includes data centers, data transfers, vendors, and end-user devices.
“Each one of these areas, no matter what part of the business you’re in, if you’re making decisions about how to configure a data center, how are you going to build an application, how are you going to write a contract with a vendor, there is something you can do to help improve the sustainability of your workloads,” Engelson said.
With that in mind, he highlighted some low-hanging fruit that helps reduce carbon footprint:
Move to the cloud, which AWS touts as nearly 80% more efficient.
Enable in-app dark mode.
Don’t let bots pull from your site all the time.
Reduce CPU.
Reduce page weight.
Stop video auto-play.
He suggested using a Website Carbon Calculator to spot other easy wins and reading Tom Greenwood’s Sustainable Web Design.
Finally, Engelson argued that instead of focusing on cost optimization which happens to help the environment, we should be working towards carbon optimization which happens to align very closely with cost. In the meantime, start having conversations across your company.
Next Steps in Sustainable Architecture
Besides giving more attention to the lack of diversity and inclusion in open source, another glaring hole in the KubeCon-CloudNativeCon schedule was the environment — a topic all tech events in 2022 should include in some way. Kubernetes, in particular, is enabled by the extra capacity just waiting around for it, making it inherently wasteful.
Because Kubernetes orchestration is so complex, you’ll most likely need a service mesh, adding a proxy sidecar container, which is very resource-intensive. Service meshes, like Istio and Linkerd are inherently always-on, on-demand and very energy inefficient. Istio can consume around 1GB of memory per proxy.
“Even in a very small environment with, say, 20 services, each running five pods, spread across three nodes, you’ve got 100 proxy containers. However small and efficient the proxy implementation is, the sheer repetition is going to cost resources,” wrote Liz Rice, the chief open source officer at Isovalent and CNCF chair of the technical oversight committee, in a blog post for The New Stack. The memory used by each proxy then increases in relation to the number of services it needs to communicate with.
Rice presents the alternative in eBPF, which she described as “a kernel technology that allows custom programs to run in the kernel. Those programs run in response to events, and there are thousands of possible events to which eBPF programs can be attached.” eBPF features one kernel per node, so all pods running on the same node share the same kernel.
There’s even eBPF-based networking, observability and security by way of CNCF sandbox project Cilium, which allows for service mesh without a sidecar, which can reduce the number of proxy instances from around 100 to three.
In addition, Cilium’s eBPF-enabled networking has been benchmarked as enabling significant performance improvements in Kubernetes networking, by allowing packets to take shortcuts that bypass parts of the kernel’s network stack.
eBPF is a promising step in the right direction. Of course, service mesh efficiency is another area that carbon-aware predictive analytics could dramatically help, alongside, again, allowing the cloud provider to self-manage.
Another thing to look out for in 2022 is hosting providers increasing their transparency into which regions are more sustainable. As an industry we must continue to push for this to come faster — if we only choose to host on sustainable servers, they will eventually make all servers sustainable. The green software practices white paper recommends asking your cloud provider which are their preferred regions for sustainable expansion. Asking questions is the first step toward big organizations providing answers.
Finally a reminder, the next 12 months will be all about architects getting on board. “Work out what it means for the services you’re orchestrating. If you don’t have a plan for what you’re going to do, you’re going to be in trouble. Regulations will come but may be in the form of carbon pricing for data centers, on-prem or in the cloud,” Currie said. So you won’t want to wait.
*Note this is the fourth year of The State of Green Software Practices in 2021, which again is in a GoogleDoc, a much greener way to publish a paper — and one that welcomes interactive comments and continuous feedback.
The author of this piece also writes for Container Solutions, mentioned in this article.