28-01-2021 | | By Sam Brown
Cloud computing services such as Google Cloud and AWS provide developers with expandable resources that can grow to their needs. However, unless deployed correctly, you may face a rather large bill like one unfortunate startup company.
When the term “cloud” was introduced a number of years ago, some were left confused (such as myself) as to what the term really meant. Did it mean a new type of server, a service, or a new way to connect stuff together? No, the term “cloud” is another name for the internet.
If you want to put this statement to the test, replace the word “cloud” with “internet” and it very quickly makes sense. Internet services, internet computing, and internet backup make sense, but the word cloud helps with marketing and sales. However, as many are used to the term cloud, it will be used henceforth in this article.
Cloud services are internet-accessible services that run on some remote server unbeknown to the user. Once that service has completed its task, it will generally send a response to inform the user that the task is complete.
An example of a cloud service would be Google Docs; it provides users with a word editor in the browser (which runs client-side), and then allows saving to Google Drive which itself is the cloud service. Another example of a cloud service would be just about any website in existence; the user connects to the service, the service responds with the content, and the user can view the content.
It should be stated that the vast majority of cloud services are event-driven (i.e. they react to things that happen), and are rarely an ongoing process (like an infinite while(1) loop). Developers write scripts that describe what a cloud service should do when a message is received, and provide APIs that allow users to interact with the services.
Anyone can create their own cloud service at home with relative ease; a few Raspberry Pi’s can be configured as Node.js servers. A single Raspberry Pi can be configured as a load balancer, and an internet connection provides an access point. Such a setup is great for a few hundred IoT devices that access the server once every so often, but what about hosting a website?
As the demands on a system increase, more servers are needed, faster local networks are needed, and a better internet connection is required. For most users, this is where they hit a bottleneck as they lack the funding to purchase many hundreds of Raspberry Pis, lack the space to make such a setup, and are limited by their ISP for using too much data.
To help developers with such issues, expandable cloud services popped up such as Amazon Web Services and Microsoft Azure. These systems allow for developers to deploy websites and services that can automatically expand their resources as they require them. Furthermore, the expansion is fully automatic, and the user only pays for the number of processing hours used.
Such services now drive many thousands of popular online services, and the companies running the web services are seeing excellent growth. Users should be extremely cautious when using such services as things can quickly get out of hand.
Recently, Announce, a start-up company, was at the final stages of developing their web service that shows users of announcements near to their location on Google Maps. Preliminary testing looked good, it was time to deploy the service, and Google Cloud was chosen to run the service.
An account was made, the company credit card was used across all the various Google services, and the free plan was selected. Of course, the designers understood that their service may need to grow during testing, so they put a $7 cloud budget into the service. The designers also used other free services, but understood that the worst-case scenario would see that the free services would be suspended as they reach their daily limits.
The cloud service was deployed, the developers sat back, and decided to rest up to see how things turn out over the next two days. Just two hours into the test and the developers noticed a warning from Google saying that they had used up their free Firebase service, but its OK because Google automatically upgraded their Firebase account because cloud services are designed to scale; this was the first red flag.
Almost immediately after the email was received, another one came through stating they had gone over their $7 budget. But it was OK because the developers set a $7 budget limit, or so they thought. As it turns out, the budget is actually a warning event that informs a developer when they have exceeded their budget.
The third email stated that all their Cloud Services had been suspended due to the companies' credit card being denied. Why would a credit card be declined on $7? The developers opened the Cloud Billing app to find that they had a total bill of not $7, but $5,000.
The team panicked trying to figure out why the bill was so large, but as they started to damage the situation, the bill updated to $15,000. As the day progressed, the bill reached a final value of $72,000; all for two hours of cloud computing time.
Without going into technical details, the cause of the large bill was essentially poorly written code. As stated previously, cloud computing is an event-based resource that reacts to messages, performs its task, and then sends the result.
Unknowingly to the developers, they had created a recursive function that would perform many requests before coming to a solution. As soon as the function was called, the cloud service systems had to expand their capabilities very quickly, and the deployed cloud service had used over 16,000 hours of CPU time and 116,222,164,695 read operations from Firebase.
If the bill was enforced, which Google could have been within their rights to do, the start-up would have gone into bankruptcy. However, after contacting Google about the situation, they were able to have the bill waved.
While Google is a tech company's mammoth, it has no interest in crushing users who may provide long-term service, especially those that could someday become gigantic. It is also likely that Google reviewed its system and recognised that there was definitely a flaw in the budgeting system with no ability to prevent services from consuming large amounts of processor time.
This does not mean that Google, or other companies, will forgive all mistaken overloaded recursive functions, but let this be an example of how cloud services should be used. Developers on such platforms should always take heed when writing code, and ensure that all functions operate quickly with no calls to themselves or functions that perform large operations that could otherwise be done once. The result shared to all other users who need the same data.
While this question may seem odd, creating a personal cloud platform using low-cost machines and load balancers is not a bad idea. In fact, such a setup can be highly ideal for developmental stages as it allows developers to see how their system reacts when constrained to a small framework.
If recursive functions crop up, the service will grind to a halt, but the bank balance will remain untouched. The development of such platforms also allows developers to spend time appreciating how such hardware works, and how to make the most of it.