{"id":21766,"date":"2017-03-10T18:28:50","date_gmt":"2017-03-10T15:28:50","guid":{"rendered":"https:\/\/www.altoros.com\/blog\/?p=21766"},"modified":"2021-03-12T04:51:59","modified_gmt":"2021-03-12T01:51:59","slug":"aws-s3-level-outage-lessons-learned-for-cloud-foundry-users","status":"publish","type":"post","link":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/","title":{"rendered":"AWS S3 Outage: Lessons Learned for Cloud Foundry Users"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#Not_your_typical_day_of_troubleshooting\" >Not your typical day of troubleshooting<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#What_caused_the_outage\" >What caused the outage?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#What_services_got_disrupted\" >What services got disrupted?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#Learning_through_forced_downtime\" >Learning through forced downtime<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#Reducing_the_risk_of_disruption_in_Cloud_Foundry\" >Reducing the risk of disruption in Cloud Foundry<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#Related_reading\" >Related reading<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Not_your_typical_day_of_troubleshooting\"><\/span>Not your typical day of troubleshooting<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Imagine an average programmer, let\u2019s call him John, entering his office in San Francisco at 9 a.m. in the morning on February the 28th. He starts his daily routine reviewing tasks in Trello, checking out code from GitHub, and running his favorite code editor.<\/p>\n<p><strong>9&#58;58 a.m.<\/strong> Having implemented a number of small features, John checks Trello again. Trello is annoyingly slow, so John decides to reach his colleague by Slack. Slack shows a spinning loading circle for longer than expected.<\/p>\n<p><strong>10&#58;05 a.m.<\/strong> Worn out of patience, John leaves his desk to see his Product Manager, Bill. John finds Bill staring at his monitor, taken aback by the avalanche of e-mails, notifying that their service is down. The bill the company will have to pay to compensate this outage is going to be a disaster!<\/p>\n<p><strong>10&#58;08 a.m.<\/strong> Searching for the root of the problem, Bill checks the AWS status page, but it returns a green status, indicating that everything is okay.<\/p>\n<p><strong>10&#58;30 a.m.<\/strong> Bill checks the AWS status page once again and sees a red banner notifying that Amazon has identified an issue with its S3 US-EAST-1 and is actively working upon it. Interestingly, the status label is still green, since the status page itself relies on S3.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Network-Status.jpg\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Network-Status.jpg\" alt=\"Amazon AWS Cloud Foundry Outage Network Status\" width=\"640\" class=\"aligncenter size-full wp-image-21770\" \/><\/a><small>Image credit: <a href=\"https:\/\/phd.aws.amazon.com\/phd\/status\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon<\/a><\/small><\/center><\/p>\n<p><strong>10&#58;42 a.m.<\/strong> Bill gets an idea of re-deploying their services to other regions and begs John to spin new instances of their service in other AWS regions as soon as possible.<\/p>\n<p><strong>10&#58;45 a.m.<\/strong> John realizes that, since the company is using a private Docker registry with a S3 storage driver, he can not get and spin up containers from pre-baked Docker images in this registry.<\/p>\n<p><strong>11&#58;02 a.m.<\/strong> The only solution John comes up with is to try building new images from Dockerfiles from their projects, but the GitHub repositories the company uses are not available.<\/p>\n<p><strong>12&#58;10 p.m.<\/strong> Somehow, John manages to collect all necessary Dockerfiles and process image building, but soon he understands that the packages these Docker images require are fetched so slowly that it will take at least 12 hours to build them all.<\/p>\n<p><strong>2&#58;10 p.m.<\/strong> John and Bill have nothing to do but sit waiting for S3 to get recovered. All development activities are blocked, all services are down. Bill submits a complaint to AWS Support, and John posts funny jokes on Twitter, which happens to be available.<\/p>\n<p>Sadly, John\u2019s company is not the one that fell on its feet when confronted with the Amazon S3 Service Disruption incident.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What_caused_the_outage\"><\/span>What caused the outage?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The <a href=\"https:\/\/venturebeat.com\/2017\/02\/28\/aws-is-investigating-s3-issues-affecting-quora-slack-trello\/\" target=\"_blank\" rel=\"noopener noreferrer\">incident<\/a> alerts us to the occasional fragility of Internet services, picturing how much loss a company can suffer when the single service it relies upon is being disrupted. We\u2019ve seen it before with the <a href=\"https:\/\/themerkle.com\/mirai-botnet-is-the-culprit-of-yesterdays-dyndns-outage\/\" target=\"_blank\" rel=\"noopener noreferrer\">DynDNS attack in 2016<\/a>, but this very outage has a different origin. According to Amazon\u2019s <a href=\"https:\/\/aws.amazon.com\/message\/41926\/\" target=\"_blank\" rel=\"noopener noreferrer\">post-mortem<\/a>, the issue was caused by a chain of events triggered by a human error.<\/p>\n<p>Amazon claimed that the issue resulted from an engineer misspelling a command argument, which lead to removal of a larger chunk of servers for one of the S3 subsystems. As soon as the engineers at Amazon detected the problem, they restarted the systems. It took much longer than expected, since this scenario had never been tested out in the wild.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.jpg\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.jpg\" alt=\"Amazon AWS Cloud Foundry Outage Web Service Down v2\" width=\"640\" class=\"aligncenter size-full wp-image-21784\" \/><\/a><small>Image credit: <a href=\"https:\/\/www.datacenterdynamics.com\/content-tracks\/colo-cloud\/amazon-web-services-us-east-1-goes-down\/97892.fullarticle\/\" target=\"_blank\" rel=\"noopener noreferrer\">Data Center Dynamics<\/a><\/small><\/center><\/p>\n<p>It turned out that other AWS services\u2014including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda\u2014rely on S3 for storage. All of them were also impacted while the S3 APIs were unavailable.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What_services_got_disrupted\"><\/span>What services got disrupted?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>With the disruption, over 70 services were affected including, but not limited to: Adobe\u2019s services, Airbnb, Amazon\u2019s Twitch, Atlassian\u2019s Bitbucket and HipChat, Autodesk Live and Cloud Rendering, Business Insider, Chef, Citrix, Codecademy, Coindesk, Coursera, Docker, Down Detector, Elastic, Expedia, Giphy, GitHub, GitLab, Google-owned Fabric, Greenhouse, Heroku, Imgur, Ionic, isitdownrightnow.com, JSTOR, Kickstarter, Lonely Planet, Mailchimp, Medium, Microsoft\u2019s HockeyApp, the MIT Technology Review, New Relic, News Corp, Pinterest, Quora, SendGrid, Signal, Slack, Snapchat\u2019s Bitmoji, Travis CI, Time Inc., Trello, Twilio, the U.S. Securities and Exchange Commission (SEC), The Verge, Vermont Public Radio, VSCO, Wix, and Zendesk.<\/p>\n<p>This list can by no means be considered exhaustive.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Quora-v3.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Quora-v3.png\" alt=\"Amazon AWS Cloud Foundry Outage Quora v3\" width=\"640\" class=\"aligncenter size-full wp-image-21783\" \/><\/a><small>Image credit: <a href=\"https:\/\/www.bleepingcomputer.com\/news\/hardware\/aws-goes-down-and-so-do-millions-of-websites-apps-and-other-services\/\" target=\"_blank\" rel=\"noopener noreferrer\">BleepingComputer<\/a><\/small><\/center><\/p>\n<p>Interestingly enough, a number of services were built <b>by developers for developers<\/b>. Failure of these services prevented operators from reacting to the warning messages and taking actions to recover.<\/p>\n<p>Similarly, the Amazon status page showed a green label throughout all the time AWS was down, indicating deceptively that everything was okay.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Learning_through_forced_downtime\"><\/span>Learning through forced downtime<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The only service that relied on AWS S3, but survived the big blackout, was Netflix. And surviving such an outage did not come by chance. For a very long time, Netflix has been basing its software building process on the assumption that there is no 100% reliable part of infrastructure. The company tested its services rigorously by imitating failure. The idea was to use a tool that systematically brings destruction to the underlying infrastructure by killing VMs, emulating a network split and so on. This most widely known tool is named <a href=\"https:\/\/github.com\/Netflix\/SimianArmy\/wiki\/Chaos-Monkey\" target=\"_blank\" rel=\"noopener noreferrer\">Chaos Monkey<\/a>.<\/p>\n<p>Netflix is a good example of how <a href=\"https:\/\/arstechnica.com\/information-technology\/2012\/07\/netflix-attacks-own-network-with-chaos-monkey-and-now-you-can-too\/\" target=\"_blank\" rel=\"noopener noreferrer\">crashing your systems on purpose<\/a> can prevent them from crashing your business in case of an unexpected disruption.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Netflix-Chaos-Monkey.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Netflix-Chaos-Monkey.png\" alt=\"Amazon AWS Cloud Foundry Outage Netflix Chaos Monkey\" width=\"528\" height=\"407\" class=\"aligncenter size-full wp-image-21775\" \/><\/a><small>Image credit: <a href=\"https:\/\/netflixtechblog.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Netflix<\/a><\/small><\/center><\/p>\n<p>The company taught us that using some resilience testing tool or technique is not enough per se. To survive such a big outage as AWS S3 has just had, you need to base your infrastructure building process on a well-defined, systematic approach tying the development and deployment processes in one. You need to have reliable monitoring and notification systems in place, run load and resilience testing, and integrate a number of independent tools enabling repeatable and stable deployments.<\/p>\n<p>To be aware of all possible losses, you need to better understand dependencies behind your services. And predicting behaviour of a complex heavily distributed system that is built upon several independent layers is extremely hard!<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Reducing_the_risk_of_disruption_in_Cloud_Foundry\"><\/span>Reducing the risk of disruption in Cloud Foundry<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>If you are dealing with Cloud Foundry community services, you have a number of options to run resilience tests. The two most powerful options are <a href=\"https:\/\/github.com\/cppforlife\/turbulence-release\" target=\"_blank\" rel=\"noopener noreferrer\">Turbulence<\/a> and <a href=\"https:\/\/github.com\/Altoros\/cf-chaos-loris-broker\" target=\"_blank\" rel=\"noopener noreferrer\">Chaos Loris<\/a>.<\/p>\n<p><strong>Turbulence<\/strong> is a BOSH-deployed service that uses BOSH itself to simulate different types of outages, from termination of VMs to complex scenarios involving network partitioning, packet loss and delay, CPU or Memory load on VMs. When used together with <a href=\"https:\/\/www.altoros.com\/blog\/8-pro-tips-for-using-concourse-ci-with-cloud-foundry\/\">Concourse CI<\/a>, Turbulence emerges as a very powerful solution to detect problems on early stages.<\/p>\n<p><strong>Chaos Loris<\/strong> works in a similar manner, but only with applications deployed to Cloud Foundry. You can bind your app with the Chaos Loris Cloud Foundry Service using a single command and this test will provide you with understanding your application behaviour in extreme situations.<\/p>\n<p>The whole story proves that one is better not rely only on the providers&#8217; SLA, but develop a set of measures that would ensure one&#8217;s service is up and running in case of a disaster or outage infrastructure points fail. BOSH\u2014supporting a variety of clouds and facilitating migration between them\u2014coupled with Turbulence and Chaos Loris allows for enabling reliable and repeatable deployment across multiple clouds with no vendor lock-in.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Related_reading\"><\/span>Related reading<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/2016-in-review-a-varietal-cloud-and-the-paas-revival\/\">2016\u20132017 Trends: A \u201cVarietal\u201d Cloud and the PaaS Revival<\/a><\/li>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/why-weve-entered-a-new-era-of-it-zettastructure-and-other-concepts\/\">Why We\u2019ve Entered a New Era of IT: ZettaStructure and Other Concepts<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Not your typical day of troubleshooting<\/p>\n<p>Imagine an average programmer, let\u2019s call him John, entering his office in San Francisco at 9 a.m. in the morning on February the 28th. He starts his daily routine reviewing tasks in Trello, checking out code from GitHub, and running his favorite code editor.<\/p>\n<p>9&#58;58 a.m. [&#8230;]<\/p>\n","protected":false},"author":35,"featured_media":21785,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","_links_to":"","_links_to_target":""},"categories":[7],"tags":[873,206],"class_list":["post-21766","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-opinion","tag-cloud-native","tag-oss-cloud-foundry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros<\/title>\n<meta name=\"description\" content=\"On February 28, AWS\u2014one of the largest cloud providers\u2014suffered an outage that disrupted over 70 services. Is Cloud Foundry vulnerable to the same error and how can it be prevented?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros\" \/>\n<meta property=\"og:description\" content=\"Not your typical day of troubleshooting Imagine an average programmer, let\u2019s call him John, entering his office in San Francisco at 9 a.m. in the morning on February the 28th. He starts his daily routine reviewing tasks in Trello, checking out code from GitHub, and running his favorite code editor. 9&#058;58 a.m. [...]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/\" \/>\n<meta property=\"og:site_name\" content=\"Altoros\" \/>\n<meta property=\"article:published_time\" content=\"2017-03-10T15:28:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-12T01:51:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif\" \/>\n\t<meta property=\"og:image:width\" content=\"640\" \/>\n\t<meta property=\"og:image:height\" content=\"360\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"author\" content=\"Alexander Lomov\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Alexander Lomov\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/\",\"url\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/\",\"name\":\"AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros\",\"isPartOf\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif\",\"datePublished\":\"2017-03-10T15:28:50+00:00\",\"dateModified\":\"2021-03-12T01:51:59+00:00\",\"author\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/32bad30d8a66fbba345c09f5ce9503a6\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage\",\"url\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif\",\"contentUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif\",\"width\":640,\"height\":360},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.altoros.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AWS S3 Outage: Lessons Learned for Cloud Foundry Users\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#website\",\"url\":\"https:\/\/www.altoros.com\/blog\/\",\"name\":\"Altoros\",\"description\":\"Insight\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.altoros.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/32bad30d8a66fbba345c09f5ce9503a6\",\"name\":\"Alexander Lomov\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2016\/12\/Alexander-Lomov-150x150.jpg\",\"contentUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2016\/12\/Alexander-Lomov-150x150.jpg\",\"caption\":\"Alexander Lomov\"},\"description\":\"Alexander Lomov is a Cloud Foundry Engineer at Altoros. With extensive experience in Ruby, Go, and Python, he was involved in development of BOSH CPIs and other Cloud Foundry-related projects for Canonical, IBM, and other companies. Alexander is a frequent speaker at various events\/meetups, mostly sharing his experience with Cloud Foundry. You may also know him as the author of several blog posts about Cloud Foundry internals.\",\"sameAs\":[\"http:\/\/altoros.com\"],\"url\":\"https:\/\/www.altoros.com\/blog\/author\/alexander-lomov\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros","description":"On February 28, AWS\u2014one of the largest cloud providers\u2014suffered an outage that disrupted over 70 services. Is Cloud Foundry vulnerable to the same error and how can it be prevented?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/","og_locale":"en_US","og_type":"article","og_title":"AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros","og_description":"Not your typical day of troubleshooting Imagine an average programmer, let\u2019s call him John, entering his office in San Francisco at 9 a.m. in the morning on February the 28th. He starts his daily routine reviewing tasks in Trello, checking out code from GitHub, and running his favorite code editor. 9&#58;58 a.m. [...]","og_url":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/","og_site_name":"Altoros","article_published_time":"2017-03-10T15:28:50+00:00","article_modified_time":"2021-03-12T01:51:59+00:00","og_image":[{"width":640,"height":360,"url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif","type":"image\/gif"}],"author":"Alexander Lomov","twitter_misc":{"Written by":"Alexander Lomov","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/","url":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/","name":"AWS S3 Outage: Lessons Learned for Cloud Foundry Users | Altoros","isPartOf":{"@id":"https:\/\/www.altoros.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage"},"image":{"@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage"},"thumbnailUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif","datePublished":"2017-03-10T15:28:50+00:00","dateModified":"2021-03-12T01:51:59+00:00","author":{"@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/32bad30d8a66fbba345c09f5ce9503a6"},"breadcrumb":{"@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#primaryimage","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/03\/Amazon-AWS-Cloud-Foundry-Outage-Web-Service-Down-v2.gif","width":640,"height":360},{"@type":"BreadcrumbList","@id":"https:\/\/www.altoros.com\/blog\/aws-s3-level-outage-lessons-learned-for-cloud-foundry-users\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.altoros.com\/blog\/"},{"@type":"ListItem","position":2,"name":"AWS S3 Outage: Lessons Learned for Cloud Foundry Users"}]},{"@type":"WebSite","@id":"https:\/\/www.altoros.com\/blog\/#website","url":"https:\/\/www.altoros.com\/blog\/","name":"Altoros","description":"Insight","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.altoros.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/32bad30d8a66fbba345c09f5ce9503a6","name":"Alexander Lomov","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2016\/12\/Alexander-Lomov-150x150.jpg","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2016\/12\/Alexander-Lomov-150x150.jpg","caption":"Alexander Lomov"},"description":"Alexander Lomov is a Cloud Foundry Engineer at Altoros. With extensive experience in Ruby, Go, and Python, he was involved in development of BOSH CPIs and other Cloud Foundry-related projects for Canonical, IBM, and other companies. Alexander is a frequent speaker at various events\/meetups, mostly sharing his experience with Cloud Foundry. You may also know him as the author of several blog posts about Cloud Foundry internals.","sameAs":["http:\/\/altoros.com"],"url":"https:\/\/www.altoros.com\/blog\/author\/alexander-lomov\/"}]}},"_links":{"self":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/21766","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/comments?post=21766"}],"version-history":[{"count":14,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/21766\/revisions"}],"predecessor-version":[{"id":60567,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/21766\/revisions\/60567"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media\/21785"}],"wp:attachment":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media?parent=21766"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/categories?post=21766"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/tags?post=21766"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}