Incidents and the Google Cloud Service Health Dashboard  |  Support Documentation (2024)

The Google Cloud Service Health (CSH) Dashboard provides status information ofthe Google Cloud products organized by region and global locale.

Major incident

Google Cloud defines an incident as a major incident if it meets all ofthe following conditions:

  • High scope - The incident has global impact or is affecting a significantpercentage of customer projects across one or more regions.
  • High severity - One or more products are unavailable or severely degraded.

In the rare instance a major incident occurs, we act with urgency to resolve any issues.

During a major incident, the status of the issue is communicated through theGoogle Cloud Service Health Dashboard.A major incident is marked as cancelService outage on the status dashboards. After the issue is resolved, wepublish a public incident report that includes the details of the factors thatcontributed to the incident and the steps we plan to take to prevent suchincidents from reoccurring.

In the case of smaller-scoped incidents, a nonpublic report might be madeavailable to customers.

Lifecycle of an incident

When a product degradation is detected, the Google Cloud Support team andproduct engineering team work together to resolve the incident and provide youwith updates.

The following diagram shows the responsibilities of the product engineering andsupport teams:

Incidents and the Google Cloud Service Health Dashboard | Support Documentation (1)

You can read more about each of these responsibilities in the followingsections.

Detection

Google Cloud uses internal and black box monitoring to detect incidents. For moreinformation, seeChapter 6 of the Site Reliability Engineering book.

If you have Premium, Enhanced, or Standard Support, you can reportan incident by creating a support case in theGoogle Cloud console. Otherwise, you canuse this form.

Initial response

When an incident is detected, the Google Cloud Customer Care team manages customercommunications. Initial notification of an incident is often sparse,frequently only mentioning the product in question. This is because weprioritize fast notification over detail. Detail can be provided in subsequentupdates.

To provide you as much information as possible without overwhelming youwith issues that do not affect you, different communication channels are useddepending on the scope and severity of an issue:

Incidents and the Google Cloud Service Health Dashboard | Support Documentation (2)

We recommend usingPersonalized Service Healthas the first stop when facing a service disruption forspecific products. ThroughPersonalized Service Health you can view disruptions relevant to your projects.Read moreabout Personalized Service Health and how to integrate it into your incidentmanagement workflow.

TheGoogle Cloud Service Health Dashboarddisplays major incidents and is designed to be available in the rare eventPersonalized Service Health itself is unavailable or affected by a disruption.

If you have not enabled Personalized Service Health for your project, or ifthe product is not yet supported by Personalized Service Health, werecommend checking for active disruptions in the following:

  • Google Cloud Service Health Dashboard
  • Google Cloud console Support page

The known issues displayed in the Google Cloud console Support page also includeminor and limited-scope incidents.

Support cases are appropriate for issues that don't qualify as incidents orwhere a one-to-one human touch is needed. The known issues page lets youcreate a case from a posted incident so that you get regular updates and cantalk to support staff.

Investigate

Product engineering teams are responsible for investigating the root cause ofincidents. Incident management is often done by Site Reliability Engineers butmight be done by software engineers or others, depending on the situation andproduct. For more information, seeChapter 12 of the Site Reliability Engineering Book.

Mitigation/Fix

An issue is considered fixed only when changes have been made that Google isconfident will end the impact indefinitely. For example, the fix could be rollingback a change that triggered an incident.

While an incident is in progress, Customer Care and the product teamattempt to mitigate the issue. Mitigation is when the impact or scope of anissue can be reduced, for example, by temporarily providing additional resourcesto a product suffering overload.

If no mitigation has been found, when possible, the Customer Care teamfinds and communicates workarounds. Workarounds are steps that you can take to solvethe underlying need despite the incident. A workaround might be to use differentsettings for an API call to avoid a problematic code path.

Follow up

While an incident is ongoing, the Customer Care team provides regularupdates. Updates typically provide:

  • More information about the incident, such as error messages, zones orregions affected, which features are affected, or percentages of impact.

  • Progress towards mitigation, including any workarounds.

  • Timelines for communication, tailored to the incident.

  • Changes in status, such as when an incident is fixed.

Postmortem

All incidents have a postmortem internally to fully understand the incident andidentify reliability improvements that Google can make. These improvements arethentracked and implemented. For more information on postmortems at Google, seeChapter 15 of the Site Reliability Engineering Book.

Incident report

When incidents have very wide and serious impact, Google provides incidentreports that outline the symptoms, impact, root cause, remediation, and futureprevention of incidents. As with postmortems, we payparticular attention to the steps that we take to learn from the issueand improve reliability. Google's goal in writing and releasing postmortems isto be transparent and demonstrate our commitment to building stable productsfor our customers.

Incident data model

An incident impact one or more products in one or more locations.Incidents have a start time and an end time, and an overall severity. An incidenthas updates that describe how the incident changes over time,including its status and the then impacted locations. The incident informationis made available through a JSON schema.

The JSON schema has fields marked Stable and Unstable. In general, IDfields are considered Stable whereas fields such as display names areconsidered Unstable and may be changed without warning. Use Stablefields only when integrating with an external system or building automation.See Can I build integrations to consume the data displayed on theGoogle Cloud Service Health Dashboard programmatically?.

FAQ

What type of status information can I find on the Google CSH Dashboard?

The Google CSH Dashboard provides status information on products thatare part of Google Cloud. Status can include product disruptions,outages, or informational messages about a temporary issue.

When does an incident get posted to the Google CSH Dashboard?

Incidents that meet any of the following criteria appear in the CSH dashboard:

  • Major incidents
  • Personalized Service Health dashboard is unavailable
  • Google Cloud products not yet available on Personalized Service Health

Where can I find information about past product disruptions and outages?

The Google CSH Dashboard keeps a record of disruptions and outages for theGoogle Cloud products for up to five years. TheOverview tab of thedashboard shows the current status of the products by locale. To view informationabout product disruptions and outages in the last year, clickView history on the dashboard.To view a product's outage history for the last five years, click See morefor that product.

How can I view regionalized status information for Google Cloud products?

The Google CSH Dashboard displays the status of all Google Cloud productsorganized by region and global locale. To view the status for a multi-region,click on the region-specific tab.

Can I build integrations to consume the data displayed on the Google Google Cloud Service Health Dashboard programmatically?

Yes, you can consume the data displayed on the Google CSH Dashboard in thefollowing ways:

  • Through an RSS feed
  • Through a JSON History file

    You can download the schema for JSON file here.

The RSS feed and JSON History file provide incident status information which canbe consumed through integrations.

Use the fields marked Stable in the JSON History file, instead of the fieldsmarked Unstable. Example: if you're trying to programmatically identifyincidents impacting a particular set of products, use the product IDs(affected_products>id), not their display names.

Product IDs versus product names

Historically, the Google Cloud Service Health Dashboard didn't provide amechanism for locating the ID for a given product. Since early 2023, theGoogle Cloud Service Health Dashboard made available aproduct catalog which providesthis mapping for all products. A product ID provides a stable field to key offwhile allowing the display name of a product to change. Prefer referencing theproduct ID when programmatically identifying incidents impacting a set ofproducts.

What if I have pre-built integrations based on the Google Cloud Status Dashboard prior to the introduction of regionalized status reporting and name change to Google Cloud Service Health Dashboard?

In both the RSS feed and the JSON file, the regional status information isadditive to the information that was already being published prior to theintroduction of regionalized status reporting and change in the name ofGoogle Cloud Status Dashboard. Therefore, we expect your existingintegrations to continue working. However, if you want to consume the regional statusinformation through your integrations, then you need to modify them.

Here's a detailed description of how regional information is presented in bothRSS feed and JSON file:

  • RSS feed

    The regional status information is a new addition to the feed information thatwas provided prior to the introduction of regionalized status. Any locations that arereported as affected are appended to the RSS message.

  • JSON file

    Prior to the regional status update, Google Cloud published a stream ofincidents where each incident contained a list of affected products and a listof status updates for each, if any. These status updates contained anunstructured string field that did or did not contain the locationinformation.

    Now, Google Cloud publishes a stream of incidents just as it did before.However, for every incident, each status update contains the following newfields:

    • updates.affected_locations: contains a structured list of affectedlocations at the time the update was posted. Every update record and themost_recent_update record contain this field.
    • currently_affected_locations: contains the most recent information on thelocations that are actively impacted by the incident. Unlikeupdates.affected_locations, this list becomes empty after the incident isresolved (that is, when end is set to a non-empty value).
    • previously_affected_locations: contains a list of locations that werepreviously impacted during an incident, but aren't currently. As theincident progresses, some locations might have an outage resolution. Theselocations will still exist in the previously_affected_locations field.Once the incident is resolved (that is, when end is set to a non-empty value),this field contains a list of all locations that were impacted during thisincident.

What if I am experiencing an issue, but it is not listed on the dashboard?

The Google Cloud Service Health dashboard provides current and historical status information for any major incident that affects Google Cloud products and services. If you are experiencing an issue that is not listed on the dashboard, the issue may be isolated to your projects or instances, or it may be impacting a limited number of customers. Incidents that have less scope may be listed on the Customer Care Portal. You can contact Customer Care about any issues you are experiencing that are not listed on the dashboard.

If you are already using the Personalized Service Health dashboard, check if the issue is listed there to determine if your project or instance is affected.

If you are using Google Cloud console, you can click the Send feedback tool inthe upper right corner to report problems.

Who updates the dashboard?

The global Customer Care team monitors the status of productsusing many different types of signals and updates the dashboard in the event ofa widespread issue. If needed, they will post a detailed incident analysisreport after an incident has been resolved.

Incidents and the Google Cloud Service Health Dashboard  |  Support Documentation (2024)

References

Top Articles
At least 25 poisoned, 1 dead from “Real Alkalized Water,” CDC report reveals
Real Water faces lawsuit after 5-year-old girl gets seriously ill: "It was excruciating"
Hamlett Dobson Funeral Home Obituaries Kingsport Tn
Brett Cooper Wikifeet
Nbc4 Columbus Facebook
8776685260
Ups Cc Center
Saydel Botanica
Iapd Lookup
Guy I'm Talking To Deleted Bumble
Www. Kdarchitects .Net
Wmlink/Sspr
Cherry Spa Madison
Thor Majestic 23A Floor Plan
Blaire White's Transformation: Before And After Transition
Craigslist Tools Las Cruces Nm
Sitel Group®, leader mondial de l’expérience client, accélère sa transformation et devient Foundever®
Tugboat Information
Rooms For Rent Portland Oregon Craigslist
Offsale Roblox Items are Going Limited… What’s Next? | Rolimon's
Mhgu Bealite Ore
Shs Games 1V1 Lol
Flyover Conservatives
Six Oaks Rv Park Mooresburg Tn
Haslam Metrics
Truecarcin
Optum Primary Care - Winter Park Aloma
SEBO (UK) Ltd on LinkedIn: #sebouk #commercialcleaning #cleaning #floorcleaning #carpetcleaning
Master Series Snap On Tool Box
Movierulz.com Kannada 2024 Download: Your Ultimate Guide
Walmart Supercenter Nearest To My Location
Kickflip Seeds
Madison Legistar
Jeff Danker Net Worth
Perry County Mugshots Busted
Pella Culver's Flavor Of The Day
Freeman Funeral Home Chapmanville Wv Obits
Worldfree4U In
Game8 Genshin Impact
三上悠亜 Thank You For Everything Mikami Yua Special Photo Book
Wgu Admissions Login
How Much Does Hasa Pay For Rent 2022
Body Rubs Austin Texas
How To Delete Jackd Account
Snowy Hydro Truck Jobs in All Sydney NSW - Sep 2024 | SEEK
Alloyed Trident Spear
Directions To Truist Bank Near Me
Uc Davis Tech Management Minor
Idaho Pets Craigslist
How To Delete Jackd Account
The Spot Barbershop - Coconut Creek Reviews
Online-Shopping bei Temu: Solltest du lieber die Finger davon lassen?
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 6239

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.