Service disruption on social channels & SMS
Incident Report for iAdvize (HA)
Postmortem

On October 1st 2018 we experienced service disruptions on our social platform between 00:00 and 11:30 (UTC+2).

During this incident we were unable to provide conversations on social channels (Facebook, Twitter, Whatsapp and Text Messages).

Cause of the incident :
The message broker was unreachable due to a network incident in one of our provider. This incident happened on Monday 1st, at 00:00 (UTC+2) and lasted about 30 minutes.
During these 30 minutes, some of our social services depending on this message broker were unhealthy and tried to restart automatically.
As they were unable to get healthy again, they unfortunately stayed in this state.

After identifying the problem, the following corrective actions have been performed:

  • restart the social backend services that were unhealthy, so they can recover a proper connection to the message broker, since it was reachable at this time.

Everything went back to normal at 11:30 (UTC+2).

Other actions have been identified to prevent this issue from happening in the future:

  • We will enhance our alerting system so these errors would be immediately notified, and actions taken quickly.
Posted Oct 01, 2018 - 14:40 CEST

Resolved
The situation is now back to normal, the messages not received will gradually appear on the conversation panel.
Posted Oct 01, 2018 - 11:44 CEST
Investigating
There is currently an incident impacting social channels (Messenger, Facebook, Twitter, Whatsapp) & SMS. Incoming messages are not received anymore on the conversation panel.
We are working to restore the service and will update you as we learn more.
Posted Oct 01, 2018 - 11:30 CEST