Postmortem: Incident Summary
Beginning approximately June 5, 2026, and escalating on June 11, 2026, Jobvite experienced a degradation in outbound email and notification delivery. The incident resulted in intermittent delays (ranging from 30 minutes to several hours) or total delivery failure for various system-generated communications.
Affected services included:
- Interview Scheduling: Google Calendar invites and candidate communications.
- Hiring Workflows: Evaluation forms, requisition approvals, and workflow notifications.
- Offers: Offer letters to candidates and CC'd offer notifications for recruiters.
Detection
The issue was identified through a sharp increase in customer support reports. The first isolated instance was noted on June 5, and widespread escalation was confirmed on June 11 at 3:52 PM ET. Internal investigations confirmed that no automated alerts had been triggered because the system was "silently" failing.
Root Cause
The issue was caused by a temporary shortage of available server capacity in the external cloud market. This triggered an automated system error that inadvertently deactivated healthy servers while trying to find new ones. As a result, our email delivery system was forced to run on a single server, which became overwhelmed by the volume. This caused outgoing messages to be diverted into a backlog rather than being sent to their intended recipients.
Resolution
Technical teams manually intervened on June 11 to restore service:
- Capacity Restoration: Engineers manually unpaused the affected servers and stabilized the server fleet to ensure adequate processing power.
- Queue Reprocessing: At 5:03 PM ET, teams began reprocessing the 60,000 backlogged messages.
- Recovery: By 2:30 PM ET, all mail queues were drained, and normal delivery speeds were restored.
- Note: Not all affected emails were ultimately delivered. Some messages were lost before they entered the mail queues and therefore could not be reprocessed. For those cases, related alerts were logged in the in-app notification bell icon. If a customer reports that an expected email was not delivered, advise them to check the in-app notification bell for the corresponding alert or notification.
Preventative Measures
To prevent a recurrence of this issue, the following actions are being implemented:
- Enhanced Monitoring: New proactive alerts are being added to notify engineering teams immediately if the server count drops below required levels or if mail queues begin to back up.
- Infrastructure Stability: We are adjusting our server configuration to guarantee a minimum number of "always-on" servers that cannot be paused by automated scaling logic, ensuring baseline capacity is always available.
- Automated Recovery: We are developing automated tools to quickly reprocess backlogged messages without requiring manual intervention.
- Process Improvement: A formal technical runbook has been created to help on-call engineers diagnose and resolve similar cloud-market availability issues more rapidly.