It seems to me like counting throttled API calls against our API call limit is punitive and the root cause for these snowballing outages. This is compounded by the fact that we are relying on Zapier for these api calls and can't finely control the retry logic.
What happens is we can't recover after hitting the rate limit because as calls get throttled, they get retried which adds additional calls. And since no call gets through, all our systems screech to a halt and start retrying.
If instead the throttled calls didn't count, the limit would reset every minute and let at least some of the calls through until the spike in call volumes is drained and we consistently get back under our limit.
Your system would still be protected because I assume throttled API calls don't use any resources on your end.