We don’t have a good way of knowing if slackbridge or ircbot (“create”) is down-- usually someone finds out when they realize something isn’t working. We should write a program that connects to both IRC and Slack, sends messages (timestamped, so we can see latency) on both sides and checks that the other side received them. It would also check that create responds to the ping command. We could then have this export Prometheus metrics to alert on. This project would run on Kubernetes.
Interested in this project! How can I help? BTW, since I do not have much experience with either monitoring or Kubernetes, I might need some guidance.
@adityasri also expressed interest in this, though it’s OK for multiple people to be working on this, since it can be reasonably parallelized. Specifically, checking that ircbot is working can be seen as a completely separate project from slackbridge monitoring.
It’s not necessary to worry about Kubernetes at this stage-- we just want to write a Python program that performs these checks and prints to the console. Once that’s done, we can talk about how to integrate this with the Prometheus monitoring system and running in Kubernetes.
I originally conceived this as being a program that would send messages at some interval on Slack, and check that they get received on IRC, and vice-versa. However, I realize now that this is problematic since we run into the Slack message limit. Sending one message per minute is ~43k messages per month, and we send about 30k messages per month.
Instead, the best way to check this probably involves checking that real conversation messages are mirrored, instead of using a test channel with test messages. We don’t have to check the content of messages, we only have to check that messages from the slackbridge (which have IRC nicks that end with
-slack) correspond with messages sent from Slack, and vice-versa. A good start would be a proof of concept program that performs this check, connecting to both the IRC server and Slack API.
For testing ircbot, we can send as many test messages we want, as long as the testing channel isn’t mirrored to Slack. This could actually be a Kubernetes readiness probe that connects to IRC, joins the test channel, sends a test command (
create: ping), and checks that
create responds. For now, we don’t need to worry about the Kubernetes part, we should just write a script that performs this check.