397 liens privés
"""
Abstract: ""#monitoringsucks.""; I can appreciate the sentiment. I used to use Nagios too! However, I can't agree that monitoring sucks. Monitoring is awesome! We observe our systems to understand their behaviour. We do this in various ways like reading logs or taking measurements and, more recently, storing them in a timeseries database such as collectd or graphite. However, the standard practice for alerting is still to check the measurement at the time that it is taken and it is this ""check script"" model of monitoring that is long due for an overhaul. In this talk, I'll start over from first principles: what do we want monitoring to do for us? I'll deconstruct the ""check script"" and rebase its essentials on the humble timeseries. I'll demonstrate simple aggregation and apply some maths and stats to show how monitoring can scale to cluster size without increasing maintenance costs. With worked examples based on real-world situations, you'll learn techniques that you can use to make your monitoring systems more useful.
"""
ce qu'il dit à 10:13 (https://youtu.be/eq4CnIzw-pE?t=613) par rapport aux alertes filesystem est particulièrement intéressant.
Sans aller jusqu'à faire des maths/stats comme le fait le mec avec 'R' en faisant des dérivés, on peut utiliser des outils de monitoring comme Riemann: http://riemann.io/api/riemann.streams.html