Service Level Agreements (SLAs) are important for service providers of all kinds, including data protection services. Whether the SLA is part of a legal contract between companies, an internal agreement between departments or part of an overall service management model such as DPSM, the SLA provides both the service provider and consumer a clear agreement on what levels of service are expected.
The process starts with negotiation of the SLA, followed by monitoring and enforcement.
When negotiating the desired level of service you should make sure it reflects the way that backups are actually performed. For example, a typical backup strategy involves both incremental and full backups on different but related schedules. You should make sure that you distinguish between those when negotiating backup frequency.
Similarly, since doing backups during business hours can interfere with normal operations, you may want to agree to specific times during which the backups are run. You might want incremental backups to run on weekdays outside of business hours and full backups to run on weekends. The SLA should capture that detail so that if backups are run at the wrong time, it’s clear that the agreement has not been satisfied.
If a backup fails, can you still pass the SLA by running another backup job within the backup window? Or conversely, if a backup succeeds and then another one is run during the same window and fails, is the earlier backup sufficient or has the SLA now failed? Clarifying these details in advance can avoid disputes later.
After the SLA has been agreed to, it will need to be monitored. While it’s useful to know when the SLA failed, monitoring on a proactive basis can identify situations where an SLA might be in jeopardy of not being satisfied before it’s too late. And both the service provider and consumer should be able to see SLA status. While it’s obvious that you’ll want to monitor the percentage of satisfied SLAs, the unsatisfied SLAs actually fall into two categories: those that have failed and those that have not yet been satisfied. That might seem like mincing words but it is an important distinction.
When a backup is run and fails, the SLA for that backup has clearly failed (although it might be possible to pass by re-running the backup job). But what if a backup hasn’t run yet? The SLA hasn’t actually failed until the end of the requirement period, although the farther along you are in the requirement period (e.g., closer to the end of the month), the more likely it seems that the backup will not run, which will fail the SLA. So monitoring the SLAs that have not yet been attempted can help anticipate – and avoid – SLA failures.
Finally, SLAs are enforced by imposing a consequence of some kind when the SLA is not satisfied. These can range from financial penalties to service termination if the SLA is consistently missed. A best practice here is to make sure that the SLA is defined in a way that can be measured objectively and define specific thresholds for each consequence. But remember that the goal of an SLA is not to enforce penalties, but to make sure that both parties understand the expected quality level of the provided service.
by Bruce Leban, VP/Engineering, Bocada