Understanding the value of Uptime

laptop command line dark siren

Companies creating web applications spend most of their time and energy on building the product: identifying the needs, building storyboards, designing, developing, testing, marketing and pushing to production. Then the Operations team becomes responsible for the work the whole company puts together.

That’s obviously a very important and critical responsibility, Operations teams have to juggle between stability, by making sure that the product is always available, and agility, to support product evolution (described in the concept of Bimodal IT designed by Gartner). Yet no schools are training students for these responsibilities and it is very challenging for the industry to find skilled professionals.

That’s why Holberton School has a system administration and DevOps track, training students to deploy, monitor, scale and be responsible for the uptime of the product that they built. Incident management becomes a mandatory tool. And that’s why we decided to partner with PagerDuty —  to give students access to the best tool to support their uptime.

Students who are using PagerDuty have fast incident escalation, which helps them to have faster incident resolution. For every minute that their website is down, they lose points, which translate very well to the real world points, which are dollars. Holberton School is a full-stack school, students work on all aspects of a web product: designing, coding, testing, shipping, maintaining. Whether they end up working as a full-stack software engineer or in a more specialized role, they will understand the implication at every level of the stack and will make sure that:

  • it fits well with the others parts
  • It will be easy to handle for co-workers

A great incident management tool allows engineers to engage not only the Operations team but other teams, in maintaining the product uptime and performance. This by allowing flexible schedules and escalations policies. On-call developers should have better insight on application-related issues, act faster, which reduces the time to resolution and helps to build a culture of trust and transparency. For junior staff it allows them to quickly gain knowledge by exposing them to every level of the escalation path, at first by shadowing more experienced staff and by then putting them in first line where they can quickly escalate if they are uncomfortable with the issue.

“Uptime is the number one goal of any SRE/DevOps/System administrator team,” said Casey Brown, manager, Site Reliability Engineering at LinkedIn. “Nowadays, well established companies like LinkedIn, Facebook and Google are also expecting developers to be fully responsible for their code in production. Having production in mind and being ready for it is something that every good developer must have, yet no school prepares students to that.”

By following the sysadmin/DevOps track at Holberton School and using PagerDuty, students will be able to deliver on agility, performance, and uptime during their time at the school and career.