edX is a massive open online course (MOOC) provider based out of Boston, MA and it was the place of my internship for the summer as I start my career as a software engineer. I opted to be part of the Site Reliability Engineering team for the summer and suffice to say, I learned A LOT during my 3 months with them.

The interest in DevOps and SRE

Back when I was still at Flatiron School, I had a chit chat with one of my coaches about career aspirations. I mused that my time in the game industry and NYC DOE helped me appreciate QA and efficient organizational structure. If an organization can work out the kinks in their products or processes proactively, quickly and efficiently, it would ultimately lead to better user experiences and an overall good product. I noted that if I could somehow combine my coding skills with this interest in structure, that’d be something to look forward to. So my coach planted the seed in my mind to explore DevOps sometime in the future.

Fortunately, a few months later, the opportunity presented itself to do just that with an internship at edX. I had a choice between being a Front end developer or a site reliability engineer (SRE), so I figured, why not?

The google cloud blog has a nice post explaining the relationship between SRE and DevOps. As the post notes, “SRE prescribes how to succeed in various DevOps areas”.

My assignment for the Summer

My task for the summer was the instrumentation of GoCD, a continuous integration/deployment (CI/CD) service. That meant building out monitors of specific metrics from the GoCD server and alerting on them whenever certain thresholds were crossed.

Some of the major technologies used to achieve that over the summer included the following:

Most of the work involved writing python scripts that would be run as a Kubernetes CronJob, push the metrics collect to CloudWatch, and have alarms set up through Terraform. You can see some of the work below:

Agent Status metrics
Alarming on Job Queue’s through Terraform and CloudWatch

In addition to the work with these scripts, I also set up New Relic’s Containerized Private Minion to monitor server uptime.

Ping monitor through New Relics CPM

Takeaways

2020 has been a weird year, to say the least. I graduated from boot camp, still managed to find some work, learned how to work remotely, and find success in the work that I was doing. I didn’t know what to expect with work from home being the norm, but it’s not as bad as I thought it was. It just took some adjusting.

I’m really glad I got the exposure to some of the work that’s done on the operations side of things and hope to continue to build on that so that I can ultimately be a more well-rounded developer.

Published by Nicholas Moy

Software developer residing in New York.