Consul 1.0, CockroachDB 1.1, krackattack and a good practice during incidents are this episode highlights.
Since this is the first one we are going to start small and cover just a little bit of news and an almost technical topic. The big news of the week is Krack Attack on WPA2. You can find plenty of info on it on krackattack.com, it’s a pretty scary one so you should keep an eye on the updates for your OSes and get your devices updated as soon as you can. The scariest part is that the issue is due to something in the standard itself so most (if not all) OSes are impacted !
Read more : KRACK Attacks: Breaking WPA2
In better news HashiCorp just released Consul 1.0 ! Consul is one great peace of software if you ask me and it’s quite unknown or not getting the recognition it should. As usual the changelog is fairly big and includes a couple of breaking changes. So have a look ! While I do think it’s great I also think it’s doing a bit too many things. I was trained by old timers of the Unix world so “one program to do one thing” is a bit of a motto for me. Consul is aimed to be a key value store, and a service registry with both a DNS layer and an HTTP API. That being said … For those who don’t know it also supports being resilient to failures of one or more servers (depending on your setup) and also supports multiple datacenters ! I have used Consul in some setups to serve as source of truth for configuration data and service registry. It works great to serve as “kind of central” exchange for such crucial data. If you haven’t worked with HashiCorp products yet you should have a look at Consul, Vault, Packer and Terraform they all have OpenSource versions available and the documentation is pretty neat. The release of 1.0 for both Consul and Packer is great news : it means breaking changes in the API and commands will not happen as often as they have been in the past; great news indeed.
Read more : HashiCorp Consul 1.0
Another software has been released under a big milestone : CockRoachDB 1.1. That Database has been blipping many times on my radar lately and I am now eager to test it properly with some of our projects. It sounds great on the paper and the release post cites a couple of big use cases that only support the claims of the company about their product. CockroachDB can be used directly with Postgresql client libraries, so it sounds quite easy to test for existing applications without changing much in your setup. Cockroach Labs
Also in the news : a pair of posts on DNSimple’s blog : one is about the introduction of the new CAA records and the other is about how they have used Hanami to power their new API within a Rails app. As usual there is great quality and content in the writings from Anthony Eden and Luca Guidi so go check both posts out (and the archives too). Talking about DNSimple : if you haven’t already go check How DNS works https: howdns.works it’s a good intro to How DNS works and will certainly clarify many things in your mind about this topic.
- Introducing CAA records - DNSimple Blog
- Why we ended up not using Rails for our new JSON API - DNSimple Blog
Let’s now talk about a good practice : communication during production incidents.
At Imfiny we help our customers get a better infrastructure to host their (usually Ruby) web applications and API backends in AWS. Yet we don’t really stop at “racking up” in the cloud we also help improve their technical culture by exposing ways to communicate better or tweaks in their tooling etc … This week I was reminded of an usual issue during incidents. Here “incident” refers to “the app is down” or “the customer experience is seriously degraded”. I am not going to cover all the aspects of the incident response here, just a key part of the communication aspect.
When something goes wrong your engineering team needs to be aware of it. What ever the medium used to get them aware of the issue is good at first (best is if they are already aware through their different layers of monitoring). I want to cover here what happens next within the team and the company.
It’s really really important that the team responsible to solve the issue gathers in one place (=a slack or irc channel) and keep the communication between them happening there. If there is anything discussed in other channels it’s going to be lost, so if it happens be sure to feedback the conclusion or minutes (a short run of what has been discussed) of that side chat into the main channel..
The main aspect is that by concentrating communication of people solving the issue in one place and organising efforts explicitly it avoids causing issues by either :
- not doing twice the same thing
- not doing things that could be conflicting
- etc … Not discussing what is being done, seen etc … with clear links to charts, logs excerpts etc … in that communication channel can be also quite dangerous with a loss of context for anyone called in later for help.
The Google SRE book covers quite well some examples of incident response and how a team can be organised to handle them. Have a look at it.on
Finishing up with a bit of What The Frack …
Scientist have figured out that by some still unknown ways and reasons big parts of DNA is shared between some quite remote species. The main example of the article is cows and reptiles but it also applies to other species. The article does cite a couple of possibilities given by the scientist and the current state of science on the topic but nothing is really certain at this point.
That’s it for this time, you will find the links in the episode description in order for you to dig into those stories. Have a good read and have fun.