Skip to content

Commit 41a5ca5

Browse files
authored
Merge pull request creativecommons#220 from creativecommons/nvmee-on-debian-on-aws
new blog post: NVMEe on Debian on AWS blog
2 parents 7e82ae0 + 3dad026 commit 41a5ca5

File tree

1 file changed

+120
-0
lines changed
  • content/blog/entries/2020-04-03-nvmee-on-debian-on-aws

1 file changed

+120
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
title: NVMEe on Debian on AWS
2+
---
3+
categories:
4+
open-source
5+
SaltStack
6+
---
7+
author: TimidRobot
8+
---
9+
pub_date: 2020-04-03
10+
---
11+
body:
12+
13+
14+
## Problem
15+
16+
The current Creative Commons infrastructure buildouts use Debian GNU/Linux AWS
17+
EC2 instances with EBS volumes. Depending on chance (or race conditions), the
18+
mapping of block devices can be different from one host to another or between
19+
reboots.
20+
21+
> *Occasionally, devices can respond to discovery in a different order in
22+
> subsequent instance starts, which causes the device name to change.*
23+
> ([Amazon EBS and NVMe on Linux Instances - Amazon Elastic Compute
24+
Cloud][nvme-ebs])
25+
26+
27+
## Our Solution
28+
29+
Modern Amazon Linux AMIs resolve this by providing a `udev` rule, but Debian
30+
GNU/Linux does not yet do this. To ensure our systems are configured correctly,
31+
At Creative Commons, we use the device specified during provisioning (ex.
32+
`/dev/xvdf`) to identify the correct NVMEe device. We then format it with a
33+
label that can be used mounting during subsequent reboots.
34+
35+
Thankfully, AWS documents the the device specified during provisioning (ex. `/dev/xvdf`):
36+
> *For Nitro-based instances, the block device mappings that are specified in
37+
> the Amazon EC2 console when you are attaching an EBS volume or during
38+
> AttachVolume or RunInstances API calls are captured in the vendor-specific
39+
> data field of the NVMe controller identification.*
40+
([Amazon EBS and NVMe on Linux Instances - Amazon Elastic Compute
41+
Cloud][nvme-ebs])
42+
43+
We use SaltStack ([`creativecommons/sre-salt-prime`][saltprime]) to:
44+
1. Install the `nvme-cli` package
45+
2. Use the `nvme` command to detect which `/dev/nvme?n?` contains *spec* (ex.
46+
`xvdf`) in the NVMe vendor specific data
47+
3. Create a symlink (ex. `/dev/xvdf -> /dev/nvme1n1`) so that SaltStack can use
48+
`/dev/xvdf` for the initial setup
49+
4. Perform the intial setup
50+
5. Delete the symlink since:
51+
1. The initial setup formatted the volume with a label that is used to mount
52+
the filesystem
53+
2. There is no guarantee the symlink will be accurate on subsequent reboots
54+
and it might cause confusion
55+
56+
The [`states/mount/init.sls`][mountstate] state includes a complex shell
57+
command (with Jinja2 variables) that loops through the NVMe devices and finds
58+
the correct one:
59+
```shell
60+
for n in /dev/nvme?n?
61+
do
62+
if nvme id-ctrl -v ${n} | grep -q '^0000:.*{{ spec_short }}'
63+
then
64+
ln -s ${n} {{ spec_long }}
65+
fi
66+
done
67+
```
68+
Example variable values:
69+
70+
| Jinja2 Variable | Example Value |
71+
| ------------------- | ------------- |
72+
| `{{ spec_short }}` | `xvdf` |
73+
| `{{ spec_long }}` | `/dev/xvdf` |
74+
75+
[nvme-ebs]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html "Amazon EBS and NVMe on Linux Instances - Amazon Elastic Compute Cloud"
76+
[saltprime]: https://github.com/creativecommons/sre-salt-prime "creativecommons/sre-salt-prime: Site Reliability Engineering / DevOps SaltStack configuration files"
77+
[mountstate]: https://github.com/creativecommons/sre-salt-prime/blob/master/states/mount/init.sls
78+
79+
80+
### Related Links
81+
82+
- [Cloud/AmazonEC2Image/Buster - Debian Wiki][busterec2]
83+
- [`nvme-cli` package details in Debian buster][nvme-cli]
84+
- Debian buster — Debian Manpages
85+
- [nvme(1) — nvme-cli][man-nvme]
86+
- [nvme-id-ctrl(1) — nvme-cli][man-nvme-cli]
87+
88+
[busterec2]: https://wiki.debian.org/Cloud/AmazonEC2Image/Buster "Cloud/AmazonEC2Image/Buster - Debian Wiki"
89+
[nvme-cli]: https://packages.debian.org/buster/nvme-cli "Debian -- Details of package nvme-cli in buster"
90+
[man-nvme]: https://manpages.debian.org/buster/nvme-cli/nvme.1.en.html
91+
[man-nvme-cli]: https://manpages.debian.org/buster/nvme-cli/nvme-id-ctrl.1.en.html
92+
93+
94+
## Other Solutions
95+
96+
While doing additional research for this blog post, I found additional
97+
solutions to the same problem. They're all good, but I apprecite the simplicity
98+
of a temporary symlink for setup versus maintaining custom udev rules (maybe I
99+
can help contribute a udev based solution to Debian or Debian's EC2 image). I
100+
can also easily imagine a more complex solution being a better fit if/when our
101+
infrastructure provisioining become more complex.
102+
103+
- [oogali/ebs-automatic-nvme-mapping][ebs-automatic]: Automatic mapping of EBS volumes via NVMe block devices to standard block device paths
104+
- udev rule that invokes a Bash script to create symlinks
105+
- CoreOS
106+
- udev rules that invokes a Bash script to create symlinks
107+
- [`udev/rules.d/90-cloud-storage.rules`][coreos-udev]
108+
- [`udev/bin/cloud_aws_ebs_nvme_id`][coreos-bin]
109+
- [AWS EBS NVMe udev rules][awsudevcopy]
110+
- udev rule that invokes a Pyton script to create symlinks
111+
- this is a copy as Amazon only provides access to the source of Amazon Linux
112+
from within an Amazon Linux AMI: *The yumdownloader --source command line
113+
tool provided in the Amazon Linux AMI enables viewing of source code inside
114+
of an Amazon EC2.* ([Amazon Linux AMI FAQs][awslinuxfaq])
115+
116+
[ebs-automatic]: https://github.com/oogali/ebs-automatic-nvme-mapping "oogali/ebs-automatic-nvme-mapping: Automatic mapping of EBS volumes via NVMe block devices to standard block device paths"
117+
[coreos-udev]: https://github.com/coreos/init/blob/master/udev/rules.d/90-cloud-storage.rules
118+
[coreos-bin]: https://github.com/coreos/init/blob/master/udev/bin/cloud_aws_ebs_nvme_id
119+
[awsudevcopy]: https://gist.github.com/jalaziz/c22c8464cb602bc2b8d0a339b013a9c4
120+
[awslinuxfaq]: https://aws.amazon.com/amazon-linux-ami/faqs/ "Amazon Linux AMI FAQs"

0 commit comments

Comments
 (0)