Infrastructure as code might be literally impossible part 2
Transcript of Infrastructure as code might be literally impossible part 2
infrastructure as code might be literally impossible
part 2
joe damato packagecloud.io
hi, i’m joei like computers
i once had a blog called timetobleed.com
@joedamato
packagecloud.io@packagecloudio
follow along
blog.packagecloud.io
hi
disclaimer
infrastructure as code might be impossible because nothing works.
cognitive load
too much stuff
coping strategies
coping w cognitive load
copy & paste configs
stackoverflow
BTWThis is actually part of another talk I’m working on called
Programmers should get paid more & work less
anw
the problem is so pronounced, that in some cases it’s impossible to do seemingly simple tasks
some examples then some thoughts.
Today’s cool stories1. SSL 2. APT 3. Linux Networking 4. Linux Threading (maybe) 5. Python packaging (maybe)
SSL
SSL is important
agreed?
Ubuntu & Debian
don’t agree
SSL doesn't work on Debian
/ Ubuntu
anw
LOL gnutls, who cares?
apt-get!git!
curl!ngIRCd!
…
well, actually you should use
OpenSSL
I like rabbits.
* 3. All advertising materials mentioning features or use of this * software must display the following… !
* 6. Redistributions of any form whatsoever must retain the following…
OpenSSL says…
GPL says
6. ….You may not impose any further restrictions on the recipients' exercise of the rights granted herein.
These two licenses are not compatible.
in other words
software licenses force you to use a particular SSL library with a very painful bug.
greetings
(not sayin that OpenSSL is
bug free)
(but, am sayin NSS and gnutls have less mindshare)
btw
(hi)
OK but I don’t care about SSL,
I use GPG.
NO.!plz stop.
anw
APT
file compression is important
agreed?
Ubuntu & Debian
don’t agree
(more about hash sum mismatch
later)
in other words
APT bug when decompressing XZ files makes it impossible to install software reliably
this is unfortunate due to the slow release cycle of Debian/Ubuntu updates
“SO easy, that type of work can be done over the weekend”
-o Acquire::CompressionTypes::Order::=gz
… OK … hopefully that repo has gzip’d metadata or it’s gonna be a real short
trip
anw
hash sum mismatch
have you seen it?
do you know what it
means?
do you know why it
happens?
what it means
happens all the time…
how could that happen?
one of (at least) 3 ways
1. stale cache between client/server 2. XZ decompression bug 3. apt race condition
how to avoid each1. better HTTP headers… or use SSL…. but like gnutls ?? lol
2. don’t generate XZ archives 3. ?????? race condition ??????
APT race
how it happens
1. Download + cache Release file 2. repo owner updates repo 3. Download Packages files 4. Compare checksums from the (stale) Release file against Packages file
5. hash sum mismatch
this means…
it is impossible to
1. update your repository without breaking clients
2. generate consistent mirrors of other repositories
!!!!!!this is bad!
!!!!!
but i’ve done all of these before and never had a
problem?
congrats you got lucky!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
so, wait, joe, are you saying that APT metadata is inherently racy?
yes!
and ubuntu agrees
OK so APT repos and the tools you use to generate them are fundamentally racy
so now what?
Acquire-by-hash
Acquire-by-hash• Mechanism for downloading metadata by it’s
hash sum • Server should keep “a few” older copies of
metadata around • Prevents the race condition from happening
Acquire-by-hash• Added in APT 1.2.0 • Ubuntu Xenial and newer • Debian Stretch and newer • not supported by reprepro!• not supported by aptly
only one way to get working, consistent, not
racy APT metadata
use packagecloud.io
Linux Networking
Full networking writeup
literally 90 pages
literally everything about linux networking
literally available here: http://bit.ly/linux-networking
summary
[random os] has a better/faster/leaner/whatever networking stack
than linux
lots and lots and lots of copy
paste coping
question
an answer
an other answer
yet an other answer
and on and on and on and on…
no one even knows what these
values mean
(p. much no one knows what these
values mean)
example
netdev_max_backlog
similarish explanations
what does it actually mean
tho?
If
if
• driver calls netif_receive_skb (likely) • and RPS is disabled (default)
Then
it doesn't do anything.
literally nothing. it’s not even checked.
if
• driver calls netif_rx (unlikely) • or RPS is enabled (rare for most ppl)
then data is queued to a backlog length
limited by netdev_max_backlog
coping strategies abound
here’s a coping strategy i think
is fine
curl | sudo bash
you aren’t reading all of the chef/puppet
source so what’s the difference?
(hi, be mad)
too damn hard to understand how a computer works
on that note…
Linux Threading
“threads are slow”
“context switches are expensive/
slow/…”
a 7 year old bugfix for XFree86 broke threads on Linux
Story Time
TLS segment selectors XFree86 Modules
Story Time
mmap MAP_32BIT
June 29, 2001
“This adds a new mmap flag to force mappings into the low
32bit address space. Useful e.g. for XFree86′s ELF loader or linuxthreads’ thread
local data structures.”
Nov 11, 2002
“532. Fixed module loader to map memory in the low 32bit address space on x86-64 (Egbert Eich).”
Story Time
ELF small code model 31bit mapping
Jan 4, 2003
“Make MAP_32BIT for 64bit processes only map in the first 31bit,
because it is usually used to map small model code. This fixes the X server crashes. Some cleanups in
this area.”
So: MAP_32BIT is actually MAP_31BIT
Mar 4, 2003
/* For Linux/x86-64 we have one extra requirement: the stack must be in the first 4GB. Otherwise the segment register base address is not wide enough. */
glibc
May 9, 2003
/* We prefer to have the stack allocated in the low 4GB since this allows faster context switches. */
glibc
justification for MAP_32BIT in glibc changed
Aug 13, 2008
“Pardo” report
https://lkml.org/lkml/2008/8/12/423
“Pardo” report
“Pardo” reportpardo filled the 31bit 1GB space
with thread stacks. !
subsequent allocations were doing a linear search for a free address
on the kernel side.
MAP_STACK is added.
(it does nothing)
June 29, 2001: MAP_32BIT added to kernel
Nov 11, 2002: XFree86 updated to use MAP_32BIT
time or w/e
Jan 4, 2003: MAP_32BIT updated for ELF small code
Feb 12, 2003: wrmsr slowness reported
Mar 4, 2003: MAP_32BIT added to glibc
May 9, 2003: MAP_32BIT retry added to glibc
Aug 13, 2008:“Pardo” reportAug 13, 2008: MAP_STACKAug 15, 2008: glibc updated
a few questions
how did we get here?
question
legacy code backward compat
an thought
free open source doesn’t exist
an thought
why so much copy-paste
coping?
question
necessary complexity
an thought
lack of timean thought
an aside:
but, why is there no time?
i don’t know, but could it be that
efficiency gains are captured by
management instead of engineering?
or could it be that…
working software systems aren’t
economically viable for 99% of companies?
hence why no one found that threading
bug for 5 years?
working software given complex requirements is
expensive
how much did you pay for your
an Linux?
?packagecloud.io@packagecloudio
Python Packaging
3 types of python packages
1. source distributions (sdists) 2. eggs 3. wheels
some …interesting… behavior with [-_.]
setup(name='hi_automacon', … !
!
setup(name=‘hi-automacon', … !
!
setup(name=‘hi.automacon', …
what do you think happens?
“There are only two hard things in Computer
Science: cache invalidation and naming
things.”
(literally unknown)
hi_automacon
setup.py: hi_automacon metadata: hi-automacon sdist: hi_automacon-1.0.tar.gz egg: hi_automacon-1.0-py2.7.egg wheel: hi_automacon-1.0-py2-none-any.whl
OK SO: wheels and eggs leave “_” in the filename but !translate it in the metadata to “-“ !
…. but not sdists
OK OK OK OK OK OK OK OK
thats fine not a big deal
weekend work and all that
hi-automacon
setup.py: hi-automacon metadata: hi-automacon sdist: hi-automacon-1.0.tar.gz egg: hi_automacon-1.0-py2.7.egg wheel: hi_automacon-1.0-py2-none-any.whl
OK SO: wheels and eggs translate “-“ to “_” in the filename but !leave it in the metadata !
…. but not sdists
package name file name metadata
dash underscore dash
underscore underscore dash
wheels and eggs only
sdists are WYSIWYG affff
hi.automacon
weird
• everything has ‘.’ in it • file names and metadata for all
package types
let’s curl against PyPI….
django-allauth
curl https://pypi.python.org/simple/
django-allauth/
HTTP 200
OK OK OK OK OK OK OK OK
curl https://pypi.python.org/simple/
django_allauth/
HTTP 302
< Location: /simple/ django-allauth
OK OK OK OK OK OK OK OK
curl https://pypi.python.org/simple/
django.allauth/
HTTP 200
(hi)
and now what happens if we try mixing the case?
lol maybe next time.