First steps on CentOs7

30
Namespaces, Cgroups and systemd Firsts steps on CentOS 7 Marc Cortinas – Production Services - Webops - March 2015

Transcript of First steps on CentOs7

Page 1: First steps on CentOs7

Namespaces, Cgroups and systemd

Firsts steps on CentOS 7Marc Cortinas – Production Services -Webops - March 2015

Page 2: First steps on CentOs7

Why?

• Why? Motivations:

• 1. Trying to understanding why Lennart_Poettering was a “little bit” prepotent

• 2. Know the mainly changes on Linux in the next years

• 3. Learn more deeply CentOs7 after change the default distribution on Odigeo

• Fosdem Conferences - whats_new_in_systemd,_2015_edition

Page 3: First steps on CentOs7

Agenda¿Why colors?----------------------------------------- so far…Memory Spaces and IPC - dbusThe kernel - udevNamespacesVirtualizations------------------------------------------ more close…Init systems on UnixControl groups

• Overview• Subsystems or resource controllers• Demo• Commands

DbusAutoFS------------------------------------------ void main ()SystemD

• Motivations• Definition and features• Overview • Unit Files, Core components and libraries• Commands• Other Components:

1. Udev2. JournalD3. NetworkD4. ConsoleD5. LoginD6. TimedateD7. Systemd-Nspawn

Page 4: First steps on CentOs7

Memory Spaces and Inter Processcomunication

User Space – Memory space to run user processes• Only kernel processes can access a user space

• System prevents one process from interfering with another process

Kernel Space – Memory Space where kernel processes run• System call is the only way user has access

• Arguments from system call exported from user space to kernel space

• User process became kernel process when it executes system call

Communication Inter Process, not yet dBus

• Half-duplex UNIX Pipes best sysadmin friend

• Named Pipes, ack UNIX socket AF_UNIX

• SYS V IPC– IPC:

– Messages queues

– Semaphores

– Shared memory

Linux Kernel Archs - Amir Hossein

http://www.tldp.org/LDP/lpg/node7.html

Page 5: First steps on CentOs7

The KernelLinux Kernel Archs - Amir Hossein

Kernel: modules or sub-system that provides operating systems functions

ukernel: Includes code necessary to allow the system to proves major functionallity– Ipc

– Memory Management

– Process Management

– IO Management

Flexible, modular, easy to implement

Monotlhitic kernel: https://en.wikipedia.org/wiki/Monolithic_kernel- entire operating system is working in kernel space and is alone in supervisor mode

- defines a high-level virtual interface over computer hardware

- device drivers can be added to the kernel as modules ,or not? uDev..

Better Performance

Hybrid Kernel, nanokernel, picokernel, etc….

Page 6: First steps on CentOs7

Namespaces

Namespaces – lightweight process virtualization• Isolation: Enable a process (or group) to have different views of the system than

other processes• Much likes Zones in Solaris• No hypervisor layer• Only one system call added (setns())• Started in kernel 2.6.23 and finished in 3.8• 6 namespaces

– Mount namespaces (CLONE_NEWNS, Linux 2.4.19) isolate the set of filesystem mount points seen by a group of processes

– UTS namespaces (CLONE_NEWUTS, Linux 2.6.19) isolate two system identifiers—nodename and domainname

– IPC namespaces (CLONE_NEWIPC, Linux 2.6.19) isolate certain interprocess communication (IPC) resources, namely, System V IPC objects and (since Linux 2.6.30) POSIX message queues

– PID namespaces (CLONE_NEWPID, Linux 2.6.24) isolate the process ID number space– Network namespaces (CLONE_NEWNET, started in Linux 2.6.24-2.6.29) provide isolation of the system

resources associated with networking– User namespaces (CLONE_NEWUSER, started in Linux 2.6.23 and completed in Linux 3.8) isolate the user

and group ID number spaces

http://lwn.net/Articles/531114/http://www.haifux.org/lectures/299/netLec7.pdf

Page 7: First steps on CentOs7

Virtualization

HW Virtualization

• Kvm

• Xen

• Vmware

OS Virtualization

• LXC= Namespaces + Cgroups

• DOCKER

• BSD Jails

• Solaris Zones

Page 8: First steps on CentOs7

Init systems on Unix

LINKS:

http://en.wikipedia.org/wiki/Init

OS Init System Family

MacOSX LaunchD (from 10.5.1) BSD

NetBSD SysVinit BSD

OpenBSD SysVinit BSD

FreeBSD SysVinit BSD

Debian Upstart/SystemD/SysVinit Linux

Ubuntu Upstart Linux

RHEL6/CentOS6 SysVinit + LSB Linux

RHEL7/CentOS7 SystemD Linux

Solaris SMF Solaris

Page 9: First steps on CentOs7

Cgroups

• Project was born in Google on 2006

• It’s called process container.

• Merged in kernel into release 2.6.24

1) an upstream kernel feature that allows system resources to be partitioned/divided up amongst different processes, or a group of processes.

2) user-space tools which handle kernel control groups mechanism

Cgroup - set of tasks with a set of parameters for one or more subsystems

Subsystem - "resource controller" that schedules a resource or applies per-cgroup limits

Hierarchy - a set of cgroups arranged in a tree

LINKS:

https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt

https://www.youtube.com/watch?v=81j1WF5xEZc

http://fedoraproject.org/wiki/Features/ControlGroups

http://docs.fedoraproject.org/en-US/Fedora/16/html-single/Resource_Management_Guide/index.html

Page 10: First steps on CentOs7

Cgroups – Subsystems OR Resource Controllers

• blkio — sets limits on input/output access to and from block devices;

• cpu — uses the CPU scheduler to provide cgroup tasks an access to the CPU. It is mounted together with the cpuacct controller on the same mount;

• cpuacct — creates automatic reports on CPU resources used by tasks in a cgroup. It is mounted together with the cpu controller on the same mount;

• cpuset — assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup;

• devices — allows or denies access to devices for tasks in a cgroup;

• freezer — suspends or resumes tasks in a cgroup;

• memory — sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks;

• net_cls — tags network packets with a class identifier (classid) that allows the Linux traffic controller (the tc command) to identify packets originating from a particular cgroup task;

• perf_event — enables monitoring cgroups with the perf tool;

• hugetlb — allows to use virtual memory pages of large sizes, and to enforce resource limits on these pages.

#yum install kernel-doc and read /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups/

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cgroups

Page 12: First steps on CentOs7

Cgroups – Commands on libcgrouptools

Description commandinstallation of packages tool to manage kernel API yum install libcgroup libcgroup-tools

creates persistent file snapshotting the currently hierarchy on runtime cgsnapshot -f /etc/cgconfig.conf

listing all available hierarchies along with their current mount points lssubsys -am

mount net_prio crontoller to a virtual file system mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio

unmount net_prio crontoller to a virtual file system umount /sys/fs/cgroup/controller_name

create transient cgroups in hierarchies, alternative 1 cgcreate -t uid:gid -a uid:gid -g controllers:path

create transient cgroups in hierarchies, alternative 2 mkdir /sys/fs/cgroup/net_prio/lab1/group1

remove cgroups cgdelete [-r] net_prio:/test-subgroup

set controller parameters by running cgset -r parameter=value path_to_cgroup

copy the parameters of one cgroup into another cgset --copy-from path_to_source_cgroup path_to_target_cgroup

Set controller parameters permanently vi /etc/cgconfig.conf ; systemctl stop cgconfig ; systemctl start cgconfig

Move a process into a cgroup cgclassify -g controllers:path_to_cgroup pidlist

Launch processes in a manually created cgroup cgexec -g controllers:path_to_cgroup command arguments

find the controllers that are available in your kernel cat /proc/cgroups

find the mount points of particular subsystem to find the mount points of particular subsystem

list the cgroups lscgroup

To restrict the output to a specific hierarchy lscgroup cpuset:adminusers

display the parameters of specific cgroups cgget -r parameter list_of_cgroups

Page 13: First steps on CentOs7

Now dbus, next kdbus!

Goal: Improvements for Inter Process Communication

Before dbus: Pipe, Named pipe, queue messages, semaphores, shared memory.

Dbus: method call transactions, signals, properties, OO, broadcasting, introspection, policy, activation, synchronization, type-safe marshalling, security, monitoring, exposes APIs, …. High level concept!!!!

Dbus limitation: 10 copies + 4 complete validations + 4 context switches in duplex full transaction, suitable for control but not payload,

Kdbus improvements: 2 or fewer copies + 2 validations + 2 context switches and more

Dbus Arch:

• Libdbus - library that allows two applications to connect to each other and exchange messages

• dbus-daemon - a message-bus daemon executable, built on libdbus

• wrapper libraries based on particular application frameworks

Linx Conf - Lennart Pottering - Dbus

DBus Freedesktop Project

Linux documentation project (tldp) - IPC

Page 14: First steps on CentOs7

AutoFS

• What’s autoFS?

automount is a program for automatically mounting directories on an as-needed basis.

• Why autoFS in systemd?

Due to speed up boot process improving parallelization of startup process and approach queue messaging into kernel until target proccess has been properly loaded.

RHEL 7 - Documentation

GIT repository in kernel code for autofs

Ubuntu help for autoFS

Man page for autofS

Page 15: First steps on CentOs7

Motivations for SystemD

• Decrease the time used to init system with SysV solving dependencies (launchD)

• Bash language used to manage daemons, slow language and it could change base on environment vars. (migrate to C)

• System need mount devices first before daemons (autofs)

• Keep track process after parent die (cgroups)

• Starts ordered and resolts dependencies (Require|Wants)

• Start only the services required on-demand (by default)

PMO systemD: Lennart Poettering

LINKS:

http://0pointer.de/blog/projects/systemd.html

http://0pointer.de/blog/projects/systemd-update.html

http://0pointer.de/blog/projects/systemd-update-2.html

http://0pointer.de/blog/projects/systemd-update-3.html

Page 16: First steps on CentOs7

What’s systemD?1. Boot system designed to start up the system more efficiently

1. Parallelization of start process, using sockets (AF_UNIX/AF_INET) and D-bus.

2. Suite of programs to manage daemons trying to avoid use BASH scripts with environment variables dependency.

2. Daemon to administration system designed exclusively from API of kernel Linux

3. First process started on userspace

4. Framework to manage services and daemons dependencies

5. Daemon process running in background, added sufix -d-

6. Uses cgroups and fanotify to manage resources

7. Use AutoFS to avoid queue for any “fopen” call request

8. Keep track of process due to cgroups

Page 17: First steps on CentOs7

SystemD Overview

Project LINKS:

Repository: http://cgit.freedesktop.org/systemd/systemd

Documentation: https://fedoraproject.org/wiki/Systemd

Page 18: First steps on CentOs7

Unit Files, Core components and libraries

1. Unit file: Configuration file trying replace traditional startup bash scripts.

Service: A process or a group of processes based 1 cfg fileScope: group - A group of externally created processes, registered with systemdSlice: skel - A group of hierarchically organized units. Slices do not contain processes, they organize a hierarchy in which scopes and services are placed.(Default slices: -.slice; system.slice ; user.slice ;machine.slice)

1. Components• systemd is a system and service manager for Linux operating systems.• systemctl may be used to introspect and control the state of the systemd system and

service manager.• systemd-analyze may be used to determine system boot-up performance statistics and

retrieve other state and tracing information from the system and service manager.

service socket device mount

automount swap target path

timer snapshot slice scope

Page 19: First steps on CentOs7

SystemD commandsSYSV command OR Description SystemD commandinit 3 systemctl isolate multi-user.target

service httpd [command] systemctl [command] httpd

ls /etc/rc.d/init.d/ systemctl list-units --all

chkconfig httpd [on|off]D: creates/remove a unit file in the /usr/lib/systemd/system/ directory (Persistent cgroups)

systemctl [enable|disable] httpd

D: run the top utility in a service unit in a new slice called test (Transcient) systemd-run --unit=toptest --slice=test top -b

D: Stop the unit non-gracefully signal systemctl kill name.service --kill-who=PID,... --signal=signal

chkconfig frobozz --add systemctl daemon-reload

runlevel systemctl list-units --type=target

D: limit the CPU and memory usage of httpd.service systemctl set-property httpd.service CPUShares=600 MemoryLimit=500M

D: limit the CPU and memory usage of httpd.service, temporary systemctl set-property --runtime httpd.service CPUShares=600

D: Recursively show control group contents systemd-cgls

D: show control group for resource systemd-cgls memory

D: Add cgroup info in ps psc='ps xawf -eo pid,user,cgroup,args'

D: List dependencies in target systemctl show -p "Wants" multi-user.target

D: Analyze system boot-up performance systemd-analyze

D: Show top control groups by their resource usage systemd-cgtop

D: Run programs in transient scope or service units systemd-run

D: Control the systemd machine manager (LXC or VM) Machinectl

D: show cgroups hierarchy attached to a process cat proc/PID/cgroup

Page 20: First steps on CentOs7

Other components on systemD

• Udevd: is a device manager for the Linux kernel, which handles the /dev directory and all user space actions when adding/removing devices

• Journald: systemd-journald is a daemon responsible for event logging

• Consoled: systemd-consoled provides a user console daemon, intending to replace the Linux kernel's virtual terminal

• Logind: systemd-logind is a daemon that manages user logins and seats in various ways

• Networkd: networkd allows systemd to perform various networking configurations, features such as DHCP server or VXLAN support

• Timedated: systemd-timedated is a daemon that can be used to control time-related settings, such as the system time, system time zone, or selection between UTC and local time zone system clock

• Systemd-nspawn: Spawn a namespace container for debugging, testing and building

Page 21: First steps on CentOs7

Udev – Device Manager

Device Manager for Linux kernel, project was born on November 2003, succesor of devfsd.udev was introduced in Linux 2.5. April 2012, udev's codebase was merged into the systemd source tree. In October 2012, Linus Torvalds criticized Kay Sievers' approach of udev maintenance and bugs related to firmware loading: Not because firmware loading cannot be done in user space. But simply because udevmaintenance since Greg gave it up has gone downhill.

Goal: Manage device nodes mapping in /dev directory have been a static set of filesUdev arch:

• libudev that allows access to device information; it was incorporated into the systemd software bundle• User space daemon udevd that manages the virtual /dev• Administrative command-line utility udevadm for diagnostics.

Udev Features:• Runs in userpace• Dynamicalle create/remove device files• Provides consistent naming• Provides user-space API• Kernel 2.6, added sysfs filesystem in /sys with all infromation about devices/filesystmes• /etc/udev/rules.d/*.rules define rules post-actions when kernel detect some device and info is

populated in sysfs

Device Manager Tutorial - udevadm

Page 22: First steps on CentOs7

Journald before rsyslog

• http://www.freedesktop.org/software/systemd/man/journald.conf.html

• http://0pointer.de/blog/projects/journalctl.html

Journal commandjournalctl

journalctl -f

journalctl -b

journalctl -b -p err

journalctl --since=yesterday

journalctl --since=2012-10-15 --until="2011-10-16 23:59:59"

journalctl -u httpd --since=00:00 --until=9:30

journalctl /dev/sdc

journalctl /usr/sbin/vpnc /usr/sbin/dhclient

journalctl -o verbose –n

journalctl _UID=70

journalctl _UID=70 _UID=71

journalctl _HOSTNAME=epsilon _COMM=avahi-daemon

journalctl _HOSTNAME=theta _UID=70 + _HOSTNAME=epsilon

journalctl -F _SYSTEMD_UNIT

journalctl _SE<TAB>

journalctl _SELINUX_CONTEXT=<TAB><TAB>

journalctl --output json

/etc/systemd/journald.conf

Page 23: First steps on CentOs7

NetworkD – not in CentOS7 – added onCoreOS

Added on systemd in v209, 20th february 2014. adding dhcp server or VxLAN supporton July 2014 into release v215 systemd.• Main goal: allows systemd to perform various networking configurations• Cfg Path: /etc/systemd/network• Enable:

– systemctl enable systemd-networkd.service– systemctl start systemd-networkd.service

• CFG type files:• .link files: networkd performs basic settings on network devices (name of the network

interface, MTU, Wake on LAN, modified MAC address, configuration file for systemd-udev) • network files: cfg file for systemd-netword, same syntaxi .link files, match and network tag• .netdev files: Even if you have to create virtual network devices, look no further than networkd

bridges, bonded interfaces and VLANS

Tip: Learn how to linux add predictible network name interfaces

LINKs:Linux Magazine Example ConfigurationsCoreOs Documentation Example configurationsNetworkd Project Freedesktop

Page 24: First steps on CentOs7

ConsoleD 1/2

The current status is….• Linux Console (Linus Trobald 1991), system console internal to the kernel, it’s a device I/O all kernel

messages and allow login in single user mode. There are 2 implementations1. Text mode – Compatible with PC systems with CGA, EGA, MDA, VGA = LEGACY (array 2D display)2. Framebuffer – (fbdev) is a graphic hardware-independent abstraction layer, used in default modern linux

distributions

• Virtual Console, multiplex linux console in a several (7) consoles using VT system, running in kernelspace. Implementations:

• Teminal Emulator runs in user space and let load graphical environments, GNOME, KDE, etc…

Systemd-consoled development wants …• Released inside systemd v217, October 2014, git commit here• Main goal: systemd-consoled provides a user console daemon, intending to replace the Linux

kernel's virtual terminal, running in userpace• Uses kmscon, project borned on Nov 2011. kmscon = KMS (Kernel-Mode-Setting, Kernel API performs

mode-settings) + DRM (Direct-Rendering-Manager of kernel to acces graphical devices)

LINKs:Wikipedia - Linux consoleWiki Freedesktop.org – kmscon20 years of CONFIG_VT, according linux-kernel VT

Page 25: First steps on CentOs7

ConsoleD 2/2

Page 26: First steps on CentOs7

LoginD• Logind was merged inside systemd on v30 released in 1 august 2011

• What has logind build for:• Keeping track of users and sessions, their processes and their idle state• Providing PolicyKit-based access for users to operations such as system shutdown or sleep• Implementing a shutdown/sleep inhibition logic for applications• Handling of power/sleep hardware keys• Multi-seat management• Session switch management• Device access management for users• Automatic spawning of text logins (gettys) on virtual console activation and user runtime directory

management

• User sessions are registered in logind via the pam_systemd(8) PAM module. (pam_systemd.so)– - creates/destroy /run/user/$USER – - $XDG_SESSION_ID (1 Id for each user)– - add/delete systemd scope copyying skel from user.slice

LINKS:Wiki freedesktop.org – multiseatWiki freedesktop.org – logindmanpage freedesktop.org - pam_systemd

Page 27: First steps on CentOs7

TimedateD

• Timedated was merged inside systemd on v30 released in 1 august 2011

• Goal: daemon that can be used to control time-related settings, such as the system time, system time zone, or selection between UTC and local time zone system clock. It is accessible through D-Bus.• The system time• The system timezone• A boolean controlling whether the system RTC is in local or UTC

timezone

• Whether the systemd-timesyncd.service(8) (NTP) services is enabled/started or disabled/stopped. See systemd-timedated.service(8) for more information.

• Wiki freedesktop.org - timedated

Page 28: First steps on CentOs7

systemd-nspawn and machinectl

• Systemd-nspawn is chroot on steroids

• Goal - Spawn a minimal namespace container for debugging, testing and building

# yum –releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

# systemd-nspawn -bD /srv/mycontainer/

[root@fedora20 ~]# machinectl

MACHINE CONTAINER SERVICE

mycontainer container nspawn

LINKS

Lennart Poettering, Linux Conf 2013

Wiki - fedoraproject.org – SystemdLightweightContainers

Wiki - freedesktop.org - VirtualizedTesting

Page 29: First steps on CentOs7

What’s new in SystemD? 2015…

Main changes announced in FOSDEM• new tool systemd-hwdb for querying the hardware metadata database , decoupled from the old libudev library

• machinectl gained support for two new "copy-from" and "copy-to" commands for copying files from a running container

• machinectl gained support for a new "bind" command to bind mount host directories into local containers

• Routes configured with networkd may now be assigned a scope in .network files

• networkd may now configure IPv6 link-local addressing in addition to IPv4 link-local addressing

• The IPv6 "token" for use in SLAAC may now be configured for each .network interface in networkd.

• When the user presses Ctrl-Alt-Del more than 7x within 2s an immediate reboot is triggered

• networkd gained support for creating "ipvlan", "gretap","ip6gre", "ip6gretap" and "ip6tnl" network devices. Moreover, gained support for collecting LLDP network announcements

• systemd-nspawn's --image= option is now capable of dissecting and booting MBR and GPT disk images, This allows running cloud images from major distributions directly with systemd-nspawn, without modification.

• networkd .network files gained support for configuring per-link IPv4/IPv6 packet forwarding as well as IPv4 masquerading.

• The default TERM variable to use for units connected changes to vt220 rather than vt102

• systemd now provides a way to store file descriptors per-service in PID 1.This is useful for daemons to ensure that fds they require are not lost during a daemon restart

• The directory /var/lib/containers/ has been deprecated and been replaced by /var/lib/machines

CONCLUSIONS: They are working on improving systemd-nspawn (with BTFRS) and networkD, mainly.

Timeline last code releases

Systemd v218 – 11 dec 2014 - http://cgit.freedesktop.org/systemd/systemd/tag/?id=v218

FOSDEM 2015 – 1 Feb 2015 - https://fosdem.org/2015/schedule/event/whats_new_in_systemd,_2015_edition/

maybe, someday,fosdem video will be gained in http://video.fosdem.org/2015/devroom-distributions/

Systemd v219 – 16 Feb 2015 - http://cgit.freedesktop.org/systemd/systemd/tag/?id=v219

Linux 4.0 – 22 Feb 2015 - http://lkml.iu.edu/hypermail/linux/kernel/1502.2/04059.html

Page 30: First steps on CentOs7

• Thanks... Questions?

• Tips:

1. Wiki freedesktop.org TipsAndTricks

2. Trick to Know systemd version –

fedora20 ~]# /usr/bin/timedatectl --version

systemd 208