Tweets by @markusgattol |
This page is part of my virtualization context i.e. from my point of view talking/doing virtualization includes
Introduction
Why using OpenVZ?This section is about why virtualization makes sense and in particular why I think that especially OpenVZ makes more sense than other solutions out there.
Why using Virtualization at all?First of all let us start with a little journey into what problems where ahead of those guys a three or so decades ago which finally lead to what we are using on a daily basis nowadays... Old ProblemsRobert P. Goldberg describes the then state of things in his 1974 paper titled Survey of Virtual Machines Research. He says: Virtual machine systems were originally developed to correct some of the shortcomings of the typical third generation architectures and multi-programming operating systems - e.g., OS/360. As he points out, such systems had a dual-state hardware organization — a privileged and a non-privileged mode, something that is prevalent today as well. In privileged mode all instructions are available to software, whereas in non-privileged mode they are not. The OS provides a small resident program called the privileged software nucleus (analogous to the kernel). User programs could execute the non-privileged hardware instructions or make supervisory calls - e.g. SVC's (analogous to system calls) to the privileged software nucleus in order to have privileged functions - e.g. I/O performed on their behalf. While this works fine for many purposes, there are fundamental shortcomings with this approach. Let us consider a few:
A Loose DefinitionWe shall shortly enumerate several more reasons for needing virtualization, before which let us clarify what we mean by the term. Let us define virtualization in as all-encompassing a manner as possible for the purpose of this discussion... Virtualization is a framework or methodology of dividing the resources of a computer into multiple execution environments, by applying one or more concepts or technologies such as hardware and software partitioning, time-sharing, partial or complete machine simulation, emulation, quality of service, and many others. We shall note that this definition is rather loose, and includes concepts such as quality of service, which, even though being a separate field of study, is often used alongside virtualization. Often, such technologies come together in intricate ways to form interesting systems, one of whose properties is virtualization. In other words, the concept of virtualization is related to, or more appropriately synergistic with various paradigms. Let us consider the multi-programming paradigm — applications on Unix-like systems (actually almost all modern systems) run within a virtual machine model of some kind.
Even though we defined it as such, the term virtualization is not always used to imply partitioning i.e. breaking something down into multiple entities. Here is an example of its different (intuitively opposite) connotation: We can take n disks, and make them appear as one (logical) disk through a virtualization layer (at this point LVM (Logical Volume Manager) comes into mind). Grid computing enables the virtualization (ad hoc provisioning, on-demand deployment, decentralized, etc.) of distributed computing: IT resources such as storage, bandwidth, CPU cycles... PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of Unix and/or Windows computers hooked together by a network to be used as a single large parallel computer. PVM is widely used in distributed computing. Colloquially speaking, virtualization abstracts out things. Why Virtualization, a List of ReasonsFollowing are some (possibly overlapping) representative reasons for and benefits of virtualization:
Why OpenVZ in particular?
Features that come with OpenVZThere are those featuers common to all container based virtualization approaches also known as OS level virtualization i.e. Linux-VServer, OpenVZ and so on. Then there are a few distinct features to OpenVZ not found with other virtualization solutions: Hardware NodeHN (Hardware Node) (otherwise known as OpenVZ host system) is a term used in OpenVZ documentation. Basically it determines the physical box on which the OpenVZ enabled Linux kernel is installed. For example, one of my servers is a hardware node located in some datacenter. This HN has an OpenVZ enabled Linux kernel installed sa@ri7:~$ uname -r 2.6.32-1-openvz-amd64 sa@ri7:~$ and thus becomes a OpenVZ host system also known as
OpenVZ Host SystemWith OpenVZ, we have multiple containers/VEs as well as the host
system itself, which is otherwise known as From OpenVZ ContainerA VE (Virtual Environment), VPS (Virtual Private Server) or CT
(Container), etc. is one of the main concepts of OpenVZ. A
ComponentsThis section lists all components needed for a functioning OpenVZ system. KernelThe core component with any OpenVZ environment is its kernel — to be more precise: OpenVZ is an operating system-level virtualization technology based on the Linux kernel. Therefore the OpenVZ kernel actually is a Linux kernel, modified to add support for OpenVZ containers. The modified kernel provides virtualization, isolation, resource management, and checkpointing. Management UtilitiesOpenVZ needs some user-level tools installed. Those are:
OS TemplatesPlease go here for detailed information. Commonly speaking, nowadays, we use vzpkg2 in conjunction with pkg-cacher to create OS templates. However, aside from that there are a few other possibilities to create OS templates respectively to set up VE rather quickly:
InstallationThis section details how to acquire all the OpenVZ components and install them onto some bare metal box from scratch. After installing all that is needed for a functioning OpenVZ environment we need to set up (configure) various parts of/with the OpenVZ components in order to make the whole shebang ready for operations. Once configured, an OpenVZ environment needs to be managed which is covered with a dedicated section further down. I am not going to talk about non-mainline procedures like for example tinkering around with ready-made binary images, using some sort of SCM (Software Configuration Management) system, FAI (Fully Automatic Installation) or even better, Puppet. Instead I am focusing on the standard procedure i.e.
In short, the prerequisites needed for installing OpenVZ is an installed Debian system with Internet connectivity. The Debian WayStarting with Linux kernel version 2.6.26, Debian provides official
OpenVZ kernels ( sa@wks:~$ acsn linux-image-openvz linux-image-openvz-amd64 - Linux image on AMD64 sa@wks:~$ acsh linux-image-openvz-amd64 Package: linux-image-openvz-amd64 Priority: optional Section: kernel Installed-Size: 8 Maintainer: Debian Kernel Team <[email protected]> Architecture: amd64 Source: linux-latest-2.6 (27) Version: 2.6.32+27 Provides: linux-latest-modules-2.6.32-5-openvz-amd64 Depends: linux-image-2.6.32-5-openvz-amd64 Filename: pool/main/l/linux-latest-2.6/linux-image-openvz-amd64_2.6.32+27_amd64.deb Size: 3048 MD5sum: 70376bd1d10280013a848e5106c1a147 SHA1: 470c8e023b2ab4ef6522e17656e85426f73cf775 SHA256: 968024de8bc4a4212dffda6d2ba4665703fa8af2c9fb2fa76b65b9b45af156d1 Description: Linux for 64-bit PCs (meta-package), OpenVZ support This package depends on the latest Linux kernel and modules for use on PCs with AMD64 or Intel 64 processors. . This kernel includes support for OpenVZ container-based virtualization. . This kernel also runs on a Xen hypervisor. It supports only unprivileged (domU) operation. sa@wks:~$ Excellent! No more need we acquire the OpenVZ Linux kernel patch as
well as the vanilla sources, patch them and finally rebuild the Linux
kernel in order to get an OpenVZ enabled Linux kernel. So, all we need
to do now is to issue sa@wks:~$ apt-cache --recurse depends linux-image-openvz-amd64 | grep vz | grep -B1 Depends linux-image-openvz-amd64 Depends: linux-image-2.6.32-5-openvz-amd64 linux-image-2.6.32-5-openvz-amd64 Depends: vzctl vzctl Depends: vzquota sa@wks:~$ As can be seen below, I have already installed all the
OpenVZ components — issuing sa@wks:~$ type dpl dpl is aliased to `dpkg -l' sa@wks:~$ dpl {linux-image*,vz*} | egrep open\|vz ii linux-image-2.6.32-1-openvz-amd64 2.6.32-1 Linux 2.6.32 image on AMD64, OpenVZ support ii linux-image-openvz-amd64 2.6.32+27 Linux image on AMD64 ii vzctl 3.0.23-16 server virtualization solution - control too ii vzquota 3.0.12-3 server virtualization solution - quota tools sa@wks:~$ date Tue May 25 10:48:10 CEST 2010 sa@wks:~$ uname -a Linux sub 2.6.32-1-openvz-amd64 #1 SMP Wed May 20 13:06:07 UTC 2010 x86_64 GNU/Linux sa@wks:~$ su Password: sub:/home/sa# /etc/init.d/vz status OpenVZ is running... sub:/home/sa# SetupAfter we have installed all the OpenVZ components, we need to set them up and configure the whole shebang. General HintsThis subsection provides a few guidelines which I generally consider best practice for OpenVZ deployment:
Setup PrerequisitesOk, now that we have installed everything needed, we can start setting up things e.g. creating VEs also known as OpenVZ containers. However, before we start firing up one VE after the other we need to know a few things about the internals about how OpenVZ works, best practices etc. First of all, there is something called OS (Operating System) templates. OS TemplatesPlease go here to see how they are acquired and verified. Figuring some Nameservers to useWhen configuring VEs further down, we need to provide each VE with
nameserver entries. Note that setting nameservers within a VEs
Those who take a look at So, we want three nameservers, possibly independent ones. I came up with the below in order to get rid of this repetitive task once and for all... sa@wks:~$ dig {yahoo,google,microsoft}.com NS | grep ^ns ns1.yahoo.com. 170667 IN A 66.218.71.63 ns4.yahoo.com. 171427 IN A 68.142.196.63 ns5.yahoo.com. 699 IN A 119.160.247.124 ns6.yahoo.com. 171642 IN A 202.43.223.170 ns8.yahoo.com. 169475 IN A 202.165.104.22 ns1.google.com. 345117 IN A 216.239.32.10 ns2.google.com. 345372 IN A 216.239.34.10 ns3.google.com. 345372 IN A 216.239.36.10 ns4.google.com. 340223 IN A 216.239.38.10 ns1.msft.net. 2342 IN A 207.68.160.190 ns2.msft.net. 154 IN A 65.54.240.126 ns3.msft.net. 822 IN A 213.199.161.77 ns4.msft.net. 2342 IN A 207.46.66.126 ns5.msft.net. 787 IN A 65.55.238.126 sa@wks:~$ More on how to use sa@wks:~$ alias | grep nas alias nas='dig {yahoo,google,microsoft}.com NS | grep ^ns1 | cut -f6 | xargs -I {} echo -n " --nameserver {}"' sa@wks:~$ nas --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190sa@wks:~$ sa@wks:~$ This output can then be directly used to set nameservers for some VE sub:/home/sa# vzctl set sub_ve0 --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190 --save Saved parameters for VE 101 sub:/home/sa# sysctlPlease go here for common information with regards to sysctl. In conjunction with OpenVZ, the sysctl Linux kernel interface is important for us since we need to set a few kernel parameters needed to run an OpenVZ environment. 1 sub:/home/sa# cat /etc/sysctl.conf 2 ###_ main 3 ###_. misc 4 #kernel.domainname = example.com 5 # Uncomment the following to stop low-level messages on console 6 #kernel.printk = 4 4 1 7 7 ###_. functions previously found in netbase 8 # Uncomment the next two lines to enable Spoof protection 9 # (reverse-path filter) Turn on Source Address Verification in all 10 # interfaces to prevent some spoofing attacks 11 #net.ipv4.conf.default.rp_filter=1 12 #net.ipv4.conf.all.rp_filter=1 13 # Uncomment the next line to enable TCP/IP SYN cookies 14 #net.ipv4.tcp_syncookies=1 15 # Uncomment the next line to enable packet forwarding for IPv4 16 #net.ipv4.ip_forward=1 17 # Uncomment the next line to enable packet forwarding for IPv6 18 #net.ipv6.conf.all.forwarding=1 19 ###_. security 20 # Additional settings - these settings can improve the network 21 # security of the host and prevent against some network attacks 22 # including spoofing attacks and man in the middle attacks through 23 # redirection. Some network environments, however, require that these 24 # settings are disabled so review and enable them as needed. 25 ###_ , ICMP broadcasts 26 # Ignore ICMP broadcasts 27 #net.ipv4.icmp_echo_ignore_broadcasts = 1 28 ###_ , ignore ICMP errors 29 # Ignore bogus ICMP errors 30 #net.ipv4.icmp_ignore_bogus_error_responses = 1 31 ###_ , ICMP redirects 32 # Do not accept ICMP redirects (prevent MITM attacks) 33 #net.ipv4.conf.all.accept_redirects = 0 34 #net.ipv6.conf.all.accept_redirects = 0 35 # _or_ 36 # Accept ICMP redirects only for gateways listed in our default 37 # gateway list (enabled by default) 38 # net.ipv4.conf.all.secure_redirects = 1 39 ###_ , send ICMP redirects 40 # Do not send ICMP redirects (we are not a router) 41 #net.ipv4.conf.all.send_redirects = 0 42 ###_ , accept IP source route packets 43 # Do not accept IP source route packets (we are not a router) 44 #net.ipv4.conf.all.accept_source_route = 0 45 #net.ipv6.conf.all.accept_source_route = 0 46 ###_ , log Martian Packets 47 #net.ipv4.conf.all.log_martians = 1 48 ###_ , /proc/<pid>/maps 49 # The contents of /proc/<pid>/maps and smaps files are only visible to 50 # readers that are allowed to ptrace() the process 51 # sys.kernel.maps_protect = 1 52 ###_. openvz 53 ###_ , packet forwarding for IPv4/6 54 net.ipv4.ip_forward=1 55 #net.ipv6.conf.all.forwarding=1 56 ###_ , magic-sysrq key 57 kernel.sysrq = 1 58 ###_ , ICMP broadcasts 59 net.ipv4.icmp_echo_ignore_broadcasts=1 60 ###_ , ICMP redirects 61 # Do not send ICMP redirects (we are not a router) 62 net.ipv4.conf.all.send_redirects = 0 63 ###_ , spoof protection 64 # Enable Spoof protection (reverse-path filter) Turn on Source Address 65 # Verification in all interfaces to prevent some spoofing attacks 66 net.ipv4.conf.default.rp_filter=1 67 net.ipv4.conf.all.rp_filter=1 68 ###_ , proxy ARP 69 # Disabling proxy ARP per default 70 net.ipv4.conf.default.proxy_arp=0 71 # Enabling for eth0 72 net.ipv4.conf.eth0.proxy_arp=1 73 ###_ , interfaces redirects 74 # We do not want all our interfaces to send redirects 75 net.ipv4.conf.default.forwarding=1 76 net.ipv4.conf.default.send_redirects = 1 77 ###_ emacs local variables 78 # Local Variables: 79 # mode: conf 80 # allout-layout: (0 : 0) 81 # End: 82 sub:/home/sa# cat /etc/sysctl.conf | grep -v \# 83 net.ipv4.ip_forward=1 84 kernel.sysrq = 1 85 net.ipv4.icmp_echo_ignore_broadcasts=1 86 net.ipv4.conf.all.send_redirects = 0 87 net.ipv4.conf.default.rp_filter=1 88 net.ipv4.conf.all.rp_filter=1 89 net.ipv4.conf.default.proxy_arp=0 90 net.ipv4.conf.eth0.proxy_arp = 1 91 net.ipv4.conf.default.forwarding=1 92 net.ipv4.conf.default.send_redirects = 1 93 wks:/home/sa# sysctl -p 94 net.ipv4.ip_forward = 1 95 kernel.sysrq = 1 96 net.ipv4.icmp_echo_ignore_broadcasts = 1 97 net.ipv4.conf.all.send_redirects = 0 98 net.ipv4.conf.default.rp_filter = 1 99 net.ipv4.conf.all.rp_filter = 1 100 net.ipv4.conf.default.proxy_arp = 0 101 net.ipv4.conf.eth0.proxy_arp = 1 102 net.ipv4.conf.default.forwarding = 1 103 net.ipv4.conf.default.send_redirects = 1 104 sub:/home/sa# Line 93 is important. We do not want to reboot the HN so we reload the
settings from SetupAt this point we have two possibilities to choose from:
VE with static IPv4 addressThis subsubsection covers how to set up a VE which uses a static IPv4 address. Basic Setup1 wks:/home/sa# vzctl create 101 --config vps.basic --ostemplate debian-5.0-amd64-minimal 2 Creating VE private area (debian-5.0-amd64-minimal) 3 Performing postcreate actions 4 VE private area was created 5 wks:/home/sa# vzctl set 101 --hostname wks-ve1 --name stable --save 6 Name stable assigned 7 Saved parameters for VE 101 8 wks:/home/sa# vzctl set 101 --ipadd 192.168.1.100 --save 9 Saved parameters for VE 101 10 wks:/home/sa# vzctl set 101 --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190 --save 11 Saved parameters for VE 101 12 wks:/home/sa# vzctl set stable --userpasswd sa:xxxxxxxxxxxxxxx 13 Starting VE... 14 VE is mounted 15 VE start in progress... 16 Stopping VE... 17 VE was stopped 18 VE is unmounted 19 wks:/home/sa# vzctl set stable --onboot no --save 20 Saved parameters for VE 101 Setting up a VE running Debian is what we do in lines 1 to 20. In line
1 we create the VE and thereby specify what configuration to use for
its initial setup — I opted for The VEID we use is Starting with line 5, we can now start to add the needed
configurations to our VE. In order to do so we need to first specify
the VE in question which is Line 12 creates a user and sets a password for this particular
user within our VE — in case the user does not exist it
is created. Note that in line 12 we used The alerted reader might have noticed that the Because as of now (March 2009) the default setting when using
Basic Commands/Usage21 wks:/home/sa# vzctl start stable 22 Starting VE... 23 VE is mounted 24 Adding IP address(es): 192.168.1.100 25 Setting CPU units: 1000 26 Configure meminfo: 65536 27 Set hostname: wks-ve1 28 File resolv.conf was modified 29 VE start in progress... 30 wks:/home/sa# vzlist -a 31 VEID NPROC STATUS IP_ADDR HOSTNAME 32 101 7 running 192.168.1.100 wks-ve1 33 wks:/home/sa# vzlist -an 34 VEID NPROC STATUS IP_ADDR NAME 35 101 7 running 192.168.1.100 stable In line 21 it is time to start our VE. When it finished, we list all
currently running VEs in lines 31 to 35, once showing their hostname
as we set it in line 5 and once its alias name, also as we set it in
line 5. If we compare the column 36 wks:/home/sa# vzctl enter stable 37 entered into VE 101 38 root@wks-ve1:~# date -u 39 Wed Mar 4 19:52:26 UTC 2010 40 root@wks-ve1:/# uname -a 41 Linux wks-ve1 2.6.32-1-openvz-amd64 #1 SMP Sat May 10 18:52:53 UTC 2010 x86_64 GNU/Linux 42 root@wks-ve1:/# whoami 43 root 44 root@wks-ve1:/# pwd 45 / 46 root@wks-ve1:~# passwd 47 Enter new UNIX password: 48 Retype new UNIX password: 49 passwd: password updated successfully 50 root@wks-ve1:~# cat /etc/debian_version 51 5.0 Next we enter the VE in line 36 — note how the prompt changes from
52 root@wks-ve1:~# cat /etc/resolv.conf 53 nameserver 68.180.131.16 54 nameserver 216.239.32.10 55 nameserver 207.68.160.190 56 root@wks-ve1:~# cat /etc/apt/sources.list 57 deb http://ftp2.de.debian.org/debian lenny main contrib non-free 58 deb http://ftp2.de.debian.org/debian-security lenny/updates main contrib non-free 59 root@wks-ve1:~# sed -i s/lenny/stable/ /etc/apt/sources.list 60 root@wks-ve1:~# sed -i s/ftp2/ftp/ /etc/apt/sources.list 61 root@wks-ve1:~# sed -i s/http/ftp/ /etc/apt/sources.list 62 root@wks-ve1:~# cat /etc/apt/sources.list 63 deb ftp://ftp.de.debian.org/debian stable main contrib non-free 64 deb ftp://ftp.de.debian.org/debian-security stable/updates main contrib non-free 65 root@wks-ve1:~# ping -c2 debian.org 66 PING debian.org (194.109.137.218) 56(84) bytes of data. 67 64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=1 ttl=49 time=26.7 ms 68 64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=2 ttl=49 time=26.5 ms 69 70 --- debian.org ping statistics --- 71 2 packets transmitted, 2 received, 0% packet loss, time 1006ms 72 rtt min/avg/max/mdev = 26.502/26.603/26.705/0.192 ms 73 root@wks-ve1:~# ping -c2 google.com 74 PING google.com (209.85.171.100) 56(84) bytes of data. 75 64 bytes from cg-in-f100.google.com (209.85.171.100): icmp_seq=1 ttl=233 time=186 ms 76 64 bytes from cg-in-f100.google.com (209.85.171.100): icmp_seq=2 ttl=233 time=181 ms 77 78 --- google.com ping statistics --- 79 2 packets transmitted, 2 received, 0% packet loss, time 1006ms 80 rtt min/avg/max/mdev = 181.314/183.724/186.135/2.448 ms Line 53 to 55 show the nameserver(s) which we set in line 10. Note
that in certain cases, one might need to not set the nameservers as we
did in line 10 but to provide his gateways IPv4 address e.g.
The current nameserver setup as it can be seen in lines 53 to 55 works perfectly fine for some server within a datacenter where our server would have unlimited/unfiltered connectivity to the Internet and therefore any nameserver we would like to poll for DNS (Domain Name System) resolution. In lines 56 to 64 I am just tailoring our VEs Lines 65 and 73 are to test if we really have connectivity from inside
our VE 81 root@wks-ve1:~# aptitude update && aptitude full-upgrade 82 Get:1 ftp://ftp.de.debian.org stable Release.gpg [386B] 83 Get:2 ftp://ftp.de.debian.org stable/updates Release.gpg [189B] 84 85 86 [skipping a lot of lines...] 87 88 89 The following packages will be upgraded: 90 apt apt-utils base-passwd debian-archive-keyring dhcp3-client dhcp3-common dpkg gcc-4.2-base iptables libattr1 libcwidget3 libgnutls26 libreadline5 man-db procps readline-common rsyslog 91 The following packages are RECOMMENDED but will NOT be installed: 92 psmisc 93 17 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded. 94 Need to get 7952kB of archives. After unpacking 156kB will be freed. 95 Do you want to continue? [Y/n/?] y 96 Writing extended state information... Done 97 Get:1 ftp://ftp.de.debian.org stable/main dpkg 1.14.25 [2399kB] 98 Get:2 ftp://ftp.de.debian.org stable/main base-passwd 3.5.20 [41.7kB] 99 100 101 [skipping a lot of lines...] 102 103 104 Setting up procps (1:3.2.7-11)... 105 Installing new version of config file /etc/sysctl.conf... 106 Installing new version of config file /etc/init.d/procps... 107 Setting kernel variables (/etc/sysctl.conf)...done. 108 Reading package lists... Done 109 Building dependency tree 110 Reading state information... Done 111 Reading extended state information 112 Initializing package states... Done 113 114 Current status: 0 updates [-17]. 115 root@wks-ve1:/# exit 116 logout 117 exited from VE 101 118 wks:/home/sa# Nothing special in lines 81 to 118. All we do is to issue line 81 in order to update our stable Debian release with up to date packages and security updates. Last thing to say, we are in business ladies and gentlemen, ready to rock... From now on this VE can be threaded just a physical machine running Debian stable — no one would notice a difference without explicitly knowing that this is actually an OpenVZ VE (Virtual Environment). VE with IPv4 address assigned via DHCPWRITEME Nice to HaveThis subsubsection covers a few optional nice to have settings in addition to the basic setup of a VE. What I consider in this regard is A VE should only have minimal set of software installed i.e. we start
out with an OS template like for example
In case machine is a HN, we do some additional stuff:
Security relevant Goodies
Storage
NetworkingThe OpenVZ network virtualization layer is designed to isolate VEs from each other and from the physical network:
veth versus venet
Resource ManagementOpenVZ resource management controls the amount of resources available for VEs. The controlled resources include such parameters as CPU power, diskspace, a set of memory-related parameters, etc. Resource management allows OpenVZ to:
Resource management is much more important for OpenVZ than for a standalone computer since computer resource utilization in an OpenVZ-based system is considerably higher than that found with typical system. As all the VEs are using the same kernel, resource management is of paramount importance i.e. each VE stays within its boundaries and does not affect other VEs in any way — and this is what resource management does. OpenVZ resource management consists of five main components:
Please note that all those resources can be changed during VE run time also known as on-line, there is no need to reboot and therefore no downtime. For example, if we want to give some VE less RAM (Random Access Memory, we just change the appropriate parameters on the fly. This is either very hard to do or not possible at all with other virtualization approaches such as VMware, Xen, etc. Disk SpaceThe OpenVZ host system administrator can set up a per-container disk quotas, in terms of disk blocks and inodes (roughly number of files). This is the first level of disk quota. In addition, a VE administrator can employ usual quota tools inside a particular VE to set standard UNIX per-user and per-group disk quotas. If one wants to give a VE more diskspace, he might just increase its disk quota limit. No need to resize disk partitions etc. However, I recommend to use LVM (Logical Volume Manager) to set up LVs (Logical Volumes) for VEs i.e. any VE gets its own LV (Logical Volume). Disk I/OSimilar to the Fair CPU scheduler described above, the I/O scheduler in OpenVZ is also two-level, utilizing the CFQ I/O scheduler on its second level. Each VE is assigned an I/O priority, and the I/O scheduler distributes the available I/O bandwidth according to the priorities assigned. Thus no single container can saturate an I/O channel.
CPUThe CPU scheduler in OpenVZ is a two-level implementation that implements a fair-share scheduling strategy. On the first level the scheduler decides which VE is given the CPU time slice, based on per-VE cpuunit values. On the second level the standard Linux scheduler decides which process to run within that particular VE, using standard Linux process priorities and such. We can set up different values of cpuunits for different VEs so the CPU time will be given to those proportionally. Also there is a way to limit CPU time, e.g. say a VE is limited to 10% of CPU time available. MemoryWRITEME
User BeancountersUser beancounters is a set of per-VE counters, limits, and guarantees. There is a set of about 20 parameters which are carefully chosen to cover all the aspects of VE operation, so no single container can abuse any resource which is limited for the whole node and thus do harm to another VEs. Resources accounted and controlled are mainly memory and various
in-kernel objects such as IPC shared memory segments, network buffers
etc. Each resource can be seen from
The meaning of barrier and limit is parameter-dependent i.e. those can
be thought of as a soft limit and hard limit. If any resource hits the
limit, its fail counter is increased, so the VE administrator can see
if something bad is happening by analyzing the output of
WRITEME (those few lines below are just notes for now; examples will follow)
Example how to set a new barrier/limit:Set beancounters without downtime: This example sets Disable Resource Limits
Checkpointing and Live MigrationGo here for more information. Best PracticeMonitoring
Backup a VE
Clone a VEAssuming we do not want to start from scratch with a new OS template because we have already added some nice to have cookies or maybe we just want to test some new software on some particular VE but not take the risk to suffer from potential failures that might arise from doing so... The right approach to such problems would be branching — an SCM (Software Configuration Management) term — but then we are talking OpenVZ now and not GIT right now. However, with OpenVZ we can do some sort of branching — it is called cloning a VE and is actually a trivial endeavor as we will see below. What we would like to accomplish is to clone a VE called What we are going to do is to clone
My rationale here is simple: I have a few prototypes which I keep up to date and then use for cloning whenever I need a VE based on some prototype VE. The prototype VEs I have are
1 wks:/var/lib/vz# vzlist -a 2 VEID NPROC STATUS IP_ADDR HOSTNAME 3 101 7 running 192.168.1.100 wks-ve1 4 102 - stopped 192.168.1.101 wks-ve2 5 103 7 running 192.168.1.102 wks-ve3 6 wks:/var/lib/vz# vzlist -an 7 VEID NPROC STATUS IP_ADDR NAME 8 101 7 running 192.168.1.100 stable 9 102 - stopped 192.168.1.101 testing 10 103 7 running 192.168.1.102 unstable 11 wks:/var/lib/vz# la root/ 12 total 8 13 drwxr-xr-x 5 root root 36 2009-03-04 23:32 . 14 drwxr-xr-x 7 root root 68 2008-10-23 21:37 .. 15 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101 16 drwxr-xr-x 2 root root 6 2009-03-04 23:28 102 17 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103 18 wks:/var/lib/vz# mkdir root/105 19 wks:/var/lib/vz# cp -a /etc/vz/conf/{102,105}.conf 20 wks:/var/lib/vz# la /etc/vz/conf 21 total 36 22 drwxr-xr-x 2 root root 4096 2009-03-06 13:04 . 23 drwxr-xr-x 6 root root 66 2008-11-17 12:52 .. 24 -rw-r--r-- 1 root root 246 2008-10-18 15:39 0.conf 25 -rw-r--r-- 1 root root 1768 2009-03-04 23:19 101.conf 26 -rw-r--r-- 1 root root 1862 2008-09-07 20:16 101.conf.destroyed 27 -rw-r--r-- 1 root root 1769 2009-03-04 23:30 102.conf 28 -rw-r--r-- 1 root root 1770 2009-03-04 23:34 103.conf 29 -rw-r--r-- 1 root root 1769 2009-03-04 23:30 105.conf 30 -rw-r--r-- 1 root root 1576 2008-10-18 15:39 ve-light.conf-sample 31 -rw-r--r-- 1 root root 1541 2008-10-18 15:39 ve-vps.basic.conf-sample 32 wks:/var/lib/vz# la root/ 33 total 8 34 drwxr-xr-x 6 root root 46 2009-03-06 12:57 . 35 drwxr-xr-x 7 root root 68 2008-10-23 21:37 .. 36 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101 37 drwxr-xr-x 2 root root 6 2009-03-04 23:28 102 38 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103 39 drwxr-xr-x 2 root root 6 2009-03-06 12:57 105 The VE we want to clone can be seen in lines 4 and 9 respectively — it need not necessarily be stopped but it sure is the more secure way if it is. There are paths and one file we need to take care of when cloning a VE. First we create the VE file system root for our new VE in line 18. Then we create its configuration file in line 19. 40 wks:/var/lib/vz# time cp -a private/10{2,5} 41 42 real 0m1.822s 43 user 0m0.068s 44 sys 0m0.200s 45 wks:/var/lib/vz# du -sh private/10[25] 46 354M private/102 47 354M private/105 48 wks:/var/lib/vz# la private/ 49 total 16 50 drwxr-xr-x 6 root root 46 2009-03-06 13:27 . 51 drwxr-xr-x 7 root root 68 2008-10-23 21:37 .. 52 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101 53 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 102 54 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103 55 drwxr-xr-x 20 root root 4096 2009-03-06 12:32 105 While creating the new VE root in line 18 and creating its
configuration file in line 19 where lightweight operations, the real
heavy lifting takes place in line 40 as we actually clone the VE
56 wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 102.conf 57 HOSTNAME="wks-ve2" 58 NAME="testing" 59 IP_ADDRESS="192.168.1.101" 60 wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 105.conf 61 HOSTNAME="wks-ve2" 62 NAME="testing" 63 IP_ADDRESS="192.168.1.101" 64 wks:/etc/vz/conf# sed -i s/wks-ve2/wks-ve5/ 105.conf 65 wks:/etc/vz/conf# sed -i s/testing/testing_ssh/ 105.conf 66 wks:/etc/vz/conf# sed -i 's/192.168.1.101/192.168.1.104/' 105.conf 67 wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 105.conf 68 HOSTNAME="wks-ve5" 69 NAME="testing_ssh" 70 IP_ADDRESS="192.168.1.104" The next thing we need to address is in the tiny differences between
the VEs What needs to be changed in 71 wks:/etc/vz/conf# vzlist -a 72 VEID NPROC STATUS IP_ADDR HOSTNAME 73 101 7 running 192.168.1.100 wks-ve1 74 102 - stopped 192.168.1.101 wks-ve2 75 103 7 running 192.168.1.102 wks-ve3 76 105 - stopped 192.168.1.104 wks-ve5 77 wks:/etc/vz/conf# cd ../names 78 wks:/etc/vz/names# ln -s /etc/vz/conf/105.conf testing_ssh 79 wks:/etc/vz/names# la 80 total 0 81 drwxr-xr-x 2 root root 94 2009-03-06 13:42 . 82 drwxr-xr-x 6 root root 66 2008-11-17 12:52 .. 83 lrwxrwxrwx 1 root root 21 2009-03-04 13:56 stable -> /etc/vz/conf/101.conf 84 lrwxrwxrwx 1 root root 21 2009-03-04 23:29 testing -> /etc/vz/conf/102.conf 85 lrwxrwxrwx 1 root root 21 2009-03-06 13:42 testing_ssh -> /etc/vz/conf/105.conf 86 lrwxrwxrwx 1 root root 21 2009-03-04 23:33 unstable -> /etc/vz/conf/103.conf We have now cloned a VE as can be seen from lines 74 and 76 — both
are stopped right now. Last thing that needs to be done is to actually
create the alias name (see files) for VE 105 which is
87 wks:/var/lib/vz# vzctl start testing 88 Starting VE... 89 VE is mounted 90 Adding IP address(es): 192.168.1.101 91 Setting CPU units: 1000 92 Configure meminfo: 65536 93 Set hostname: wks-ve2 94 File resolv.conf was modified 95 VE start in progress... 96 wks:/var/lib/vz# vzctl start testing_ssh 97 Starting VE... 98 VE is mounted 99 Adding IP address(es): 192.168.1.104 100 Setting CPU units: 1000 101 Configure meminfo: 65536 102 Set hostname: wks-ve5 103 File resolv.conf was modified 104 VE start in progress... 105 wks:/etc/vz/names# vzlist -an 106 VEID NPROC STATUS IP_ADDR NAME 107 101 7 running 192.168.1.100 stable 108 102 7 running 192.168.1.101 testing 109 103 7 running 192.168.1.102 unstable 110 105 7 running 192.168.1.104 testing_ssh 111 wks:/etc/vz/names# vzctl enter testing_ssh 112 entered into VE 105 113 wks-ve5:/# pin 114 wks-ve5:/# ping -c2 debian.org 115 PING debian.org (194.109.137.218) 56(84) bytes of data. 116 64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=1 ttl=49 time=26.1 ms 117 64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=2 ttl=49 time=25.8 ms 118 119 --- debian.org ping statistics --- 120 2 packets transmitted, 2 received, 0% packet loss, time 1005ms 121 rtt min/avg/max/mdev = 25.893/26.035/26.178/0.215 ms 122 wks-ve5:/# cat /etc/debian_version 123 squeeze/sid 124 wks-ve5:/# After starting both VEs, the result can then be seen in lines 108 and
110 respectively. In line 111 we enter the just cloned VE
Physical to VE- http://wiki.openvz.org/Physical_to_container Checkpointing and Live MigrationA live migration and checkpointing feature was released for OpenVZ in April 2006. It allows to migrate a VE from one physical server to another one, all online i.e. without the need to shutdown/restart a VE and therefore without any downtime. The process is known as checkpointing i.e. a VE is frozen and its whole state is saved to a file on disk. This file can then be transferred to another machine and a VE can be unfrozen (restored) there. The delay is about a few seconds, and it is not a downtime, just a delay. Since every piece of the VE state, including opened network connections etc., is saved, from the user's perspective it looks like a delay in response. For example, say one database transaction takes a longer time than usual, when it continues as normal and user does not notice that his database is already running on the other HN (Hardware Node). That feature makes possible scenarios such as upgrading our server online i.e. without any need to reboot it. If our database needs more memory or CPU resources, we just buy a newer/better server and live migrate some VE to it, then increase its limits and be done. If we want to add more RAM to our server, we migrate all VEs to another one, shut it down, add memory, start it again and migrate all VEs back (enterprise hardware allows for hot-plugging/swapping such components i.e. no need to migrate around VEs at all). NotesWRITEME
Misc
MiscellaneousThis section is used to drop anything OpenVZ related here but which on its own does not deserve a section on its own. The subsections here must not necessarily have anything to do with each another, except for the fact that OpenVZ may be the only thing they have in common. FilesFor the most part, those files are either part of OpenVZ itself or closely related to it — for example some of those files are used for storing setup and config information about and with regards to OpenVZ. Personally, I find it very important for anybody to know where what is in order to produce sane results with OpenVZ. Configuration
What I find especially funky here is the fact that I use etckeeper
which means I keep track of any change that happens to the above
files/directories. Doing so means I can look up any changes which for
example happened because I created a VE, changed some setting in for
example Information gatheringThe files below are used by OpenVZ (and thus the Linux Kernel) to provide us with information about the current state of operations.
Log Data, Utilities Metadata, VE DataThe following are not files but directories containing files of
similar type for each directory. Debian's OpenVZ root directory is
Kernel ModulesWRITEME sa@wks:/lib/modules/2.6.32-2-openvz-amd64/kernel/kernel$ ls -lR .: total 0 drwxr-xr-x 2 root root 36 2009-04-12 22:26 cpt drwxr-xr-x 2 root root 52 2009-04-12 22:26 ve ./cpt: total 372 -rw-r--r-- 1 root root 175176 2009-03-27 08:21 vzcpt.ko -rw-r--r-- 1 root root 201514 2009-03-27 08:21 vzrst.ko ./ve: total 92 -rw-r--r-- 1 root root 11781 2009-03-27 08:21 vzdev.ko -rw-r--r-- 1 root root 61273 2009-03-27 08:21 vzmon.ko -rw-r--r-- 1 root root 17854 2009-03-27 08:21 vzwdog.ko sa@wks:/lib/modules/2.6.32-2-openvz-amd64/kernel/kernel$ Tool SetAfter Installing and setting up OpenVZ it need be managed respectively the software running within VEs need be managed to ensure faultless operations. This section will point out how to use OpenVZ's userspace toolset in order to manage all kinds of things somehow related to OpenVZ. Basically managing the whole OpenVZ shebang can be done with CLI tools. However, some folks, sometimes, for some reason, prefer using GUIs... GUIs / CLIs ToolsI am not so fond of GUIs but prefer the CLI a lot therefore I am not making GUIs a high priority item onto this page. However, there are a few URL (Uniform Resource Locator):
The following is all about CLI (Command Line Interface) tools used to install, set up and manage OpenVZ components. vzctlvzctl is a CLI userspace utility which is used from within The sa@wks:~$ afl vzctl | grep bin/ vzctl: /usr/sbin/arpsend vzctl: /usr/sbin/ndsend vzctl: /usr/sbin/vzcalc vzctl: /usr/sbin/vzcfgvalidate vzctl: /usr/sbin/vzcpucheck vzctl: /usr/sbin/vzctl vzctl: /usr/sbin/vzlist vzctl: /usr/sbin/vzmemcheck vzctl: /usr/sbin/vzmigrate vzctl: /usr/sbin/vznetaddbr vzctl: /usr/sbin/vznetcfg vzctl: /usr/sbin/vzpid vzctl: /usr/sbin/vzsplit each one serving a particular purpose:
Notes on vzctlThose notes are mostly literally taken from
vzquotavzquota controls disk quotas for OpenVZ VEs. These are per-VE disk
quotas set from the OpenVZ host system i.e. The The sa@wks:~$ afl vzquota | grep bin/ vzquota: /usr/sbin/vzdqcheck vzquota: /usr/sbin/vzdqdump vzquota: /usr/sbin/vzdqload vzquota: /usr/sbin/vzquota sa@wks:~$ each one serving a particular purpose:
Notes on vzquota
vzpkgOpenVZ OS template management utility i.e. it may be used for semi-automatic OS template creation. More information can be found here. vzpkg2 and pkg-cacherThe successor to vzpkg who works in conjunction with pkg-cacher, an apt-cache like tool, redesigned to ease and improve OS template creation process for OpenVZ. vztmplOpenVZ Templates Third Party Utilitiesvzdump
vzprocpsOS TemplatesAn OS template is basically a set of packages from some Linux distribution (e.g. Debian) used to populate an OpenVZ container. With OpenVZ, different distributions can co-exist on the same OpenVZ host system, therefore multiple OS templates are available with any OpenVZ installation when installing the debian-way. An OS template consists of system programs, libraries, and scripts needed to boot up and run the system i.e. VE, as well as some very basic applications and utilities. Stuff like for example a compiler, SQL server, CMS (e.g. Django CMS) etc. are usually not included with an OS template simply because they are not considered being part of a minimalistic set of software needed to create a fully functional VE.
OS Template MetadataOS template metadata is a set of a few files containing the following information:
Where to get precreated OS templatesThere are few places where we can get precreated OS templates
Download1 sa@wks:/var/lib/vz/template/cache$ la 2 total 0 3 drwxr-xr-x 2 root root 6 2008-10-18 15:39 . 4 drwxr-xr-x 3 root root 18 2008-10-23 21:37 .. 5 sa@wks:/var/lib/vz/template/cache$ su 6 Password: 7 wks:/var/lib/vz/template/cache# wget --quiet http://download.openvz.org/template/precreated/contrib/debian-5.0-amd64-minimal.tar.gz{,.asc} 8 wks:/var/lib/vz/template/cache# ls -la 9 total 60024 10 drwxr-xr-x 2 root root 86 2009-03-03 13:13 . 11 drwxr-xr-x 3 root root 18 2008-10-23 21:37 .. 12 -rw-r--r-- 1 root root 61459687 2009-01-13 08:44 debian-5.0-amd64-minimal.tar.gz 13 -rw-r--r-- 1 root root 197 2009-01-13 08:46 debian-5.0-amd64-minimal.tar.gz.asc The Verify for Integrity and AuthenticityI can only strongly recommend to verify the downloaded OS template for integrity and authenticity — if this utterly important check is missed, then every security measure that comes afterwards is pointless. Any author/creator of OS templates should (I would say must but...) sign them. 14 wks:/var/lib/vz/template/cache# gpg --verify debian-5.0-amd64-minimal.tar.gz.asc debian-5.0-amd64-minimal.tar.gz 15 gpg: Signature made Tue 13 Jan 2009 08:46:04 AM CET using DSA key ID 6D9DACBE 16 gpg: Can't check signature: public key not found 17 wks:/var/lib/vz/template/cache# gpg --keyserver hkp://pool.sks-keyservers.net --search-keys systs.org 18 gpg: searching for "systs.org" from hkp server pool.sks-keyservers.net 19 (1) debian.systs.org Archiv Signing Key (2007) <[email protected]> 20 1024 bit DSA key 52A9498A, created: 2006-11-02 21 (2) Thorsten Schifferdecker <[email protected]> 22 1024 bit DSA key EB1522E1, created: 2006-11-02 23 (3) Thorsten Schifferdecker <[email protected]> 24 1024 bit DSA key D66C37F9, created: 2005-01-20 25 Keys 1-3 of 3 for "systs.org". Enter number(s), N)ext, or Q)uit > Q 26 wks:/var/lib/vz/template/cache# gpg --keyserver hkp://pool.sks-keyservers.net --search-keys openvz 27 gpg: searching for "openvz" from hkp server pool.sks-keyservers.net 28 (1) Virtualization:OpenVZ OBS Project <Virtualization:[email protected] 29 1024 bit DSA key D673DA6C, created: 2008-07-21 30 (2) OpenVZ Project <[email protected]> 31 1024 bit DSA key A7A1D4B6, created: 2005-09-14 32 Keys 1-2 of 2 for "openvz". Enter number(s), N)ext, or Q)uit > Q 33 wks:/var/lib/vz/template/cache# wget -q http://debian.systs.org/debian/dso_archiv_signing_key.asc 34 wks:/var/lib/vz/template/cache# mv dso_archiv_signing_key.asc /tmp/ 35 wks:/var/lib/vz/template/cache# gpg --import /tmp/dso_archiv_signing_key.asc 36 gpg: key 6D9DACBE: public key "debian.systs.org (dso) - Automatic Archive Signing Key (2008-2009) <[email protected]>" imported 37 gpg: Total number processed: 1 38 gpg: imported: 1 39 wks:/var/lib/vz/template/cache# gpg --list-key sys 40 pub 1024D/6D9DACBE 2008-03-01 [expires: 2009-03-01] 41 uid debian.systs.org Archiv Signing Key (2008) <[email protected]> 42 43 wks:/var/lib/vz/template/cache# gpg --verify debian-5.0-amd64-minimal.tar.gz.asc debian-5.0-amd64-minimal.tar.gz 44 gpg: Signature made Tue 13 Jan 2009 08:46:04 AM CET using DSA key ID 6D9DACBE 45 gpg: Good signature from "debian.systs.org Archiv Signing Key (2008) <[email protected]>" 46 gpg: WARNING: This key is not certified with a trusted signature! 47 gpg: There is no indication that the signature belongs to the owner. 48 Primary key fingerprint: 8EE1 945F 377B A6E5 8234 72FC C709 A411 6D9D ACBE 49 wks:/var/lib/vz/template/cache# Now that we have downloaded the signature file (line 13) as well as the OS template (line 12), we try to verify the OS template in line 14. As we can see, it does not work — we need to get the OS template creators public key that was used to sign the OS template. Line 17 shows how to acquire a public key using a key server — what
we are looking for is a key with the ID from line 15 i.e. Well, obviously the OS template creator has not uploaded this particular key onto a key server but instead provides it directly from his website (line 33). All we do then is to import the key into our keyring in line 35 and finally try to verify again in line 43. That worked out just fine — the warning in line 46 should not necessarily concern us although, from my point of view, I would prefer a key which has been signed by others in order to elevate its trust level. OS Templates CreationSemi-automatic with vzpkgSemi-automatic with vzpkg2Determining the Page SizeWe can use the following program to determine the page size for our
architecture (if it supports the #include <stdlib.h> #include <stdint.h> #include <unistd.h> int main(int argc, char *argv[]) { int page_size = getpagesize(); printf("The page size is %d\n", page_size); exit(0); } Here's how to compile and run it (assuming we saved it as # gcc pagesize.c -o pagesize # ./pagesize The page size is 4096 Whether Those who prefer, can get the pagesize using python. All we need to do is to start a python console and write: >>> import resource >>> resource.getpagesize() 4096 Some typical Pagesizes
Use CasesUse cases for OpenVZ and therefore virtualization in general come out of the understanding why we need virtualization. A VE for each Debian ReleaseSetting up a VE for each Debian release for further cloning in order to use them for testing software, staging software, packaging software, as a testbed, etc. is quite handy and takes just seconds to do. wks:/home/sa# vzlist -a VEID NPROC STATUS IP_ADDR HOSTNAME 101 8 running 192.168.1.100 wks-ve1 102 6 running 192.168.1.101 wks-ve2 103 7 running 192.168.1.102 wks-ve3 wks:/home/sa# vzlist -an VEID NPROC STATUS IP_ADDR NAME 101 8 running 192.168.1.100 stable 102 6 running 192.168.1.101 testing 103 7 running 192.168.1.102 unstable wks:/home/sa# SecuritySince security with regards to OpenVZ can be handled equally to security considerations of any non-virtualized environment, I am going to point to a few places which discuss particular security matters with focus on particular issues:
|