OpenVZ

Tweets by @markusgattol

OpenVZ

Status: Introduction done. See 'Update' below.

Last changed: Saturday 2015-01-10 18:32 UTC

Abstract:

OpenVZ is an operating system-level virtualization technology (same as Linux-VServer) based on the Linux Kernel. OpenVZ allows a physical server to run multiple isolated operating system instances, known as containers, Virtual Private Servers (VPSs), or Virtual Environments (VEs). As compared to virtual machines such as VMware and paravirtualization technologies like Xen, OpenVZ is limited in that it requires both the host and guest OS to be Linux (although Linux distributions can be different in different VEs). However, OpenVZ claims a performance advantage -- according to its website, there is only a 1-3% performance penalty for OpenVZ as compared to a system running on a native Linux Kernel. I am using three virtualization solutions -- Xen, Linux-VServer and OpenVZ. The latter is what I use exclusively now (2010) -- that might change again depending on my mood or requirements. This page will tell all possibly interesting things I have to say about OpenVZ and the things I do with it.

Table of Contents

Introduction

Why using OpenVZ?
Hardware Node
OpenVZ Host System
OpenVZ Container
Components

Installation

The Debian Way

Setup

General Hints
Setup Prerequisites
Setup
Storage
Networking
Resource Management

Best Practice

Monitoring
Backup a VE
Clone a VE
Physical to VE
Checkpointing and Live Migration
Misc

Miscellaneous

Files
Kernel Modules
Tool Set
OS Templates
Determining the Page Size
Use Cases
Security

This page is part of my virtualization context i.e. from my point of view talking/doing virtualization includes

the OS (Operating System) part e.g. OpenVZ, Xen, Linux-VServer, etc.
the storage part e.g. LVM (Logical Volume Manager), a world-class solution for doing storage virtualization.

Update: In January 2010 the decision had been made to ditch OpenVZ and go with LXC (Linux Containers) because LXC is in mainline. The problem with OpenVZ was that it did not provide recent kernels and it was not clear what future intentions will be. This, we cannot have.

In particular folks have been waiting for a new OpenVZ patch (>= 2.6.32) when only 2.6.27 was available. Then, in early 2010, because the current udev version does not work with old kernels, it was not possible to upgrade Debian (or Debian based; Ubuntu) systems anymore. It was also unclear if there will be support for the upcoming stable release of Debian. We had to act...

Update #2: Just now (2010-03-08) a patch for 2.6.32 showed up on http://git.openvz.org. This is great news and has been awaited for a long time by many people. Thank you guys! We are now continuing to use OpenVZ but also, in parallel, continue with our LXC migrations — maybe we will use both in the future since under the hood and feature-wise they are almost the same already anyway. Who knows, maybe in the future OpenVZ will just be a thin layer atop LXC providing a richer set of features over entirely-in-mainline-already LXC. Best of both worlds, way to go...

Introduction

OpenVZ and Mainline: It is very unlikely code within mainline will die because the no one uses it anymore — this also implicates that once code made it into mainline, chances are high it becomes some sort of standard and is used by the masses. In the long-run, code within mainline Linux will get better and better simply because more and more people use it, reveal bugs, fix them and generally improve the code. For code being part of mainline Linux also leads to good, novice-appropriate documentation, industry support... in short, lots of things fostering a project — OpenVZ in this case.; It is declared goal of the OpenVZ project to get as much as possible of OpenVZ code into mainline Linux. As of now (August 2008), around 70% of OpenVZ is already part of mainline Linux. A detailed interview on the subject can be read here. The user-land tools are another story — they have nothing to do with the OpenVZ kernel patch.
Virtuozzo: Parallels Virtuozzo Containers is Parallels virtualization and automation solution built on top of OpenVZ. Parallels Virtuozzo Containers provides improvements and additional functionality in the areas of density, management tools, recovery, and other areas.; Specific benefits of Parallels Virtuozzo Containers/VEs compared to OpenVZ can be found here. Personally, I rather think of Virtuozzo as something comforting for the GUIs people. For those comfortable with the CLI I would recommend to stick with plain OpenVZ since it is 100% FLOSS (Free/Libre Open Source Software) and just as good as Virtuozzo if not better if we take the vendor lock-in factor into account.

Why using OpenVZ?

This section is about why virtualization makes sense and in particular why I think that especially OpenVZ makes more sense than other solutions out there.

Only one of many reasons...

Why using Virtualization at all?

First of all let us start with a little journey into what problems where ahead of those guys a three or so decades ago which finally lead to what we are using on a daily basis nowadays...

Old Problems

Robert P. Goldberg describes the then state of things in his 1974 paper titled Survey of Virtual Machines Research. He says: Virtual machine systems were originally developed to correct some of the shortcomings of the typical third generation architectures and multi-programming operating systems - e.g., OS/360. As he points out, such systems had a dual-state hardware organization — a privileged and a non-privileged mode, something that is prevalent today as well.

In privileged mode all instructions are available to software, whereas in non-privileged mode they are not. The OS provides a small resident program called the privileged software nucleus (analogous to the kernel). User programs could execute the non-privileged hardware instructions or make supervisory calls - e.g. SVC's (analogous to system calls) to the privileged software nucleus in order to have privileged functions - e.g. I/O performed on their behalf. While this works fine for many purposes, there are fundamental shortcomings with this approach. Let us consider a few:

Only one bare machine interface is exposed. Therefore, only one kernel can be run. Anything, whether it be another kernel (belonging to the same or a different operating system), or an arbitrary program that requires to talk to the bare machine (such as a low-level testing, debugging, or diagnostic program), cannot be run alongside the booted kernel even if the hardware had enough resources left.
One cannot perform any activity that would disrupt the running system (for example, upgrade, migration, system debugging, etc.) One also cannot run untrusted applications in a secure manner.
One cannot easily provide the illusion of a hardware configuration that one does not have (multiple processors, arbitrary memory and storage configurations, etc.) to some software.

A Loose Definition

We shall shortly enumerate several more reasons for needing virtualization, before which let us clarify what we mean by the term. Let us define virtualization in as all-encompassing a manner as possible for the purpose of this discussion...

Virtualization is a framework or methodology of dividing the resources of a computer into multiple execution environments, by applying one or more concepts or technologies such as hardware and software partitioning, time-sharing, partial or complete machine simulation, emulation, quality of service, and many others.

We shall note that this definition is rather loose, and includes concepts such as quality of service, which, even though being a separate field of study, is often used alongside virtualization. Often, such technologies come together in intricate ways to form interesting systems, one of whose properties is virtualization. In other words, the concept of virtualization is related to, or more appropriately synergistic with various paradigms. Let us consider the multi-programming paradigm — applications on Unix-like systems (actually almost all modern systems) run within a virtual machine model of some kind.

Since this section is an informal, non-pedantic overview of virtualization and how it is used, it is more appropriate not to strictly categorize the systems that we discuss.

Even though we defined it as such, the term virtualization is not always used to imply partitioning i.e. breaking something down into multiple entities. Here is an example of its different (intuitively opposite) connotation: We can take n disks, and make them appear as one (logical) disk through a virtualization layer (at this point LVM (Logical Volume Manager) comes into mind).

Grid computing enables the virtualization (ad hoc provisioning, on-demand deployment, decentralized, etc.) of distributed computing: IT resources such as storage, bandwidth, CPU cycles...

PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of Unix and/or Windows computers hooked together by a network to be used as a single large parallel computer. PVM is widely used in distributed computing.

Colloquially speaking, virtualization abstracts out things.

Why Virtualization, a List of Reasons

Following are some (possibly overlapping) representative reasons for and benefits of virtualization:

Virtual machines can be used to consolidate the workloads of several under-utilized servers to fewer machines, perhaps a single machine (server consolidation). Related benefits (perceived or real, but often cited by vendors) are savings on hardware, environmental costs, management, and administration of the server infrastructure.
The need to run legacy applications is served well by virtual machines. A legacy application might simply not run on newer hardware and/or operating systems. Even if it does, if may under-utilize the server, so as above, it makes sense to consolidate several applications. This may be difficult without virtualization as such applications are usually not written to co-exist within a single execution environment (consider applications with hard-coded System V IPC keys, as a trivial example).
Virtual machines can be used to provide secure, isolated sandboxes for running untrusted applications. We could even create such an execution environment dynamically — on the fly — as we download something from the Internet and run it. One can think of creative schemes, such as those involving address obfuscation. Virtualization is an important concept in building secure computing platforms.
Virtual machines can be used to create operating systems, or execution environments with resource limits, and given the right schedulers, resource guarantees. Partitioning usually goes hand-in-hand with quality of service in the creation of QoS-enabled operating systems.
Virtual machines can provide the illusion of hardware, or hardware configuration that we do not have (such as SCSI devices, multiple processors...) Virtualization can also be used to simulate networks of independent computers.
Virtual machines can be used to run multiple operating systems simultaneously i.e. different versions, or even entirely different systems, which can be on hot standby. Some such systems may be hard or impossible to run on newer real hardware.
Virtual machines allow for powerful debugging and performance monitoring. We can put such tools in the virtual machine monitor, for example. Operating systems can be debugged without losing productivity, or setting up more complicated debugging scenarios.
Virtual machines can isolate what they run, so they provide fault and error containment. We can inject faults proactively into software to study its subsequent behavior.
Virtual machines make software easier to migrate, thus aiding application and system mobility.
We can treat application suites as appliances by packaging and running each in a virtual machine.
Virtual machines are great tools for research and academic experiments. Since they provide isolation, they are safer to work with. They encapsulate the entire state of a running system e.g. we can save the state, examine it, modify it, reload it, and so on. The state also provides an abstraction of the workload being run.
Virtualization can enable existing operating systems to run on shared memory multiprocessors.
Virtual machines can be used to create arbitrary test scenarios, and can lead to some very imaginative, effective quality assurance.
Virtualization can be used to retrofit new features in existing operating systems without too much work.
Virtualization can make tasks such as system migration, backup, and recovery easier and more manageable.
Virtualization can be an effective means of providing binary compatibility.
Virtualization on commodity hardware has been popular in co-located hosting. Many of the above benefits make such hosting secure, cost-effective, and appealing in general.
Virtualization is fun.
Plenty of other reasons...

Why OpenVZ in particular?

OpenVZ is FLOSS (Free/Libre Open Source Software) which cannot be said of many other solutions e.g. VMware.
OpenVZ is damn fast and small (very small memory fingerprint); its overhead compared to running a native Linux kernel is just around 1-3%. Xen has around 7-11% and stuff like for example Qemu strikes at around 24-31% percent performance overhead. However, this is somewhat like comparing apples and bananas simply because all of the afore mentioned do not belong within a single group i.e. OpenVZ is OS level virtualization whereas Xen is paravirtualization and Qemu is emulation.
A huge part of OpenVZ is part of mainline already.
Debian provides OpenVZ images (no more wasting time in compiling kernels myself).
Compared to other operating system level virtualization solutions out there, OpenVZ seems to be the best choice in terms of long-term support and continuous development and efforts to merge its code into mainline
Fast and steadily growing userbase
Good documentation and IRC as well as ML (Mailing List) support from experts and developers.
I have no need to run any other OS than Linux therefore OpenVZ is a good choice; note, an OpenVZ host system can only run Linux flavors (e.g. Gentoo, Suse, RedHat, CentOS, etc.) as its OpenVZ container i.e. it is not possible to run Windows, some Apple stuff or for example FreeBSD within a VE also known as container also known as VE (Virtual Environment).
etc.

Features that come with OpenVZ

There are those featuers common to all container based virtualization approaches also known as OS level virtualization i.e. Linux-VServer, OpenVZ and so on. Then there are a few distinct features to OpenVZ not found with other virtualization solutions:

go here, here and here for more details

Hardware Node

HN (Hardware Node) (otherwise known as OpenVZ host system) is a term used in OpenVZ documentation. Basically it determines the physical box on which the OpenVZ enabled Linux kernel is installed. For example, one of my servers is a hardware node located in some datacenter. This HN has an OpenVZ enabled Linux kernel installed

sa@ri7:~$ uname -r
2.6.32-1-openvz-amd64
sa@ri7:~$

and thus becomes a OpenVZ host system also known as VE0 (VE (Virtual Environment Zero) and therefore a fully fledged platform to launch dozens of virtualized Linux OSes also known as OpenVZ containers. The image below shows the layers, which together, compose a functional OpenVZ setup.

OpenVZ Host System

With OpenVZ, we have multiple containers/VEs as well as the host system itself, which is otherwise known as VE0 — VE0 determines the abstraction layer sitting atop the HN (Hardware Node) and below of one or more OpenVZ VEs.

From VE0, we can use vzctl and other tools to manage containers. Also, from VE0, all the VE's processes, files, etc. are accessible plus we can manage the underlying HN, install a new kernel and things like that.

OpenVZ Container

A VE (Virtual Environment), VPS (Virtual Private Server) or CT (Container), etc. is one of the main concepts of OpenVZ. A VE is an isolated entity which performs and executes exactly like a stand-alone Linux.

VEs can be rebooted independently and have root access, users/groups, IP address(es), memory, processes, files, applications, system libraries and configuration files. OpenVZ allows to have multiple VEs (up to as many as several hundreds) on a single HN. If we want to manage a VE, we must use its identifier known as VEID (Virtual Environment IDentifer). A VEID is not always the same as a quota ID (more on that later).

Components

This section lists all components needed for a functioning OpenVZ system.

Kernel

The core component with any OpenVZ environment is its kernel — to be more precise: OpenVZ is an operating system-level virtualization technology based on the Linux kernel. Therefore the OpenVZ kernel actually is a Linux kernel, modified to add support for OpenVZ containers. The modified kernel provides virtualization, isolation, resource management, and checkpointing.

Management Utilities

OpenVZ needs some user-level tools installed. Those are:

vzctl, a utility to control OpenVZ containers (create, destroy, start, stop, set parameters etc.) and
vzquota, a utility to manage quotas for containers like for example disk quotas. - although vzpkg2 and pkg-cacher are no management utilities in a strict sense but rather they are tools to create and manage OS templates, mentioning them here seems appropriate

OS Templates

Please go here for detailed information. Commonly speaking, nowadays, we use vzpkg2 in conjunction with pkg-cacher to create OS templates. However, aside from that there are a few other possibilities to create OS templates respectively to set up VE rather quickly:

Binary Images: Some folks use binary images which they created with the particular goal to have a prototype for cloning available.

LVM Snapshots: I am very fond of using LVM Snapshots i.e. LV (Logical Volumes) — the usage however is the same, I create particular prototypes e.g. a mail system which I set up once within a VE, configure as I think fit and then create a snapshot.; This snapshot is not going to be a production LV but a prototype to clone from and create as many mail systems as needed. Only little configuration is left to do for any particular mail system VE to adapt it to individual settings like for example domain name, MX record, IP address, etc. — this can be automated as well e.g. with an interactive shell script.; Working with VE and LVM snapshots to clone from dramatically speeds up deployment of new systems and decreases chances for bugs simply because once a bug is fixed it is not made a second time as often with setting up things manually over and over again.
BTRFS (B-Tree File System): A new file system, supporting snapshots and having the notion of replication built-in. So, compared to ext4 or ext3 etc., this one offers means of deploying VEs that go far beyond any other major file system that Linux offers as of now (November 2008).
SCM (Software Configuration Management) with Puppet: More information can be found here.
At this point, the reader should have an idea about what OpenVZ is and how it works. Before taking on with the more practical part, I strongly recommend to read through the FAQs (Frequently Asked Questions) to avoid misunderstandings and tedious work with subsequent tasks.

Installation

This section details how to acquire all the OpenVZ components and install them onto some bare metal box from scratch. After installing all that is needed for a functioning OpenVZ environment we need to set up (configure) various parts of/with the OpenVZ components in order to make the whole shebang ready for operations. Once configured, an OpenVZ environment needs to be managed which is covered with a dedicated section further down.

I am not going to talk about non-mainline procedures like for example tinkering around with ready-made binary images, using some sort of SCM (Software Configuration Management) system, FAI (Fully Automatic Installation) or even better, Puppet. Instead I am focusing on the standard procedure i.e.

Installing a standard Debian system with the Debian installer (e.g via Netinstall images) and then
Using APT (Advanced Packaging Tool) to install the OpenVZ components over the Internet.

In short, the prerequisites needed for installing OpenVZ is an installed Debian system with Internet connectivity.

The Debian Way

Starting with Linux kernel version 2.6.26, Debian provides official OpenVZ kernels (acsn, acsh, etc. are just aliases in my ~/.bashrc)

sa@wks:~$ acsn linux-image-openvz
linux-image-openvz-amd64 - Linux image on AMD64
sa@wks:~$ acsh linux-image-openvz-amd64
Package: linux-image-openvz-amd64
Priority: optional
Section: kernel
Installed-Size: 8
Maintainer: Debian Kernel Team <[email protected]>
Architecture: amd64
Source: linux-latest-2.6 (27)
Version: 2.6.32+27
Provides: linux-latest-modules-2.6.32-5-openvz-amd64
Depends: linux-image-2.6.32-5-openvz-amd64
Filename: pool/main/l/linux-latest-2.6/linux-image-openvz-amd64_2.6.32+27_amd64.deb
Size: 3048
MD5sum: 70376bd1d10280013a848e5106c1a147
SHA1: 470c8e023b2ab4ef6522e17656e85426f73cf775
SHA256: 968024de8bc4a4212dffda6d2ba4665703fa8af2c9fb2fa76b65b9b45af156d1
Description: Linux for 64-bit PCs (meta-package), OpenVZ support
 This package depends on the latest Linux kernel and modules for use on PCs
 with AMD64 or Intel 64 processors.
 .
 This kernel includes support for OpenVZ container-based virtualization.
 .
 This kernel also runs on a Xen hypervisor.  It supports only unprivileged
 (domU) operation.

sa@wks:~$

Excellent! No more need we acquire the OpenVZ Linux kernel patch as well as the vanilla sources, patch them and finally rebuild the Linux kernel in order to get an OpenVZ enabled Linux kernel. So, all we need to do now is to issue aptitude install linux-image-openvz-amd64 which not just installs the kernel put also the userspace tools because the kernel package lists them as dependencies:

sa@wks:~$ apt-cache --recurse depends linux-image-openvz-amd64 | grep vz | grep -B1 Depends
linux-image-openvz-amd64
  Depends: linux-image-2.6.32-5-openvz-amd64
linux-image-2.6.32-5-openvz-amd64
  Depends: vzctl
vzctl
  Depends: vzquota
sa@wks:~$

As can be seen below, I have already installed all the OpenVZ components — issuing date is just to indicate when this worked already with Debian... if I recall correctly, official OpenVZ kernels are available since around end of July 2008.

sa@wks:~$ type dpl
dpl is aliased to `dpkg -l'
sa@wks:~$ dpl {linux-image*,vz*} | egrep open\|vz
ii  linux-image-2.6.32-1-openvz-amd64          2.6.32-1                      Linux 2.6.32 image on AMD64, OpenVZ support
ii  linux-image-openvz-amd64                   2.6.32+27                     Linux image on AMD64
ii  vzctl                                      3.0.23-16                     server virtualization solution - control too
ii  vzquota                                    3.0.12-3                      server virtualization solution - quota tools
sa@wks:~$ date
Tue May 25 10:48:10 CEST 2010
sa@wks:~$ uname -a
Linux sub 2.6.32-1-openvz-amd64 #1 SMP Wed May 20 13:06:07 UTC 2010 x86_64 GNU/Linux
sa@wks:~$ su
Password:
sub:/home/sa# /etc/init.d/vz status
OpenVZ is running...
sub:/home/sa#

Setup

After we have installed all the OpenVZ components, we need to set them up and configure the whole shebang.

General Hints

This subsection provides a few guidelines which I generally consider best practice for OpenVZ deployment:

The OpenVZ host system has minimum software installed. Although any Linux distribution can be used, I use Debian because of its excellent support for all OpenVZ components. Additional applications are installed as needed along the deployment.
The OpenVZ host system should be as secure as possible. On the other hand, I want to keep it simple and easy to setup/maintain. So I chose a compromise: I rely only on what can be easily deployed with Debian and do not go for outside-of-the-package-management stuff i.e. I only pick what can be installed/removed via APT (Advanced Packaging Tool).
Each service is deployed in a separate VE i.e. an FTP server providing the FTP service does not share a VE with something else like for example an MTA (Mail Transfer Agent). I actually tend to not even but a whole mail system into one VE but split it even further into subparts i.e. one VE for the MTA, another VE for the IMAP (Internet Message Access Protocol) server and so forth.
an IDS (Intrusion Detection System) for the OpenVZ host system as well as the VEs is deployed on host system using e.g. OSSEC
Firewalling (iptables) is done on the real server for the most part. All VEs run only the services. However, in some cases — where necessary and/or it makes sense — firewalling may also be done within some VEs in addition to the firewalling performed on the OpenVZ host system already.
I use SSH (Secure Shell) to access and maintain the OpenVZ host system with its VEs.

Setup Prerequisites

Ok, now that we have installed everything needed, we can start setting up things e.g. creating VEs also known as OpenVZ containers. However, before we start firing up one VE after the other we need to know a few things about the internals about how OpenVZ works, best practices etc. First of all, there is something called OS (Operating System) templates.

OS Templates

Please go here to see how they are acquired and verified.

Figuring some Nameservers to use

When configuring VEs further down, we need to provide each VE with nameserver entries. Note that setting nameservers within a VEs /etc/resolv.conf is a non-persistent thing i.e. when a VE gets rebooted, those settings vanish and get replaced by settings made in /etc/vz/conf/<CTID>.conf, which is exactly where we should put that kind of information.

Those who take a look at resolv.h or man 5 resolv.conf will notice, that as of now (June 2009), three nameserver entries are supported. Those are tried in order i.e. the algorithm used tries a nameserver, and if the query times out, it tries the next one until out of nameservers, then it repeats trying all the nameservers until a maximum number of retries is made.

So, we want three nameservers, possibly independent ones. I came up with the below in order to get rid of this repetitive task once and for all...

sa@wks:~$ dig {yahoo,google,microsoft}.com NS | grep ^ns
ns1.yahoo.com.          170667  IN      A       66.218.71.63
ns4.yahoo.com.          171427  IN      A       68.142.196.63
ns5.yahoo.com.          699     IN      A       119.160.247.124
ns6.yahoo.com.          171642  IN      A       202.43.223.170
ns8.yahoo.com.          169475  IN      A       202.165.104.22
ns1.google.com.         345117  IN      A       216.239.32.10
ns2.google.com.         345372  IN      A       216.239.34.10
ns3.google.com.         345372  IN      A       216.239.36.10
ns4.google.com.         340223  IN      A       216.239.38.10
ns1.msft.net.           2342    IN      A       207.68.160.190
ns2.msft.net.           154     IN      A       65.54.240.126
ns3.msft.net.           822     IN      A       213.199.161.77
ns4.msft.net.           2342    IN      A       207.46.66.126
ns5.msft.net.           787     IN      A       65.55.238.126
sa@wks:~$

More on how to use dig can be found here. With a little more magic we can ease VE creation

sa@wks:~$ alias | grep nas
alias nas='dig {yahoo,google,microsoft}.com NS | grep ^ns1 | cut -f6 | xargs -I {} echo -n " --nameserver {}"'
sa@wks:~$ nas
 --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190sa@wks:~$
sa@wks:~$

This output can then be directly used to set nameservers for some VE

sub:/home/sa# vzctl set sub_ve0 --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190 --save
Saved parameters for VE 101
sub:/home/sa#

sysctl

Please go here for common information with regards to sysctl. In conjunction with OpenVZ, the sysctl Linux kernel interface is important for us since we need to set a few kernel parameters needed to run an OpenVZ environment.

  1  sub:/home/sa# cat /etc/sysctl.conf
  2  ###_ main
  3  ###_. misc
  4  #kernel.domainname = example.com
  5  # Uncomment the following to stop low-level messages on console
  6  #kernel.printk = 4 4 1 7
  7  ###_. functions previously found in netbase
  8  # Uncomment the next two lines to enable Spoof protection
  9  # (reverse-path filter) Turn on Source Address Verification in all
 10  # interfaces to prevent some spoofing attacks
 11  #net.ipv4.conf.default.rp_filter=1
 12  #net.ipv4.conf.all.rp_filter=1
 13  # Uncomment the next line to enable TCP/IP SYN cookies
 14  #net.ipv4.tcp_syncookies=1
 15  # Uncomment the next line to enable packet forwarding for IPv4
 16  #net.ipv4.ip_forward=1
 17  # Uncomment the next line to enable packet forwarding for IPv6
 18  #net.ipv6.conf.all.forwarding=1
 19  ###_. security
 20  # Additional settings - these settings can improve the network
 21  # security of the host and prevent against some network attacks
 22  # including spoofing attacks and man in the middle attacks through
 23  # redirection. Some network environments, however, require that these
 24  # settings are disabled so review and enable them as needed.
 25  ###_ , ICMP broadcasts
 26  # Ignore ICMP broadcasts
 27  #net.ipv4.icmp_echo_ignore_broadcasts = 1
 28  ###_ , ignore ICMP errors
 29  # Ignore bogus ICMP errors
 30  #net.ipv4.icmp_ignore_bogus_error_responses = 1
 31  ###_ , ICMP redirects
 32  # Do not accept ICMP redirects (prevent MITM attacks)
 33  #net.ipv4.conf.all.accept_redirects = 0
 34  #net.ipv6.conf.all.accept_redirects = 0
 35  # _or_
 36  # Accept ICMP redirects only for gateways listed in our default
 37  # gateway list (enabled by default)
 38  # net.ipv4.conf.all.secure_redirects = 1
 39  ###_ , send ICMP redirects
 40  # Do not send ICMP redirects (we are not a router)
 41  #net.ipv4.conf.all.send_redirects = 0
 42  ###_ , accept IP source route packets
 43  # Do not accept IP source route packets (we are not a router)
 44  #net.ipv4.conf.all.accept_source_route = 0
 45  #net.ipv6.conf.all.accept_source_route = 0
 46  ###_ , log Martian Packets
 47  #net.ipv4.conf.all.log_martians = 1
 48  ###_ , /proc/<pid>/maps
 49  # The contents of /proc/<pid>/maps and smaps files are only visible to
 50  # readers that are allowed to ptrace() the process
 51  # sys.kernel.maps_protect = 1
 52  ###_. openvz
 53  ###_ , packet forwarding for IPv4/6
 54  net.ipv4.ip_forward=1
 55  #net.ipv6.conf.all.forwarding=1
 56  ###_ , magic-sysrq key
 57  kernel.sysrq = 1
 58  ###_ , ICMP broadcasts
 59  net.ipv4.icmp_echo_ignore_broadcasts=1
 60  ###_ , ICMP redirects
 61  # Do not send ICMP redirects (we are not a router)
 62  net.ipv4.conf.all.send_redirects = 0
 63  ###_ , spoof protection
 64  # Enable Spoof protection (reverse-path filter) Turn on Source Address
 65  # Verification in all interfaces to prevent some spoofing attacks
 66  net.ipv4.conf.default.rp_filter=1
 67  net.ipv4.conf.all.rp_filter=1
 68  ###_ , proxy ARP
 69  # Disabling proxy ARP per default
 70  net.ipv4.conf.default.proxy_arp=0
 71  # Enabling for eth0
 72  net.ipv4.conf.eth0.proxy_arp=1
 73  ###_ , interfaces redirects
 74  # We do not want all our interfaces to send redirects
 75  net.ipv4.conf.default.forwarding=1
 76  net.ipv4.conf.default.send_redirects = 1
 77  ###_ emacs local variables
 78  # Local Variables:
 79  # mode: conf
 80  # allout-layout: (0 : 0)
 81  # End:
 82  sub:/home/sa# cat /etc/sysctl.conf | grep -v \#
 83  net.ipv4.ip_forward=1
 84  kernel.sysrq = 1
 85  net.ipv4.icmp_echo_ignore_broadcasts=1
 86  net.ipv4.conf.all.send_redirects = 0
 87  net.ipv4.conf.default.rp_filter=1
 88  net.ipv4.conf.all.rp_filter=1
 89  net.ipv4.conf.default.proxy_arp=0
 90  net.ipv4.conf.eth0.proxy_arp = 1
 91  net.ipv4.conf.default.forwarding=1
 92  net.ipv4.conf.default.send_redirects = 1
 93  wks:/home/sa# sysctl -p
 94  net.ipv4.ip_forward = 1
 95  kernel.sysrq = 1
 96  net.ipv4.icmp_echo_ignore_broadcasts = 1
 97  net.ipv4.conf.all.send_redirects = 0
 98  net.ipv4.conf.default.rp_filter = 1
 99  net.ipv4.conf.all.rp_filter = 1
100  net.ipv4.conf.default.proxy_arp = 0
101  net.ipv4.conf.eth0.proxy_arp = 1
102  net.ipv4.conf.default.forwarding = 1
103  net.ipv4.conf.default.send_redirects = 1
104  sub:/home/sa#

Line 93 is important. We do not want to reboot the HN so we reload the settings from /etc/sysctl.conf online i.e. no downtime for the whole system. Lines 83 to 92 show the essential settings I recommend for running an OpenVZ system — one might deviate from that based on personal likings and needs. However, certain settings are simply necessary like for example line 54 or line 72.

Setup

At this point we have two possibilities to choose from:

The first use case is the one where we have a non-moving HN with a bunch of static IP addresses like for example a server within a datacenter. Actually, OpenVZ is pretty much straight forward to set up for this use case. Every VE might get its own public, static IP address and we are done. Of course, there is no need to assign a static IP address to any VE — not even do we need to assign a static IP address to VE0 also known as OpenVZ host system. There might for example be VEs that just contain a DBMS (Database Management System) which servers another VE on the same HN i.e. the VE containing the DBMS would not have assigned a public IP address.
The second use case is where we set up an OpenVZ environment on some moving HN like for a example my subnotebook. This requires us to assign IP addresses to VEs via DHCP (Dynamic Host Configuration Protocol). Note, that this has nothing to do with setting up a dhcpd within some VE.

Time to set up containers aka VEs...

VE with static IPv4 address

This subsubsection covers how to set up a VE which uses a static IPv4 address.

Basic Setup

 1  wks:/home/sa# vzctl create 101 --config vps.basic --ostemplate debian-5.0-amd64-minimal
 2  Creating VE private area (debian-5.0-amd64-minimal)
 3  Performing postcreate actions
 4  VE private area was created
 5  wks:/home/sa# vzctl set 101 --hostname wks-ve1 --name stable --save
 6  Name stable assigned
 7  Saved parameters for VE 101
 8  wks:/home/sa# vzctl set 101  --ipadd 192.168.1.100 --save
 9  Saved parameters for VE 101
10  wks:/home/sa# vzctl set 101  --nameserver 68.180.131.16 --nameserver 216.239.32.10 --nameserver 207.68.160.190 --save
11  Saved parameters for VE 101
12  wks:/home/sa# vzctl set stable --userpasswd sa:xxxxxxxxxxxxxxx
13  Starting VE...
14  VE is mounted
15  VE start in progress...
16  Stopping VE...
17  VE was stopped
18  VE is unmounted
19  wks:/home/sa# vzctl set stable --onboot no --save
20  Saved parameters for VE 101

Setting up a VE running Debian is what we do in lines 1 to 20. In line 1 we create the VE and thereby specify what configuration to use for its initial setup — I opted for /etc/vz/conf/ve-vps.basic.conf-sample as can be seen.

The VEID we use is 101 which is the first VEID not reserved for internal use by OpenVZ (see here for more details). We also specify the OS template to use, which is the one we downloaded and verified earlier.

Starting with line 5, we can now start to add the needed configurations to our VE. In order to do so we need to first specify the VE in question which is 101. Then we set the hostname and an alias name (see files). In line 8 we give the VE with the VEID 101 respectively the name stable its static IPv4 address. The VE also needs a nameserver(s) which we set in line 10.

Line 12 creates a user and sets a password for this particular user within our VE — in case the user does not exist it is created. Note that in line 12 we used stable instead of the VEID 101 for the first time.

The alerted reader might have noticed that the --save option is absent with this line — this is because the username and password are not saved in configuration files but they are applied directly to the VE by modifying its /etc/passwd and /etc/shadow files. In case the VE root is not mounted, it is automatically mounted, all appropriate file changes are applied, then it is unmounted again as we can see from lines 13 to 18.

Because as of now (March 2009) the default setting when using vps.basic is to set the VE up for being started when the HN reboots, I issued line 19 in order to prevent that from happening — I do not need/want it for that particular VE called stable respectively 101. In case a VE would be used to host a website or something like that, we would of course want for the VE to boot when the HN boots as well.

Basic Commands/Usage

21  wks:/home/sa# vzctl start stable
22  Starting VE...
23  VE is mounted
24  Adding IP address(es): 192.168.1.100
25  Setting CPU units: 1000
26  Configure meminfo: 65536
27  Set hostname: wks-ve1
28  File resolv.conf was modified
29  VE start in progress...
30  wks:/home/sa# vzlist -a
31        VEID      NPROC STATUS  IP_ADDR         HOSTNAME
32         101          7 running 192.168.1.100   wks-ve1
33  wks:/home/sa# vzlist -an
34        VEID      NPROC STATUS  IP_ADDR         NAME
35         101          7 running 192.168.1.100   stable

In line 21 it is time to start our VE. When it finished, we list all currently running VEs in lines 31 to 35, once showing their hostname as we set it in line 5 and once its alias name, also as we set it in line 5. If we compare the column VEID and NAME, we can see that they are a match i.e. stable is an alias for VEID 101.

36  wks:/home/sa# vzctl enter stable
37  entered into VE 101
38  root@wks-ve1:~# date -u
39  Wed Mar  4 19:52:26 UTC 2010
40  root@wks-ve1:/# uname -a
41  Linux wks-ve1 2.6.32-1-openvz-amd64 #1 SMP Sat May 10 18:52:53 UTC 2010 x86_64 GNU/Linux
42  root@wks-ve1:/# whoami
43  root
44  root@wks-ve1:/# pwd
45  /
46  root@wks-ve1:~# passwd
47  Enter new UNIX password:
48  Retype new UNIX password:
49  passwd: password updated successfully
50  root@wks-ve1:~# cat /etc/debian_version
51  5.0

Next we enter the VE in line 36 — note how the prompt changes from wks:/home/sa# to root@wks-ve1:~#. The remainder until line 51 is basically just to try a few basic things and look around. However, as of now the user root had now password set which is why we issued line 46 and then set a password for root.

52  root@wks-ve1:~# cat /etc/resolv.conf
53  nameserver 68.180.131.16
54  nameserver 216.239.32.10
55  nameserver 207.68.160.190
56  root@wks-ve1:~# cat /etc/apt/sources.list
57  deb      http://ftp2.de.debian.org/debian lenny main contrib non-free
58  deb      http://ftp2.de.debian.org/debian-security lenny/updates main contrib non-free
59  root@wks-ve1:~# sed -i s/lenny/stable/ /etc/apt/sources.list
60  root@wks-ve1:~# sed -i s/ftp2/ftp/ /etc/apt/sources.list
61  root@wks-ve1:~# sed -i s/http/ftp/ /etc/apt/sources.list
62  root@wks-ve1:~# cat /etc/apt/sources.list
63  deb      ftp://ftp.de.debian.org/debian stable main contrib non-free
64  deb      ftp://ftp.de.debian.org/debian-security stable/updates main contrib non-free
65  root@wks-ve1:~# ping -c2 debian.org
66  PING debian.org (194.109.137.218) 56(84) bytes of data.
67  64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=1 ttl=49 time=26.7 ms
68  64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=2 ttl=49 time=26.5 ms
69
70  --- debian.org ping statistics ---
71  2 packets transmitted, 2 received, 0% packet loss, time 1006ms
72  rtt min/avg/max/mdev = 26.502/26.603/26.705/0.192 ms
73  root@wks-ve1:~# ping -c2 google.com
74  PING google.com (209.85.171.100) 56(84) bytes of data.
75  64 bytes from cg-in-f100.google.com (209.85.171.100): icmp_seq=1 ttl=233 time=186 ms
76  64 bytes from cg-in-f100.google.com (209.85.171.100): icmp_seq=2 ttl=233 time=181 ms
77
78  --- google.com ping statistics ---
79  2 packets transmitted, 2 received, 0% packet loss, time 1006ms
80  rtt min/avg/max/mdev = 181.314/183.724/186.135/2.448 ms

Line 53 to 55 show the nameserver(s) which we set in line 10. Note that in certain cases, one might need to not set the nameservers as we did in line 10 but to provide his gateways IPv4 address e.g. 192.168.1.1 in my case which is my DSL (Digital Subscriber Line) modem.

The current nameserver setup as it can be seen in lines 53 to 55 works perfectly fine for some server within a datacenter where our server would have unlimited/unfiltered connectivity to the Internet and therefore any nameserver we would like to poll for DNS (Domain Name System) resolution.

In lines 56 to 64 I am just tailoring our VEs /etc/apt/sources.list file to my likings — the lines 57 and 58 are just fine though.

Lines 65 and 73 are to test if we really have connectivity from inside our VE 101 also known as stable, to the outside world. We do as can be seen — our settings from line 8 and 10 where successful in setting up the basic networking.

 81  root@wks-ve1:~# aptitude update && aptitude full-upgrade
 82  Get:1 ftp://ftp.de.debian.org stable Release.gpg [386B]
 83  Get:2 ftp://ftp.de.debian.org stable/updates Release.gpg [189B]
 84
 85
 86  [skipping a lot of lines...]
 87
 88
 89  The following packages will be upgraded:
 90    apt apt-utils base-passwd debian-archive-keyring dhcp3-client dhcp3-common dpkg gcc-4.2-base iptables libattr1 libcwidget3 libgnutls26 libreadline5 man-db procps readline-common rsyslog
 91  The following packages are RECOMMENDED but will NOT be installed:
 92    psmisc
 93  17 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
 94  Need to get 7952kB of archives. After unpacking 156kB will be freed.
 95  Do you want to continue? [Y/n/?] y
 96  Writing extended state information... Done
 97  Get:1 ftp://ftp.de.debian.org stable/main dpkg 1.14.25 [2399kB]
 98  Get:2 ftp://ftp.de.debian.org stable/main base-passwd 3.5.20 [41.7kB]
 99
100
101  [skipping a lot of lines...]
102
103
104  Setting up procps (1:3.2.7-11)...
105  Installing new version of config file /etc/sysctl.conf...
106  Installing new version of config file /etc/init.d/procps...
107  Setting kernel variables (/etc/sysctl.conf)...done.
108  Reading package lists... Done
109  Building dependency tree
110  Reading state information... Done
111  Reading extended state information
112  Initializing package states... Done
113
114  Current status: 0 updates [-17].
115  root@wks-ve1:/# exit
116  logout
117  exited from VE 101
118  wks:/home/sa#

Nothing special in lines 81 to 118. All we do is to issue line 81 in order to update our stable Debian release with up to date packages and security updates. Last thing to say, we are in business ladies and gentlemen, ready to rock... From now on this VE can be threaded just a physical machine running Debian stable — no one would notice a difference without explicitly knowing that this is actually an OpenVZ VE (Virtual Environment).

VE with IPv4 address assigned via DHCP

WRITEME

Nice to Have

This subsubsection covers a few optional nice to have settings in addition to the basic setup of a VE. What I consider in this regard is

A VE should only have minimal set of software installed i.e. we start out with an OS template like for example debian-5.0-amd64-minimal. We then only add those packages needed by the VE to carry out its particular job — a VE used as an instant messaging VE would for example get ejabberd installed in addition to the minimal set of packages provided by the OS template. In addition to the minimal set plus the purpose specific packages I mostly install a few additional packages:

Install bash-comletion package. Enable bash completion in interactive shells.
Install locate respectively mlocate.
Install tree.
Install htop.
Install locales. Issue dpkg-reconfigure locales, pick en_US.UTF-8 and do not forget to go on OK and only then hit return i.e. not right after marking en_US.UTF-8.
Install and setup apt-list{changes,bugs}.
Install apt-file and run apt-file update.
Configure /etc/hosts and /etc/network/interfaces /etc/resolv.conf (note: for VEs this is done in their config file e.g. /etc/vz/conf/3003.conf).
Install and setup etckeeper. Put /etc/etckeeper/pre-install.d/50uncommitted-changes and /etc/etckeeper/post-install.d/99git-gc in place.
Install and setup a basic SSH (Secure Shell) setup with password authentication. Setup the banner message.
For firewalling, put /root/netfilter_ports_to_accept in place and edit as needed. netfilter_ports_to_accept is a file used to specify which ports (for the VE) should be opened on the firewall running on the HN. The script used for packet filtering is packet_filter. It runs on the HN and provides firewalling for both, the VEs and the HN.
for my user sa, put my well beloved ~/.bashrc into place (at least parts of it i.e. /home/sa/.bashrc_downsized_version_for_sa_to_deploy_inside_ves)
set up a colorized bash prompt
custom tailor the bash history settings

In case machine is a HN, we do some additional stuff:

Install chrony on the HN (not within a VE)
Install and setup Unison.
Install vnstat and run vnstat -u -i eth0 to initialize a new database.
Install ulogd so netfilter can do extended logging. Also, install lynx, moreutils (which contains ifdata, used to extract mac address for example) and fping which are needed by packet_filter.
Prepare /etc/network/interfaces to automatically start the firewall on boot.

Security relevant Goodies

further configure the SSH (Secure Shell) setup
- use TCP wrappers
- make use of PKA (Public Key Authentication)
install pwgen
install and configure sxid in order to keep an eye on the suid and sgid bit

Storage

http://wiki.openvz.org/Disk_quota
works on ext{2,3,4}

Networking

The OpenVZ network virtualization layer is designed to isolate VEs from each other and from the physical network:

Each VE has its own IP address; multiple IP addresses per VE are allowed.
Network traffic of a VE is isolated from the other VEs. In other words, VE are protected from each other in the way that makes traffic snooping impossible.
Firewalling may be used inside a VE (the user can create rules limiting access to some services using the canonical iptables tool inside a VE). In other words, it is possible to set up firewall rules from inside a VE.
Routing table manipulations and advanced routing features are supported for individual VEs. For example, setting different maximum transmission units (MTUs) for different destinations, specifying different source addresses for different destinations, and so on.

veth versus venet

Resource Management

OpenVZ resource management controls the amount of resources available for VEs. The controlled resources include such parameters as CPU power, diskspace, a set of memory-related parameters, etc. Resource management allows OpenVZ to:

Effectively share available host system resources among VEs
Guarantee Quality-of-Service (QoS)
Provide performance and resource isolation and protect from denial-of-service attacks
Collect usage information for system health monitoring

Resource management is much more important for OpenVZ than for a standalone computer since computer resource utilization in an OpenVZ-based system is considerably higher than that found with typical system. As all the VEs are using the same kernel, resource management is of paramount importance i.e. each VE stays within its boundaries and does not affect other VEs in any way — and this is what resource management does. OpenVZ resource management consists of five main components:

Disk Space: two-level disk quota
Disk I/O: two-level disk I/O scheduler
CPU: fair CPU scheduler
Memory: RAM (Random Access Memory)
user beancounters

Please note that all those resources can be changed during VE run time also known as on-line, there is no need to reboot and therefore no downtime. For example, if we want to give some VE less RAM (Random Access Memory, we just change the appropriate parameters on the fly. This is either very hard to do or not possible at all with other virtualization approaches such as VMware, Xen, etc.

Disk Space

The OpenVZ host system administrator can set up a per-container disk quotas, in terms of disk blocks and inodes (roughly number of files). This is the first level of disk quota. In addition, a VE administrator can employ usual quota tools inside a particular VE to set standard UNIX per-user and per-group disk quotas.

If one wants to give a VE more diskspace, he might just increase its disk quota limit. No need to resize disk partitions etc. However, I recommend to use LVM (Logical Volume Manager) to set up LVs (Logical Volumes) for VEs i.e. any VE gets its own LV (Logical Volume).

Disk I/O

Similar to the Fair CPU scheduler described above, the I/O scheduler in OpenVZ is also two-level, utilizing the CFQ I/O scheduler on its second level.

Each VE is assigned an I/O priority, and the I/O scheduler distributes the available I/O bandwidth according to the priorities assigned. Thus no single container can saturate an I/O channel.

CPU

The CPU scheduler in OpenVZ is a two-level implementation that implements a fair-share scheduling strategy.

On the first level the scheduler decides which VE is given the CPU time slice, based on per-VE cpuunit values. On the second level the standard Linux scheduler decides which process to run within that particular VE, using standard Linux process priorities and such.

We can set up different values of cpuunits for different VEs so the CPU time will be given to those proportionally. Also there is a way to limit CPU time, e.g. say a VE is limited to 10% of CPU time available.

Memory

WRITEME

http://maxgarrick.com/understanding-openvz-resource-limits/
- memory is meassured in pages, which are 4kB on on x86-64
http://forum.openvz.org/index.php?t=msg&goto=32000&

User Beancounters

User beancounters is a set of per-VE counters, limits, and guarantees. There is a set of about 20 parameters which are carefully chosen to cover all the aspects of VE operation, so no single container can abuse any resource which is limited for the whole node and thus do harm to another VEs.

Resources accounted and controlled are mainly memory and various in-kernel objects such as IPC shared memory segments, network buffers etc. Each resource can be seen from /proc/user_beancounters (see files) and has five values assiciated with it:

current usage
maximum usage (for the lifetime of a VE)
barrier
limit and
fail counter

The meaning of barrier and limit is parameter-dependent i.e. those can be thought of as a soft limit and hard limit. If any resource hits the limit, its fail counter is increased, so the VE administrator can see if something bad is happening by analyzing the output of /proc/user_beancounters of this particular VE. More on that subject also known as monitoring can be found here.

WRITEME (those few lines below are just notes for now; examples will follow)

http://wiki.openvz.org/UBC_parameters
http://wiki.openvz.org/UBC_secondary_parameters
http://wiki.openvz.org/Setting_UBC_parameters
http://wiki.openvz.org/UBC_failcnt_reset
http://wiki.openvz.org/BC_proc_entries
- In new interface only /proc/user_beancounters entry is left for compatibility with old tools. New entries reside in /proc/bc directory.
http://wiki.openvz.org/Proc/user_beancounters

Example how to set a new barrier/limit:

Set beancounters without downtime: vzctl set 3007 --dcachesize 27279360:28999680 --save.

This example sets dcachesize for VE with VEID 3007 to barrier 27279360 and limit 28999680. It also saves those settings to /etc/vz/conf/3007.conf because of the --save i.e. it does not just alter the settings at run time until the VE is restarted and then it takes the old settings from /etc/vz/conf/3007.conf.

Disable Resource Limits

Checkpointing and Live Migration

Go here for more information.

Best Practice

http://www.howtoforge.com/some-tips-on-openvz-deployment

Monitoring

http://wiki.openvz.org/Category:Monitoring
http://wiki.openvz.org/Monitoring_openvz_resources_using_yabeda
http://gforge.opensource-sw.net/projects/nagios_openvz (Robert Nelsons ml post after he released it as 0.9.0; actually he rewrote the existing plug-in... w00t!)
- This Nagios plug-in monitors the user_beancounters and quotas of a remote OpenVZ system.

Backup a VE

Clone a VE

Assuming we do not want to start from scratch with a new OS template because we have already added some nice to have cookies or maybe we just want to test some new software on some particular VE but not take the risk to suffer from potential failures that might arise from doing so...

The right approach to such problems would be branching — an SCM (Software Configuration Management) term — but then we are talking OpenVZ now and not GIT right now. However, with OpenVZ we can do some sort of branching — it is called cloning a VE and is actually a trivial endeavor as we will see below.

What we would like to accomplish is to clone a VE called testing which has already some nice to have stuff added to it.

What we are going to do is to clone testing and add an SSH (Secure Shell) setup, thereby creating a VE which is Debian testing, has a few nice to have things added, plus it provides a ready to use SSH setup. This VE will then be called testing_ssh. testing_ssh is then a perfect prototype VE (Virtual Environment) for further cloning/branching:

For example, we might use it as an VE that provides instant messaging simply by cloning from testing_ssh and installing and configuring ejabberd.
If we wanted to set a an FTP (File Transfer Protocol) server, we would also clone testing_ssh and then install vsftp.
In case we needed a mail system, we would again clone testing_ssh and install postfix, dovecot and mailscanner.
etc.

My rationale here is simple: I have a few prototypes which I keep up to date and then use for cloning whenever I need a VE based on some prototype VE. The prototype VEs I have are

stable: Debian stable release with nice to have goodies; minimum set of packages, no special security setup in place, just PKA (Public Key Authentication) SSH setup.
testing: Debian testing release with nice to have goodies; minimum set of packages, no special security setup in place, just PKA SSH setup.
unstable: Debian unstable release with nice to have goodies; minimum set of packages, no special security setup in place, just PKA SSH setup.
stable_sec: same as stable but with Monkeysphere SSH setup. In addition to that, there is also a special security and monitoring setup in place.
testing_sec: same as testing but with Monkeysphere SSH setup. In addition to that, there is also a special security and monitoring setup in place.

 1  wks:/var/lib/vz# vzlist -a
 2        VEID      NPROC STATUS  IP_ADDR         HOSTNAME
 3         101          7 running 192.168.1.100   wks-ve1
 4         102          - stopped 192.168.1.101   wks-ve2
 5         103          7 running 192.168.1.102   wks-ve3
 6  wks:/var/lib/vz# vzlist -an
 7        VEID      NPROC STATUS  IP_ADDR         NAME
 8         101          7 running 192.168.1.100   stable
 9         102          - stopped 192.168.1.101   testing
10         103          7 running 192.168.1.102   unstable
11  wks:/var/lib/vz# la root/
12  total 8
13  drwxr-xr-x  5 root root   36 2009-03-04 23:32 .
14  drwxr-xr-x  7 root root   68 2008-10-23 21:37 ..
15  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101
16  drwxr-xr-x  2 root root    6 2009-03-04 23:28 102
17  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103
18  wks:/var/lib/vz# mkdir root/105
19  wks:/var/lib/vz# cp -a /etc/vz/conf/{102,105}.conf
20  wks:/var/lib/vz# la /etc/vz/conf
21  total 36
22  drwxr-xr-x 2 root root 4096 2009-03-06 13:04 .
23  drwxr-xr-x 6 root root   66 2008-11-17 12:52 ..
24  -rw-r--r-- 1 root root  246 2008-10-18 15:39 0.conf
25  -rw-r--r-- 1 root root 1768 2009-03-04 23:19 101.conf
26  -rw-r--r-- 1 root root 1862 2008-09-07 20:16 101.conf.destroyed
27  -rw-r--r-- 1 root root 1769 2009-03-04 23:30 102.conf
28  -rw-r--r-- 1 root root 1770 2009-03-04 23:34 103.conf
29  -rw-r--r-- 1 root root 1769 2009-03-04 23:30 105.conf
30  -rw-r--r-- 1 root root 1576 2008-10-18 15:39 ve-light.conf-sample
31  -rw-r--r-- 1 root root 1541 2008-10-18 15:39 ve-vps.basic.conf-sample
32  wks:/var/lib/vz# la root/
33  total 8
34  drwxr-xr-x  6 root root   46 2009-03-06 12:57 .
35  drwxr-xr-x  7 root root   68 2008-10-23 21:37 ..
36  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101
37  drwxr-xr-x  2 root root    6 2009-03-04 23:28 102
38  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103
39  drwxr-xr-x  2 root root    6 2009-03-06 12:57 105

The VE we want to clone can be seen in lines 4 and 9 respectively — it need not necessarily be stopped but it sure is the more secure way if it is.

There are paths and one file we need to take care of when cloning a VE. First we create the VE file system root for our new VE in line 18. Then we create its configuration file in line 19.

40  wks:/var/lib/vz# time cp -a private/10{2,5}
41
42  real    0m1.822s
43  user    0m0.068s
44  sys     0m0.200s
45  wks:/var/lib/vz# du -sh private/10[25]
46  354M    private/102
47  354M    private/105
48  wks:/var/lib/vz# la private/
49  total 16
50  drwxr-xr-x  6 root root   46 2009-03-06 13:27 .
51  drwxr-xr-x  7 root root   68 2008-10-23 21:37 ..
52  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 101
53  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 102
54  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 103
55  drwxr-xr-x 20 root root 4096 2009-03-06 12:32 105

While creating the new VE root in line 18 and creating its configuration file in line 19 where lightweight operations, the real heavy lifting takes place in line 40 as we actually clone the VE testing i.e. all its data. Since I do this on my workstation which has a hardware RAID HBA (Host Bus Adapter) installed that flexes its muscle, cloning the VE is a fast thing to do as can be seen in line 42. The VE we just cloned is 354MiB in size at this point in time which lines 46 and 47 show.

56  wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 102.conf
57  HOSTNAME="wks-ve2"
58  NAME="testing"
59  IP_ADDRESS="192.168.1.101"
60  wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 105.conf
61  HOSTNAME="wks-ve2"
62  NAME="testing"
63  IP_ADDRESS="192.168.1.101"
64  wks:/etc/vz/conf# sed -i s/wks-ve2/wks-ve5/ 105.conf
65  wks:/etc/vz/conf# sed -i s/testing/testing_ssh/ 105.conf
66  wks:/etc/vz/conf# sed -i 's/192.168.1.101/192.168.1.104/' 105.conf
67  wks:/etc/vz/conf# egrep ^NAME=\|^IP\|^HOST 105.conf
68  HOSTNAME="wks-ve5"
69  NAME="testing_ssh"
70  IP_ADDRESS="192.168.1.104"

The next thing we need to address is in the tiny differences between the VEs testing and testing_ssh. Those are stored in their configuration files — the one we duplicated in line 19 above.

What needs to be changed in testing_ssh's setup can be seen in lines 61 to 63. Changing this is trivial as lines 64 to 66 demonstrate. Once done, lines 68 to 70 show the correct settings for the VE testing_ssh as opposed to those of testing in lines 57 to 59.

71  wks:/etc/vz/conf# vzlist -a
72        VEID      NPROC STATUS  IP_ADDR         HOSTNAME
73         101          7 running 192.168.1.100   wks-ve1
74         102          - stopped 192.168.1.101   wks-ve2
75         103          7 running 192.168.1.102   wks-ve3
76         105          - stopped 192.168.1.104   wks-ve5
77  wks:/etc/vz/conf# cd ../names
78  wks:/etc/vz/names# ln -s /etc/vz/conf/105.conf testing_ssh
79  wks:/etc/vz/names# la
80  total 0
81  drwxr-xr-x 2 root root 94 2009-03-06 13:42 .
82  drwxr-xr-x 6 root root 66 2008-11-17 12:52 ..
83  lrwxrwxrwx 1 root root 21 2009-03-04 13:56 stable -> /etc/vz/conf/101.conf
84  lrwxrwxrwx 1 root root 21 2009-03-04 23:29 testing -> /etc/vz/conf/102.conf
85  lrwxrwxrwx 1 root root 21 2009-03-06 13:42 testing_ssh -> /etc/vz/conf/105.conf
86  lrwxrwxrwx 1 root root 21 2009-03-04 23:33 unstable -> /etc/vz/conf/103.conf

We have now cloned a VE as can be seen from lines 74 and 76 — both are stopped right now. Last thing that needs to be done is to actually create the alias name (see files) for VE 105 which is testing_ssh. Line 78 does just that.

 87  wks:/var/lib/vz# vzctl start testing
 88  Starting VE...
 89  VE is mounted
 90  Adding IP address(es): 192.168.1.101
 91  Setting CPU units: 1000
 92  Configure meminfo: 65536
 93  Set hostname: wks-ve2
 94  File resolv.conf was modified
 95  VE start in progress...
 96  wks:/var/lib/vz# vzctl start testing_ssh
 97  Starting VE...
 98  VE is mounted
 99  Adding IP address(es): 192.168.1.104
100  Setting CPU units: 1000
101  Configure meminfo: 65536
102  Set hostname: wks-ve5
103  File resolv.conf was modified
104  VE start in progress...
105  wks:/etc/vz/names# vzlist -an
106        VEID      NPROC STATUS  IP_ADDR         NAME
107         101          7 running 192.168.1.100   stable
108         102          7 running 192.168.1.101   testing
109         103          7 running 192.168.1.102   unstable
110         105          7 running 192.168.1.104   testing_ssh
111  wks:/etc/vz/names# vzctl enter testing_ssh
112  entered into VE 105
113  wks-ve5:/# pin
114  wks-ve5:/# ping -c2 debian.org
115  PING debian.org (194.109.137.218) 56(84) bytes of data.
116  64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=1 ttl=49 time=26.1 ms
117   64 bytes from klecker.debian.org (194.109.137.218): icmp_seq=2 ttl=49 time=25.8 ms
118
119   --- debian.org ping statistics ---
120   2 packets transmitted, 2 received, 0% packet loss, time 1005ms
121   rtt min/avg/max/mdev = 25.893/26.035/26.178/0.215 ms
122   wks-ve5:/# cat /etc/debian_version
123   squeeze/sid
124   wks-ve5:/#

After starting both VEs, the result can then be seen in lines 108 and 110 respectively. In line 111 we enter the just cloned VE testing_ssh and take a look around. We have now successfully finished cloning a VE.

Physical to VE

- http://wiki.openvz.org/Physical_to_container

Checkpointing and Live Migration

A live migration and checkpointing feature was released for OpenVZ in April 2006. It allows to migrate a VE from one physical server to another one, all online i.e. without the need to shutdown/restart a VE and therefore without any downtime.

The process is known as checkpointing i.e. a VE is frozen and its whole state is saved to a file on disk. This file can then be transferred to another machine and a VE can be unfrozen (restored) there. The delay is about a few seconds, and it is not a downtime, just a delay.

Since every piece of the VE state, including opened network connections etc., is saved, from the user's perspective it looks like a delay in response. For example, say one database transaction takes a longer time than usual, when it continues as normal and user does not notice that his database is already running on the other HN (Hardware Node).

That feature makes possible scenarios such as upgrading our server online i.e. without any need to reboot it. If our database needs more memory or CPU resources, we just buy a newer/better server and live migrate some VE to it, then increase its limits and be done. If we want to add more RAM to our server, we migrate all VEs to another one, shut it down, add memory, start it again and migrate all VEs back (enterprise hardware allows for hot-plugging/swapping such components i.e. no need to migrate around VEs at all).

Notes

WRITEME

http://wiki.openvz.org/Checkpointing_and_live_migration
http://wiki.openvz.org/Checkpointing_internals
http://wiki.openvz.org/Migration_from_one_HN_to_another
http://www.howtoforge.com/how-to-do-live-migration-of-openvz-containers
simply using rsync sometimes works perfectly fine for us; see rsync aliases in my ~/.bashrc. If there is a lot of data to synchronize/copy from one VE to another, just run rsync twice so the second run would synchronize the changes that happened during the initial synchronization. One great thing about rsync is, it can be resumed.

Misc

http://wiki.openvz.org/Processes_scope_and_visibility
http://wiki.openvz.org/Modifying_initrd_image
What happens if VEs are rebooted from the inside: see /usr/share/vzctl/scripts/vpsreboot

Miscellaneous

This section is used to drop anything OpenVZ related here but which on its own does not deserve a section on its own. The subsections here must not necessarily have anything to do with each another, except for the fact that OpenVZ may be the only thing they have in common.

Files

For the most part, those files are either part of OpenVZ itself or closely related to it — for example some of those files are used for storing setup and config information about and with regards to OpenVZ.

Personally, I find it very important for anybody to know where what is in order to produce sane results with OpenVZ.

Configuration

/etc/vz/dists/*: various OS templates data
- /etc/vz/dists/scripts/*: scripts used by OS templates
/etc/vz/vz.conf: global OpenVZ configuration file.
/etc/vz/vps.conf: configuration file for a particular VE.
/etc/vz/cron/vz: cronjobs needed for OpenVZ
/etc/vz/names: alias names for VEs i.e. symlinks to VEIDs; see --name in man 8 vzctl
/etc/vz/conf: various configuration files/examples
- /etc/vz/conf/0.conf: configuration file for VE0.
- /etc/vz/conf/ve-light.conf-sample: example configuration for VEs; see man 5 vps.conf for more information
- /etc/vz/conf/ve-vps.basic.conf-sample: example configuration for VEs
- /etc/vz/conf/veid.conf: this is a configuration file for a particular VE. It is stored as /etc/vz/conf/$VEID.conf: where VEID is the ID of the given VE. It is important to note, that this file is created/modified by the vzctl command with the set and --save arguments, therefore changes to this file made otherwise than by vzctl can be lost during the next vzctl run.

What I find especially funky here is the fact that I use etckeeper which means I keep track of any change that happens to the above files/directories. Doing so means I can look up any changes which for example happened because I created a VE, changed some setting in for example ../vz.conf, destroyed a VE, etc.

Information gathering

The files below are used by OpenVZ (and thus the Linux Kernel) to provide us with information about the current state of operations.

/proc/vz/veinfo: the fields are as follows: VEID CLASSID Number of Processes IPADD
/proc/vz/vzquota
/proc/user_beancounters or better since it is the new way of checking resources /proc/bc/<veid>/resources, see here for more information on beancounters
/proc/fairsched

Log Data, Utilities Metadata, VE Data

The following are not files but directories containing files of similar type for each directory. Debian's OpenVZ root directory is /var/lib/vz in order to be FHS (Filesystem Hierarchy Standard) compliant. We can make a symlink (ln -s /var/lib/vz /vz) from /vz to /var/lib/vz to establish compatibility to OpenVZ as installed in other distributions.

/var/log/vzctl.log: log file for vzctl
/var/lib/vz/dump/$VEID: Used to store VE dumps made with vzctl chkpnt VEID. This command saves all the state of a running VE to the dump file and stops the VE. If the option --dumpfile is not set, vzctl uses a default path i.e. /vz/dump/Dump.VEID. Go here for more information.
/var/lib/vz/lock/$VEID:
/var/lib/vz/private/$VEID: this is where all the data for a particular VE lives on the file system e.g. when we upload a movie to a VE, then it can be found somewhere below /var/lib/vz/private/$VEID.
/var/lib/vz/root/$VEID: Path to the root directory for a particular VE (essentially a mount point for the VE root).
/var/lib/vz/template/$VEID: template parameters
/var/lib/vz/template/cache/: home of OS templates

Kernel Modules

WRITEME

sa@wks:/lib/modules/2.6.32-2-openvz-amd64/kernel/kernel$ ls -lR
.:
total 0
drwxr-xr-x 2 root root 36 2009-04-12 22:26 cpt
drwxr-xr-x 2 root root 52 2009-04-12 22:26 ve

./cpt:
total 372
-rw-r--r-- 1 root root 175176 2009-03-27 08:21 vzcpt.ko
-rw-r--r-- 1 root root 201514 2009-03-27 08:21 vzrst.ko

./ve:
total 92
-rw-r--r-- 1 root root 11781 2009-03-27 08:21 vzdev.ko
-rw-r--r-- 1 root root 61273 2009-03-27 08:21 vzmon.ko
-rw-r--r-- 1 root root 17854 2009-03-27 08:21 vzwdog.ko
sa@wks:/lib/modules/2.6.32-2-openvz-amd64/kernel/kernel$

Tool Set

After Installing and setting up OpenVZ it need be managed respectively the software running within VEs need be managed to ensure faultless operations. This section will point out how to use OpenVZ's userspace toolset in order to manage all kinds of things somehow related to OpenVZ. Basically managing the whole OpenVZ shebang can be done with CLI tools. However, some folks, sometimes, for some reason, prefer using GUIs...

GUIs / CLIs Tools

I am not so fond of GUIs but prefer the CLI a lot therefore I am not making GUIs a high priority item onto this page. However, there are a few URL (Uniform Resource Locator):

The following is all about CLI (Command Line Interface) tools used to install, set up and manage OpenVZ components.

vzctl

vzctl is a CLI userspace utility which is used from within VE0 to perform direct manipulations on VEs. VEs can be referred to by either numeric VEID or by name (see --name option). VEIDs <= 100 are reserved for OpenVZ internal purposes.

The vzctl package actually contains a bunch of distinct tools,

sa@wks:~$ afl vzctl | grep bin/
vzctl: /usr/sbin/arpsend
vzctl: /usr/sbin/ndsend
vzctl: /usr/sbin/vzcalc
vzctl: /usr/sbin/vzcfgvalidate
vzctl: /usr/sbin/vzcpucheck
vzctl: /usr/sbin/vzctl
vzctl: /usr/sbin/vzlist
vzctl: /usr/sbin/vzmemcheck
vzctl: /usr/sbin/vzmigrate
vzctl: /usr/sbin/vznetaddbr
vzctl: /usr/sbin/vznetcfg
vzctl: /usr/sbin/vzpid
vzctl: /usr/sbin/vzsplit

each one serving a particular purpose:

arpsend: Sends ARP (Address Resolution Protocol) packets on user specified interface in order to detect or update neighbour's ARP caches with a given IP. An interface we use has to be arpable and not be loopback i.e. /sbin/ip link show <interface> should show neither NOARP not LOOPBACK flags in interface parameters.
ndsend
vzcalc: This utility displays the share of the HN (Hardware Node) resources a particular Virtual Environment is using. If the VE is running, the current usage is displayed. High utilization values (>100%) mean the system is overloaded (or the VE has an invalid configuration).
vzcfgvalidate: This utility checks validity of resource control parameters in VE config file. Configurations where resources allowed for Virtual Environment exceeds system capacity are not valid and dangerous from stability point of view. There are three severity levels in output: Error, Warning, Recommendation.
vzcpucheck: Outputs information on the CPU power and utilization.
vzctl: See above.
vzlist: This utility is used for listing VEs. By default only running VEs are listed. If VEID (Virtual Environment IDentifer) is/are specified, only specified VEs are displayed. VEID of -1 is used to output VEIDs only, one per line. --name_filter <pattern> displays only VEs whose hostname matches the pattern.
vzmemcheck: This utility shows node memory parameters: low memory utilization, low memory commitment, RAM utilization, mem+swap utilization, mem+swap commitment, allocmem utilization, allocmem commitment, allocmem limit.; With -A option it show absolute values in MiB. Without -v option, it will be one more line: header line(s), the numerator and the denominator of the corresponding fraction.

vzmigrate: This utility is used to migrate a VE from one (source) HN to another (destination) HN. The utility can migrate either stopped or running VEs. For a stopped VE, simple VE private area transfer is performed (rsync is used for file transfer). For running VEs, migration may be offline (default) or online.; This program uses SSH (Secure Shell) as a transport layer. We need to setup PKA (Public Key Authentication) — i.e. put the SSH public key onto the destination node and be able to connect to the destination HN without entering a SSH password.
vznetaddbr: Add Veth (Virtual eTHernet) interfaces in a VE to a bridge on the HN.
vznetcfg
vzpid: Displays the VEID a process with the given PID (Process Identifier) belongs to.
vzsplit: The vzsplit utility is used to split the HN into equal parts. It generates a full set of system resource control parameters for the given number of VEs.; The values are calculated from the total physical memory of the Hardware Node the utility runs on, and the number of Virtual Environments the Hardware Node shall be able to run even if the given number of Virtual Environments consume all the resources available.; Without any option given, vzsplit prompts for the desired number of Virtual Environments and outputs the resulting resource control parameters to stdout. If there are not enough system resources to run the specified number of VEs, an appropriate message is shown and the sample configuration file is not generated.

Notes on vzctl

Those notes are mostly literally taken from man 8 vzctl and selected by myself on a good-to-know basis.

create veid [--ostemplate name] [--config name] [--private path] [--root path] [--ipadd addr] [--hostname name] Creates a new VE area. This operation should be done once, before the first start of the VE.
If the --config option is specified, values from example configuration file /etc/vz/conf/ve-name.conf-sample are put into the VE configuration file. If this VE configuration file already exists, it will be removed.
You can use --root path option to sets the path to the mount point for the VE root directory (default is VE_ROOT specified in vz(5) file). Argument can contain string $VEID, which will be substituted with numeric VEID.
You can use --private path option to set the path to directory in which all the files and directories specific to this very VE are stored (default is VE_PRIVATE specified in vz(5) file). Argument can contain string $VEID, which will be substituted with numeric VEID.
You can use --ipadd addr option to assign an IP address to a VE. Note that this option can be used multiple times.
You can use --hostname name option to set a hostname for a VE.
--name name Add a name for a VE. The name can later be used in subsequent calls to vzctl in place of veid.
Note that VEID <= 100 are reserved for OpenVZ internal purposes.
--userpasswd user:password Sets password for the given user in VE, creating the user if it does not exists. Note that this option is not saved in configuration file at all (so --save flag is useless), it is applied to VE (by modifying its /etc/passwd and /etc/shadow files). In case VE root is not mounted, it is automatically mounted, then all appropriate file changes are applied, then it is unmounted. Note that VE area should be created before using this option.
--setmode restart|ignore Whether to restart a VE after applying any parameters requiring that the VE be restarted for those to take effect.
--ipadd addr Adds IP address to a given VE. Note that this option is incremental, so addr are added to already existing ones.
--hostname name Sets VE hostname. vzctl writes it to the appropriate file inside a VE (distribution-dependent).
--netif_add ifname[,mac,host_ifname,host_mac] Adds a virtual ethernet device (veth) to a given VE. Here ifname is the ethernet device name in the VE, mac is its MAC address, host_ifname is the ethernet device name on the host, and host_mac is its MAC address. MAC addresses should be in the format like XX:XX:XX:XX:XX:XX. All parameters except ifname are optional and are automatically generated if not specified.
In case of one argument, vzctl sets barrier and limit to the same value. In case of two colon-separated arguments, the first is a barrier, and the second is a limit. Arguments are in items, pages or bytes. Note that page size is architecture-specific, it is 4096 bytes on IA32 platform. You can also specify the literal word unlimited in place of a number. In that case the corresponding value will be set to LONG_MAX, i.e. the maximum possible value.
--privvmpages pages[:pages] Allows controlling the amount of memory allocated by the applications. For shared (mapped as MAP_SHARED) pages, each VE really using a memory page is charged for the fraction of the page (depending on the number of others using it). For potentially private pages (mapped as MAP_PRI - VATE), VE is charged either for a fraction of the size or for the full size if the allocated address space. It the latter case, the physical pages associated with the allocated address space may be in memory, in swap or not physically allocated yet. The barrier and the limit of this parameter control the upper boundary of the total size of allocated memory. Note that this upper boundary does not guarantee that VE will be able to allocate that much memory. The primary mechanism to control memory allocation is the --vmguarpages guarantee. However, --privvmpages is an important resource limit with daemons like for example Apache — enough --privvmpages should be provided to VEs running such daemons otherwise they will not even start or create warnings and errors at run time.
--vmguarpages pages[:pages] Memory allocation guarantee. This parameter controls how much memory is available to a VE. The barrier is the amount of memory that VE’s applications are guaranteed to be able to allocate. The meaning of the limit is currently unspecified; it should be set to unlimited.
--cpuunits num CPU weight for a VE. Argument is positive non-zero number, which passed to and used in kernel fair scheduler. The larger the number is, the more CPU time this VE get. Maximum value is 500000, minimal is 8. Number is relative to weights of all the other running VEs. If cpuunits not specified default value 1000 ia used. You can set CPU weight for VE0 (hardware node itself) as well (use vzctl set 0 --cpuunits num). Usually, OpenVZ initscript (/etc/init.d/vz) takes care of setting this.
--cpulimit num[%] Limit of CPU usage for the VE, in per cent. Note if the computer has 2 CPUs, it has total of 200% CPU time. Default CPU limit is 0 (no CPU limit).
--cpus num sets number of CPUs available in the VE.
--diskspace num[:num] sets soft and hard disk quotas, in blocks. First parameter is soft quota, second is hard quota. One block is currently equal to 1Kb. Also suffixes G, M, K can be specified (see Resource limits section for more info).
--diskinodes num[:num] sets soft and hard disk quotas, in i-nodes. First parameter is soft quota, second is hard quota.
--capability capname:on|off Sets capability inside a VE. Note that setting capability when the VE is running does not take immediate effect; restart VE in order for changes to take effect. Note a VE has default set of capabilities, thus any operation on capabilities is logical AND with the default capability mask. You can use the following values for capname: chown, dac_override, dac_read_search, fowner, fsetid, kill, setgid, setuid, setpcap, linux_immutable, net_bind_service, net_broadcast, net_admin, net_raw, ipc_lock, ipc_owner, sys_module, sys_rawio, sys_chroot, sys_ptrace, sys_pacct, sys_admin, sys_boot, sys_nice, sys_resource, sys_time, sys_tty_config, mknod, lease, setveid, ve_admin. WARNING: setting some of those capabilities may have far reaching security implications, so do not do it unless you know what you are doing. Also note that setting setpcap:on for a VE will most probably lead to inability to start it.
--devnodes device:r|w|rw|none Give the VE an access (r - read, w - write, rw - read write, none - no access) to a device designated by the special file /dev/device. Device file is created in VE by vzctl.
--ioprio priority Assigns I/O priority to VE. Priority range is 0-7. The greater priority is, the more time for I/O activity VE has. By default each VE has priority of 4.

vzquota

vzquota controls disk quotas for OpenVZ VEs. These are per-VE disk quotas set from the OpenVZ host system i.e. VE0.

The quota_id must be a numeric-only identifier — that Quota ID is not the same as VEID (Virtual Environment IDentifer). One VE can mount several filesystems and each of them can have its own quotas. Hint: no quota support for XFS.

The vzquota package contains a bunch of distinct tools,

sa@wks:~$ afl vzquota | grep bin/
vzquota: /usr/sbin/vzdqcheck
vzquota: /usr/sbin/vzdqdump
vzquota: /usr/sbin/vzdqload
vzquota: /usr/sbin/vzquota
sa@wks:~$

each one serving a particular purpose:

vzdqcheck: Summarizes disk usage of all files given a particular file system path.
vzdqdump: vzdqdump dumps user/group quota information obtained either from a quota file or the kernel to stdout.; The quota_id must be numeric-only identifier. Note, that Quota ID is not the same as VEID. One VE can mount several filesystems and each of them can have its own quotas.
vzdqload: vzdqload loads user/group quota information provided by vzdqdump from stdin into quota file. Quota must be stopped at load.
vzquota: See above.

Notes on vzquota

vzquota controls disk quotas for VEs. These are per-VE disk quotas set from the OpenVZ host system also known as VE0.
Quota limits and flags. All these options are required with the init command, and optionally accepted with on and setlimit commands.
It is impossible to start or stop quota accounting if the directory given by -p option is busy. This is rather a limitation of the kernel part of disk quota implementation.
However, for enterprise environments and important/risky setups (e.g. providing VPSs (Virtual Private Servers) to strangers) I strongly recommend the use of LVM (Logical Volume Manager) in conjunction with OpenVZ. LVM makes it possible to provide each VE with its own LV (Logical Volume). This is a super-secure (e.g. even a process gone made can only fill up the particular LV but not more i.e. cause issues to other VEs on the same OpenVZ host system) and flexible way (LVs can be resized online i.e. no downtime) to provide individual storage to each VE.

vzpkg

OpenVZ OS template management utility i.e. it may be used for semi-automatic OS template creation. More information can be found here.

vzpkg2 and pkg-cacher

The successor to vzpkg who works in conjunction with pkg-cacher, an apt-cache like tool, redesigned to ease and improve OS template creation process for OpenVZ.

http://wiki.openvz.org/Install_vzpkg2_and_pkg-cacher

vztmpl

OpenVZ Templates

Third Party Utilities

vzdump

See Backup subsection.

vzprocps

OS Templates

An OS template is basically a set of packages from some Linux distribution (e.g. Debian) used to populate an OpenVZ container. With OpenVZ, different distributions can co-exist on the same OpenVZ host system, therefore multiple OS templates are available with any OpenVZ installation when installing the debian-way.

An OS template consists of system programs, libraries, and scripts needed to boot up and run the system i.e. VE, as well as some very basic applications and utilities. Stuff like for example a compiler, SQL server, CMS (e.g. Django CMS) etc. are usually not included with an OS template simply because they are not considered being part of a minimalistic set of software needed to create a fully functional VE.

OS Template Metadata

OS template metadata is a set of a few files containing the following information:

List of packages that form this OS template
Locations of package repositories
Scripts needed to be executed on various stages of template installation
Public GPG (GNU Privacy Guard) key(s) needed to check signatures of packages
Additional OpenVZ-specific packages

Where to get precreated OS templates

There are few places where we can get precreated OS templates

Download

 1  sa@wks:/var/lib/vz/template/cache$ la
 2  total 0
 3  drwxr-xr-x 2 root root  6 2008-10-18 15:39 .
 4  drwxr-xr-x 3 root root 18 2008-10-23 21:37 ..
 5  sa@wks:/var/lib/vz/template/cache$ su
 6  Password:
 7  wks:/var/lib/vz/template/cache# wget --quiet http://download.openvz.org/template/precreated/contrib/debian-5.0-amd64-minimal.tar.gz{,.asc}
 8  wks:/var/lib/vz/template/cache# ls -la
 9  total 60024
10  drwxr-xr-x 2 root root       86 2009-03-03 13:13 .
11  drwxr-xr-x 3 root root       18 2008-10-23 21:37 ..
12  -rw-r--r-- 1 root root 61459687 2009-01-13 08:44 debian-5.0-amd64-minimal.tar.gz
13  -rw-r--r-- 1 root root      197 2009-01-13 08:46 debian-5.0-amd64-minimal.tar.gz.asc

The la in line 1 is just an alias in my ~/.bashrc. The lines 2 to 13 are nothing special... maybe only the paste once, download two thingy in line 7. The directory we downloaded to (/var/lib/vz/template/cache/) is where we keep our OS templates (see files).

Verify for Integrity and Authenticity

I can only strongly recommend to verify the downloaded OS template for integrity and authenticity — if this utterly important check is missed, then every security measure that comes afterwards is pointless. Any author/creator of OS templates should (I would say must but...) sign them.

14  wks:/var/lib/vz/template/cache# gpg --verify debian-5.0-amd64-minimal.tar.gz.asc debian-5.0-amd64-minimal.tar.gz
15  gpg: Signature made Tue 13 Jan 2009 08:46:04 AM CET using DSA key ID 6D9DACBE
16  gpg: Can't check signature: public key not found
17  wks:/var/lib/vz/template/cache# gpg --keyserver hkp://pool.sks-keyservers.net --search-keys systs.org
18  gpg: searching for "systs.org" from hkp server pool.sks-keyservers.net
19  (1)     debian.systs.org Archiv Signing Key (2007) <[email protected]>
20            1024 bit DSA key 52A9498A, created: 2006-11-02
21  (2)     Thorsten Schifferdecker <[email protected]>
22            1024 bit DSA key EB1522E1, created: 2006-11-02
23  (3)     Thorsten Schifferdecker <[email protected]>
24            1024 bit DSA key D66C37F9, created: 2005-01-20
25  Keys 1-3 of 3 for "systs.org".  Enter number(s), N)ext, or Q)uit > Q
26  wks:/var/lib/vz/template/cache# gpg --keyserver hkp://pool.sks-keyservers.net --search-keys openvz
27  gpg: searching for "openvz" from hkp server pool.sks-keyservers.net
28  (1)     Virtualization:OpenVZ OBS Project <Virtualization:[email protected]
29            1024 bit DSA key D673DA6C, created: 2008-07-21
30  (2)     OpenVZ Project <[email protected]>
31            1024 bit DSA key A7A1D4B6, created: 2005-09-14
32  Keys 1-2 of 2 for "openvz".  Enter number(s), N)ext, or Q)uit > Q
33  wks:/var/lib/vz/template/cache# wget -q http://debian.systs.org/debian/dso_archiv_signing_key.asc
34  wks:/var/lib/vz/template/cache# mv dso_archiv_signing_key.asc /tmp/
35  wks:/var/lib/vz/template/cache# gpg --import /tmp/dso_archiv_signing_key.asc
36  gpg: key 6D9DACBE: public key "debian.systs.org (dso) - Automatic Archive Signing Key (2008-2009) <[email protected]>" imported
37  gpg: Total number processed: 1
38  gpg:               imported: 1
39  wks:/var/lib/vz/template/cache# gpg --list-key sys
40  pub   1024D/6D9DACBE 2008-03-01 [expires: 2009-03-01]
41  uid                  debian.systs.org Archiv Signing Key (2008) <[email protected]>
42
43  wks:/var/lib/vz/template/cache# gpg --verify debian-5.0-amd64-minimal.tar.gz.asc debian-5.0-amd64-minimal.tar.gz
44  gpg: Signature made Tue 13 Jan 2009 08:46:04 AM CET using DSA key ID 6D9DACBE
45  gpg: Good signature from "debian.systs.org Archiv Signing Key (2008) <[email protected]>"
46  gpg: WARNING: This key is not certified with a trusted signature!
47  gpg:          There is no indication that the signature belongs to the owner.
48  Primary key fingerprint: 8EE1 945F 377B A6E5 8234  72FC C709 A411 6D9D ACBE
49  wks:/var/lib/vz/template/cache#

Now that we have downloaded the signature file (line 13) as well as the OS template (line 12), we try to verify the OS template in line 14. As we can see, it does not work — we need to get the OS template creators public key that was used to sign the OS template.

Line 17 shows how to acquire a public key using a key server — what we are looking for is a key with the ID from line 15 i.e. 6D9DACBE. Although we are looking hard in lines 18 to 31, we are unable to find our key. What now?

Well, obviously the OS template creator has not uploaded this particular key onto a key server but instead provides it directly from his website (line 33). All we do then is to import the key into our keyring in line 35 and finally try to verify again in line 43. That worked out just fine — the warning in line 46 should not necessarily concern us although, from my point of view, I would prefer a key which has been signed by others in order to elevate its trust level.

OS Templates Creation

Semi-automatic with vzpkg

Semi-automatic with vzpkg2

Determining the Page Size

We can use the following program to determine the page size for our architecture (if it supports the getpagesize() function):

#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
        int page_size = getpagesize();
        printf("The page size is %d\n", page_size);
        exit(0);
}

Here's how to compile and run it (assuming we saved it as pagesize.c):

# gcc pagesize.c -o pagesize
# ./pagesize
The page size is 4096

Whether getpagesize() is present as a Linux system call depends on the architecture. If it is, it returns the kernel symbol PAGE_SIZE, which is architecture and machine model dependent. Generally, one uses binaries that are architecture but not machine model dependent, in order to have a single binary distribution per architecture. This means that a user program should not find PAGE_SIZE at compile time from a header file, but use an actual system call, at least for those architectures (like sun4) where this dependency exists. Here libc4, libc5, glibc 2.0 fail because their getpagesize() returns a statically derived value, and does not use a system call. Things are OK in glibc 2.1 and later.

Those who prefer, can get the pagesize using python. All we need to do is to start a python console and write:

>>> import resource
>>> resource.getpagesize()
4096

Some typical Pagesizes

Architecture	Pagesize
arm, extensa, s390, sh64, x86, x86-64, v850	4k
alpha, cris	8k
m68k, sparc	4k, 8k
powerpc	4k, 64k
parisc	4k, 16k, 64k
ia64, mips	4k, 8k, 16k, 64k

Use Cases

Use cases for OpenVZ and therefore virtualization in general come out of the understanding why we need virtualization.

A VE for each Debian Release

Setting up a VE for each Debian release for further cloning in order to use them for testing software, staging software, packaging software, as a testbed, etc. is quite handy and takes just seconds to do.

wks:/home/sa# vzlist -a
      VEID      NPROC STATUS  IP_ADDR         HOSTNAME
       101          8 running 192.168.1.100   wks-ve1
       102          6 running 192.168.1.101   wks-ve2
       103          7 running 192.168.1.102   wks-ve3
wks:/home/sa# vzlist -an
      VEID      NPROC STATUS  IP_ADDR         NAME
       101          8 running 192.168.1.100   stable
       102          6 running 192.168.1.101   testing
       103          7 running 192.168.1.102   unstable
wks:/home/sa#

Security

Since security with regards to OpenVZ can be handled equally to security considerations of any non-virtualized environment, I am going to point to a few places which discuss particular security matters with focus on particular issues: