|
Firewall
Status: Netfilter related theory done. Rest partly done or still missing.
Last changed: Thursday 2014-07-10 16:45 UTC
Abstract:
The term firewall is a misleading term. There is no fire. There is not even a wall. All there is, are, for example, electrical signals (pulses) on a wire which are encoded/decoded as so-called bits. Many of them put together make up for a so-called packet, containing bits, the so-called payload. If we chain several packets together, a queue of packets is what we get. Now, if we put a sender and receiver address onto a packet, an IP packet is what we get. At this point we have three types of information -- the one within the packet, the one on the packet, and the information about the queue itself i.e. which packets in what particular order make up the queue. Still, as a matter of fact, in the end, we are talking electrical pulses on the wire, no fire, no wall, no fiction, just simple science. All three types of information can be used to do something with those IP packets e.g. deny/grant access/traversal, mangle, masquerade, etc. So, in the end talking about the term "firewall" actually means talking IP packet inspection and manipulation e.g. changing sender/receiver address, payload, sequence order etc. Again, no fire, no wall, no myths and legend... no dumb sales or marketing chatter either... just deterministic, simple to explain technology. This page will take a look at inspecting/manipulating IP packet streams from an practical angle but will, where necessary and/or appropriate, take a look under the hood of the technology that is being used. We will also take a look at accompanying technologies to IP packet inspection/manipulation like for example port knocking and so-called pro-active approaches which change settings on-the-fly, thereby adapting to current situations and threats.
|
Table of Contents
|
Security in a system is made up of layers, firewalling should be the
last to include, once all services have been hardened.
— Debian Security Manual
I decided to give this page the tech level T->2 because it is a
mandatory prerequisite for the reader to understand the
internet protocol suite with its two most important protocols namely
IP (Internet Protocol) and TCP (Transmission Control Protocol) in
order for being able to understand this page.
In the course of learning about the internet protocol suite, one will
also learn about the OSI (Open Systems Interconnection) standard which
is important to know about as well.
Since this page provides a lot of information, those who are looking
for a quickstart should follow the links below right away
Introduction
A firewall (sometimes also known as packet filter; see
types of firewalls) is an integrated collection of security measures
designed to prevent unauthorized electronic access to a networked
computer system.
It is also a device or set of devices configured to permit, deny,
encrypt, decrypt, or proxy all computer traffic between different
security domains based upon a set of rules and other criteria.
Firewalls can be implemented in both, hardware and software, or a
combination of both. Firewalls are frequently used to prevent
unauthorized Internet users from accessing private networks connected
to the Internet, especially intranets.
All messages entering or leaving the intranet pass through the
firewall, which examines each message and blocks those that do not
meet the specified security criteria.
From a mere technical point of view, a firewall is a dedicated
appliance and/or software running on computer, which inspects network
traffic passing through it, and denies or permits passage based on a
set of rules.
A firewall's basic task is to regulate some of the flow of traffic
between computer networks of different trust levels. Typical examples
are the Internet which is a zone with no trust and an internal network
which is a zone of higher trust.
A zone with an intermediate trust level, situated between the Internet
and a trusted internal network, is often referred to as a perimeter
network or DMZ (Demilitarized Zone).
A firewall's function within a network is similar to physical
firewalls with fire doors in building construction — it is used to
prevent network intrusion to the private network and it is also
intended to contain and delay structural fire from spreading to
adjacent structures.
Without proper configuration, a firewall can often become worthless.
Standard security practices dictate a default-deny (also known as
whitelisting) firewall ruleset, in which the only network connections
which are allowed are the ones that have been explicitly allowed.
Such configuration requires detailed understanding of the network
applications and endpoints required for the day-to-day operation.
Many people responsible for a firewall lack such understanding, and
therefore implement a default-allow (blacklisting) ruleset, in which
all traffic is allowed unless it has been specifically blocked. This
configuration makes inadvertent network connections and system
compromise much more likely.
Types of Firewalls
Often people actually refer to a packet filter (e.g.
netfilter/iptables) when they talk about a firewall —
a packet filter
is just one of several component of a firewall!
A firewall usually consists of several building blocks, each one
responsible for a particular job. In essence, there are 4 building
blocks a firewall can be made of:
- A packet filter filters packets on the network layer (OSI layer
3).
- Modern packet filters are also capable of stateful filtering which
means they also operate on the transport layer (OSI layer 4).
- A so-called application layer firewall operates on the
application layer (OSI layer 7).
- Last but not least, there are proxies. In terms of firewalling
however, the boundary between proxies and application layer
firewalls is somewhat blurred i.e. the same thing might be referred
to as both, proxy and application firewall.
Firewalling with Linux
Since this page is about DebianGNU/Linux it will focus on Linux and
its firewalling capability. As of now (April 2011) there is one
predominant firewalling framework called netfilter — others like for
example HiPAC or nftables are still under heavy development. Basically
there are two components with netfilter:
- the kernelspace component called netfilter and
- the userspace component called iptables
Alternatives to Netfilter
Before we talk about alternatives to netfilter, let us define the
overall scope of this page.
We are not talking about central connectivity hubs and backbone
technologies as they are found in datacenters with continues
data rates in the tens of Gibibits if not Tebibits... that kind of
stuff is done with Cisco, Juniper or force10 setups for example e.g.
with multi-redundant configurations in mesh topology. Using netfilter
here would be like trying to propel an aircraft carrier with a car
engine...
What we are talking about is about much smaller continuous data rates
(maybe a Gibibit peak from time to time) and less complex setups —
setups as they are found at home, at the office and things like that.
Using some fat Juniper setup here would be like trying to maneuver an
aircraft carrier up the Tames river.
-
There are always two parameters which are more or less mutual
exclusive — flexibility (ease of deployment, changes, etc) and
throughput. This is the reason why different layers in networking
require different techniques and therefore different gear.
Netfilter can do OSI layer 3 and 4 out of the box and OSI layer 7 with
an additional classifier called l7. That fact makes Linux/Netfilter
the perfect choice in order to create a reliable, well-tested,
incredibly flexible and high-speed firewall for small to mid-sized
computer networks.
However, it is just fair to say that big corporate/governmental/etc.
networks are equipped with Juniper/Cisco/force10/etc. at the tier-1
level in order to do the firewalling/routing/switching and that
netfilter may only be used on the smaller, segmented, subnetworks of
those bigger networks.
I am not going into detail here but the reasoning for this is only
partially technical but rather things like TCO (Total Cost of
Ownership), SLA (Service Level Agreement) considerations and of course
the powerful marketing machinery of those dominant companies play a
major role.
My personal experience and opinion on the matter of choice whether to
use netfilter or non-Netfiler solutions is that any company with 3000
or less employees, which main business is not in IT (Information
Technology), can easily use netfilter and run it on some high-end
server(s) with redundant components like for example redundant
power supply, RAID (Redundancy Arrays of Independent Disks), etc.
This might cost 15000 euros at maximum, starting at a few hundred
euros in a minimalistic case. The same solution bought from Juniper or
Cisco is mostly 2 to 5 times as expensive, with license costs for
their proprietary software making up for a big chunk of it —
something that is not true for netfilter because it is FLOSS
(Free/Libre Open Source Software).
However, successfully using netfilter with off-the-shelf hardware
requires that a company employs some Linux expert too — something I
can only strongly recommend if it is only for the simple fact that
this person can help planning the companies long-term IT strategy
aside from his daily labor.
netfilter is way more flexible and powerful than entry-level
Juniper/Cisco/force10/etc. gear but then the main benefit is that
those companies provide all-in-one bundles i.e. starting with initial
consultation to hardware/software and most importantly, SLA with 24/7
coverage if needed — something that can only be topped by employing a
full-time IT expert.
Ultimately, any use case is different and thus requires individual
solutions...
Using Netfilter in Conjunction with other Tools/Software
Because of its nature of being FLOSS, there is a bunch of software
surrounding netfilter respectively plays into the realm of firewalling
on Linux
sa@wks:~$ debtags search *firewall | wc -l
64
sa@wks:~$
Below is a subset of that software I consider most useful/important
for firewalling on Linux. We can classify them into three categories
— there is software that can be used to control/manage netfilter,
- iptables: main administration tool for packet filtering and NAT
with netfilter.
- ipset: is an administration tool for kernel IP sets.
- conntrack: a userspace command line program used to view and manage
the in-kernel connection tracking state table.
- iptstate: top-like state for netfilter/iptables. It is only useful
if netfilter CONNTRACK is enabled in the kernel.
- netstat-nat: a tool that displays NAT connections.
Then there is software that works hand in hand with netfilter in order
to get a particular job done
- nufw: used to authenticate user traffic i.e. allows to write
filtering rules based on user identity, in addition to classical
network criteria.
- fwknop: port knocking in modern SPA (Single Packet Authorization)
manner.
- psad: detect port scans and take measures like for example
dynamically altering netfilter rules, email alert, etc.
- fail2ban: dynamically updates netfilter and/or denyhost rules in
order to ban IPs that cause multiple authentication errors; also
capable of sending alert email.
- fwsnort: translates Snort rules into equivalent netfilter rules.
and last but not least, there are optional add-ons to netfilter
itself, extending its functionality
- l7: a classifier that identifies packets based on application layer
data. It can classify packets as Kazaa, HTTP, Jabber, Citrix,
Bittorrent, FTP, Gnucleus, eDonkey2000, etc., regardless of port.
It complements existing classifiers that match on IP address, port
numbers and so on.
- ulog, specter: enhanced logging.
Many extensions are included in the base iptables package, such as the
iptables extension which allows querying of the connection state
mentioned above.
Additional extensions are distributed in the xtables-addons-source
package that replaces the older patch-o-matic-ng package. With it,
experimental features get tested and possibly later included into
netfilter and iptables releases.
Packet Filtering with Linux
We are now going to take a closer look at what is called the Linux
packet filtering stack. The image below shows what packet filtering
on Linux looks like as of now (June 2009).
As we can see, there are several layers starting with the Linux kernel
itself at the bottom and ending with userspace tools at the top. The
layer at the top (userspace tools) is used by us to manage/control
packet filtering with Linux, ultimately carried out by the netfilter
layer respectively the Linux kernel network layer at the bottom.
The intermediate layers like for example Xtables, are, in essence,
frameworks that make sense from a technical point of view — they
bring structure, modularity and well defined boundaries and interfaces
to the whole shebang.
Last but not least, the layers on top the networking layer abstract
things out so packet filtering with Linux becomes a doable task even
for the non-expert on the matter of networking as described by the OSI
standard and all the other RFCs and standards with regards to
networking out there.
Components
There is bidirectional (vertically as well as in some cases
horizontally) information exchange among the various components in the
packet filtering stack.
It is so that a lower layer provides functionality to its layer above
which in turn instructs the lower layer to carry out some action on
behalf of itself and ultimately the user when the information travels
several layers down to the netfilter layer or even deeper into the
stack i.e. the Linux networking layer.
Netfilter
There are two things the word netfilter might refer to. Firstly,
netfilter is the name of the project that brings firewalling
capabilities to Linux. These these capabilities consist of several
components — kernel modules, libraries and a set of userspace tools.
Secondly netfilter is the name of the Linux kernel framework that
provides a set of hooks within the Linux kernel for intercepting and
manipulating network packets.
The best-known components on top of netfilter are those used with
regards to firewalling (x_tables , ip_tables the kernel module and
iptables the userspace tool), but then netfilter and its hooks are
also used by other components in the Linux packet filtering stack
which perform NAT (Network Address Translation), stateful packet
tracking and packet enqueueing to userspace.
Connection Tracking
One of the important features built on top of the netfilter framework
is connection tracking — made possible by using the so-called
state machine.
Connection tracking allows the Linux kernel to keep track of all
logical network connections or sessions, and thereby relate all of the
packets which may make up that connection.
For example, NAT relies on this information to translate all related
packets in the same way, and xtables/iptables can use this information
to act as a stateful firewall.
Connection tracking classifies each packet as being in a number of
different states:
NEW : trying to create a new connection
ESTABLISHED : part of an already-existing connection
RELATED : packet initiating a new connection that is related to, but
not actually part of an existing connection
INVALID : not part of an existing connection and
UNTRACKED : not tracked
A normal example would be that the first packet the conntrack
subsystem sees will be classified new, the reply would be classified
established and an ICMP (Internet Control Message Protocol) error
would be related.
An ICMP error packet which did not match any known connection would be
invalid. untracked is a special state that can be assigned by the
administrator to bypass connection tracking for a particular packet.
Note that the connection state is completely independent of any
TCP packet/sequence state. If the host answers with a SYN ACK packet
to acknowledge a new incoming TCP (Transmission Control Protocol)
connection, the TCP connection itself is not yet established but the
tracked connection is i.e. this packet will match the established
state. Also, a tracked connection of a stateless protocol like UDP
nevertheless has a connection state.
Furthermore, through the use of plugin modules, connection tracking
can be given knowledge of application layer protocols and thus
understand that two or more distinct connections are related.
For example, consider the FTP (File Transfer Protocol) protocol. A
control connection is established, but whenever data is transferred, a
separate connection is established to transfer it. When the
nf_conntrack_ftp module is loaded, the first packet of an FTP data
connection will be classified as related instead of new, as it is
logically part of an existing connection.
Iptables can use the connection tracking information to make packet
filtering rules more powerful and easier to manage. The conntrack
match extension (--ctstate at man 8 iptables ) allows iptables rules to
examine the connection tracking classification for a packet.
For example, one rule might allow NEW packets only from inside the
packet filter to outside, but allow RELATED and ESTABLISHED in either
direction. This allows normal reply packets from the outside
(ESTABLISHED ), but does not allow new connections to come from the
outside to the inside. However, if an FTP data connection needs to
come from outside the packet filter to the inside, it will be allowed,
because the packet will be correctly classified as RELATED to the FTP
control connection, rather than a NEW connection.
xtables
The xtables framework in essence is mostly a function collection of C
functions that are only available within the Linux kernel — currently
it is just an ongoing effort to collapse ebtables , arp_tables ,
ip_tables and ip6_tables into one collection of functions.
What it does is, it provides us with the possibility to add features
to the Linux packet filtering stack on the fly.
To do so, we write kernel modules that register against this
framework. Also, depending on the feature's category, we write an
iptables userspace module e.g. something involved with logging.
By writing our new extension, we can match, mangle, track and give
faith to any given packet or complete flows of interrelated
connections (connection tracking ergo stateful that is).
Below is a listing of userspace tools sitting atop their main
accompanying kernel modules i.e. the userspace tool iptables sits on
top its main kernel module ip_tables which in the end uses x_tables as
can be seen
sa@wks:~$ lsmod | grep ^x_tables
x_tables 25736 12 xt_state,iptable_nat,xt_tcpudp,xt_length,ipt_ttl,xt_tcpmss,xt_TCPMSS,xt_multiport,xt_limit,xt_dscp,ipt_REJECT,ip_tables
sa@wks:~$
ebtables
Ebtables is an application program used to set up and maintain the
tables containing rules which inspect Ethernet frames.
It is analogous to the iptables userspace tool, but less complicated,
due to the fact that the Ethernet protocol is a simpler protocol than
the IP protocol is.
arptables
arptables is a userspace tool, used to set up and maintain the tables
of ARP (Address Resolution Protocol) rules in the Linux kernel. These
rules inspect the ARP frames as they travel through the kernel.
As for ebtables, arptables is analogous to the iptables userspace
tool, but less complicated than iptables due to the nature of the
address resolution protocol.
iptables
Iptables is commonly used to inclusively refer to the kernel-level
component xtables that does the actual table traversal and provides an
API for kernel-level extensions.
However, more precisely, iptables is actually name of the userspace
tool used to configure and maintain a set of tables and the
chains and rules stored within those tables. The tables are provided
by the xtables infrastructure/framework, which in turn uses netfilter.
Its main purpose is to create and maintain rules for the packet
filtering, both inbound and outbound, as well as to create and
maintain rules for NAT.
As for the other userspace tools, iptables requires elevated
privileges to operate i.e. it must be executed by the user root.
Iptables is installed at /usr/sbin/iptables and documented at man 5
iptables .
ip6tables
Same as iptables but for IPv6.
Terms
We need to know a few basic terms in order to understand this page and
being able to communicate about packet filtering:
- Connection
-
This is generally referred to as a series of packets relating to each
other. These packets refer to each other as an established kind of
connection. A connection is in another word a series of exchanged
packets.
-
In TCP, this mainly means establishing a connection via the
3-way handshake, and then this is considered a connection until the
release handshake.
- DNAT
-
Destination Network Address Translation. DNAT refers to the technique
of translating the Destination IP address of an IP packet, or to
change it simply put.
-
This is used together with SNAT to allow several hosts to share a
single Internet routable IP address, and to still provide Server
Services. This is normally done by assigning different ports with an
Internet routable IP address, and then tell the Linux router where to
send the traffic.
- SNAT
-
Source Network Address Translation also known as masquerading. This
refers to translating the Source IP address of an IP packet. It is
used to make it possible for several hosts to share a single Internet
routable IP address, since there is currently a shortage of available
IP addresses in IPv4 (IPv6 will solve this).
- Kernelspace
-
This is more or less the opposite of userspace. This implies the
actions that take place within the Linux kernel itself, and not
outside of the kernel.
- Userspace
-
With this term we refer to everything and anything that takes place
outside the kernel. For example, invoking
iptables -h takes place
outside the kernel, while iptables -A FORWARD -p tcp -j ACCEPT takes
place (partially) within the kernel, since a new rule is added to the
ruleset.
- Packet
-
A singular unit sent over a network, containing a header and a
data/payload portion. For example, an IP packet or an TCP packet.
-
In RFC (Request for Comments) a packet is not so generalized, instead
IP packets are called datagrams, while TCP packets are called
segments. With this page, pretty much everything is called packet for
reasons of simplicity.
- Segment
-
A TCP segment is pretty much the same as an packet, but a formalized
word for a TCP packet.
- QoS
-
Quality of Service is a way of specifying how a packet should be
handled and what kind of service quality it should receive while
sending it.
- Stream
-
This term refers to a connection that sends and receives packets that
are related to each other in some fashion. Basically, we use this term
for any kind of connection that sends two or more packets in both
directions.
-
In TCP this may mean a connection that sends a
SYN and then replies
with an SYN/ACK , but it may also mean a connection that sends a SYN
and then replies with an ICMP Host unreachable i.e. we use this term
very loosely.
- State
-
This term refers to which state the packet is in, either according to
RFC 793 (TCP), or to userside states used in netfilter/iptables.
-
Note that, as mentioned before, the used states internally, and
externally, do not follow the RFC 793 specification fully. The main
reason is that netfilter has to make several assumptions about the
connections and packets.
- IPSEC
-
Internet Protocol Security is a protocol used to encrypt IPv4 packets
and sending them securely over the Internet.
- VPN
-
Virtual Private Network is a technique used to create virtually
private networks over non-private (thus insecure) networks, such as
the Internet. IPSEC is one technique used to create VPN connections.
OpenVPN is another.
- Policy
-
There are two kinds of policies that we speak about most of the time
when implementing a firewall.
-
First we have the chain policies, which tells the firewall
implementation the default behavior to take on a packet if there was
no rule that matched it.
-
The second type of policy is the security policy that we may have
written documentation on, for example for the whole company or for
this specific network segment. Security policies are very good
documents to have thought through properly and to study properly
before starting to actually implement the firewall.
- Accept
-
To accept a packet and to let it through. This is the opposite of the
drop or deny targets, as well as the reject target.
- Drop/Deny
-
When a packet is dropped or denied, it is simply deleted, and no
further actions are taken. No reply to tell the host it was dropped,
nor is the receiving host of the packet notified in any way. The
packet simply disappears.
- Reject
-
This is basically the same as a drop or deny target or policy, except
that we also send a reply to the host sending the packet that was
dropped.
-
The reply may be specified, or automatically calculated to some value.
To this date (2009-04-19), there is unfortunately no iptables
functionality to also send a packet notifying the receiving host of
the rejected packet what happened i.e. , doing the reverse of the
REJECT target. This would be very good in certain circumstances, since
the receiving host has no ability to stop DoS (Denial of Service)
attacks from happening.
- State
-
A specific state of a packet in comparison to a whole stream of
packets. For example, if the packet is the first that the firewall
sees or knows about, it is considered new (the SYN packet in a TCP
connection), or if it is part of an already established connection
that the firewall knows about, it is considered to be established.
States are known through the connection tracking system, which keeps
track of all the sessions.
- Chain
-
A chain contains a ruleset of rules that are applied on packets that
traverses the chain. Each chain has a specific purpose (e.g. which
table it is connected to, which specifies what this chain is able to
do), as well as a specific application area (e.g. only forwarded
packets, or only packets destined for this host).
- Table
-
Each table has a specific purpose, and in iptables there are 4 tables.
The
raw , nat , mangle and filter tables. For example, the filter table
is specifically designed to filter packets, while the nat table is
specifically designed to NAT packets.
- Rule
-
A rule is a set of a match or several matches together with a single
target in most implementations of IP filters, including the iptables
implementation. There are some implementations which let us use
several targets/actions per rule.
- Ruleset
-
A ruleset is the complete set of rules that are put into a whole IP
filter implementation.
-
In the case of iptables, this includes all of the rules set in the
filter , nat , raw and mangle tables, and in all of the subsequent
chains. Most of the time, they are written down in a configuration
file of some sort.
- Match
-
This word can have two different meanings when it comes to IP
filtering:
-
The first meaning would be a single match that tells a rule that this
header must contain this and this information. For example, the
--source match tells us that the source address must be a specific
network range or host IP address.
-
The second meaning is if a whole rule is a match. If the packet
matches the whole rule, the jump or target instructions will be
carried out e.g. the packet will be dropped.
- Target
-
There is generally a target set for each rule in a ruleset. If the
rule has matched fully, the target specification tells us what to do
with the packet.
-
For example, if we should drop or accept it, or NAT it, etc. There is
also something called a jump specification — there might not be a
target or jump for each rule, but there may be.
- Jump
-
The jump instruction is closely related to a target. A jump
instruction is written exactly the same as a target in iptables, with
the exception that instead of writing a target name, we write the name
of another chain. If the rule matches, the packet will hence be sent
to this second chain and be processed as usual in that chain.
- Connection tracking
-
A firewall which implements connection tracking is able to track
connections/streams simply put. The ability to do so is often done at
the impact of lots of processor and memory usage. This is
unfortunately true in iptables as well, but much work has been done to
work on this.
-
However, the good thing is that the firewall will be much more secure
with connection tracking properly used by the implementer of the
firewall policies plus it may reduce complexity a lot e.g. assuming we
alow outgoing connections and then we also allow all incoming
connections based on whether or not they are related to some already
existing connection (the outgoing one i.e. the one we initiated).
Installation and Setup
Before we can start setting up our packet filter, we need to check on
a few prerequisites and familiarize ourselves with a few things like
for example what kernel modules we need in order to do a particular
job or what files are involved in the process and where they live on
the filesystem.
Planning an IP Filter
One of the first steps to think about when planning the firewall is
their placement. This should be a fairly simple step since mostly our
networks should be fairly well segmented anyway.
One of the first places that comes to mind is the gateway between our
local network(s) and the Internet. This is a place where there should
be fairly tight security. Also, in larger networks it may be a good
idea to separate different divisions from each other via firewalls.
For example, why should the development team have access to the human
resources network, or why not protect the economic department from
other networks? Simply put, we do not want an angry employee with the
pink slip tampering with the salary databases.
The above means that we should plan our networks as well as possible,
and plan them to be segregated. Especially if the network is medium is
not small — 50 workstations or more, based on different aspects of
the network.
There are basically two choices here which can be mixed or used
standalone:
- In between these smaller networks, we try to put firewalls (OSI
layer 3 and higher) that will only allow the kind of traffic that
we would like or/and
- We could use VLANs (OSI layer 2).
It may also be a good idea to create a DMZ (Demilitarized Zone) in
case we have servers that are reached from the Internet as well as
from the LAN. In essence, a DMZ is a small subnetwork with servers,
which is closed down to the extreme.
This lessens the risk of anyone actually getting in to the machines in
the DMZ, and even more important, it lessens the risk of anyone
getting from those machines in the DMZ into our LAN by either trying
to pro-actively getting into the LAN using the machines in the DMZ as
a intermediary layer or placing backdoors and trojans on the DMZ
machines.
The machines within the DMZ thus are mostly hardened and stuffed with
all kinds of IDS (Intrusion Detection System) magic and other nifty
stuff like that.
There are a couple of ways to set up the policies and default
behaviors in a packet filter, and this section will discuss the
actual theory that we should think about before actually starting to
implement a packet filter.
Before we start, we should understand that most packet filters
respectively firewalls have default behavior. For example, if no rule
in a specific chain matches, it can be either dropped or accepted per
default. Unfortunately, there is only one policy per chain, but this
is often easy to get around if we want to have different policies per
network interface etc.
There are two basic policies that we normally use. Either we drop
everything except that which we specify (whitelisting), or we accept
everything except that which we specifically drop (blacklisting).
Most of the time, we are mostly interested in the drop policy, and
then accepting everything that we want to allow specifically. This
means that the firewall is more secure per default, but it may also
mean that we will have more labor in order to getting our packet
filter to operate properly.
Our first decision to make is to simply figure out which type of
firewall we should use — whitelisting or blacklisting that is; I
always go for whiteliting i.e. drop anything per default.
Next, how big are the security concerns — what are we going to
protect with our packet filter? What kind of applications must be
able to communicate through the firewall?...
Overall, it considered best practice to apply layered security
measures i.e. we should use as many independent security measures as
possible/affordable at the same time, and not rely on a single
security concept.
For example, we could use a fully fledged, highly secured and
redundant, Linux packet filter for our main gateway between the
outside world (Internet) and our LANs (maybe including a DMZ) but also
harden each workstation. This way we would already introduce two
independent security layers and thus boost overall IT security.
In addition to hardening each workstation (such things can be done
very effectively using clusterssh, puppet or for example FAI (Fully
Automatic Installation) and the like) we could also apply some
minimalistic packet filter onto each workstation.
If that is not enough yet, we might go even farther and set up some
IDS (Intrusion Detection System) like for example OSSEC and last but
not least set up a trustful SSH infrastructure using Monkeysphere.
However, what is utterly important and therefore what should always
happen no matter what efforts we make on the technical site, is to
educate our users.
Finally, if we are diligent and consequent, we end up what is called a
security protocol that describes every possible angle about our
security concept. Preferable this is some sort of paper, set up for
collaborative work using some CMS (Content Management System) or SCM
(Software Configuration Management).
Every person who is serious about IT security maintains such a
security protocol, for medium and big corporate structures it is
mandatory anyway, governments and their military do so since decades
anyway.
One last thing to note is that it is always a good thing to follow
standards respectively use software which applies to standards.
As probably many of us have already seen with crappy things like for
example Skype or ICQ, if we do not use standardized systems, things
can go terribly wrong — Skype and ICQ use their own,
proprietary, communication protocols; no one exactly knows how they
work.
Instead of Skype folks should use QuteCom or Ekiga and instead of ICQ,
folks should go with XMPP (a standardized protocol) simply by using
Pidgin (or any other client that does XMPP). Please go here for more
information about configuring and using Pidgin.
Prerequisites
There is not really much to do here. All we need is a fairly up to
date Linux kernel and at least the iptables userspace tool —
xtables-addons-source, arptables , ebtables etc. are all optional. dpl
is an alias in my ~/.bashrc by the way.
sa@wks:~$ type dpl
dpl is aliased to `dpkg -l'
sa@wks:~$ dpl *tables* | grep ^ii | egrep -v lib\|dev
ii arptables 0.0.3.3-1 ARP table administration
ii ebtables 2.0.8.2-4 Ethernet bridge frame table administration
ii iptables 1.4.3.2-1 administration tools for packet filtering and
ii xtables-addons- 1.14-1 Source for the xtables-addons driver
sa@wks:~$ uname -r
2.6.30-1-openvz-amd64
sa@wks:~$
sysctl Settings
WRITEME
Files
There are certain files that we need to know about or that we should
at least know about:
/etc/protocols is a list of Internet protocols officially
acknoledged by IANA (Internet Assigned Numbers Authority).
/etc/services , /usr/share/nmap/nmap-services or the links
to Wikipedia as well as to http://www.graffiti.com/services provide
information on port numbers and their usage.
/etc/ini.d/<name_of_shell_script_containing_ruleset> is our main
firewalling shell script which we use to store our ruleset. We use
update-rc.d in order to add/remove it to/from the various
runlevels. More information on the matter can be found here.
/etc/iproute2/rt_realms is used in Linux to classify routes into
logical groups of routes.
Kernel Modules
As mentioned above, there are certain modules which are mandatory to
be loaded for a packet filter to functions (e.g. x_tables ) and then
there are those kernel modules which are optional based on what we are
trying to accomplish with our packet filter.
A nice example is if we take a look at module dependencies — as we
can see, if we wanted to use xt_tcpudp , x_tables gets loaded
automatically because it is listed as a dependency to xt_tcpudp .
sa@wks:/lib/modules/2.6.26-2-openvz-amd64$ grep xt_tcpudp modules.dep
kernel/net/netfilter/xt_tcpudp.ko: kernel/net/netfilter/x_tables.ko
sa@wks:/lib/modules/2.6.26-2-openvz-amd64$
We can get alist of available kernel modules using a one-liner as
shown below.
wks:/etc/init.d# modprobe -l xt_* | xargs -I {} basename {} | head
xt_realm.ko
xt_connlimit.ko
xt_RATEEST.ko
xt_pkttype.ko
xt_sctp.ko
xt_limit.ko
xt_tcpudp.ko
xt_TCPOPTSTRIP.ko
xt_NFLOG.ko
xt_conntrack.ko
wks:/etc/init.d#
However, right now (April 2009) there are nf_ , xt_ , ipt_ , ip6t_ , arp_ ,
arpt_ and ebt_ but that might all change in the future based on the
naming scheme the netfilter developers settle with at some point in
the future.
Right now, no matter what the prefix is, they all use the xtables
framework already anyway.
Another thing to notice is that kernel modules written in uppercase
represent targets and those written in lowercase represent matches —
xt_RATEEST for example is a target whereas xt_realm is a match.
State Machine
The state machine is a special part within netfilter that should
really not be called the state machine at all, since it is really a
connection tracking machine.
However, most people recognize it under the name state machine — for
us it is only important to know that, in order to do connection
tracking, we need to have a state machine.
Connection tracking is done to let the netfilter framework know the
state of a specific connection. Firewalls that implement this are
generally called stateful firewalls (see types of firewalls). A
stateful firewall is generally much more secure than a non-stateful
firewalls since it allows us to write much tighter rulesets.
Within netfilter, packets can be related to tracked connections in
four different so called states. These are known as
- NEW
- ESTABLISHED
- RELATED and
- INVALID
We will discuss each of these in more depth later. With the --state
match we can easily control who or what is allowed to initiate new
sessions.
All of the connection tracking is done by special framework within the
kernel called conntrack. conntrack may be loaded either as a
kernel module, or as an internal part of the kernel itself. Most of
the time, we need and want more specific connection tracking than the
default conntrack engine can maintain.
Because of this, there are also more specific parts of conntrack that
handles the TCP, UDP or ICMP protocols among others. These modules
grab specific, unique, information from the packets, so that they may
keep track of each stream of data.
The information that conntrack gathers is then used to tell conntrack
in which state the stream is currently in. For example, UDP streams
are, generally, uniquely identified by their destination IP address,
source IP address, destination port and source port.
In previous kernels, we had the possibility to turn on and off
defragmentation. However, since iptables and netfilter were introduced
and connection tracking in particular, this option was gotten rid of.
The reason for this is that connection tracking can not work properly
without defragmenting packets, and hence defragmenting has been
incorporated into conntrack and is carried out automatically. It can
not be turned off, except by turning off connection tracking itself
i.e. defragmentation is always carried out if connection tracking is
turned on.
All connection tracking is handled in the PREROUTING chain, except
locally generated packets which are handled in the OUTPUT chain. What
this means is that netfilter will do all recalculation of states and
so on within the PREROUTING chain.
If we send the initial packet in a stream, the state gets set to NEW
within the OUTPUT chain, and when we receive a return packet, the
state gets changed in the PREROUTING chain to ESTABLISHED , and so on.
If the first packet is not originated by ourselves, the NEW state is
set within the PREROUTING chain of course. So, all state changes and
calculations are done within the PREROUTING and OUTPUT chains of the
nat table.
The conntrack entries
Let us take a brief look at a conntrack entry and how to read them in
/proc/net/ip_conntrack . This gives a list of all the current entries
in our conntrack database. If we have the ip_conntrack module
loaded, we can check for the current connection tracking status
wks:/home/sa# lsmod | grep conntrack
nf_conntrack_ipv4 24352 0
nf_conntrack 82688 1 nf_conntrack_ipv4
wks:/home/sa# cat /proc/net/ip_conntrack | head -n1
tcp 6 12 SYN_SENT src=192.168.1.4 dst=234.12.87.233 sport=40735 dport=30206 packets=11 bytes=765 [UNREPLIED] src=234.12.87.233 dst=192.168.1.4 sport=30206 dport=40735 packets=8 bytes=667 [ASSURED] mark=0 secmark=0 use=1
wks:/home/sa#
This example contains all the information that the conntrack module
maintains to know which state a specific connection is in.
First of all, we have a protocol, which in this case is tcp . Next, the
same value in normal decimal coding. After this, we see how long this
conntrack entry has to live. This value is set to 12 seconds right
now and is decremented regularly until we see more traffic.
This value is then reset to the default value for the specific state
that it is in at that relevant point of time. Next comes the actual
state that this entry is in at the present point of time. In the above
mentioned case we are looking at a packet that is in the SYN_SENT
state — the internal value of a connection is slightly different from
the ones used externally with netfilter.
The value SYN_SENT tells us that we are looking at a connection that
has only seen a TCP SYN packet in one direction. Next, we see the
source IP address, destination IP address, source port and destination
port. At this point we see a specific keyword that tells us that we
have seen no return traffic for this connection. Lastly, we see what
we expect of return packets. The information details the source IP
address and destination IP address (which are both inverted, since the
packet is to be directed back to us). The same thing goes for the
source port and destination port of the connection. These are the
values that should be of any interest to us.
The connection tracking entries may take on a series of different
values, all specified in the conntrack headers available in
/usr/src/linux/include/net/netfilter/*.h files. These values are
dependent on which sub-protocol of IP we use.
TCP, UDP or ICMP protocols take specific default values as specified
in /usr/src/linux/include/net/netfilter/ip_conntrack.h . Also,
depending on how this state changes, the default value of the time
until the connection is destroyed will also change.
With tcp-window-tracking feature adds all of the above timeouts to
special sysctl variables, which means that they can be changed on the
fly, while the system is still running. Hence, this makes it
unnecessary to recompile the kernel every time we want to change the
timeouts.
These can be altered via using specific system calls available in the
/proc/sys/net/ipv4/netfilter directory. We should in particular look
at the /proc/sys/net/ipv4/netfilter/ip_ct_* variables.
When a connection has seen traffic in both directions, the conntrack
entry will erase the [UNREPLIED] flag, and then reset it. The entry
that tells us that the connection has not seen any traffic in both
directions, will be replaced by the [ASSURED] flag, to be found close
to the end of the entry.
The [ASSURED] flag tells us that this connection is assured and that
it will not be erased if we reach the maximum possible tracked
connections. Thus, connections marked as [ASSURED] will not be erased,
contrary to the non-assured connections (those not marked as
[ASSURED] ).
How many connections that the connection tracking table can hold
depends upon a variable that can be set through the ip-sysctl
functions in recent kernels. The default value held by this entry
varies heavily depending on how much memory we have. On 128 MB of RAM
we will get 8192 possible entries, and at 256 MB of RAM, we will get
16376 entries. We can read and set our settings through the
sa@wks:~$ cat /proc/sys/net/ipv4/ip_conntrack_max
65536
sa@wks:~$
variable. A different way of doing this, that is more efficient, is to
set the hashsize option to the ip_conntrack module once this is
loaded. Under normal circumstances ip_conntrack_max equals 8 *
hashsize .
In other words, setting the hashsize to 4096 will result in
ip_conntrack_max being set to 32768 conntrack entries. An example of
this would be:
wks:/home/sa# modprobe ip_conntrack hashsize=4096
wks:/home/sa# cat /proc/sys/net/ipv4/ip_conntrack_max
32768
wks:/home/sa#
User-land states
As we have seen, packets may take on several different states within
the kernel itself, depending on what protocol we are talking about.
However, outside the kernel, we only have the 4 states as described
previously. These states can mainly be used in conjunction with the
state match which will then be able to match packets based on their
current connection tracking state.
The valid states are NEW , ESTABLISHED , RELATED and INVALID . The
following list will briefly explain each possible state:
- NEW
-
The
NEW state tells us that the packet is the first packet that we
see. This means that the first packet that the conntrack module sees,
within a specific connection, will be matched. For example, if we see
a SYN packet and it is the first packet in a connection that we see,
it will match. However, the packet may as well not be a SYN packet and
still be considered NEW . This may lead to certain problems in some
instances, but it may also be extremely helpful when we need to pick
up lost connections from other firewalls, or when a connection has
already timed out, but in reality is not closed.
- ESTABLISHED
-
The
ESTABLISHED state has seen traffic in both directions and will
then continuously match those packets. ESTABLISHED connections are
fairly easy to understand. The only requirement to get into an
ESTABLISHED state is that one host sends a packet, and that it later
on gets a reply from the other host. The NEW state will upon receipt
of the reply packet to (or through) the firewall change to the
ESTABLISHED state. ICMP reply messages can also be considered as
ESTABLISHED , if we created a packet that in turn generated the reply
ICMP message.
- RELATED
-
The
RELATED state is one of the more tricky states. A connection is
considered RELATED when it is related to another already ESTABLISHED
connection. What this means, is that for a connection to be considered
as RELATED , we must first have a connection that is considered
ESTABLISHED . The ESTABLISHED connection will then spawn a connection
outside of the main connection. The newly spawned connection will then
be considered RELATED , if the conntrack module is able to understand
that it is RELATED . Some good examples of connections that can be
considered as RELATED are the FTP-data session that are considered
RELATED to the FTP control session, and the DCC (Direct
Client-to-Client) connections issued through IRC (Internet Relay
Chat). This could be used to allow ICMP error messages, FTP transfers
and DCC's to work properly through the firewall. Do note that most TCP
protocols and some UDP protocols that rely on this mechanism are quite
complex and send connection information within the payload of the TCP
or UDP data segments, and hence require special helper modules to be
correctly understood.
- INVALID
-
The
INVALID state means that the packet cannot be identified or that
it does not have any state. This may be due to several reasons, such
as the system running out of memory or ICMP error messages that do not
respond to any known connections. Generally, it is a good idea to DROP
everything in this state.
- UNTRACKED
-
This is the
UNTRACKED state. In brief, if a packet is marked within
the raw table with the NOTRACK target, then that packet will show up
as UNTRACKED in the state machine. This also means that all RELATED
connections will not be seen, so some caution must be taken when
dealing with the UNTRACKED connections since the state machine will
not be able to see related ICMP messages etc.
These states can be used together with the --state match to match
packets based on their connection tracking state. This is what makes
the state machine so incredibly strong and efficient for our packet
filter. Previously, we often had to open up all ports above 1024 to
let all traffic back into our LAN again. With the state machine in
place this is not necessary any longer, since we can now just open
up the packet filter for return traffic and not for all kinds of
other traffic.
TCP connections
In this section and the upcoming ones, we will take a closer look at
the states and how they are handled for each of the three basic
protocols TCP, UDP and ICMP.
Also, we will take a closer look at how connections are handled per
default, if they cannot be classified as either of these three
protocols. We have chosen to start out with the TCP protocol since it
is a stateful protocol in itself, and has a lot of interesting details
with regard to the state machine in netfilter.
A TCP connection is always initiated with the 3-way handshake, which
establishes and negotiates the actual connection over which data will
be sent. The whole session is begun with a SYN packet, then a SYN/ACK
packet and finally an ACK packet to acknowledge the whole session
establishment. At this point the connection is established and able to
start sending data. The big problem is, how does connection tracking
hook up into this? Quite simply really.
As far as the user is concerned, connection tracking works basically
the same for all connection types. Have a look at the picture below to
see exactly what state the stream enters during the different stages
of the connection.
As we can see, the connection tracking code does not really follow the
flow of the TCP connection, from the users viewpoint. Once it has seen
one packet (the SYN ), it considers the connection as NEW . Once it sees
the return packet (SYN/ACK ), it considers the connection as
ESTABLISHED .
If we think about this a second, we will understand why. With this
particular implementation, we can allow NEW and ESTABLISHED packets to
leave our LAN, only allow ESTABLISHED connections back, and that will
work perfectly.
Conversely, if the connection tracking machine were to consider the
whole connection establishment as NEW , we would never really be able
to stop outside connections to our LAN, since we would have to allow
NEW packets back in again.
To make things more complicated, there are a number of other internal
states that are used for TCP connections inside the kernel, but which
are not available for us from userspace. Roughly, they follow the state
standards specified within RFC 793.
As we can see, it is really quite simple, seen from the user's point
of view. However, looking at the whole construction from the kernel's
point of view, it is a little more difficult. Let us look at an
example.
Consider exactly how the connection states change in the
/proc/net/ip_conntrack table. The first state is reported upon receipt
of the first SYN packet in a connection.
tcp 6 117 SYN_SENT src=192.168.1.5 dst=192.168.1.35 sport=1031 dport=23 [UNREPLIED] src=192.168.1.35 dst=192.168.1.5 sport=23 dport=1031 use=1
As we can see from the above entry, we have a precise state in which a
SYN packet has been sent, (the SYN_SENT flag is set), and to which as
yet no reply has been sent (witness the [UNREPLIED] flag). The next
internal state will be reached when we see another packet in the other
direction.
tcp 6 57 SYN_RECV src=192.168.1.5 dst=192.168.1.35 sport=1031 dport=23 src=192.168.1.35 dst=192.168.1.5 sport=23 dport=1031 use=1
Now we have received a corresponding SYN/ACK in return. As soon as
this packet has been received, the state changes once again, this time
to SYN_RECV . SYN_RECV tells us that the original SYN was delivered
correctly and that the SYN/ACK return packet also got through the
firewall properly.
Moreover, this connection tracking entry has now seen traffic in both
directions and is hence considered as having been replied to. This is
not explicit, but rather assumed, as was the [UNREPLIED] flag above.
The final step will be reached once we have seen the final ACK in the
3-way handshake.
tcp 6 431999 ESTABLISHED src=192.168.1.5 dst=192.168.1.35 sport=1031 dport=23 src=192.168.1.35 dst=192.168.1.5 sport=23 dport=1031 [ASSURED] use=1
In the last example, we have gotten the final ACK in the 3-way
handshake and the connection has entered the ESTABLISHED state, as far
as the internal mechanisms of netfilter are aware. Normally, the
stream will be ASSURED by now.
A connection may also enter the ESTABLISHED state, but not be
[ASSURED] . This happens if we have connection pickup turned on (this
requires the tcp-window-tracking, and the ip_conntrack_tcp_loose to be
set to 1 or higher). The default, without the tcp-window-tracking, is
to have this behavior, and is not changeable.
When a TCP connection is closed down, it is done in the following way
and takes the following states.
As we can see, the connection is never really closed until the last
ACK is sent. Do note that this picture only describes how it is closed
down under normal circumstances. A connection may also, for example,
be closed by sending a RST (reset), if the connection were to be
refused. In this case, the connection would be closed down
immediately.
When the TCP connection has been closed down, the connection enters
the TIME_WAIT state, which is per default set to 2 minutes. This is
used so that all packets that have gotten out of order can still get
through our rule-set, even after the connection has already closed.
This is used as a kind of buffer time so that packets that have gotten
stuck in one or another congested router can still get to the
firewall, or to the other end of the connection.
If the connection is reset by a RST packet, the state is changed to
CLOSE . This means that the connection per default has 10 seconds
before the whole connection is definitely closed down.
RST packets are not acknowledged in any sense, and will break the
connection directly. There are also other states than the ones we have
told discussed so far.
Below is the complete list of possible states that a TCP stream may
take, and their timeout values (format is <state>: <timeout> ):
NONE : 30 minutes
ESTABLISHED : 5 days
SYN_SENT : 2 minutes
SYN_RECV : 60 seconds
FIN_WAIT : 2 minutes
TIME_WAIT : 2 minutes
CLOSE : 10 seconds
CLOSE_WAIT : 12 hours
LAST_ACK : 30 seconds
LISTEN : 2 minutes
These values are most definitely not absolute. They may change with
kernel revisions, and they may also be changed via the proc
file-system in the /proc/sys/net/ipv4/netfilter/ip_ct_tcp_* variables.
The default values should, however, be fairly well established in
practice. These values are set in seconds.
Also note that the userspace side of the state machine does not look
at TCP flags (i.e. RST , ACK , and SYN are flags) set in the TCP
packets. This is generally bad, since we may want to allow packets in
the NEW state to get through the firewall, but when we specify the NEW
flag, we will in most cases mean SYN packets.
This is not what happens with the current state implementation —
instead, even a packet with no bit set or an ACK flag, will count as
NEW . This can be used for redundant firewalling and so on, but it is
generally extremely bad on our home network, where we only have a
single firewall.
To get around this behavior, we could use the tcp-window-tracking
feature, and set /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_loose
to zero, which will make the packet filter drop all NEW packets with
anything but the SYN flag set.
UDP connections
UDP connections are in themselves not stateful connections, but rather
stateless. There are several reasons why, mainly because they do not
contain any connection establishment or connection closing — most of
all they lack sequencing.
Receiving two UDP datagrams in a specific order does not say anything
about the order in which they were sent. It is, however, still
possible to set states on the connections within the kernel. Let us
have a look at how a connection can be tracked and how it might look
in conntrack.
As we can see, the connection is brought up almost exactly in the same
way as a TCP connection. That is, from the user point of view.
Internally, conntrack information looks quite a bit different, but
intrinsically the details are the same. First of all, let us have a
look at the entry after the initial UDP packet has been sent:
udp 17 20 src=192.168.1.2 dst=192.168.1.5 sport=137 dport=1025 [UNREPLIED] src=192.168.1.5 dst=192.168.1.2 sport=1025 dport=137 use=1
As we can see from the first and second values, this is an UDP packet.
The first is the protocol name, and the second is protocol number.
This is just the same as for TCP connections. The third value marks
how many seconds this state entry has to live. After this, we get the
values of the packet that we have seen and the future expectations of
packets over this connection reaching us from the initiating packet
sender.
These are the source, destination, source port and destination port.
At this point, the [UNREPLIED] flag tells us that there has so far
been no response to the packet. Finally, we get a brief list of the
expectations for returning packets. Do note that the latter entries
are in reverse order to the first values. The timeout at this point is
set to 30 seconds, as per default:
udp 17 170 src=192.168.1.2 dst=192.168.1.5 sport=137 dport=1025 src=192.168.1.5 dst=192.168.1.2 sport=1025 dport=137 [ASSURED] use=1
At this point the server has seen a reply to the first packet sent out
and the connection is now considered as ESTABLISHED . This is not shown
in the connection tracking, as we can see. The main difference is that
the [UNREPLIED] flag has now gone.
Moreover, the default timeout has changed to 180 seconds — but in
this example that is by now been decremented to 170 seconds — in 10
seconds time, it will be 160 seconds.
There is one thing that is missing, though, and can change a bit, and
that is the [ASSURED] flag described above. For the [ASSURED] flag to
be set on a tracked connection, there must have been a legitimate
reply packet to the NEW packet
udp 17 175 src=192.168.1.5 dst=195.22.79.2 sport=1025 dport=53 src=195.22.79.2 dst=192.168.1.5 sport=53 dport=1025 [ASSURED] use=1
At this point, the connection has become assured. The connection looks
exactly the same as the previous example. If this connection is not
used for 180 seconds, it times out. 180 Seconds is a comparatively low
value, but should be sufficient for most use. This value is reset to
its full value for each packet that matches the same entry and passes
through the packet filter, just the same as for all of the internal
states.
ICMP connections
ICMP packets are far from a stateful stream, since they are only used
for controlling and should never establish any connections.
There are four ICMP types that will generate return packets however,
and these have 2 different states. These ICMP messages can take the
NEW and ESTABLISHED states. The ICMP types we are talking about are
Echo request and Echo reply , Timestamp request and Timestamp reply ,
=Information request and =Information reply= and finally Address mask
request and Address mask reply .
Out of these, the timestamp request and information request are
obsolete and could most probably just be dropped. However, the Echo
messages are used in several setups such as pinging hosts. Address
mask requests are not used often, but could be useful at times and
worth allowing. To get an idea of how this could look, have a look at
the following image.
As we can see in the above picture, the host sends an echo request to
the target, which is considered as NEW by the firewall. The target
then responds with a echo reply which the firewall considers as state
ESTABLISHED .
When the first echo request has been seen, the following state entry
goes into the ip_conntrack .
icmp 1 25 src=192.168.1.6 dst=192.168.1.10 type=8 code=0 id=33029 [UNREPLIED] src=192.168.1.10 dst=192.168.1.6 type=0 code=0 id=33029 use=1
This entry looks a little bit different from the standard states for
TCP and UDP as we can see. The protocol is there, and the timeout, as
well as source and destination addresses.
The problem comes after that however. We now have 3 new fields called
type , code and id . They are not special in any way, the type field
contains the ICMP type and the code field contains the ICMP code. The
final id field, contains the ICMP ID.
Each ICMP packet gets an ID set to it when it is sent, and when the
receiver gets the ICMP message, it sets the same ID within the new
ICMP message so that the sender will recognize the reply and will be
able to connect it with the correct ICMP request.
The next field, we once again recognize as the [UNREPLIED] flag, which
we have seen before. Just as before, this flag tells us that we are
currently looking at a connection tracking entry that has seen only
traffic in one direction.
Finally, we see the reply expectation for the reply ICMP packet, which
is the inversion of the original source and destination IP addresses.
As for the type and code, these are changed to the correct values for
the return packet, so an echo request is changed to echo reply and so
on. The ICMP ID is preserved from the request packet.
The reply packet is considered as being ESTABLISHED , as we have
already explained. However, we can know for sure that after the ICMP
reply, there will be absolutely no more legal traffic in the same
connection. For this reason, the connection tracking entry is
destroyed once the reply has traveled all the way through the
netfilter structure.
In each of the above cases, the request is considered as NEW , while
the reply is considered as ESTABLISHED . Let us consider this more
closely. When the firewall sees a request packet, it considers it as
NEW . When the host sends a reply packet to the request it is
considered ESTABLISHED .
Note that this means that the reply packet must match the criterion
given by the connection tracking entry to be considered as
established, just as with all other traffic types.
ICMP requests have a default timeout of 30 seconds, which we can change
in the /proc/sys/net/ipv4/netfilter/ip_ct_icmp_timeout entry. This
should in general be a good timeout value, since it will be able to
catch most packets in transit.
Another hugely important part of ICMP is the fact that it is used to
tell the hosts what happened to specific UDP and TCP connections or
connection attempts.
For this simple reason, ICMP replies will very often be recognized as
RELATED to original connections or connection attempts. A simple
example would be the ICMP Host unreachable or ICMP Network
unreachable. These should always be spawned back to our host if it
attempts an unsuccessful connection to some other host, but the
network or host in question could be down, and hence the last router
trying to reach the site in question will reply with an ICMP message
telling us about it. In this case, the ICMP reply is considered as a
RELATED packet. The following picture should explain how it would
look.
In the above example, we send out a SYN packet to a specific address.
This is considered as a NEW connection by the packet filter. However,
the network the packet is trying to reach is unreachable, so a router
returns a network unreachable ICMP error to us.
The connection tracking code can recognize this packet as RELATED .
thanks to the already added tracking entry, so the ICMP reply is
correctly sent to the client which will then hopefully abort.
Meanwhile, the firewall has destroyed the connection tracking entry
since it knows this was an error message.
The same behavior as above is experienced with UDP connections if they
run into any problem like the above. All ICMP messages sent in reply
to UDP connections are considered as RELATED . Consider the following
image.
This time an UDP packet is sent to the host. This UDP connection is
considered as NEW . However, the network is administratively prohibited
by some firewall or router on the way over.
Hence, our packet filter receives a ICMP Network Prohibited in return.
The packet filer knows that this ICMP error message is related to the
already opened UDP connection and sends it as a RELATED packet to the
client.
At this point, the packet filter destroys the connection tracking
entry, and the client receives the ICMP message and should hopefully
abort.
Default connections
In certain cases, the conntrack machine does not know how to handle a
specific protocol. This happens if it does not know about that
protocol in particular, or does not know how it works.
In these cases, it goes back to a default behavior. The default
behavior is used on, for example, NETBLT , MUX and EGP .
This behavior looks pretty much the same as the UDP connection
tracking. The first packet is considered NEW , and reply traffic and so
forth is considered ESTABLISHED .
When the default behavior is used, all of these packets will attain
the same default timeout value. This can be set via the
/proc/sys/net/ipv4/netfilter/ip_ct_generic_timeout variable.
The default value here is 600 seconds, or 10 minutes. Depending on
what traffic we are trying to send over a link that uses the default
connection tracking behavior, this might need changing. Especially if
we are bouncing traffic through satellites and such, which can take a
long time.
Untracked connections and the raw table
UNTRACKED is a rather special keyword when it comes to connection
tracking in Linux. Basically, it is used to match packets that has
been marked in the raw table not to be tracked.
The raw table was created specifically for this reason. In this table,
we set a NOTRACK mark on packets that we do not wish to track in
netfilter.
Notice how I say packets, not connection, since the mark is actually
set for each and every packet that enters. Otherwise, we would still
have to do some kind of tracking of the connection to know that it
should not be tracked.
As we have already stated, conntrack and the state machine is rather
resource hungry. For this reason, it might sometimes be a good idea to
turn off connection tracking and the state machine.
One example would be if we have a heavily trafficked router that we
want to firewall the incoming and outgoing traffic, but not the routed
traffic.
We could then set the NOTRACK mark on all packets not destined for the
firewall itself by ACCEPT ing all packets with destination of our host
in the raw table, and then set the NOTRACK for all other traffic.
This would then allow us to have stateful matching on incoming traffic
for the router itself, but at the same time save processing power from
not handling all the crossing traffic.
Another example when NOTRACK can be used is if we have a highly
trafficked web server and want to do stateful tracking, but do not
want to waste processing power on tracking the web traffic.
We could then set up a rule that turns of tracking for port 80 on all
the locally owned IP addresses, or the ones that are actually serving
web traffic.
We could then enjoy statefull tracking on all other services, except
for webtraffic which might save some processing power on an already
overloaded system.
There is however some problems with NOTRACK that we must take into
consideration. If a whole connection is set with NOTRACK , then we will
not be able to track related connections either, conntrack and nat
helpers will simply not work for untracked connections, nor will
related ICMP errors do i.e. we will have to open up for these
manually.
When it comes to complex protocols such as FTP and SCTP etc., this can
be very hard to manage. As long as we are aware of this, we should be
able to handle it however.
Complex protocols and connection tracking
Certain protocols are more complex than others. What this means when
it comes to connection tracking, is that such protocols may be harder
to track correctly.
Good examples of these are the ICQ, IRC and FTP protocols. Each and
every one of these protocols carries information within the actual
data payload of the packets, and hence requires special connection
tracking helpers to enable it to function correctly.
The complex protocols that have support inside the linux kernel are
FTP (File Transfer Protocol), IRC (Internet Relay Chat) and TFTP
(Trivial File Transfer Protocol).
Let us take the FTP protocol as the first example. The FTP protocol
first opens up a single connection that is called the FTP
control session.
When we issue commands through this session, other ports are opened to
carry the rest of the data related to that specific command. These
connections can be done in two ways, either actively or passively.
When a connection is done actively, the FTP client sends the server a
port and IP address to connect to. After this, the FTP client opens up
the port and the server connects to that specified port from a random
unprivileged port (>1024) and sends the data over it.
The problem here is that the packet filter will not know about these
extra connections, since they were negotiated within the actual
payload of the protocol data.
Because of this, the firewall will be unable to know that it should
let the server connect to the client over these specific ports.
The solution to this problem is to add a special helper to the
connection tracking module which will scan through the data in the
control connection for specific syntaxes and information.
When it runs into the correct information, it will add that specific
information as RELATED and the server will be able to track the
connection, thanks to that RELATED entry. Consider the following
picture to understand the states when the FTP server has made the
connection back to the client.
Passive FTP works the opposite way. The FTP client tells the server
that it wants some specific data, upon which the server replies with
an IP address to connect to and at what port.
The client will, upon receipt of this data, connect to that specific
port, from its own port 20 (the FTP-data port), and get the data in
question.
If we have an FTP server behind our package filter, we will require
this module in addition to our standard netfilter modules to let
clients on the Internet connect to the FTP server properly. The same
goes if we are extremely restrictive to our users, and only want to
let them reach HTTP and FTP servers on the Internet and block all
other ports. Consider the following image and its bearing on Passive
FTP.
Some conntrack helpers are already available within the kernel itself.
More specifically, the FTP and IRC protocols have conntrack helpers as
of writing this. If we can not find the conntrack helpers that we need
within the kernel itself, we should have a look at the xtables-addons
package.
The xtables-addons tree may contain more conntrack helpers, such as
for the ntalk or H.323 protocols. If they are not available in the
xtables-addons tree, we have a number of options.
Either we can look at the CVS source of netfilter, if it has recently
gone into that tree, or we can contact the netfilter-devel mailing
list and ask if it is available.
Conntrack helpers may either be statically compiled into the kernel,
or be available as kernel module
wks:/home/sa# modprobe -l *conntrack* | xargs -I {} basename {}
nf_conntrack_proto_sctp.ko
nf_conntrack_netlink.ko
nf_conntrack_h323.ko
nf_conntrack_tftp.ko
nf_conntrack_irc.ko
xt_conntrack.ko
nf_conntrack_ftp.ko
nf_conntrack_netbios_ns.ko
nf_conntrack_amanda.ko
nf_conntrack_sane.ko
nf_conntrack_proto_udplite.ko
nf_conntrack_pptp.ko
nf_conntrack.ko
nf_conntrack_proto_gre.ko
nf_conntrack_sip.ko
nf_conntrack_proto_dccp.ko
nf_conntrack_ipv6.ko
nf_conntrack_ipv4.ko
wks:/home/sa#
If they are compiled as modules, we can load them using modprobe as
shown below
wks:/home/sa# modprobe ip_conntrack_ftp
wks:/home/sa# lsmod | grep conntrack
nf_conntrack_amanda 8832 0
nf_conntrack_irc 10680 0
nf_conntrack_ftp 12728 0
nf_conntrack 82688 3 nf_conntrack_amanda,nf_conntrack_irc,nf_conntrack_ftp
wks:/home/sa#
Do note that connection tracking has nothing to do with NAT, and hence
we may require more modules if we are NAT'ing connections as well.
For example, if we want to NAT and track FTP connections, we would
need the NAT module as well. As of now (April 2009), all NAT helpers
start with nf_nat_ and follow that naming convention i.e. the FTP NAT
helper would be named nf_nat_ftp and the IRC module would be named
nf_nat_irc . The conntrack helpers follow the same naming convention,
and hence the IRC conntrack helper would be named nf_conntrack_irc ,
while the FTP conntrack helper would be named nf_conntrack_ftp .
Tables / Chains / Rules
The xtables framework, used by the modules ip_tables , ip6_tables ,
arp_tables and ebtables allows us to define tables containing chains
of rules for the treatment of packets.
The tables we know are filter , nat , mangle and raw . Each table is
associated with a different kind of packet processing. Packets are
processed by traversing the chains, rule by rule. A rule in a chain
can send a packet to another chain, and this can be repeated to
whatever level of nesting is desired.
Every network packet arriving at or leaving from the computer
traverses at least one chain. The source of the packet determines
which chain it traverses initially.
There are three predefined chains (INPUT , OUTPUT , and FORWARD ) in the
filter table. Predefined chains have a default policy, for example
DROP , which is applied to the packet if it reaches the end of the
chain.
However, we can create as many other custom chains as desired. These
chains have no default policy i.e. if a packet reaches the end of the
chain it is returned to the chain which called it. A chain may be
empty.
Each rule in a chain contains the specification of which packets it
matches. It may also contain a target. As a packet traverses a chain,
each rule in turn examines it. If a rule does not match the packet,
the packet is passed on to the next rule.
If a rule does match the packet, the rule takes the action indicated
by the target, which may result in the packet being allowed to
continue along the chain or not. The packet continues to traverse the
chain until either
- a rule matches the packet and decides the ultimate fate of the
packet e.g. by calling a target like for example
ACCEPT , DROP ,
QUEUE etc. or
- a rule calls the
RETURN target, in which case processing returns to
the calling chain or
- the end of the chain is reached
Packet Flow and Relationship between Tables and Chains
It is now time to take a look at the actual packet flow and how the
pieces (tables and chains) actually fit together.
When packets travel/traverse through a packet filter there is a
certain order applied to that process. The image below is going to
help us get to grips with the whole routing/filtering/mangling/nating
shebang that may happen to an IP packet as it traverses through a
packet filter.
When an IP packet first enters the packet filter, it hits the hardware
and then gets passed on to the proper device driver in the kernel.
Then the packet starts to travel through a series of steps in the
Linux kernel, before it is either sent to the correct application
(i.e. a local process) running on the local machine, or it is
forwarded to another machine.
Let us now take a look at the three major cases that might happen with
regards to packet filtering/routing:
- inbound i.e. Source: non-local (e.g. Internet), Destination: local
process
- outbound i.e. Source: local process, Destination: outgoing (e.g.
to the Internet)
- forwarding i.e. Source: non-local, Destination: non-local
Inbound: Source = non-local, Destination = Local Process
First, let us have a look at a packet that is destined for our local
machine i.e. a local process. It would pass through the following
steps before it is actually being delivered to our application that
receives it:
- On the wire e.g. Internet.
- Comes in on the interface e.g.
eth0 .
- table:
raw , chain: PREROUTING . This chain is used to handle packets
before the connection tracking takes place. It can be used to set a
specific connection not to be handled by the connection tracking
code for example.
- Next the connection tracking takes place.
- table:
mangle , chain: PREROUTING . This chain is normally used for
mangling packets, e.g. changing TOS (Type of Services) and so on.
- table:
nat , chain: PREROUTING . This chain is used for DNAT mainly.
We should avoid filtering in this chain since it will be bypassed
in certain cases.
- Next a routing decision takes place, i.e. is the packet destined
for our local machine or to be forwarded. It is also important to
note, that part of concluding a routing decision is doing
ingress filtering and/or making QoS (Quality of Service) routing
decisions.
- table:
mangle , chain: INPUT . At this point, the mangle INPUT chain
is hit. We use this chain to mangle packets, after they have been
routed, but before they are actually sent to the local process on
our local machine.
- table:
filter , chain: INPUT . This is where we do filtering for all
incoming traffic destined for our local machine. Note that all
incoming packets destined for this machine pass through this chain,
no matter what interface or in which direction they came from.
- Local process receives the IP packet.
Outbound: Source = Local Process, Destination = Outgoing
Now we look at the outgoing packets from our own local host and what
steps they go through.
- Local process creates an IP packet, sends it to the Linux kernel
network stack for processing.
- The IP packet is given a source address, the outgoing interface to
use, and other necessary information that needs to be gathered.
Based up that information, a routing decision is made.
- table:
raw , chain: OUTPUT . This is where we do work before the
connection tracking takes place for locally generated packets — we
can mark connections so that they will not be tracked for example.
- This is where the connection tracking takes place for locally
generated packets, for example state changes etc.
- table:
mangle , chain: OUTPUT . This is where we mangle packets, it
is suggested that we do not filter in this chain since it can have
side effects.
- table:
nat , chain: OUTPUT . This chain can be used to NAT outgoing
packets from the firewall itself.
- Routing decision, since the previous mangle and nat changes may
have changed how the packet should be routed.
- table:
filter , chain: OUTPUT . This is where we filter packets going
out from the local machine.
- table:
mangle , chain: POSTROUTING . The POSTROUTING chain in the
mangle table is mainly used when we want to do mangling on packets
before they leave our machine, but after the actual routing took
place. This chain will be hit by both packets just traversing the
firewall, as well as packets created by the firewall itself.
- table:
nat , chain: POSTROUTING . This is where we do SNAT. It is
suggested that we do not do filtering here since it can have side
effects, and certain packets might slip through even though we set
a default policy of DROP .
- The IP packet goes out on some interface e.g.
eth0 . We might also
do egress filtering at this point which certainly is a good idea in
case our firewall is the gateway for one or several LANs.
- It is now on the wire e.g. Internet.
Forwarding: Source = non-local, Destination = non-local
Now we are assuming that the packet is destined for another machine on
another network. The packet goes through the different steps in the
following fashion:
- On the wire e.g. Internet.
- IP packet comes in on the interface e.g.
eth0 .
- table:
raw , chain: PREROUTING . Here we can set a connection to
not be handled by the connection tracking system.
- This is where the non-locally generated connection tracking takes
place.
- table:
mangle , chain: PREROUTING . This chain is normally used for
mangling packets e.g. changing TOS and so on.
- table:
nat , chain: PREROUTING . This chain is used for DNAT
mainly. SNAT is done further on. We should avoid filtering in
this chain since it will be bypassed in certain cases.
- Routing decision takes place i.e. is the packet destined for our
local machine or to be forwarded and where. It is also important to
note, that part of concluding a routing decision is doing
ingress filtering and/or making QoS (Quality of Service) routing
decisions.
- table:
mangle , chain: FORWARD . The packet is then sent on to the
FORWARD chain of the mangle table. This can be used for very
specific needs, where we want to mangle the packets after the
initial routing decision, but before the last routing decision is
made, just before the packet is sent out.
- table:
filter , chain: FORWARD The packet gets routed onto the
FORWARD chain. Only forwarded packets go through here, and here we
do all the filtering. Note that all traffic that is forwarded goes
through here (i.e. not only incoming), so we need to think about it
when writing our rule-set. We should not use the INPUT chain to
filter on in the current forwarding scenario! INPUT is meant solely
for packets to our local machine that do not get routed to any
other destination.
- table:
mangle , chain: POSTROUTING . This chain is used for
specific types of packet mangling that we wish to take place
after all kinds of routing decisions have been made, but still on
this machine.
- table:
nat , chain: POSTROUTING . This chain should first and
foremost be used for SNAT. Again, we should avoid doing filtering
here, since certain packets might pass this chain without ever
hitting it. This is also where masquerading is done.
- Finally the IP packet goes out on the outgoing interface e.g.
eth1 . We might also do egress filtering at this point which
certainly is a good idea in case our firewall is the gateway for
one or several LANs.
- Out on the wire again e.g. LAN.
As we can see, there are quite a lot of steps to pass through. The
packet can be stopped at any of the iptables chains, or anywhere else
if it is malformed. However, we are mainly interested in the packet
filtering aspect right now.
Do note that there are no specific chains or tables for different
interfaces or anything like that. FORWARD is always passed by all
packets that are forwarded over this firewall/router.
The Routing Tables in Detail
Now is the right time to take a closer look at the four routing
tables, why they exist, what they do in particular and what their
relationship is.
raw
The raw table and its chains are used before any other tables in
netfilter. For this table to work, the iptable_raw module must be
loaded. It will be loaded automatically if iptables is run with the -t
raw keyword, and if the module is available.
-
raw is mainly only used for one thing, and that is to set a mark on
packets that they should not be handled by the connection tracking
system. This is done by using the NOTRACK target on the packet. If a
connection is hit with the NOTRACK target, then conntrack will simply
not track the connection.
-
This has been impossible to solve without adding a new table, since
none of the other tables are called until after conntrack has actually
been run on the packets, and been added to the conntrack tables, or
matched against an already available connection.
This table is rather new and is only available, if compiled, with late
2.6 kernels and later. The raw table contains two chains. The
PREROUTING and OUTPUT chain, where they will handle IP packets
before they hit any of the other netfilter subsystems.
The PREROUTING chain can be used for all incoming packets to this
machine, or that are forwarded, while the OUTPUT chain can be used to
alter the locally generated packets before they hit any of the other
netfilter subsystems.
mangle
This table is used mainly for mangling packets (by using mangle
targets). Among other things, we can change the contents of different
packets and that of their headers. We are strongly advised not to use
this table for any filtering — nor will any DNAT, SNAT or
masquerading work in this table.
Examples of this would be to change the TTL (Time to Live), TOS or
MARK. Note that the MARK is not really a change to the packet, but a
mark value for the packet is set in kernelspace. Other rules or
programs might use this mark further along in the firewall to filter
or do advanced routing on —
tc or ip are examples.
The table consists of five built-in chains namely PREROUTING ,
POSTROUTING , OUTPUT , INPUT and FORWARD :
-
PREROUTING is used for altering packets just as they enter the packet
filter and before they hit the routing decision.
-
POSTROUTING is used to mangle packets just after all routing decisions
have been made.
-
OUTPUT is used for altering locally generated packets after they enter
the routing decision.
-
INPUT is used to alter packets after they have been routed to the
local computer itself, but before the userspace application actually
sees the data.
-
FORWARD is used to mangle packets after they have hit the first
routing decision, but before they actually hit the last routing
decision.
Note that mangle can not be used for any kind of NAT or masquerading,
the nat table was made for these kinds of operations.
nat
The nat table is used mainly for NAT. NATed packets get their IP
addresses altered, according to our rules. Packets in a stream only
traverse this table once.
We assume that the first packet of a stream is allowed. The rest of
the packets in the same stream are then automatically NATed, and will
be subject to the same actions as the first packet.
These will, in other words, not go through this table again, but will
nevertheless be treated like the first packet in the stream. This is
the main reason why we should not do any filtering in this table.
The PREROUTING chain is used to alter packets as soon as they get come
into the package filter. The OUTPUT chain is used for altering locally
generated packets (i.e. on the package filter itself) before they get
to the routing decision.
Finally we have the POSTROUTING chain which is used to alter packets
just as they are about to leave the firewall.
filter
The filter table should be used exclusively for filtering packets. For
example, we could DROP , LOG , ACCEPT or REJECT packets without
problems, as we can in the other tables.
With the filter table, we can match packets and filter them in
whatever way we want. This is the place that we actually take action
against packets and look at what they contain and either deny or
permit them, depending on their content.
Almost all targets are usable in this table — this table is the right
place to do your main filtering.
There are three chains built in to this table. The first one is named
FORWARD and is used on all non-locally generated packets that are not
destined for our local machine. INPUT is used on all packets that are
destined for our local machine (the package filter itself) and OUTPUT
is finally used for all locally generated packets.
User-specified chains
If an IP packet enters a chain such as the INPUT chain in the filter
table, we can specify a jump rule to a different chain within the same
table.
The new chain must be user-specified, it may not be a built-in chain
such as the INPUT or FORWARD chain for example. If we consider a
pointer pointing at the rule in the chain to execute, the pointer will
go down, rule by rule, from top to bottom (there is no such thing like
goto) until the chain traversal is either ended by a target or the
main chain (i.e. FORWARD , INPUT , etc.) ends. Once this happens, the
default policy of the built-in chain will be applied.
If one of the rules that matches points to another userspecified chain
in the jump specification, the pointer will jump over to this chain
and then start traversing that chain from the top to bottom.
For example, see how the rule execution jumps from rule number 3 to
chain 2 in the above image. The IP packet matched the matches
contained in rule 3, and the jump/target specification was set to send
the packet on for further examination in chain 2.
User-specified chains cannot have a default policy i.e. -P
<name_of_user-specified_chain> DROP for example will not work. Only
built-in chains (FORWARD , OUTPUT , INPUT , PREROUTING and POSTROUTING )
have default policies.
However, this can be circumvented by appending a single rule at the
end of the user-specified chain, which has no matches, and hence it
will behave as a default policy. Best practices however is to really
only use default policies with built-in chains!
If no rule is matched in a userspecified chain, the default behavior
is to jump back to the originating chain as can be seen in the image
above i.e. the rule execution jumps from chain 2 and back to chain 1
rule 4, below the rule that sent the rule execution into chain 2 to
begin with.
Each and every rule in the user-specified chain is traversed until
either one of the rules matches or the end of the chain is reached. If
we have a match before the end of the chain is reached, the target
specifies if the traversing should end or continue.
If the end of the user-specified chain is reached, the packet is sent
back to the invoking chain. The invoking chain can be either a
user-specified chain or a built-in chain i.e. user-specified chains
can be nested.
Rules
A rule could be described as the directions the packet filter will
adhere to when blocking or permitting different connections and
packets in a specific chain.
Each line we write that is inserted in a chain should be considered a
rule. We will also discuss the basic matches that are available, and
how to use them, as well as the different targets and how we can
construct new targets of our own i.e. by creating
user-specified chains.
Closer Look at Rules
Each rule is a line that the kernel looks at to find out what to do
with an IP packet. If all the matches are met, the target/jump
instruction is performed.
Normally we would write our rules in a syntax that looks something
like this: iptables [-t table] command [match] [target/jump] like for
example
iptables -t nat -A POSTROUTING -s 10.0.3.0/24 -o eth0 -j SNAT --to 10.0.0.34
^^^^^^ ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
table command match target/jump
There is nothing that says that the target/jump instruction has to be
the last function in the line. However, we would usually adhere to
this convention to get the best readability — most of the rules we
will see are written in this way. Hence, if we read someone else's
script, we will most likely recognize the syntax and easily understand
the rule.
If we want to use a table other than the standard table, we can insert
the table specification at the point at which [table] is specified.
However, it is not necessary to state explicitly what table to use,
since by default iptables uses the filter table on which to implement
all commands.
Neither do we have to specify the table at just this point in the
rule. It could be set pretty much anywhere along the line. However, it
is more or less convention to put the table specification at the
beginning.
One more thing to think though is that the command should always come
first, or alternatively directly after the table specification. We use
command to tell iptables what to do, for example to insert (-I ) a rule
or to add (-A ) a rule to the end of the chain, or to delete (-D ) a
rule.
The match is the part of the rule that we send to the kernel that
details the specific character of the packet — what makes it
different from all other packets. Here we could specify what IP
address the packet comes from, from which network interface, the
intended IP address, port, protocol or whatever. There is a heap of
different matches that we can use.
Finally we have the target/jump of the packet. If all the matches are
met for a packet, we tell the kernel what to do with it. We could, for
example, tell the kernel to send the packet to a user-specified chain
that we have created ourselves, and which is part of this particular
table. We could also tell the kernel to drop the packet and do no
further processing, or we could tell the kernel to send a specified
reply to the sender.
Example Commands
The command tells iptables what to do with the rest of the rule.
Normally we would want either to add or delete something in some table
or another. The following commands are available to iptables:
-A or --append
- Example:
iptables -A INPUT...
- Explanation: This command appends the rule to the end of the
chain. The rule will in other words always be put last in the
rule-set and hence be checked last, unless we append more rules
later on.
-D or --delete
- Example:
iptables -D INPUT --dport 80 -j DROP or iptables -D INPUT
1 .
- Explanation: This command deletes a rule in a chain. This could be
done in two ways. Either by entering the whole rule to match (as
in the first example), or by specifying the rule number that we
want to match. If we use the first method, our entry must match
the entry in the chain exactly. If we use the second method, we
must match the number of the rule we want to delete. The rules
are numbered from the top of each chain, starting with number 1.
-R or --replace
- Example:
iptables -R INPUT 1 -s 192.168.0.1 -j DROP .
- Explanation: This command replaces the old entry at the specified
line. It works in the same way as the
--delete command, but
instead of totally deleting the entry, it will replace it with a
new entry. The main use for this might be while we are still
experimenting with iptables.
-I or --insert
- Example:
iptables -I INPUT 1 --dport 80 -j ACCEPT .
- Explanation: Insert a rule somewhere in a chain. The rule is
inserted as the actual number that we specify. In other words, the
above example would be inserted as rule 1 in the
INPUT chain, and
hence from now on it would be the very first rule in the chain.
Hint: see --line-numbers and --list to get a numbered list or
rules.
-L or --list
- Example:
iptables -L INPUT .
- Explanation: This command lists all the entries/rules in the
specified chain. In the above case, we would list all the entries
in the
INPUT chain. It is also legal to not specify any chain at
all. In the last case, the command would list all the chains in
the specified table. The exact output is affected by other options
like for example the -n and -v etc.
-S or --list-rules
- Example:
iptables -S INPUT
- Explanation: Print all rules in the selected chain e.g.
INPUT . If
no chain is selected, all chains are printed like with
iptables-save . Like every other iptables command, it applies to
the specified table (filter is the default).
-F or --flush
- Example:
iptables -F INPUT .
- Explanation: This command flushes all rules from the specified
chain and is equivalent to deleting each rule one by one, but is
quite a bit faster. The command can be used without options, and
will then delete all rules in all chains within the specified
table.
-Z or --zero
- Example:
iptables -Z INPUT .
- Explanation: This command tells the program to zero all counters
in a specific chain, or in all chains. If we have used the
-v
option with the -L command, we have probably seen the packet
counter at the beginning of each field. To zero this packet
counter, we use the -Z option. This option works the same as -L ,
except that -Z will not list the rules. If -L and -Z is used
together, the chains will first be listed, and then the packet
counters are zeroed.
-N or --new-chain
- Example:
iptables -N allowed .
- Explanation: This command tells the kernel to create a new chain
of the specified name in the specified table. In the above example
we create a chain called allowed. Note that there must not already
be a chain or target of the same name.
-X or --delete-chain
- Example:
iptables -X allowed .
- Explanation: This command deletes the specified chain from the
table. For this command to work, there must be no rules that refer
to the chain that is to be deleted. In other words, we would have
to replace or delete all rules referring to the chain before
actually deleting the chain. If this command is used without any
options, all chains but the built-in ones to the specified table
will be deleted.
-P or --policy
- Example:
iptables -P INPUT DROP .
- Explanation: This command tells the kernel to set a specified
default target, or policy, on a chain. All packets that do not
match any rule will then be forced to use the policy of the chain.
Legal targets are
DROP and ACCEPT .
-E or --rename-chain
- Example:
iptables -E <old_name> <new_name> .
- Explanation: The
-E command tells iptables to change the name of
the chain old_name to new_name . Note that this will not affect the
actual way the table will work. It is, in other words, just a
cosmetic change to the table.
We should always enter a complete command line, unless we just want to
list the built-in help for iptables or get the version of the command.
To get the version, -v can be used and to get the help message, -h is
the way to go.
Matches
A match is something that specifies a special condition within the
packet that must be true (or false). A single rule can contain
several matches of any kind.
For example, we may want to match packets that come from a specific
host located within our LAN, and (logical AND) on top of that only
from specific ports on that host. We could then use matches to tell
the rule to only apply the target (or jump specification) on packets
that have a specific source address, that come in on the interface
that connects to the LAN and the packets must be one of the specified
ports.
If any one of these matches fails (e.g. the source address is not
correct, but everything else is true), the whole rule fails and the
next rule is tested on the packet. If all matches are true, however,
the target specified by the rule is applied.
Roughly speaking, matches can be classified into five different
subcategories:
- First of all we have the generic matches, which can be used in all
rules.
- Then we have the TCP matches which can only be applied to TCP
packets.
- We have UDP matches which can only be applied to UDP packets, and
- ICMP matches which can only be used on ICMP packets.
- Finally we have special matches, such as the
state , owner and
limit matches and so on.
These final matches have in turn been narrowed down to even more
subcategories, even though they might not necessarily be different
matches at all.
Generic Matches
A generic match is the kind of match that is always available,
whatever kind of protocol we are working on, or whatever match
extensions we have loaded. No special parameters at all are needed to
use these matches.
-p or --protocol :
- Example:
iptables -A INPUT -p tcp
- Explanation: This match is used to check for certain protocols.
Examples of protocols are TCP, UDP and ICMP. The protocol must
either be one of the internally specified TCP, UDP or ICMP. It may
also take a value specified in the
/etc/protocols file, and if it
can not find the protocol there it will reply with an error. The
protocol may also be an integer value — the ICMP protocol is
integer value 1, TCP is 6 and UDP is 17. Finally, it may also take
the value ALL. ALL means that it matches only TCP, UDP and ICMP.
If this match is given the integer value of zero (0), it means ALL
protocols, which in turn is the default behavior, if the
--protocol match is not used. This match can also be inversed with
the ! sign, so --protocol ! tcp would mean to match UDP and ICMP.
-s or --src or --source
- Example:
iptables -A INPUT -s 192.168.1.1
- Explanation: This is the source match, which is used to match
packets, based on their source IP address. The main form can be
used to match single IP addresses, such as
192.168.1.1 . It could
also be used with a netmask in a CIDR (Classless Inter-Domain
Routing) bit form, by specifying the number of ones (1's) on the
left side of the network mask. This means that we could for
example add /24 to use a 255.255.255.0 netmask. We could then
match whole IP ranges, such as our local networks or network
segments behind the firewall. The line would then look something
like 192.168.0.0/24 . This would match all packets in the
192.168.0.x range. Another way is to do it with a regular netmask
in the 255.255.255.255 form i.e. 192.168.0.0/255.255.255.0 ). We
could also invert the match with an ! just as before. If we were,
to use a match in the form of --source ! 192.168.0.0/24 , we would
match all packets with a source address not coming from within the
192.168.0.x range. The default is to match all IP addresses if no
particular source IP address or IP address range is specified.
-d or --dst or --destination
- Example:
iptables -A INPUT -d 192.168.1.1
- Explanation: The
--destination match is used for packets based on
their destination address or addresses. It works pretty much the
same as the --source match and has the same syntax, except that
the match is based on where the packets are going to. To match an
IP range, we can add a netmask either in the exact netmask form,
or in the number of ones (1's) counted from the left side of the
netmask bits. Examples are: 192.168.0.0/255.255.255.0 and
192.168.0.0/24 . Both of these are equivalent. We could also invert
the whole match with an ! sign, just as before i.e. --destination
! 192.168.0.1 would match all packets except those destined to the
192.168.0.1 IP address.
-i or --in-interface
- Example:
iptables -A INPUT -i eth0
- Explanation: This match is used for the interface the packet came
in on. Note that this option is only legal in the
INPUT , FORWARD
and PREROUTING chains and will return an error message when used
anywhere else. The default behavior of this match, if no
particular interface is specified, is to assume a string value of
+ (a glob). The + value is used to match a string of letters and
numbers. A single + would, tell the kernel to match all packets
without considering which interface it came in on. The + string
can also be appended to the type of interface, so eth+ would be
all ethernet devices. We can also invert the meaning of this
option with the help of the ! sign. The line would then have a
syntax looking something like -i ! eth0 , which would match all
incoming interfaces, except eth0 .
-o or --out-interface
- Example:
iptables -A FORWARD -o eth0
- Explanation: The
--out-interface match is used for packets on the
interface from which they are leaving. Note that this match is
only available in the OUTPUT , FORWARD and POSTROUTING chains, the
opposite in fact of the --in-interface match. Other than this, it
works pretty much the same as the --in-interface match. The +
extension is understood as matching all devices of similar type,
so eth+ would match all eth devices and so on. To invert the
meaning of the match, we can use the ! sign in exactly the same
way as for the --in-interface match. If no --out-interface is
specified, the default behavior for this match is to match all
devices, regardless of where the packet is going.
-f or --fragment
- Example:
iptables -A INPUT -f
- Explanation: This match is used to match the second and third part
of a fragmented packet. The reason for this is that in the case of
fragmented packets, there is no way to tell the source or
destination ports of the fragments, nor ICMP types, among other
things. Also, fragmented packets might in rather special cases be
used to compound attacks against other computers. Packet fragments
like this will not be matched by other rules, and hence this match
was created. This option can also be used in conjunction with the
! sign. However, in this case the ! sign must precede the match,
i.e. ! -f . When this match is inverted, we match all header
fragments and/or unfragmented packets. What this means, is that we
match all the first fragments of fragmented packets, and not the
second, third, and so on. We also match all packets that have not
been fragmented during transfer. Note also that there are really
good defragmentation options within the kernel that we can use
instead. As a secondary note, if we use connection tracking we
will not see any fragmented packets, since they are dealt with
before hitting any chain or table.
Implicit matches
This section will describe the matches that are loaded implicitly.
Implicit matches are implied, taken for granted, automatic.
For example when we match on --protocol tcp without any further
criteria. There are currently three types of implicit matches for
three different protocols. These are
- TCP matches
- UDP matches
- ICMP matches
The TCP based matches contain a set of unique criteria that are
available only for TCP packets. UDP based matches contain another set
of criteria that are available only for UDP packets. And the same
thing for ICMP packets.
On the other hand, there can be explicit matches that are loaded
explicitly. Explicit matches are not implied or automatic, we have to
specify them specifically. For these we use the -m or --match option,
which we will discuss in the next section.
TCP matches
These matches are protocol specific and are only available when
working with TCP packets and streams. To use these matches, we need to
specify --protocol tcp before trying to use them.
Note that the --protocol tcp match must be to the left of the protocol
specific matches. These matches are loaded implicitly in a sense, just
as the UDP and ICMP matches are loaded implicitly.
--sport or --source-port
- Example:
iptables -A INPUT -p tcp --sport 22
- Explanation: The
--source-port match is used to match packets
based on their source port. Without it, we imply all source ports.
This match can either take a service name or a port number. If we
specify a service name, the service name must be in the
/etc/services file, since iptables uses this file to map service
names to ports. If we specify the port by its number, the rule
will load slightly faster, since iptables do not have to check up
the service name. However, the match might be a little bit harder
to read than if we use the service name. If we are writing a
rule-set consisting of a 200 rules or more, we should definitely
use port numbers, since the difference is really noticeable. (On a
slow box, this could make as much as 10 seconds difference, if we
have configured a large rule-set containing 1000 rules or so). We
can also use the --source-port match to match any range of ports,
--source-port 22:80 for example. This example would match all
source ports between 22 and 80. If we omit specifying the first
port, port 0 is assumed (is implicit) i.e. --source-port :80 would
then match port 0 through 80. And if the last port specification
is omitted, port 65535 is assumed i.e. if we were to write
--source-port 22: , we would have specified a match for all ports
from port 22 through port 65535. If we invert the port range,
iptables automatically reverses our inversion. If we write
--source-port 80:22 , it is simply interpreted as --source-port
22:80 . We can also invert a match by adding a ! sign. For
example, --source-port ! 22 means that we want to match all ports
but port 22. The inversion could also be used together with a port
range and would then look like --source-port ! 22:80 , which in
turn would mean that we want to match all ports but ports 22
through 80. Note that this match does not handle multiple
separated ports and port ranges. For more information about those,
look at the multiport match extension.
--dport or --destination-port
- Example:
iptables -A INPUT -p tcp --dport 22
- Explanation: This match is used to match TCP packets, according to
their destination port. It uses exactly the same syntax as the
--source-port match. It understands port and port range
specifications, as well as inversions. It also reverses high and
low ports in port range specifications, as above. The match will
also assume values of 0 and 65535 if the high or low port is left
out in a port range specification. In other words, exactly the
same as the --source-port syntax. Note that this match does not
handle multiple separated ports and port ranges. For more
information about those, look at the multiport match extension.
--tcp-flags
- Example:
iptables -p tcp --tcp-flags SYN,FIN,ACK SYN
- Explanation: This match is used to match on the TCP flags in a
packet. First of all, the match takes a list of TCP flags to
compare (a mask) and secondly it takes a list of flags that should
be set (i.e. be 1). Both lists should be comma-delimited. The
match knows about the
SYN , ACK , FIN , RST , URG , PSH flags, and it
also recognizes the words ALL and NONE . ALL and NONE is pretty
much self-explaining: ALL means to use all flags and NONE means to
use no flags for the option e.g. --tcp-flags ALL NONE would mean
to check all of the TCP flags and match if none of the flags are
set. This option can also be inverted with the ! sign. For
example, if we specify ! SYN,FIN,ACK SYN , we would get a match
that would match packets that had the ACK and FIN bits set, but
not the SYN bit.
--syn
- Example:
iptables -p tcp --syn
- Explanation: The
--syn match is more or less an old relic from the
ipchains days and is still there for backward compatibility and
for and to make transition one to the other easier. It is used to
match packets if they have the SYN bit set and the ACK and RST
bits unset. This command would in other words be exactly the same
as the --tcp-flags SYN,RST,ACK SYN match. Such packets are mainly
used to request new TCP connections from a server. If we block
these packets, we should have effectively blocked all incoming
connection attempts. However, we will not have blocked the
outgoing connections, which a lot of exploits today use (for
example, hacking a legitimate service and then installing a
program or suchlike that enables initiating an existing connection
to our machine, instead of opening up a new port on it). This
match can also be inverted with the ! sign like this ! --syn . This
would match all packets with the RST or the ACK bits set, in other
words packets in an already established connection.
--tcp-option
- Example:
iptables -p tcp --tcp-option 16
- Explanation: This match is used to match packets depending on
their TCP options. A TCP Option is a specific part of the header.
This part consists of 3 different fields. The first one is 8 bits
long and tells us which options are used in this stream, the
second one is also 8 bits long and tells us how long the options
field is. The reason for this length field is that TCP options are
optional. To be compliant with the standards, we do not need to
implement all options, but instead we can just look at what kind
of option it is, and if we do not support it, we just look at the
length field and can then jump over this data. This match is used
to match different TCP options depending on their decimal values.
It may also be inverted with the
! flag, so that the match matches
all TCP options but the option given to the match.
UDP matches
This section describes matches that will only work together with UDP
packets. These matches are implicitly loaded when we specify the
--protocol udp match and will be available after this specification.
Note that UDP packets are not connection oriented, and hence there is
no such thing as different flags to set in the packet to give data on
what the datagram is supposed to do, such as open or closing a
connection, or if they are just simply supposed to send data.
UDP packets do not require any kind of acknowledgment either. If they
are lost, they are simply lost (Not taking ICMP error messaging etc.
into account). This means that there are quite a lot less matches to
work with on a UDP packet than there is on TCP packets.
Also note that the state machine will work on all kinds of packets
even though UDP or ICMP packets are counted as connectionless
protocols. The state machine works pretty much the same on UDP packets
as on TCP packets.
--sport or --source-port
- Example:
iptables -A INPUT -p udp --sport 53
- Explanation: This match works exactly the same as its TCP
counterpart. It is used to perform matches on packets based on
their source UDP ports. It has support for port ranges, single
ports and port inversions with the same syntax. To specify a UDP
port range, we could use
22:80 which would match UDP ports 22
through 80. If the first value is omitted, port 0 is assumed. If
the last port is omitted, port 65535 is assumed. If the high port
comes before the low port, the ports switch place with each other
automatically. Single UDP port matches look as in the example
above. To invert the port match, add a ! sign like for example
--source-port ! 53 . This would match all ports but port 53. The
match can understand service names, as long as they are available
in the /etc/services file. Note that this match does not handle
multiple separated ports and port ranges. For more information
about this, look at the multiport match extension.
--dport or --destination-port
- Example:
iptables -A INPUT -p udp --dport 53
- Explanation: The same goes for this match as for
--source-port
above. It is exactly the same as for the equivalent TCP match, but
here it applies to UDP packets. It matches packets based on their
UDP destination port. The match handles port ranges, single ports
and inversions. To match a single port we use, for example,
--destination-port 53 , to invert this we would use
--destination-port ! 53 . The first would match all UDP packets
going to port 53 while the second would match all packets but
those going to the destination port 53. To specify a port range,
we would, for example, use --destination-port 9:19 . This example
would match all packets destined for UDP port 9 through 19. If the
first port is omitted, port 0 is assumed. If the second port is
omitted, port 65535 is assumed. If the high port is placed before
the low port, they automatically switch place, so the low port
winds up before the high port. Note that this match does not
handle multiple ports and port ranges. For more information about
this, look at the multiport match extension.
ICMP matches
These are the ICMP matches. These packets are even more ephemeral,
that is to say short lived, than UDP packets, in the sense that they
are connectionless.
The ICMP protocol is mainly used for error reporting and for
connection controlling and the like. ICMP is not a protocol
subordinated to the IP protocol, but more of a protocol that augments
the IP protocol and helps in handling errors.
The headers of ICMP packets are very similar to those of the IP
headers, but differ in a number of ways. The main feature of this
protocol is the type header, that tells us what the packet is for. One
example is, if we try to access an inaccessible IP address, we would
normally get an ICMP host unreachable in return. There is only one
ICMP specific match available for ICMP packets, and hopefully this
should suffice.
This match is implicitly loaded when we use the --protocol icmp match
and we get access to it automatically. Note that all the generic
matches can also be used, so that among other things we can match on
the source and destination addresses.
--icmp-type
- Example:
iptables -A INPUT -p icmp --icmp-type 8
- Explanation: This match is used to specify the ICMP type to match.
ICMP types can be specified either by their numeric values or by
their names. Numerical values are specified in RFC 792. To find a
complete listing of the ICMP name values, do an
iptables
--protocol icmp --help . This match can also be inverted with the !
sign e.g. --icmp-type ! 8 . Note that some ICMP types are obsolete,
and others again may be dangerous for an unprotected host since
they may, among other things, redirect packets to the wrong
places. The type and code may also be specified by their typename,
numeric type, and type/code as well. For example --icmp-type
network-redirect , --icmp-type 8 or --icmp-type 8/0 .
Please note that netfilter uses ICMP type 255 to match all ICMP types.
If we try to match this ICMP type, we will wind up with matching all
ICMP types.
SCTP matches
SCTP (Stream Control Transmission Protocol) is a relatively new
occurrence in the networking domain in comparison to the TCP and UDP
protocols. The implicit SCTP matches are loaded through adding the -p
sctp match to the command line of iptables.
The SCTP protocol was developed by some of the larger telecom and
switch/network manufacturers out there, and the protocol is
specifically well suited for large simultaneous transactions with high
reliability and high throughput.
--source-port or --sport
- Example:
iptables -A INPUT -p sctp --source-port 80
- Explanation: The
--source-port match is used to match an SCTP
packet based on the source port in the SCTP packet header. The
port can either be a single port, as in the example above, or a
range of ports specified as --source-port 20:100 , or it can also
be inverted with the ! sign. This looks, for example, like
--source-port ! 25 . The source port is an unsigned 16 bit integer,
so the maximum value is 65535 and the lowest value is 0.
--destination-port or --dport
- Example:
iptables -A INPUT -p sctp --destination-port 80
- Explanation: This match is used for the destination port of the
SCTP packets. All SCTP packets contain a destination port, just as
it does a source port, in the headers. The port can be either
specified as in the example above, or with a port range such as
--destination-port 6660:6670 . The command can also be inverted
with the ! sign, for example, --destination-port ! 80 . This
example would match all packets but those to port 80. The same
applies for destination ports as for source ports, the highest
port is 65535 and the lowest is 0.
--chunk-types
- Example:
iptables -A INPUT -p sctp --chunk-types any INIT,INIT_ACK
- Explanation: This matches the chunk type of the SCTP packet. The
match begins with the
--chunk-types keyword, and then continues
with a flag of either all , any or only . After this, we specify the
SCTP chunk types to match for. Additionally, the flags can take
some chunk flags as well. This is done for example in the form
--chunk-types any DATA:Be . The flags are specific for each SCTP
chunk type and must be valid according to list below. If an upper
case letter is used, the flag must be set, and if a lower case
flag is set it must be unset to match. The whole match can be
inversed by using an ! sign just after the --chunk-types keyword.
For example, --chunk-types ! any DATA:Be would match anything but
this pattern.
- Chunk types:
DATA , INIT , INIT_ACK , SACK , HEARTBEAT ,
HEARTBEAT_ACK , ABORT SHUTDOWN , SHUTDOWN_ACK , ERROR , COOKIE_ECHO ,
COOKIE_ACK , ECN_ECNE ECN_CWR , SHUTDOWN_COMPLETE , ASCONF ,
ASCONF_ACK .
- Flags: The following flags can be used with the
--chunk-types
match as seen above.
- DATA:
U or u for unordered bit, B or b for beginning fragment
bit and E or e for ending fragment bit.
- ABORT:
T or t for TCB destroy flag.
- SHUTDOWN_COMPLETE:
T or t for TCB destroyed flag.
Explicit matches
Explicit matches are those that have to be specifically loaded with
the -m or --match option. State matches, for example, demand the
directive -m <state> prior to entering the actual match that we want
to use.
Some of these matches may be protocol specific. Some may be
unconnected with any specific protocol — for example connection
states. These might be NEW (the first packet of an as yet
unestablished connection), ESTABLISHED (a connection that is already
registered in the kernel), RELATED (a new connection that was created
by an older, established one) etc.
A few may just have been evolved for testing or experimental purposes,
or just to illustrate what iptables is capable of. This in turn means
that not all of these matches may at first sight be of any use.
Nevertheless, it may well be that we personally will find a use for
specific explicit matches. And there are new ones coming along all the
time, with each new iptables release.
Whether we find a use for them or not depends on our imagination and
needs. The difference between implicitly loaded matches and explicitly
loaded ones, is that the implicitly loaded matches will automatically
be loaded when, for example, we match on the properties of TCP
packets, while explicitly loaded matches will never be loaded
automatically — it is up to us to discover and activate explicit
matches.
Addrtype match
The addrtype module matches packets based on the address type. The
address type is used inside the kernel to put different packets into
different categories.
With this match we will be able to match all packets based on their
address type according to the kernel. It should be noted that the
exact meaning of the different address types varies between the OSI
layer 3 protocols. The available types are as follows:
- ANYCAST: This is a one-to-many associative connection type, where
only one of the many receiver hosts actually receives the data.
This is for example implemented with the DNS (Domain Name System).
We have single address to a root server, but it actually has
several locations and our packet will be directed to the closest
working server. This one is not implemented in Linux IPv4.
- BLACKHOLE: A blackhole address will simply delete the packet and
send no reply. It works as a black hole in space basically. This is
configured in the routing tables of Linux.
- BROADCAST: A broadcast packet is a single packet sent to everyone
in a specific network in a one-to-many relation. This is for
example used in ARP (Address Resolution Protocol) resolution, where
a single packet is sent out requesting information on how to reach
a specific IP, and then the host that is authoritative replies with
the proper MAC (Media Access Control) address of that host.
- LOCAL: An address that is local to the host we are working on e.g.
127.0.0.1 .
- MULTICAST: A multicast packet is sent to several hosts using the
shortest distance and only one packet is sent to each waypoint
where it will be multiple copies for each host/router subscribing
to the specific multicast address. Commonly used in one way
streaming media such as video or sound.
- NAT: An address that has been NAT'ed by the kernel.
- PROHIBIT: Same as blackhole except that a prohibited answer will be
generated. In the IPv4 case, this means an ICMP communication
prohibited (type 3, code 13) answer will be generated.
- THROW: Special route in the Linux kernel. If a packet is thrown in
a routing table it will behave as if no route was found in the
table. In normal routing, this means that the packet will behave as
if it had no route. In policy routing, another route might be found
in another routing table.
- UNICAST: A real routable address for a single address. The most
common type of route.
- UNREACHABLE: This signals an unreachable address that we do not
know how to reach. The packets will be discarded and an ICMP Host
unreachable (type 3, code 1) will be generated.
- UNSPEC: An unspecified address that has no real meaning.
- XRESOLVE: This address type is used to send route lookups to
userspace applications which will do the lookup for the kernel. This
might be wanted to send ugly lookups to the outside of the kernel,
or to have an application do lookups for us. However, as of now
(April 2009) this one is not implemented in Linux.
The addrtype match is loaded by using the -m addrtype keyword. When
this is done, the extra match options below will be available for
usage:
--src-type
- Example:
iptables -A INPUT -m addrtype --src-type UNICAST
- Explanation: The
--src-type match option is used to match the
source address type of the packet. It can either take a single
address type or several separated by coma signs, for example
--src-type BROADCAST,MULTICAST . The match option may also be
inverted by adding an exclamation sign before it, for example !
--src-type BROADCAST,MULTICAST .
--dst-type
- Example:
iptables -A INPUT -m addrtype --dst-type UNICAST
- Explanation: The
--dst-type works exactly the same way as
--src-type and has the same syntax. The only difference is that it
will match packets based on their destination address type.
AH/ESP match
These matches are used for the IPSEC AH (Authentication Header) and
ESP (Encapsulating Security Payload) protocols. IPSEC is used to
create secure tunnels over an insecure Internet connection.
The AH and ESP protocols are used by IPSEC to create these secure
connections. The AH and ESP matches are really two separate matches,
but are both described here since they look very much alike, and both
are used in the same function.
To use the AH/ESP matches, you need to use -m ah to load the AH
matches, and -m esp to load the ESP matches.
--ahspi
- Example:
iptables -A INPUT -p 51 -m ah --ahspi 500
- Explanation: This matches the AH SPI (Security Parameter Index)
number of the AH packets. Please note that we must specify the
protocol as well, since AH runs on a different protocol than the
standard TCP, UDP or ICMP protocols. The SPI number is used in
conjunction with the source and destination address and the secret
keys to create a SA (Security Association). The SA uniquely
identifies each and every one of the IPSEC tunnels to all hosts.
The SPI is used to uniquely distinguish each IPSEC tunnel
connected between the same two peers. Using the
--ahspi match, we
can match a packet based on the SPI of the packets. This match can
match a whole range of SPI values by using a : sign, such as
500:520 , which will match the whole range of SPI's.
--espspi
- Example:
iptables -A INPUT -p 50 -m esp --espspi 500
- Explanation: The ESP counterpart SPI is used exactly the same way
as the AH variant. The match looks exactly the same, with the
esp/ah difference. Of course, this match can match a whole range
of SPI numbers as well as the AH variant of the SPI match, such as
--espspi 200:250 which matches the whole range of SPI's.
Comment match
The comment match is used to add comments inside the iptables ruleset
and the kernel. This can make it much easier to understand our ruleset
and thus ease debugging a lot.
For example, we could add comments documenting which bash function
added specific sets of rules to netfilter, and why. It should be noted
that this is not actually a match. The comment match is loaded using
the -m comment keywords. At this point the following options are
available:
--comment
- Example:
iptables -A INPUT -m comment --comment "A comment"
- Explanation: The
--comment option specifies the comment to
actually add to the rule within the kernel. The comment can be a
maximum of 256 characters.
Connmark match
The connmark match is used very much the same way as the mark match is
in the MARK target and match combination. The connmark match is used
to match marks which have been set on a connection with the CONNMARK
target.
To match a mark on the same packet as is the first to create the
connection marking, we must use the connmark match after the CONNMARK
target has set the mark on the first packet.
--mark
- Example:
iptables -A INPUT -m connmark --mark 12 -j ACCEPT
- Explanation: The mark option is used to match a specific mark
associated with a connection. The mark match must be exact, and if
we want to filter out unwanted flags from the connection mark
before actually matching anything, we can specify a mask that will
be added to the connection mark. For example, if we have a
connection mark set to 33 (10001 in binary) on a connection, and
want to match the first bit only, we would be able to run
something like
--mark 1/1 . The mask (00001) would be masked to
10001, so 10001 && 00001 equals 1, and then matched against the 1.
Conntrack match
The conntrack match is an extended version of the state match, which
makes it possible to match packets in a much more granular way. It
lets us look at information directly available in the
connection tracking system, without any additional layers, such as in
the state match.
There are a number of different matches put together in the conntrack
match, for several different fields in the connection tracking system.
These are compiled together into the list below. To load these
matches, we need to specify -m conntrack .
--ctstate
- Example:
iptables -A INPUT -p tcp -m conntrack --ctstate RELATED
- Explanation: This match is used to match the state of a packet,
according to the conntrack state. It is used to match pretty much
the same states as in the original state match. The valid entries
for this match are:
INVALID , ESTABLISHED , NEW , RELATED , SNAT and
DNAT . The entries can be used together with each other separated
by a comma. For example, -m conntrack --ctstate
ESTABLISHED,RELATED . It can also be inverted by putting a ! in
front of --ctstate e.g. -m conntrack ! --ctstate
ESTABLISHED,RELATED , which matches all but the ESTABLISHED and
RELATED states.
--ctproto
- Example:
iptables -A INPUT -p tcp -m conntrack --ctproto TCP
- Explanation: This matches the protocol, the same as
--protocol
does. It can take the same types of values, and is inverted using
the ! sign. For example, -m conntrack ! --ctproto TCP matches all
protocols but the TCP protocol.
--ctorigsrc
- Example:
iptables -A INPUT -p tcp -m conntrack --ctorigsrc 192.168.0.0/24
- Explanation:
--ctorigsrc matches based on the original source IP
specification of the conntrack entry that the packet is related
to. The match can be inverted by using a ! between the --ctorigsrc
and IP specification, such as --ctorigsrc ! 192.168.0.1 . It can
also take a netmask of the CIDR (Classless Inter-Domain Routing)
form, such as --ctorigsrc 192.168.0.0/24 .
--ctorigdst
- Example:
iptables -A INPUT -p tcp -m conntrack --ctorigdst 192.168.0.0/24
- Explanation: This match is used exactly as the
--ctorigsrc , except
that it matches on the destination field of the conntrack entry.
It has the same syntax in all other respects.
--ctreplsrc
- Example:
iptables -A INPUT -p tcp -m conntrack --ctreplsrc 192.168.0.0/24
- Explanation: The
--ctreplsrc match is used to match based on the
original conntrack reply source of the packet. Basically, this is
the same as the --ctorigsrc , but instead we match the reply source
expected of the upcoming packets. This target can, of course, be
inverted and address a whole range of addresses, just the same as
the the previous targets in this class.
--ctrepldst
- Example:
iptables -A INPUT -p tcp -m conntrack --ctrepldst 192.168.0.0/24
Explanation: The —ctrepldst= match is the same as the --ctreplsrc
match, with the exception that it matches the reply destination of
the conntrack entry that matched the packet. It too can be
inverted, and accept ranges, just as the --ctreplsrc match.
--ctstatus
- Example:
iptables -A INPUT -p tcp -m conntrack --ctstatus RELATED
- Explanation: This matches the status of the connection, as
described in the The state machine section. It can match the
following statuses:
NONE , The connection has no status at all.
EXPECTED , This connection is expected and was added by one of the
expectation handlers.
SEEN_REPLY , This connection has seen a reply but isn't assured
yet.
ASSURED , The connection is assured and will not be removed until
it times out or the connection is closed by either end. All the
statuses can also be inverted by using the ! sign. For example -m
conntrack ! --ctstatus ASSURED which will match all but the
ASSURED status.
--ctexpire
- Example:
iptables -A INPUT -p tcp -m conntrack --ctexpire 100:150
- Explanation: This match is used to match on packets based on how
long is left on the expiration timer of the conntrack entry,
measured in seconds. It can either take a single value and match
against, or a range such as in the example above. It can also be
inverted by using the
! sign, such as this -m conntrack !
--ctexpire 100 . This will match every expiration time, which does
not have exactly 100 seconds left to it.
Dscp match
This match is used to match on packets based on their DSCP
(Differentiated Services Code Point) field. This is documented in the
RFC 2638. The match is explicitly loaded by specifying -m dscp . The
match can take two mutually exclusive options, described below.
--dscp
- Example:
iptables -A INPUT -p tcp -m dscp --dscp 32
- Explanation: This option takes a DSCP value in either decimal or
in hex. If the option value is in decimal, it would be written
like 32 or 16 etc. If written in hex, it should be prefixed with
0x , like this: 0x20 . It can also be inverted by using the !
character, like this: -m dscp ! --dscp 32 .
--dscp-class
- Example:
iptables -A INPUT -p tcp -m dscp --dscp-class BE
- Explanation: The
--dscp-class match is used to match on the
DiffServ class of a packet. The values can be any of the BE , EF ,
AFxx or CSx classes as specified in the various RFC's. This match
can be inverted just the same way as the --dscp option.
Please note that the --dscp and --dscp-class options are mutually
exclusive and can not be used in conjunction with each other.
Ecn match
The ecn match is used to match on the different ECN (Explicit
Congestion Notification) fields in the TCP and IPv4 headers. ECN is
described in detail in the RFC 3168. The match is explicitly loaded by
using -m ecn in the command line. The ecn match takes three different
options as described below.
--ecn
- Example:
iptables -A INPUT -p tcp -m ecn --ecn-tcp-cwr
- Explanation: This match is used to match the CWR (Congestion
Window Received) bit, if it has been set. The CWR flag is set to
notify the other endpoint of the connection that they have
received an ECE (ECN-Echo), and that they have reacted to it. Per
default this matches if the CWR bit is set, but the match may also
be inversed using an exclamation point.
--ecn-tcp-ece
- Example:
iptables -A INPUT -p tcp -m ecn --ecn-tcp-ece
- Explanation: This match can be used to match the ECE (ECN-Echo)
bit. The ECE is set once one of the endpoints has received a
packet with the CE bit set by a router. The endpoint then sets the
ECE in the returning ACK packet, to notify the other endpoint that
it needs to slow down. The other endpoint then sends a CWR packet
as described in the
--ecn-tcp-cwr explanation. This matches per
default if the ECE bit is set, but may be inversed by using an
exclamation point.
--ecn-ip-ect
- Example:
iptables -A INPUT -p tcp -m ecn --ecn-ip-ect 1
- Explanation: The
--ecn-ip-ect match is used to match the ECT (ECN
Capable Transport) codepoints. The ECT codepoints has several
types of usage. Mainly, they are used to negotiate if the
connection is ECN capable by setting one of the two bits to 1. The
ECT is also used by routers to indicate that they are experiencing
congestion, by setting both ECT codepoints to 1. The ECT values
are all available in the in the ECN Field in IP table below. The
match can be inversed using an exclamation point, for example !
--ecn-ip-ect 2 which will match all ECN values but the ECT(0)
codepoint. The valid value range is 0-3 in iptables.
Hashlimit match
This is a modified version of the limit match. Instead of just setting
up a single token bucket, it sets up a hash table pointing to token
buckets for each destination IP, source IP, destination port and
source port tuple.
For example, we can set it up so that every IP address can receive a
maximum of 1000 packets per second, or we can say that every service
on a specific IP address may receive a maximum of 200 packets per
second. The hashlimit match is loaded by specifying the -m hashlimit
keywords.
Each rule that uses the hashlimit match creates a separate hashtable
which in turn has a specific max size and a maximum number of buckets.
This hash table contains a hash of either a single or multiple values.
The values can be any and/or all of destination IP, source IP,
destination port and source port. Each entry then points to a token
bucket that works as the limit match.
--hashlimit
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000/sec --hashlimit-mode dstip,dstport --hashlimit-name hosts
- Explanation: The
--hashlimit (this option is mandatory for all
hashlimit matches) specifies the limit of each bucket. In this
example the hashlimit is set to 1000/sec . We have set up the
hashlimit-mode to be dstip,dstport and destination 192.168.0.3 .
Hence, for every port or service on the destination host, it can
receive 1000 packets per second. This is the same setting as the
limit option for the limit match. The limit can take a /sec ,
/minute , /hour or /day postfix. If no postfix is specified, the
default postfix is per second.
--hashlimit-mode
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.0/16 -m hashlimit --hashlimit 1000/sec --hashlimit-mode dstip --hashlimit-name hosts
- Explanation: The
--hashlimit-mode option (this option is mandatory
for all hashlimit matches) specifies which values we should use as
the hash values. In this example, we use only the dstip
(destination IP) as the hashvalue. So, each host in the
192.168.0.0/16 network will be limited to receiving a maximum of
1000 packets per second in this case. The possible values for the
--hashlimit-mode is dstip (Destination IP), srcip (Source IP),
dstport (Destination port) and srcport (Source port). All of these
can also be separated by a comma sign to include more than one
hashvalue, such as for example --hashlimit-mode dstip,dstport .
--hashlimit-name
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts
- Explanation: This option (this option is mandatory for all
hashlimit matches) specifies the name that this specific hash will
be available as. It can be viewed inside the
/proc/net/ipt_hashlimit directory. The example above would be
viewable inside the /proc/net/ipt_hashlimit/hosts file. Only the
filename should be specified.
--hashlimit-burst
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts --hashlimit-burst 2000
- Explanation: This match is the same as the
--limit-burst in that
it sets the maximum size of the bucket. Each bucket will have a
burst limit, which is the maximum amount of packets that can be
matched during a single time unit.
--hashlimit-htable-size
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts --hashlimit-htable-size 500
- Explanation: This sets the maximum available buckets to be used.
In this example, it means that a maximum of 500 ports can be open
and active at the same time.
--hashlimit-htable-max
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts --hashlimit-htable-max 500
- Explanation: The
--hashlimit-htable-max sets the maximum number of
hashtable entries. This means all of the connections, including
the inactive connections that does not require any token buckets
for the moment.
--hashlimit-htable-gcinterval
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts --hashlimit-htable-gcinterval 1000
- Explanation: How often should the garbage collection function be
run. Generally speaking this value should be lower than the expire
value. The value is measured in milliseconds. If it is set too low
it will be taking up unnecessary system resources and processing
power, but if it is too high it can leave unused token buckets
lying around for too long and leaving other connections
impossible. In this example the garbage collector will run every
second.
--hashlimit-htable-expire
- Example:
iptables -A INPUT -p tcp --dst 192.168.0.3 -m hashlimit --hashlimit 1000 --hashlimit-mode dstip,dstport --hashlimit-name hosts --hashlimit-htable-expire 10000
- Explanation: This value sets after how long time an idle hashtable
entry should expire. If a bucket has been unused for longer than
this, it will be expired and the next garbage collection run will
remove it from the hashtable, as well as all of the information
pertaining to it.
Helper match
This is a rather unorthodox match in comparison to the other matches,
in the sense that it uses a little bit specific syntax.
The match is used to match packets, based on which conntrack helper
that the packet is related to. For example, let us look at the FTP
session. The control session is opened up, and the ports/connection is
negotiated for the data session within the control session. The
ip_conntrack_ftp helper module will find this information, and create
a related entry in the conntrack table.
Now, when a packet enters, we can see which protocol it was related
to, and we can match the packet in our ruleset based on which helper
was used. The match is loaded by using the -m helper keyword.
--helper
- Example:
iptables -A INPUT -p tcp -m helper --helper ftp-21
- Explanation: The
--helper option is used to specify a string
value, telling the match which conntrack helper to match. In the
basic form, it may look like --helper irc . This is where the
syntax starts to change from the normal syntax. We can also choose
to only match packets based on which port that the original
expectation was caught on. For example, the FTP Control session is
normally transferred over port 21, but it may as well be port 954
or any other port. We may then specify upon which port the
expectation should be caught on, like --helper ftp-954 .
IP range match
The IP range match is loaded by using the -m iprange keyword. It is
used to match IP ranges, just as the --source and --destination
matches are able to do as well.
However, this match adds a different kind of matching in the sense
that it is able to match in the manner of from IP / to IP, which the
--source and --destination matches are unable to. This may be needed
in some specific network setups, and it is rather a bit more flexible.
--src-range
- Example:
iptables -A INPUT -p tcp -m iprange --src-range 192.168.1.13-192.168.2.19
- Explanation: This matches a range of source IP addresses. The
range includes every single IP address from the first to the last,
so the example above includes everything from
192.168.1.13 to
192.168.2.19 . The match may also be inverted by adding an ! . The
above example would then look like -m iprange ! --src-range
192.168.1.13-192.168.2.19 , which would match every single IP
address, except the ones specified.
--dst-range
- Example:
iptables -A INPUT -p tcp -m iprange --dst-range 192.168.1.13-192.168.2.19
- Explanation: The
--dst-range works exactly the same as the
--src-range match, except that it matches destination IP's instead
of source IP's.
Length match
The length match is used to match packets based on their length. It is
very simple. If we want to limit the packet length for some strange
reason, or want to block ping-of-death-like behavior, we use the
length match.
--length
- Example:
iptables -A INPUT -p tcp -m length --length 1400:1500
- Explanation: The example
--length will match all packets with a
length between 1400 and 1500 bytes. The match may also be inversed
using the ! sign, like this: -m length ! --length 1400:1500 . It
may also be used to match only a specific length, removing the :
sign and onwards, like this: -m length --length 1400 . The range
matching is, of course, inclusive, which means that it includes
all packet lengths in between the values we specify.
Limit match
The limit match extension must be loaded explicitly with the -m limit
option. This match can, for example, be used to advantage to give
limited logging of specific rules etc.
For example, we could use this to match all packets that do not exceed
a given value, and after this value has been exceeded, limit logging
of the event in question.
Think of a time limit: We could limit how many times a certain rule
may be matched in a certain time frame, for example to lessen the
effects of DoS syn flood attacks. This is its main usage, but there
are more usages, of course. The limit match may also be inverted by
adding a ! flag in front of the limit match. It would then be
expressed as -m limit ! --limit 5/s . This means that all packets will
be matched after they have broken the limit.
-
To further explain the limit match, it is basically a token bucket
filter. Consider having a leaky bucket where the bucket leaks X
packets per time-unit. X is defined depending on how many matching
packets we get, so if we get 3 packets, the bucket leaks 3 packets per
that time-unit.
-
The
--limit option tells us how many packets to refill the bucket with
per time-unit, while the --limit-burst option tells us how big the
bucket is in the first place. So, setting --limit 3/minute
--limit-burst 5 , and then receiving 5 matches will empty the bucket.
After 20 seconds, the bucket is refilled with another token, and so on
until the --limit-burst is reached again or until they get used.
Consider the example below for further explanation of how this may
look.
- We set a rule with
-m limit --limit 5/second --limit-burst
10/second . The limit-burst token bucket is set to 10 initially.
Each packet that matches the rule uses a token.
- We get a packet that matches, 1-2-3-4-5-6-7-8-9-10, all within a
1/1000 of a second.
- The token bucket is now empty. Once the token bucket is empty, the
packets that qualify for the rule otherwise no longer match the
rule and proceed to the next rule if any, or hit the chain policy
e.g.
DROP .
- For each 1/5 second without a matching packet, the token count
goes up by 1, up to a maximum of 10. 1 second after receiving the
10 packets, we will once again have 5 tokens left.
- And of course, the bucket will be emptied by 1 token for each
packet it receives.
--limit
- Example:
iptables -A INPUT -m limit --limit 3/hour
- Explanation: This sets the maximum average match rate for the
limit match. We specify it with a number and an optional time
unit. The following time units are currently recognized:
/second ,
/minute , /hour , and /day . The default value here is 3 per hour, or
3/hour . This tells the limit match how many times to allow the
match to occur per time unit e.g. per minute.
--limit-burst
- Example:
iptables -A INPUT -m limit --limit-burst 5
- Explanation: This is the setting for the burst limit of the limit
match. It tells iptables the maximum number of tokens available in
the bucket when we start, or when the bucket is full. This number
gets decremented by one for every packet that arrives, down to the
lowest possible value, 1. The bucket will be refilled by the limit
value every time unit, as specified by the
--limit option. The
default --limit-burst value is 5.
Mac match
The MAC (Ethernet Media Access Control) match can be used to match
packets based on their MAC source address. This match can be used to
match packets on the source MAC address only as previously said. We
explicitly load it with the -m mac option.
--mac-source
- Example:
iptables -A INPUT -m mac --mac-source 00:00:00:00:00:01
- Explanation: This match is used to match packets based on their
MAC source address. The MAC address specified must be in the form
XX:XX:XX:XX:XX:XX , else it will not be legal. The match may be
reversed with an ! sign and would look like --mac-source !
00:00:00:00:00:01 . This would in other words reverse the meaning
of the match, so that all packets except packets from this MAC
address would be matched. Note that since MAC addresses are only
used on Ethernet type networks, this match will only be possible
to use for Ethernet interfaces. The MAC match is only valid in the
PREROUTING , FORWARD and INPUT chains.
Mark match
The mark match extension is used to match packets based on the marks
they have set. A mark is a special field, only maintained within the
kernel, that is associated with the packets as they travel through the
computer.
Marks may be used by different kernel routines for such tasks as
traffic shaping and filtering. As of today, there is only one way of
setting a mark in Linux, namely the MARK target in iptables. This was
previously done with the FWMARK target in ipchains, and this is why
people still refer to FWMARK in advanced routing areas.
The mark field is currently set to an unsigned integer, or 4294967296
possible values on a 32 bit system. In other words, we are probably
not going to run into this limit for quite some time.
--mark
- Example:
iptables -t mangle -A INPUT -m mark --mark 1
- Explanation: This match is used to match packets that have
previously been marked. Marks can be set with the
MARK target. All
packets traveling through netfilter get a special mark field
associated with them. Note that this mark field is not in any way
propagated, within or outside the packet. It stays inside the
computer that made it. If the mark field matches the mark, it is
a match. The mark field is an unsigned integer, hence there can be
a maximum of 4294967296 different marks. We may also use a mask
with the mark. The mark specification would then look like, for
example, --mark 1/1 . If a mask is specified, it is logically AND
ed with the mark specified before the actual comparison.
Multiport match
The multiport match extension can be used to specify multiple
destination ports and port ranges. Without the possibility this match
gives, we would have to use multiple rules of the same type, just to
match different ports.
We cannot use both standard port matching and multiport matching at
the same time, for example we cannot write: --sport 1024:63353 -m
multiport --dport 21,23,80 . This will simply not work. What in fact
happens, if we do, is that iptables honors the first element in the
rule, and ignores the multiport instruction.
--source-port
- Example:
iptables -A INPUT -p tcp -m multiport --source-port 22,53,80,110
- Explanation: This match matches multiple source ports. A maximum
of 15 separate ports may be specified. The ports must be comma
delimited, as in the above example. The match may only be used in
conjunction with the
-p tcp or -p udp matches. It is mainly an
enhanced version of the normal --source-port match.
--destination-port
- Example:
iptables -A INPUT -p tcp -m multiport --destination-port 22,53,80,110
- Explanation: This match is used to match multiple destination
ports. It works exactly the same way as the above mentioned source
port match, except that it matches destination ports. It too has a
limit of 15 ports and may only be used in conjunction with
-p tcp
and -p udp .
--port
- Example:
iptables -A INPUT -p tcp -m multiport --port 22,53,80,110
- Explanation: This match extension can be used to match packets
based both on their destination port and their source port. It
works the same way as the
--source-port and --destination-port
matches above. It can take a maximum of 15 ports and can only be
used in conjunction with -p tcp and -p udp . Note that the --port
match will only match packets coming in from and going to the same
port, for example, port 80 to port 80, port 110 to port 110 and so
on.
Owner match
The owner match extension is used to match packets based on the
identity of the process that created them.
The owner can be specified as the PID (Process Identifier) either of
the user who issued the command in question, that of the group, the
process, the session, or that of the command itself.
This extension was originally written as an example of what iptables
could be used for. The owner match only works within the OUTPUT chain,
for obvious reasons — it is pretty much impossible to find out any
information about the identity of the instance that sent a packet from
the other end, or where there is an intermediate hop to the real
destination.
Even within the OUTPUT chain it is not very reliable, since certain
packets may not have an owner. Notorious packets of that sort are
(among other things) the different ICMP responses. ICMP responses will
never match.
--cmd-owner
- Example:
iptables -A OUTPUT -m owner --cmd-owner httpd
- Explanation: This is the command owner match, and is used to match
based on the command name of the process that is sending the
packet. In the example,
httpd is matched. This match may also be
inverted by using an exclamation sign, for example -m owner !
--cmd-owner ssh .
--uid-owner
- Example:
iptables -A OUTPUT -m owner --uid-owner 500
- Explanation: This packet match will match if the packet was
created by the given UID (User ID). This could be used to match
outgoing packets based on who created them. One possible use would
be to block any other user than root from opening new connections
outside our packet filter. Another possible use could be to block
everyone but the http user from sending packets from the HTTP
port.
--gid-owner
- Example:
iptables -A OUTPUT -m owner --gid-owner 0
- Explanation: This match is used to match all packets based on
their GID (Group ID). This means that we match all packets based
on what group the user creating the packets is in. This could be
used to block all but the users in the network group from getting
out onto the Internet or, as described above, only to allow
members of the http group to create packets going out from the
HTTP port.
--pid-owner
- Example:
iptables -A OUTPUT -m owner --pid-owner 78
- Explanation: This match is used to match packets based on the PID
(Process Identifier) that was responsible for them. This match is
a bit harder to use, but one example would be only to allow PID 94
to send packets from the HTTP port (if the HTTP process is not
threaded, of course). Alternatively we could write a small script
that grabs the PID from a
ps output for a specific daemon and then
add a rule for it.
--sid-owner
- Example:
iptables -A OUTPUT -m owner --sid-owner 100
- Explanation: This match is used to match packets based on the SID
(Session ID) used by the program in question. The value of the
SID, or SID of a process, is that of the process itself and all
processes resulting from the originating process. These latter
could be threads, or a child of the original process. So, for
example, all of our HTTPD processes should have the same SID as
their parent process (the originating HTTPD process), if our HTTPD
is threaded (most HTTPDs are, Apache and Roxen for instance).
The pid, sid and command matching is broken in SMP kernels since they
use different process lists for each processor. It might be fixed in
the future however
Packet type match
The packet type match is used to match packets based on their type
i.e. are they destined to a specific person, to everyone or to a
specific group of machines or users. These three groups are generally
called unicast, broadcast and multicast. The match is loaded by using
-m pkttype .
--pkt-type
- Example:
iptables -A OUTPUT -m pkttype --pkt-type unicast
- Explanation: The
--pkt-type match is used to tell the packet type
match which packet type to match. It can either take unicast,
broadcast or multicast as an argument, as in the example. It can
also be inverted by using a ! like this: -m pkttype --pkt-type !
broadcast , which will match all other packet types.
Realm match
The realm match is used to match packets based on the routing realm
that they are part of.
Routing realms are used in Linux for complex routing scenarios and
setups such as when using BGP (Border Gateway Protocol) etc. The realm
match is loaded by adding the -m realm keyword to the commandline.
A routing realm is used in Linux to classify routes into logical
groups of routes. In most dedicated routers today, the RIB (Routing
Information Base) and the forwarding engine are very close to
eachother. Inside the kernel for example. Since Linux is not really a
dedicated routing system, it has been forced to separate its RIB and
FIB (Forwarding Information Base).
The RIB lives in userspace and the FIB lives inside kernelspace.
Because of this separation, it becomes quite resourceheavy to do quick
searches in the RIB. The routing realm is the Linux solution to this,
and actually makes the system more flexible and richer.
The Linux realms can be used together with BGP and other routing
protocols that delivers huge amounts of routes. The routing daemon can
then sort the routes by their prefix, aspath, or source for example, and
put them in different realms. The realm is numeric, but can also be
named through the /etc/iproute2/rt_realms file.
--realm
- Example:
iptables -A OUTPUT -m realm --realm 4
- Explanation: This option matches the realm number and optionally a
mask. If this is not a number, it will also try and resolve the
realm from the
/etc/iproute2/rt_realms file also. If a named realm
is used, no mask may be used. The match may also be inverted by
setting an exclamation sign, for example --realm ! cosmos .
Recent match
The recent match is a rather large and complex matching system, which
allows us to match packets based on recent events that we have
previously matched.
For example, if we would see an outgoing IRC connection, we could set
the IP addresses into a list of hosts, and have another rule that
allows identd requests back from the IRC server within 15 seconds of
seeing the original packet.
Before we can take a closer look at the match options, let us try and
explain a little bit how it works:
-
First of all, we use several different rules to accomplish the use of
the recent match. The recent match uses several different lists of
recent events. The default list being used is the
DEFAULT list. We
create a new entry in a list with the set option, so once a rule is
entirely matched (the set option is always a match), we also add an
entry in the recent list specified.
-
The list entry contains a timestamp, and the source IP address used in
the packet that triggered the set option. Once this has happened, we
can use a series of different recent options to match on this
information, as well as update the entries timestamp etc.
-
Finally, if we would for some reason want to remove a list entry, we
would do this using the
--remove match option from the recent match.
All rules using the recent match, must load the recent module (-m
recent ) as usual. Before we go on with an example of the recent
match, let's take a look at all the options:
--name
- Example:
iptables -A OUTPUT -m recent --name examplelist
- Explanation: The name option gives the name of the list to use.
Per default the
DEFAULT list is used, which is probably not what
we want if we are using more than one list.
--set
- Example:
iptables -A OUTPUT -m recent --set
- Explanation: This creates a new list entry in the named recent
list, which contains a timestamp and the source IP address of the
host that triggered the rule. This match will always return
success, unless it is preceded by a
! sign, in which case it will
return failure.
--rcheck
- Example:
iptables -A OUTPUT -m recent --name examplelist --rcheck
- Explanation: The
--rcheck option will check if the source IP
address of the packet is in the named list. If it is, the match
will return true, otherwise it returns false. The option may be
inverted by using the ! sign. In the later case, it will return
true if the source IP address is not in the list, and false if it
is in the list.
--update
- Example:
iptables -A OUTPUT -m recent --name examplelist --update
- Explanation: This match is true if the source combination is
available in the specified list and it also updates the last-seen
time in the list. This match may also be reversed by setting the
!
mark in front of the match. For example, ! --update .
--remove
- Example:
iptables -A INPUT -m recent --name example --remove
- Explanation: This match will try to find the source address of the
packet in the list, and returns true if the packet is there. It
will also remove the corresponding list entry from the list. The
command is also possible to inverse with the
! sign.
--seconds
- Example:
iptables -A INPUT -m recent --name example --check --seconds 60
- Explanation: This match is only valid together with the
--check
and --update matches. The --seconds match is used to specify how
long since the last seen column was updated in the recent list.
If the last seen column was older than this amount in seconds, the
match returns false. Other than this the recent match works as
normal, so the source address must still be in the list for a true
return of the match.
--hitcount
- Example:
iptables -A INPUT -m recent --name example --check --hitcount 20
- Explanation: The
--hitcount match must be used together with the
--check or --update matches and it will limit the match to only
include packets that have seen at least the hitcount amount of
packets. If this match is used together with the --seconds match,
it will require the specified hitcount packets to be seen in the
specific timeframe. This match may also be reversed by adding a !
sign in front of the match. Together with the --seconds match,
this means that a maximum of this amount of packets may have been
seen during the specified timeframe. If both of the matches are
inversed, then a maximum of this amount of packets may have been
seen during the last minumum of seconds.
--rttl
- Example:
iptables -A INPUT -m recent --name example --check --rttl
- Explanation: The
--rttl match is used to verify that the TTL value
of the current packet is the same as the original packet that was
used to set the original entry in the recent list. This can be
used to verify that people are not spoofing their source address
to deny others access to our servers by making use of the recent
match.
--rsource
- Example:
iptables -A INPUT -m recent --name example --rsource
- Explanation: The
--rsource match is used to tell the recent match
to save the source address and port in the recent list. This is
the default behavior of the recent match.
--rdest
- Example:
iptables -A INPUT -m recent --name example --rdest
- Explanation: The
--rdest match is the opposite of the --rsource
match in that it tells the recent match to save the destination
address and port to the recent list.
Below is a small sample script which demonstrates how the recent match can be
used:
#!/bin/bash
iptables -N http-recent
iptables -N http-recent-final
iptables -N http-recent-final1
iptables -N http-recent-final2
iptables -A INPUT -p tcp --dport 80 -j http-recent
# http-recent-final, has this connection been deleted from httplist or not?
#
iptables -A http-recent-final -p tcp -m recent --name httplist -j http-recent-final1
iptables -A http-recent-final -p tcp -m recent --name http-recent-final -j http-recent-final2
# http-recent-final1, this chain deletes the connection from the httplist
# and adds a new entry to the http-recent-final
#
iptables -A http-recent-final1 -p tcp -m recent --name httplist --tcp-flags SYN,ACK,FIN FIN,ACK --close -j ACCEPT
iptables -A http-recent-final1 -p tcp -m recent --name http-recent-final --tcp-flags SYN,ACK,FIN FIN,ACK --set -j ACCEPT
# http-recent-final2, this chain allows final traffic from non-closed host
# and listens for the final FIN and FIN,ACK handshake.
#
iptables -A http-recent-final2 -p tcp --tcp-flags SYN,ACK NONE -m recent --name http-recent-final --update -j ACCEPT
iptables -A http-recent-final2 -p tcp --tcp-flags SYN,ACK ACK -m recent --name http-recent-final --update -j ACCEPT
iptables -A http-recent-final2 -p tcp -m recent --name http-recent-final --tcp-flags SYN,ACK,FIN FIN --update -j ACCEPT
iptables -A http-recent-final2 -p tcp -m recent --name http-recent-final --tcp-flags SYN,ACK,FIN FIN,ACK --close -j ACCEPT
# http-recent chain, our homebrew state tracking system.
#
# Initial stage of the tcp connection SYN/ACK handshake
iptables -A http-recent -p tcp --tcp-flags SYN,ACK,FIN,RST SYN -m recent --name httplist --set -j ACCEPT
iptables -A http-recent -p tcp --tcp-flags SYN,ACK,FIN,RST SYN,ACK -m recent --name httplist --update -j ACCEPT
# Note that at this state in a connection, RST packets are legal (see RFC 793).
iptables -A http-recent -p tcp --tcp-flags SYN,ACK,FIN ACK -m recent --name httplist --update -j ACCEPT
# Middle stage of tcp connection where data transportation takes place.
iptables -A http-recent -p tcp --tcp-flags SYN,ACK NONE -m recent --name httplist --update -j ACCEPT
iptables -A http-recent -p tcp --tcp-flags SYN,ACK ACK -m recent --name httplist --update -j ACCEPT
# Final stage of tcp connection where one of the parties tries to close the
# connection.
iptables -A http-recent -p tcp --tcp-flags SYN,FIN,ACK FIN -m recent --name httplist --update -j ACCEPT
iptables -A http-recent -p tcp --tcp-flags SYN,FIN,ACK FIN,ACK -m recent --name httplist -j http-recent-final
# Special case if the connection crashes for some reason. Malicious intent or
# no.
iptables -A http-recent -p tcp --tcp-flags SYN,FIN,ACK,RST RST -m recent --name httplist --remove -j ACCEPT
Briefly, this is a poor replacement for the state engine available in
netfilter. This version was created with a http server in mind, but
will work with any TCP connection. First we have created two chains
named http-recent and http-recent-final .
The http-recent chain is used in the starting stages of the
connection, and for the actual data transmission, while the
http-recent-final chain is used for the last and final FIN/ACK , FIN
handshake.
-
This is a very bad replacement for the built in state engine and can
not handle all of the possibilities that the state engine can handle.
-
However, it is a good example of what can be done with the recent
match without being too specific. We should not use this example in
a real world environment. It is slow, handles special cases badly,
and should generally never be used more than as an example.
-
For example, it does not handle closed ports on connection,
asynchronous
FIN handshake (where one of the connected parties closes
down, while the other continues to send data), etc.
Let us follow a packet through the example ruleset. First a packet
enters the INPUT chain, and we send it to the http-recent chain.
- The first packet should be a
SYN packet, and should not have the
ACK,FIN or RST bits set. Hence it is matched using the --tcp-flags
SYN,ACK,FIN,RST SYN line. At this point we add the connection to
the httplist using -m recent --name httplist --set line . Finally
we accept the packet.
- After the first packet we should receive a
SYN/ACK packet to
acknowledge that the SYN packet was received. This can be matched
using the --tcp-flags SYN,ACK,FIN,RST SYN,ACK line. FIN and RST
should be illegal at this point as well. At this point we update
the entry in the httplist using -m recent --name httplist --update
and finally we ACCEPT the packet.
- By now we should get a final
ACK packet, from the original creator
of the connection, to acknowledge the SYN/ACK sent by the server.
SYN , FIN and RST are illegal at this point of the connection, so
the line should look like --tcp-flags SYN,ACK,FIN,RST ACK . We
update the list in exactly the same way as in the previous step,
and ACCEPT it.
- At this point the data transmission can start. The connection
should never contain any
SYN packet now, but it will contain ACK
packets to acknowledge the data packets that are sent. Each time
we see any packet like this, we update the list and ACCEPT the
packets.
- The transmission can be ended in two ways, the simplest is the
RST
packet. RST will simply reset the connection and it will die. With
FIN/ACK , the other endpoint answers with a FIN , and this closes
down the connection so that the original source of the FIN/ACK can
no longer send any data. The receiver of the FIN , will still be
able to send data, hence we send the connection to a final stage
chain to handle the rest.
- In the
http-recent-final chain we check if the packet is still in
the httplist , and if so, we send it to the http-recent-final1
chain. In that chain we remove the connection from the httplist
and add it to the http-recent-final list instead. If the
connection has already been removed and moved over to the
http-recent-final list, we send te packet to the
http-recent-final2 chain.
- In the final
http-recent-final2 chain, we wait for the non-closed
side to finish sending its data, and to close the connection from
their side as well. Once this is done, the connection is
completely removed.
As we can see, the recent list can become quite complex, but it will
give us a huge set of possibilities if need be. Still, we should try
and remember not to reinvent the wheel. If the ability we need is
already implemented, we should try to use it instead of creating our
own solution.
State match
The state match extension is used in conjunction with the
connection tracking code in the kernel.
The state match accesses the connection tracking state of the packets
from the state machine. This allows us to know in what state the
connection is, and works for pretty much all protocols, including
stateless protocols such as UDP and ICMP.
In all cases, there will be a default timeout for the connection and
it will then be dropped from the connection tracking database. This
match needs to be loaded explicitly by adding a -m state statement to
the rule. We will then have access to one new match called state . The
concept of state matching is covered more fully in the state machine
section.
--state
- Example:
iptables -A INPUT -m state --state RELATED,ESTABLISHED
- Explanation: This match option tells the state match what states
the packets must be in to be matched. There are currently 4 states
that can be used.
INVALID , ESTABLISHED , NEW and RELATED . INVALID
means that the packet is associated with no known stream or
connection and that it may contain faulty data or headers.
ESTABLISHED means that the packet is part of an already
established connection that has seen packets in both directions
and is fully valid. NEW means that the packet has or will start a
new connection, or that it is associated with a connection that
has not seen packets in both directions. Finally, RELATED means
that the packet is starting a new connection and is associated
with an already established connection. This could for example
mean an FTP data transfer, or an ICMP error associated with a TCP
or UDP connection. Note that the NEW state does not look for SYN
bits in TCP packets trying to start a new connection and should,
hence, not be used unmodified in cases where we have only one
packet filter and no load balancing between different packet filters.
However, there may be times where this could be useful.
Tcpmss match
The tcpmss match is used to match a packet based on the MSS (Maximum
Segment Size) in TCP. This match is only valid for SYN and SYN/ACK
packets. This match is loaded using -m tcpmss and takes only one
option.
--mss
- Example:
iptables -A INPUT -p tcp --tcp-flags SYN,ACK,RST SYN -m tcpmss --mss 2000:2500
- Explanation: The
--mss option tells the tcpmss match which MSS to
match. This can either be a single specific MSS value, or a range
of MSS values separated by a : . The value may also be inverted as
usual using the ! sign, as in the following example: -m tcpmss !
--mss 2000:2500 . This example will match all MSS values, except
for values in the range 2000 through 2500.
Tos match
The TOS (Type of Services) match can be used to match packets based on
their TOS field — it consists of 8 bits, and is located in the IP
header.
This match is loaded explicitly by adding -m tos to the rule. TOS is
normally used to inform intermediate hosts of the precedence of the
stream and its content (it does not really, but it informs of any
specific requirements for the stream, such as it having to be sent as
fast as possible, or it needing to be able to send as much payload as
possible).
How different routers and administrators deal with these values
depends. Most do not care at all, while others try their best to do
something good with the packets in question and the data they provide.
--tos
- Example:
iptables -A INPUT -p tcp -m tos --tos 0x16
- Explanation: This match is used as described above. It can match
packets based on their TOS field and their value. This could be
used, among other things together with the iproute2 and advanced
routing functions in Linux, to mark packets for later usage. The
match takes a hex or numeric value as an option, or possibly one
of the names resulting from
iptables -m tos -h . At the time of
writing it contained the following named values: Minimize-Delay 16
(0x10) , Maximize-Throughput 8 (0x08) , Maximize-Reliability 4
(0x04) , Minimize-Cost 2 (0x02) , and Normal-Service 0 (0x00) .
Minimize-Delay means to minimize the delay in putting the packets
through — example of standard services that would require this
include telnet, SSH (Secure Shell) and FTP-control.
Maximize-Throughput means to find a path that allows as big a
throughput as possible — a standard protocol would be FTP-data.
Maximize-Reliability means to maximize the reliability of the
connection and to use lines that are as reliable as possible — a
couple of typical examples are BOOTP and TFTP (Trivial File
Transfer Protocol). Minimize-Cost means minimizing the cost of
packets getting through each link to the client or server e.g. for
finding the route that costs the least to travel along. Examples
of normal protocols that would use this would be RTSP (Real Time
Stream Control Protocol) and other streaming video/radio
protocols. Finally, Normal-Service would mean any normal protocol
that has no special needs.
Ttl match
The TTL (Time to Live) match is used to match packets based on their
TTL field residing in the IP headers.
The TTL field contains 8 bits of data and is decremented once every
time it is processed by an intermediate machine between the client and
recipient host.
If the TTL reaches 0, an ICMP type 11 code 0 (TTL equals 0 during
transit) or code 1 (TTL equals 0 during reassembly) is transmitted to
the party sending the packet and informing it of the problem. This
match is only used to match packets based on their TTL, and not to
change anything. The latter, incidentally, applies to all kinds of
matches. To load this match, we need to add an -m ttl to the rule.
--ttl-eq
- Example:
iptables -A OUTPUT -m ttl --ttl-eq 60
- Explanation: This match option is used to specify the TTL value to
match exactly. It takes a numeric value and matches this value
within the packet. There is no inversion and there are no other
specifics to match. It could, for example, be used for debugging
our LAN e.g. LAN hosts that seem to have problems connecting to
hosts on the Internet, or to find possible ingress by Trojans etc.
The usage is relatively limited, however, its usefulness really
depends on our imagination. One example would be to find hosts
with bad default TTL values (could be due to a badly implemented
TCP/IP stack, or simply to misconfiguration).
--ttl-gt
- Example:
iptables -A OUTPUT -m ttl --ttl-gt 64
- Explanation: This match option is used to match any TTL greater
than the specified value. The value can be between 0 and 255 and
the match cannot be inverted. It could, for example, be used for
matching any TTL greater than a specific value and then force them
to a standardized value. This could be used to overcome some
simple forms of spying by ISP's to find out if we are running
multiple machines behind a packet filter, against their policies.
--ttl-lt
- Example:
iptables -A OUTPUT -m ttl --ttl-lt 64
- Explanation: The
--ttl-lt match is used to match any TTL smaller
than the specified value. It is pretty much the same as the
--ttl-gt match, but as already stated, it matches smaller TTL's.
It could also be used in the same way as the --ttl-gt match, or to
simply homogenize the packets leaving our network in general.
Unclean match
The unclean match takes no options and requires no more than
explicitly loading it when you want to use it.
The unclean match tries to match packets that seem malformed or
unusual, such as packets with bad headers or checksums and so on. This
could be used to DROP connections and to check for bad streams, for
example. However we should be aware that it could possibly break legal
connections — it is regarded as experimental and may not work at all
times, nor will it take care of all unclean packages or problems.
Targets and Jumps
The target/jumps tells the rule what to do with a packet that is a
perfect match with the match section of the rule.
There are a couple of basic targets, the ACCEPT and DROP targets,
which we will deal with first. However, before we do that, let us have
a brief look at how a jump is done.
Jump
The jump specification is done in exactly the same way as in the
target definition, except that it requires a chain within the same
table to jump to. To jump to a specific chain, it is of course a
prerequisite that chain exists.
As we have already explained, a user-specified chain is created with
the -N command . For example, let us say we create a chain in the
filter table called tcp_packets , like this iptables -N tcp_packets . We
could then add a jump target to it like this iptables -A INPUT -p tcp
-j tcp_packets .
We would then jump from the INPUT chain to the tcp_packets chain and
start traversing that chain. When/if we reach the end of that chain,
we get dropped back to the INPUT chain and the packet starts
traversing from the rule one step below where it jumped to the other
chain (tcp_packets in this case).
If a packet is ACCEPT ed within one of the sub-chains, it will be
ACCEPT ed in the superior chain also and it will not traverse any of
the superset chains any further.
However, do note that the packet will traverse all other chains in
the other tables in a normal fashion.
Target
Targets on the other hand specify an action to take on the packet in
question. We could for example, DROP or ACCEPT the packet depending on
what we want to do.
There are also a number of other actions we may want to take, which we
will describe further on in this section. Jumping to targets may incur
different results, as it were. Some targets will cause the packet to
stop traversing that specific chain and superior chains as described
above.
Good examples of such rules are DROP and ACCEPT . Packets that are
stopped, will not pass through any of the rules further on in the
chain or in superior chains. Other targets, may take an action on the
packet, after which the packet will continue passing through the rest
of the rules.
A good example of this would be the LOG , ULOG and TOS targets. These
targets can log the packets, mangle them and then pass them on to the
other rules in the same set of chains. We might, for example, want
this so that we in addition can mangle both the TTL (Time to Live) and
the TOS (Type of Services) values of a specific packet/stream.
Some targets will accept extra options (what TOS value to use etc),
while others do not necessarily need any options — but we can include
them if we want to (log prefixes, masquerade-to ports and so on).
ACCEPT target
This target needs no further options. As soon as the match
specification for a packet has been fully satisfied, and we specify
ACCEPT as the target, the rule is accepted and will not continue
traversing the current chain or any other ones in the same table.
Note however, that a packet that was accepted in one chain might still
travel through chains within other tables, and could still be dropped
there. There is nothing special about this target whatsoever, and it
does not require, nor have the possibility of, adding options to the
target. To use this target, we simply specify -j ACCEPT .
CLASSIFY target
The CLASSIFY target can be used to classify packets in such a way that
can be used by a couple of different qdiscs (Queue Disciplines). For
example, atm , cbq , dsmark , pfifo_fast , htb and the prio qdiscs. The
CLASSIFY target is only valid in the POSTROUTING chain of the mangle
table.
For more information about qdiscs and traffic controlling, please
visit the Linux Advanced Routing and Traffic Control HOW-TO webpage.
--set-class
- Example:
iptables -t mangle -A POSTROUTING -p tcp --dport 80 -j CLASSIFY --set-class 20:10
- Explanation: The
CLASSIFY target only takes one argument, the
--set-class . This tells the target how to class the packet. The
class takes 2 values separated by a coma sign, like this
MAJOR:MINOR .
CLUSTERIP target
The CLUSTERIP target is used to create simple clusters of nodes
answering to the same IP and MAC address in a round robin fashion.
This is a simple form of clustering where we set up a virtual IP on
all hosts participating in the cluster, and then use the CLUSTERIP on
each machine that is supposed to answer the requests.
The CLUSTERIP match requires no special load balancing hardware or
machines, it simply does its work on each machine part of the cluster of
machines. It is a very simple clustering solution and not suited for
large and complex clusters, neither does it have built in heartbeat
handling, but it should be easily implemented as a simple script.
All servers in the cluster uses a common Multicast MAC for a virtual
IP, and then a special hash algorithm is used within the CLUSTERIP
target to figure out who of the cluster participants should respond to
each connection.
A Multicast MAC is a MAC address starting with 01:00:5e as the first
24 bits — an example of a Multicast MAC would be 01:00:5e:00:00:20 .
The virtual IP can be any IP address, but must be the same on all
hosts as well.
Remember that the CLUSTERIP might break protocols such as SSH etc. The
connection will go through properly, but if we try the same time again
to the same host, we might be connected to another machine in the
cluster, with a different keyset, and hence our SSH client might
refuse to connect or give we errors.
For this reason, this will not work very well with some protocols, and
it might be a good idea to add separate addresses that can be used for
maintenance and administration. Another solution is to use the same
SSH keys on all hosts participating in the cluster (which I think is a
bad idea however). The cluster can be load-balanced with three kinds
of hashmodes.
- The first one is only source IP (sourceip)
- the second is source IP and source port (sourceip-sourceport) and
- the third one is source IP, source port and destination port
(sourceip-sourceport-destport).
The first one might be a good idea where we need to remember states
between connections, for example a web server with a shopping cart
that keeps state between connections, this load-balancing might become
a little bit uneven (different machines might get a higher loads than
others, etc.) since connections from the same source IP will go to the
same server.
The sourceip-sourceport hash might be a good idea where we want to get
the load-balancing a little bit more even, and where state does not
have to be kept between connections on each server.
For example, a large informational webpage with perhaps a simple
search engine might be a good idea here. The third and last hashmode,
sourceip-sourceport-destport, might be a good idea where we have a
machine with several services running that does not require any state to
be preserved between connections.
This might for example be a simple NTP, DNS and WWW server on the same
host. Each connection to each new destination would hence be
renegotiated — actually no negotiation goes on, it is basically just
a round robin system and each machine receives one connection each.
Each CLUSTERIP cluster gets a separate file in the
/proc/net/ipt_CLUSTERIP directory, based on the virtual IP of the
cluster. If the VIP is 192.168.0.5 for example, we could cat
/proc/net/ipt_CLUSTERIP/192.168.0.5 to see which nodes this machine is
answering for.
To make the machine answer for another machine, lets say node 2, we
add it using echo "+2" >> /proc/net/ipt_CLUSTERIP/192.168.0.5 . To
remove it, we run echo "-2" >> /proc/net/ipt_CLUSTERIP/192.168.0.5 .
--new
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new...
- Explanation: This creates a new
CLUSTERIP entry. It must be set on
the first rule for a virtual IP, and is used to create a new
cluster. If we have several rules connecting to the same CLUSTERIP
we can omit the --new keyword in any secondary references to the
same virtual IP.
--hashmode
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 443 -j CLUSTERIP --new --hashmode sourceip...
- Explanation: The
--hashmode keyword specifies the kind of hash
that should be created. The hashmode can be any of the following
three: sourceip , sourceip-sourceport and
sourceip-sourceport-destport . Basically, sourceip will give better
performance and simpler states between connections, but not as
good load-balancing between the machines. sourceip-sourceport will
give a slightly slower hashing and not as good to maintain states
between connections, but will give better load-balancing
properties. The last one may create very slow hashing that
consumes a lot of memory, but will on the other hand also create
very good load-balancing properties.
--clustermac
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20...
- Explanation: The MAC address that the cluster is listening to for
new connections. This is a shared multicast MAC address that all
the hosts are listening to.
--total-nodes
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --total-nodes 2...
- Explanation: The
--total-nodes keyword specifies how many hosts
are participating in the cluster and that will answer to requests.
--local-node
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
- Explanation: This is the number that this machine has in the
cluster. The cluster answers in a round-robin fashion, so once a
new connection is made to the cluster, the next machine answers,
and then the next after that, and so on.
--hash-init
- Example:
iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --hash-init 1234
- Explanation: Specifies a random seed for hash initialization.
CONNMARK target
The CONNMARK target is used to set a mark on a whole connection, much
the same way as the MARK target does. It can then be used together
with the connmark match to match the connection in the future.
For example, say we see a specific pattern in a header, and we do not
want to mark just that packet, but the whole connection. The CONNMARK
target is a perfect solution in that case.
The CONNMARK target is available in all chains and all tables, but
remember that the nat table is only traversed by the first packet in a
connection, so the CONNMARK target will have no effect if we try to
use it for subsequent packets after the first one in here.
--set-mark
- Example:
iptables -t nat -A PREROUTING -p tcp --dport 80 -j CONNMARK --set-mark 4
- Explanation: This option sets a mark on the connection. The mark
can be an unsigned long int, which means values between 0 and
4294967295l is valid. Each bit can also be masked by doing
--set-mark 12/8 . This will only allow the bits in the mask to be
set out of all the bits in the mark. In this example, only the 4th
bit will be set, not the 3rd. 12 translates to 1100 in binary, and
8 to 1000, and only the bits set in the mask are allowed to be
set. Hence, only the 4th bit, or 8, is set in the actual mark.
--save-mark
- Example:
iptables -t mangle -A PREROUTING --dport 80 -j CONNMARK --save-mark
- Explanation: The
--save-mark target option is used to save the
packet mark into the connection mark. For example, if we have set
a packet mark with the MARK target, we can then move this mark to
mark the whole connection with the --save-mark match. The mark can
also be masked by using the --mask option described further down.
--restore-mark
- Example:
iptables -t mangle -A PREROUTING --dport 80 -j CONNMARK --restore-mark
- Explanation: This target option restores the packet mark from the
connection mark as defined by the
CONNMARK . A mask can also be
defined using the --mask option as seen below. If a mask is set,
only the masked options will be set. Note that this target option
is only valid for use in the mangle table.
--mask
- Example:
iptables -t mangle -A PREROUTING --dport 80 -j CONNMARK --restore-mark --mask 12
- Explanation: The
--mask option must be used in unison with the
--save-mark and --restore-mark options. The --mask option
specifies an and-mask that should be applied to the mark values
that the other two options will give. For example, if the restored
mark from the above example would be 15, it would mean that the
mark was 1111 in binary, while the mask is 1100. 1111 and 1100
equals 1100.
CONNSECMARK target
The CONNSECMARK target sets a SELinux security context mark to or from
a packet mark. The target is only valid in the mangle table and is
used together with the SECMARK target, where the SECMARK target is
used to set the original mark, and then the CONNSECMARK is used to set
the mark on the whole connection.
SELinux is beyond the scope of this page, but basically it is an
addition of MAC (Mandatory Access Control) to Linux. This is more
finegrained than the original security systems of most Linux and Unix
security controls.
Each object can have security attributes, or security context,
connected to it, and these attributes are then matched to eachother
before allowing or denying a specific task to be performed. This
target will allow a security context to be set on a connection.
--save
- Example:
iptables -t mangle -A PREROUTING -p tcp --dport 80 -j CONNSECMARK --save
- Explanation: Save the security context mark from the packet to the
connection if the connection is not marked since before.
--restore
- Example:
iptables -t mangle -A PREROUTING -p tcp --dport 80 -j CONNSECMARK --restore
- Explanation: If the packet has no security context mark set on it,
the
--restore option will set the security context mark associated
with the connection on the packet.
DNAT target
The DNAT (Destination Network Address Translation) target is used to
do DNAT, which means that it is used to rewrite the Destination IP
address of a packet.
If a packet is matched, and this is the target of the rule, the
packet, and all subsequent packets in the same stream will be
translated, and then routed on to the correct device, machine or network.
This target can be extremely useful, for example,when we have a machine
running our web server inside a LAN, but no real IP to give it that
will work on the Internet. We could then tell the packet filter to
forward all packets going to its own HTTP port, on to the real web
server within the LAN.
We may also specify a whole range of destination IP addresses, and the
DNAT mechanism will choose the destination IP address at random for
each stream. Hence, we will be able to deal with a kind of load
balancing by doing this.
Note that the DNAT target is only available within the PREROUTING and
OUTPUT chains in the nat table, and any of the chains called upon from
any of those listed chains. Note that chains containing DNAT targets
may not be used from any other chains, such as the POSTROUTING chain.
--to-destination
- Example:
iptables -t nat -A PREROUTING -p tcp -d 15.45.23.67 --dport 80 -j DNAT --to-destination 192.168.1.1-192.168.1.10
- Explanation: The
--to-destination option tells the DNAT mechanism
which destination IP to set in the IP header, and where to send
packets that are matched. The above example would send on all
packets destined for IP address 15.45.23.67 to a range of LAN
IP's, namely 192.168.1.1 through 192.168.1.10 . Note, as described
previously, that a single stream will always use the same machine,
and that each stream will randomly be given an IP address that it
will always be destined for, within that stream. We could also
have specified only one IP address, in which case we would always
be connected to the same machine. Also note that we may add a port or
port range to which the traffic would be redirected to. This is
done by adding, for example, an :80 statement to the IP addresses
to which we want to DNAT the packets. A rule could then look like
--to-destination 192.168.1.1:80 for example, or like
--to-destination 192.168.1.1:80-100 if we wanted to specify a port
range. As we can see, the syntax is pretty much the same for the
DNAT target, as for the SNAT target even though they do two
totally different things. Do note that port specifications are
only valid for rules that specify the TCP or UDP protocols with
the --protocol option.
Since DNAT requires quite a lot of work to work properly, I have
decided to add a larger explanation on how to work with it. Let us
take a brief example on how things would be done normally. We want to
publish our website via our Internet connection. We only have one IP
address, and the HTTP server is located on our internal network.
Our packet filter has the external IP address $INET_IP , and our HTTP
server has the internal IP address $HTTP_IP and finally the packet
filter has the internal IP address $LAN_IP . The first thing to do is
to add the following simple rule to the PREROUTING chain in the nat
table: iptables -t nat -A PREROUTING --dst $INET_IP -p tcp --dport 80
-j DNAT --to-destination $HTTP_IP .
Now, all packets from the Internet going to port 80 on our packet
filter are redirected (or DNAT'ed) to our internal HTTP server. If we
test this from the Internet, everything should work just perfect.
So, what happens if we try connecting from a machine on the same local
network as the HTTP server? It will simply not work. This is a problem
with routing really. We start out by dissecting what happens in a
normal case. The external box has IP address $EXT_BOX , to maintain
readability.
- The IP packet leaves the connecting machine going to
$INET_IP
and source $EXT_BOX .
- The IP packet reaches the packet filter.
- Packet Filter DNAT's the packet and runs the packet through all
different chains etc.
- Packet leaves the packet filter and travels to the
$HTTP_IP .
- Packet reaches the HTTP server, and the HTTP box replies back
through the packet filter, if that is the box that the routing
database has entered as the gateway for
$EXT_BOX . Normally, this
would be the default gateway of the HTTP server.
- Packet Filter Un-DNAT's the packet again, so the packet looks as
if it was replied to from the packet filter itself.
- Reply packet travels as usual back to the client
$EXT_BOX .
Now, we will consider what happens if the packet was instead generated
by a client on the same network as the HTTP server itself. The client
has the IP address $LAN_BOX , while the rest of the machines maintain
the same settings.
- The IP packet leaves
$LAN_BOX to $INET_IP .
- The packet reaches the packet filter.
The packet gets DNAT'ed, and all other required actions are
taken, however, the packet is not SNAT'ed, so the same source IP
address is used on the packet.
- The packet leaves the packet filter and reaches the HTTP server.
- The HTTP server tries to respond to the packet, and sees in the
routing databases that the packet came from a local box on the
same network, and hence tries to send the packet directly to the
original source IP address (which now becomes the destination IP
address).
- The packet reaches the client, and the client gets confused
since the return packet does not come from the machine that it
sent the original request to. Hence, the client drops the reply
packet, and waits for the real reply.
The simple solution to this problem is to SNAT all packets entering
the packet filter and leaving for a machine or IP that we know we
DNAT.
For example, consider the above rule. We SNAT the packets entering our
packet filter that are destined for $HTTP_IP port 80 so that they look as
if they came from $LAN_IP . This will force the HTTP server to send the
packets back to our packet filter, which Un-DNAT's the packets and sends
them on to the client. The rule would look something like this:
iptables -t nat -A POSTROUTING -p tcp --dst $HTTP_IP --dport 80 -j
SNAT --to-source $LAN_IP .
Remember that the POSTROUTING chain is processed last of the chains,
and hence the packet will already be DNAT'ed once it reaches that
specific chain. This is the reason that we match the packets based on
the internal address.
This last rule will seriously harm our logging, so it is really
advisable not to use this method, but the whole example is still a
valid one.
What will happen is this: the IP packet comes from the Internet, gets
SNAT'ed and DNAT'ed, and finally hits the HTTP server (for example).
The HTTP server now only sees the request as if it was coming from the
packet filter, and hence logs all requests from the internet as if
they came from the packet filter.
This can also have even more severe implications. Take an SMTP (Simple
Mail Transfer Protocol) server on the LAN, that allows requests from
the internal network, and we have our packet filter set up to forward
SMTP traffic to it. We have now effectively created an open relay SMTP
server, with horrenduously bad logging.
One solution to this problem is to simply make the SNAT rule even more
specific in the match part, and to only work on packets that come in
from our LAN interface i.e. add a --src $LAN_IP_RANGE to the whole
command as well. This will make the rule only work on streams that
come in from the LAN, and hence will not affect the Source IP, so the
logs will look correct, except for streams coming from our LAN.
We will, be better off solving these problems by either setting up a
separate DNS server for our LAN, or to actually set up a separate DMZ
(Demilitarized Zone), the latter being preferred if we have the money.
There is one final aspect to this whole scenario. What if the packet
filter itself tries to access the HTTP server, where will it go? As it
looks now, it will unfortunately try to get to its own HTTP server,
and not the server residing on $HTTP_IP . To get around this, we need
to add a DNAT rule in the OUTPUT chain as well. Following the above
example, this should look something like the following: iptables -t
nat -A OUTPUT --dst $INET_IP -p tcp --dport 80 -j DNAT
--to-destination $HTTP_IP .
Adding this final rule should get everything up and running. All
separate networks that do not sit on the same net as the HTTP server
will run smoothly, all machines on the same network as the HTTP server
will be able to connect and finally, the packet filter will be able to
do proper connections as well. Now everything works and no problems
should arise.
Everyone should realize that these rules only affect how the packet is
DNAT'ed and SNAT'ed properly. In addition to these rules, we may also
need extra rules in the filter table (FORWARD chain) to allow the
packets to traverse through those chains as well.
Do not forget that all packets have already gone through the
PREROUTING chain, and should hence have their destination addresses
rewritten already by DNAT.
DROP target
The DROP target does just what it says, it drops packets dead and will
not carry out any further processing.
A packet that matches a rule perfectly and is then dropped will be
blocked. Note that this action might in certain cases have an unwanted
effect, since it could leave dead sockets around on either machine.
A better solution in cases where this is likely would be to use the
REJECT target, especially when we want to block port scanners from
getting too much information, such as on filtered ports and so on.
Also note that if a packet has the DROP action taken on it in a
subchain, the packet will not be processed in any of the main chains
either in the present or in any other table i.e. the packet simply
vanished.
As we have seen previously, the target will not send any kind of
information in either direction, nor to intermediaries such as
routers.
DSCP target
This is a target (there is also the DSCP match) that changes the DSCP
(Differentiated Services Code Point) marks inside a packet.
The DSCP target is able to set any DSCP value inside a TCP packet,
which is a way of telling routers the priority of the packet in
question. For more information about DSCP, look at the RFC 2474.
Basically, DSCP is a way of differentiating different services into
separate categories, and based on this, give them different priority
through the routers. This way, we can give interactive TCP sessions
(such as XMPP, telnet, SSH, POP3) a very high fast connection, that
may not be very suitable for large bulk transfers.
If on the other hand the connection is one of low importance (SMTP, or
whatever we classify as low priority), we could send it over a large
bulky network with worse latency than the other network, that is
cheaper to utilize than the faster and lower latency connections.
--set-dscp
- Example:
iptables -t mangle -A FORWARD -p tcp --dport 80 -j DSCP --set-dscp 1
- Explanation: This sets the DSCP value to the specified value. The
values can be set either via class (see below) or with the
--set-dscp , which takes either an integer value, or a hex value.
--set-dscp-class
- Example:
iptables -t mangle -A FORWARD -p tcp --dport 80 -j DSCP --set-dscp-class EF
- Explanation: This sets the DSCP field according to a predefined
DiffServ class. Some of the possible values are
EF , BE and the
CSxx and AFxx values available. Do note that the --set-dscp-class
and --set-dscp commands are mutually exclusive, which means we
cannot use both of them in the same command! For more information
go here.
ECN target
Simply put, the ECN (Explicit Congestion Notification) target can be
used to reset the ECN bits from the IPv4 header, or to put it
correctly, reset them to 0 at least (there is also the ECN match).
Since ECN is a relatively new thing on the net, there are problems
with it. For example, it uses 2 bits that are defined in the original
RFC for the TCP protocol to be 0. Some routers and other Internet
appliances will not forward packets that have these bits set to 1.
If we want to make use of at least parts of the ECN functionality from
our machines, we could for example reset the ECN bits to 0 for specific
networks that we know we are having troubles reaching because of ECN.
Please do note that it is not possible to turn ECN on in the middle of
a stream. It is not allowed according to the RFC's, and it is not
possible anyway. Both endpoints of the stream must negotiate ECN. If
we turn it on, then one of the machines is not aware of it, and cannot
respond properly to the ECN notifications.
--ecn-tcp-remove
- Example:
iptables -t mangle -A FORWARD -p tcp --dport 80 -j ECN --ecn-tcp-remove
- Explanation: The ECN target only takes one argument, the
--ecn-tcp-remove argument. This tells the target to remove the ECN
bits inside the TCP headers.
LOG target options
The LOG target is specially designed for logging detailed information
about packets.
These could, for example, be considered as illegal. Or, logging can be
used purely for bug hunting and error finding. The LOG target will
return specific information on packets, such as most of the IP headers
and other information considered interesting. It does this via the
kernel logging facility, normally syslogd .
This information may then be read directly with dmesg , or from the
syslogd logs, or with other programs or applications. This is an
excellent target to use to debug our rule-sets, so that we can see
what packets go where and what rules are applied on what packets.
Note as well that it could be a really great idea to use the LOG
target instead of the DROP target while we are testing a rule we are
not 100% sure about on a production packet filter, since a syntax
error in the rule-sets could otherwise cause severe connectivity
problems for our users.
Also note that the ULOG target may be interesting if we are using
really extensive logging, since the ULOG target has support for direct
logging to MySQL databases and suchlike.
Note that if we get undesired logging direct to consoles, this is not
an iptables or netfilter problem, but rather a problem caused by our
syslogd configuration in /etc/syslog.conf .
We may also need to tweak our dmesg settings. dmesg is the command
that changes which errors from the kernel that should be shown on the
console. dmesg -n 1 should prevent all messages from showing up on the
console, except panic messages. The dmesg message levels matches
exactly the syslogd levels, and it only works on log messages from the
kernel facility.
The LOG target currently takes five options that could be of interest
if we have specific information needs, or want to set different
options to specific values:
--log-level
- Example:
iptables -A FORWARD -p tcp -j LOG --log-level debug
- Explanation: This is the option to tell netfilter and syslog which
log level to use. For a complete list of log levels take a look at
man 5 syslog.conf . Normally there are the following log levels, or
priorities as they are normally referred to: debug , info , notice ,
warning , warn , err , error , crit , alert , emerg and panic . The
keyword error is the same as err , warn is the same as warning and
panic is the same as emerg . Note that all three of these are
deprecated i.e. we should not use error , warn and panic . The
priority defines the severity of the message being logged. All
messages are logged through the kernel facility i.e. setting
kern.=info /var/log/iptables in /etc/syslog.conf and
then letting all our LOG messages in iptables use log level info,
would make all messages appear in the /var/log/iptables file. Note
that there may be other messages here as well from other parts of
the kernel that uses the info priority.
--log-prefix
- Example:
iptables -A INPUT -p tcp -j LOG --log-prefix "INPUT packets"
- Explanation: This option tells iptables to prefix all log messages
with a specific prefix, which can then easily be combined with
grep or other tools to track specific problems and output from
different rules. The prefix may be up to 29 letters long,
including white-spaces and other special symbols.
--log-tcp-sequence
- Example:
iptables -A INPUT -p tcp -j LOG --log-tcp-sequence
- Explanation: This option will log the TCP Sequence numbers,
together with the log message. The TCP Sequence numbers are
special numbers that identify each packet and where it fits into a
TCP sequence, as well as how the stream should be reassembled.
Note that this option constitutes a security risk if the logs are
readable by unauthorized users.
--log-tcp-options
- Example:
iptables -A FORWARD -p tcp -j LOG --log-tcp-options
- Explanation: The
--log-tcp-options option logs the different
options from the TCP packet headers and can be valuable when
trying to debug what could go wrong, or what has actually gone
wrong. This option does not take any variable fields or anything
like that, just as most of the LOG options do not.
--log-ip-options
- Example:
iptables -A FORWARD -p tcp -j LOG --log-ip-options
- Explanation: The
--log-ip-options option will log most of the IP
packet header options. This works exactly the same as the
--log-tcp-options option, but instead works on the IP options.
These logging messages may be valuable when trying to debug or
track specific culprits, as well as for debugging — in just the
same way as the previous option.
MARK target
The MARK target, next to the mark match, is used to set netfilter mark
values that are associated with specific packets.
This target is only valid in the mangle table and will not work with
any other table. The MARK values may be used in conjunction with the
advanced routing capabilities in Linux to send different packets
through different routes and to tell them to use different queue
disciplines (qdisc), etc.
Note that the mark value is not set within the actual packet, but is a
value that is associated within the kernel with the packet. In other
words, we cannot set a MARK for a packet and then expect the MARK
still to be there on another machine. If this is what we want, we will be
better off with the TOS (Type of Services) target which will mangle
the TOS value in the IP header.
--set-mark
- Example:
iptables -t mangle -A PREROUTING -p tcp --dport 22 -j MARK --set-mark 2
- Explanation: The
--set-mark option is required to set a mark. The
--set-mark match takes an integer value. For example, we may set
mark 2 on a specific stream of packets, or on all packets from a
specific machine and then do advanced routing on that machine, to
decrease or increase the network bandwidth, etc.
MASQUERADE target
The MASQUERADE target is used basically the same as the SNAT target,
but it does not require any --to-source option. The reason for this is
that the MASQUERADE target was made to work with, for example, dial-up
connections, or DHCP (Dynamic Host Configuration Protocol)
connections, which gets dynamic IP addresses when connecting to the
network in question.
This means that we should only use the MASQUERADE target with
dynamically assigned IP connections, which we do not know the actual
address of at all times. If we have a static IP connection, we should
instead use the SNAT target.
When we masquerade a connection, it means that we set the IP address
used on a specific network interface instead of the --to-source
option, and the IP address is automatically grabbed from the
information about the specific interface.
The MASQUERADE target also has the effect that connections are
forgotten when an interface goes down, which is extremely good if we,
for example, kill a specific interface.
If we would have used the SNAT target, we may have been left with a
lot of old connection tracking data, which would be lying around for
days, swallowing up useful connection tracking memory. This is, in
general, the correct behavior when dealing with dial-up lines that are
probably assigned a different IP every time they are brought up. In
case we are assigned a different IP, the connection is lost anyway,
and it is unnecessary to keep the entry around.
It is still possible to use the MASQUERADE target instead of SNAT even
though we do have a static IP, however, it is not favorable since it
will add extra overhead, and there may be inconsistencies in the
future which will thwart our existing scripts and render them
unusable.
Note that the MASQUERADE target is only valid within the POSTROUTING
chain in the nat table, just as the SNAT target. The MASQUERADE target
takes one option specified below, which is optional.
--to-ports
- Example:
iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE --to-ports 1024-31000
- Explanation: The
--to-ports option is used to set the source port
or ports to use on outgoing packets. Either we can specify a
single port like --to-ports 1025 or we may specify a port range as
--to-ports 1024-3000 . In other words, the lower port range
delimiter and the upper port range delimiter separated with a
hyphen. This alters the default SNAT port-selection as described
in the SNAT target section. The --to-ports option is only valid if
the rule match section specifies the TCP or UDP protocols with the
--protocol match.
MIRROR target
Be warned, the MIRROR is dangerous and was only developed as an
example code of the new conntrack and NAT code. It can cause dangerous
things to happen, and very serious DoS (Denial of Service) attacks
will be possible if used improperly. It was removed from 2.5 and
2.6 kernels due to its bad security implications!
The MIRROR target is an experimental and demonstration target only,
and we are warned against using it, since it may result in really bad
loops hence, among other things, resulting in serious DoS.
The MIRROR target is used to invert the source and destination fields
in the IP header, and then to retransmit the packet. This can cause
some really funny effects, and I will bet that, thanks to this target,
not just one red faced cracker has cracked his own box by now.
The effect of using this target is stark, to say the least. Let's say
we set up a MIRROR target for port 80 at computer A. If machine B were to
come from yahoo.com, and try to access the HTTP server at machine A, the
MIRROR target would return the yahoo machine's own web page (since this
is where the request came from).
Note that the MIRROR target is only valid within the INPUT , FORWARD
and PREROUTING chains, and any user-specified chains which are called
from those chains.
Also note that outgoing packets resulting from the MIRROR target are
not seen by any of the normal chains in the filter , nat or mangle
tables, which could give rise to loops and other problems. This could
make the target the cause of unforeseen headaches.
For example, a machine might send a spoofed packet to another machine that
uses the MIRROR command with a TTL (Time to Live) of 255, at the same
time spoofing its own packet, so as to seem as if it comes from a
third machine that uses the MIRROR command. The packet will then bounce
back and forth incessantly, for the number of hops there are to be
completed.
If there is only 1 hop, the packet will jump back and forth 240-255
times. Not bad for a cracker, in other words, to send 1500 bytes of
data and eat up 380 kbyte of our connection. Note that this is a best
case scenario for the cracker or script kiddie, whatever we want to
call them.
NETMAP target
NETMAP is a new implementation of the SNAT and DNAT targets where the
host part of the IP address is not changed. It provides a 1:1 NAT
function for whole networks which is not available in the standard
SNAT and DNAT functions.
For example, lets say we have a network containing 254 machines using
private IP addresses (a /24 network), and we just got a new /24
network of public IP's. Instead of walking around and changing the IP
of each and every one of the machines, we would be able to simply use
the NETMAP target like -j NETMAP -to 10.5.6.0/24 and Et voilà, all the
machines are seen as 10.5.6.x when they leave the packet filter. For
example, 192.168.1.26 would become 10.5.6.26 .
--to
- Example:
iptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j NETMAP --to 10.5.6.0/24
- Explanation: This is the only option of the NETMAP target. In the
above example, the
192.168.1.x machines will be directly translated
into 10.5.6.x .
NFQUEUE target
The NFQUEUE target is used much the same way as the QUEUE target, and
is basically an extension of it. The NFQUEUE target allows for sending
packets for separate and specific queues. The queue is identified by a
16-bit ID. This target requires the nfnetlink_queue kernel support to
run.
--queue-num
- Example:
iptables -t nat -A PREROUTING -p tcp --dport 80 -j NFQUEUE --queue-num 30
- Explanation: The
--queue-num option specifies which queue to use
and to send the queued data to. If this option is skipped, the
default queue 0 is used. The queue number is a 16 bit unsigned
integer, which means it can take any value between 0 and 65535.
The default 0 queue is also used by the QUEUE target.
NOTRACK target
This target is used to turn off connection tracking for all packets
matching this rule. The target is only valid inside the raw table.
The target takes no options and is very easy to use. Match the packets
we wish to not track, and then set the NOTRACK target on the rules
matching the packets we do not wish to track.
QUEUE target
The QUEUE target is used to queue packets to userspace programs and
applications. It is used in conjunction with programs or utilities
that are extraneous to netfilter and may be used, for example, with
network accounting, or for specific and advanced applications which
proxy or filter packets.
As of kernel 2.6.14 the behavior of netfilter has changed. A new
system for talking to the QUEUE has been deviced, called the
nfnetlink_queue . The QUEUE target is basically a pointer to the
NFQUEUE 0 nowadays. For programming questions take a look at the
nfnetlink_queue.ko module.
REDIRECT target
The REDIRECT target is used to redirect packets and streams to the
machine itself.
This means that we could for example REDIRECT all packets destined for
the HTTP ports to an HTTP proxy like squid, on our own machine. Locally
generated packets are mapped to the 127.0.0.1 address.
In other words, this rewrites the destination address to our own machine
for packets that are forwarded, or something alike. The REDIRECT
target is extremely good to use when we want, for example, transparent
proxying, where the LAN machines do not know about the proxy at all.
Note that the REDIRECT target is only valid within the PREROUTING and
OUTPUT chains of the nat table. It is also valid within
user-specified chain that are only called from those chains, and
nowhere else. The REDIRECT target takes only one option, as described
below:
--to-ports
- Example:
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080
- Explanation: The
--to-ports option specifies the destination port,
or port range, to use. Without the --to-ports option, the
destination port is never altered. This is specified, as above,
--to-ports 8080 in case we only want to specify one port. If we
would want to specify a port range, we would do it like --to-ports
8080-8090 , which tells the REDIRECT target to redirect the packets
to the ports 8080 through 8090. Note that this option is only
available in rules specifying the TCP or UDP protocol with the
--protocol matcher, since it would not make any sense anywhere
else.
REJECT target
The REJECT target works basically the same as the DROP target, but it
also sends back an error message to the machine sending the packet that
was blocked.
The REJECT target is as of today only valid in the INPUT , FORWARD and
OUTPUT chains or their subchains. After all, these would be the only
chains in which it would make any sense to put this target. Note that
all chains that use the REJECT target may only be called by the INPUT ,
FORWARD , and OUTPUT chains, else they will not work.
There is currently only one option which controls the nature of how
this target works, though this may in turn take a huge set of
variables. Most of them are fairly easy to understand, if we have a
basic knowledge of TCP/IP:
--reject-with
- Example:
iptables -A FORWARD -p TCP --dport 22 -j REJECT --reject-with tcp-reset
- Explanation: This option tells the
REJECT target what response to
send to the machine that sent the packet that we are rejecting. Once
we get a packet that matches a rule in which we have specified
this target, our machine will first of all send the associated reply,
and the packet will then be dropped dead, just as the DROP target
would drop it. The following reject types are currently valid:
icmp-net-unreachable , icmp-host-unreachable ,
icmp-port-unreachable , icmp-proto-unreachable , icmp-net-prohibited
and icmp-host-prohibited . The default error message is to send a
icmp-port-unreachable to the machine. All of the above are ICMP error
messages and may be set as we wish. Finally, there is one more
option called tcp-reset , which may only be used together with the
TCP protocol. The tcp-reset option will tell REJECT to send a TCP
RST packet in reply to the sending machine. TCP RST packets are used
to close open TCP connections gracefully. As stated in the
iptables man page, this is mainly useful for blocking ident probes
which frequently occur when sending mail to broken mail hosts,
that will not otherwise accept our mail.
RETURN target
The RETURN target will cause the current packet to stop traveling
through the chain where it hit the rule.
If it is the subchain of another chain, the packet will continue to
travel through the superior chains as if nothing had happened. If the
chain is the main chain, for example the INPUT chain, the packet will
have the default policy taken on it. The default policy is normally
set to ACCEPT , DROP or similar.
For example, let us say a packet enters the INPUT chain and then hits
a rule that it matches and that tells it to --jump EXAMPLE_CHAIN . The
packet will then start traversing the EXAMPLE_CHAIN , and all of a
sudden it matches a specific rule which has the --jump RETURN target
set.
It will then jump back to the INPUT chain. Another example would be if
the packet hits a --jump RETURN rule in the INPUT chain. It would then
be dropped to the default policy as previously described, and no more
actions would be taken in this chain.
SAME target
The SAME target works almost in the same fashion as the SNAT target,
but it still differs. Basically, the SAME target will try to always
use the same outgoing IP address for all connections initiated by a
single machine on our network.
For example, say we have one 192.168.1.0/24 network and 3 IP addresses
10.5.6.7-9 . Now, if 192.168.1.20 went out through 10.5.6.7 address the
first time, the packet filter will try to keep that machine always going
out through that IP address.
--to
- Example:
iptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j SAME --to 10.5.6.7-10.5.6.9
- Explanation: As we can see, the
--to argument takes 2 IP addresses
bound together by a - sign. These IP addresses, and all in
between, are the IP addresses that we NAT to using the SAME
algorithm.
--nodst
- Example:
iptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j SAME --to 10.5.6.7-10.5.6.9 --nodst
- Explanation: Under normal action, the
SAME target is calculating
the followup connections based on both destination and source IP
addresses. Using the --nodst option, it uses only the source IP
address to find out which outgoing IP the NAT function should use
for the specific connection. Without this argument, it uses a
combination of the destination and source IP address.
SECMARK target
The SECMARK target is used to set a security context mark on a single
packet, as defined by SELinux and security systems. The SECMARK target
is only valid in the mangle table.
In brief, SELinux is a new and improved security system to add MAC
(Mandatory Access Control) to Linux, implemented by the NSA as a proof
of concept. SELinux basically sets security attributes for different
objects and then matches them into security contexts. The SECMARK
target is used to set a security context on a packet which can then be
used within the security subsystems to match on.
--selctx
- Example:
iptables -t mangle -A PREROUTING -p tcp --dport 80 -j SECMARK --selctx httpcontext
- Explanation: The
--selctx option is used to specify which security
context to set on a packet. The context can then be used for
matching inside the security systems of Linux.
SNAT target
The SNAT (Source Network Address Translation) target is used to do
SNAT, which means that this target will rewrite the source IP address
in the IP header of the IP packet.
This is what we want, for example, when several machines have to share an
Internet connection. We can then turn on ip forwarding in the kernel,
and write an SNAT rule which will translate all packets going out from
our local network to the source IP of our own Internet connection.
Without doing this, the outside world would not know where to send
reply packets, since our local networks mostly use the IANA (Internet
Assigned Numbers Authority) specified IP addresses which are allocated
for LAN networks.
If we forwarded these packets as is, no one on the Internet would know
that they were actually from us. The SNAT target does all the
translation needed to do this kind of work, letting all packets
leaving our LAN look as if they came from a single machine, which would
be our packet filter.
The SNAT target is only valid within the nat table, within the
POSTROUTING chain i.e. this is the only chain in which we may use
SNAT.
Only the first packet in a connection is mangled by SNAT, and after
that all future packets using the same connection will also be
SNATted. Furthermore, the initial rules in the POSTROUTING chain will
be applied to all the packets in the same stream.
--to-source
- Example:
iptables -t nat -A POSTROUTING -p tcp -o eth0 -j SNAT --to-source 194.236.50.155-194.236.50.160:1024-32000
- Explanation: The
--to-source option is used to specify which
source the IP packet should use. This option, at its simplest,
takes one IP address which we want to use for the source IP
address in the IP header. If we want to balance between several IP
addresses, we can use a range of IP addresses, separated by a
hyphen. The --to--source IP numbers could then, for instance, be
something like in the above example:
194.236.50.155-194.236.50.160 . The source IP for each stream that
we open would then be allocated randomly from these, and a single
stream would always use the same IP address for all packets within
that stream. We can also specify a range of ports to be used by
SNAT. All the source ports would then be confined to the ports
specified. The port bit of the rule would then look like in the
example above, :1024-32000 . This is only valid if -p tcp or -p udp
was specified somewhere in the match of the rule in question.
netfilter will always try to avoid making any port alterations if
possible, but if two machines try to use the same ports, then
netfilter will map one of them to another port. If no port range
is specified, then if they are needed, all source ports below 512
will be mapped to other ports below 512. Those between source
ports 512 and 1023 will be mapped to ports below 1024. All other
ports will be mapped to 1024 or above. As previously stated,
iptables will always try to maintain the source ports used by the
actual workstation making the connection. Note that this has
nothing to do with destination ports, so if a client tries to make
contact with an HTTP server outside the packet filter, it will not be
mapped to the FTP control port.
TCPMSS target
The TCPMSS target (there is also the match) can be used to alter the
MSS (Maximum Segment Size) value of TCP SYN packets that the packet filter
sees.
The MSS value is used to control the maximum size of packets for
specific connections. Under normal circumstances, this means the size
of the MTU (Maximum Transmission Unit) value, minus 40 bytes. This is
used to overcome some ISP's and servers that block ICMP fragmentation
needed packets, which can result in really weird problems which can
mainly be described such that everything works perfectly from our
packet filter/router, but our local machines behind the packet filter cannot
exchange large packets.
This could mean such things as mail servers being able to send small
mails, but not large ones, web browsers that connect but then hang
with no data received, and SSH connecting properly, but SCP hangs
after the initial handshake. In other words, everything that uses any
large packets will be unable to work.
The TCPMSS target is able to solve these problems, by changing the
size of the packets going out through a connection. Please note that
we only need to set the MSS on the SYN packet since the machines take
care of the MSS after that. The target takes two arguments:
--set-mss
- Example:
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --set-mss 1460
- Explanation: The
--set-mss argument explicitly sets a specific MSS
value of all outgoing packets. In the example above, we set the
MSS of all SYN packets going out over the eth0 interface to 1460
bytes — normal MTU for ethernet is 1500 bytes, minus 40 bytes is
1460 bytes. MSS only has to be set properly in the SYN packet, and
then the peer machines take care of the MSS automatically.
--clamp-mss-to-pmtu
- Example:
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --clamp-mss-to-pmtu
- Explanation: The
--clamp-mss-to-pmtu automatically sets the MSS to
the proper value, hence we do not need to explicitly set it. It is
automatically set to PMTU (Path Maximum Transmission Unit) minus
40 bytes, which should be a reasonable value for most
applications.
TOS target
The TOS (Type of Services) target is used to set the type of service
field within the IP header. Note that this target is only valid within
the mangle table.
The TOS field consists of 8 bits which are used to help in routing
packets. This is one of the fields that can be used directly within
iproute2 and its subsystem for routing policies. Worth noting, is that
if we handle several separate packet filters and routers, this is the
only way to propagate routing information within the actual packet
between these routers and packet filters.
As previously noted, the MARK target (which sets a MARK associated
with a specific packet) is only available within the kernel, and
cannot be propagated with the packet. If we feel a need to propagate
routing information for a specific packet or stream, we should
therefore set the TOS field, which was developed for this.
There are currently a lot of routers on the Internet which do a pretty
bad job at this, so as of now it may prove to be a bit useless to
attempt TOS mangling before sending the packets on to the Internet. At
best the routers will not pay any attention to the TOS field. At
worst, they will look at the TOS field and do the wrong thing.
However, as stated above, the TOS field can most definitely be put to
good use if we have a large WAN (Wide Area Network) or LAN (Local Area
Network) with multiple routers. We then in fact have the possibility
of giving packets different routes and preferences, based on their TOS
value — even though this might be confined to our own network.
The TOS target is only capable of setting specific values, or named
values on packets. These predefined TOS values can be found in the
kernel include files, or more precisely, the ../linux/ip.h file.
The reasons are many, and we should actually never need to set any
other values. However, there are ways around this limitation. To get
around the limitation of only being able to set the named values on
packets, we can use the FTOS feature/patch available at the
Paksecured Linux Kernel patches site. However, we should be cautious
with this patch i.e. we should not need to use any other than the
default values, except in extreme cases.
The TOS target only takes one option as described below.
--set-tos
- Example:
iptables -t mangle -A PREROUTING -p TCP --dport 22 -j TOS --set-tos 0x10
- Explanation: The
--set-tos option tells the TOS mangler what TOS
value to set onpackets that are matched. The option takes a
numeric value, either in hex or in decimal value. As the TOS value
consists of 8 bits, the value may be 0-255 , or in hex 0x00-0xFF .
Note that in the standard TOS target we are limited to using the
named values available (which should be more or less
standardized), as mentioned in the previous warning. These values
are Minimize-Delay (decimal value 16, hex value 0x10),
Maximize-Throughput (decimal value 8, hex value 0x08),
Maximize-Reliability (decimal value 4, hex value 0x04),
Minimize-Cost (decimal value 2, hex 0x02) or Normal-Service
(decimal value 0, hex value 0x00). The default value on most
packets is Normal-Service , or 0. Note that we can, of course, use
the actual names instead of the actual hex values to set the TOS
value — in fact, this is generally to be recommended, since the
values associated with the names may be changed in future. For a
complete listing of the descriptive values we can do an iptables
-j TOS -h .
TTL target
The TTL (Time to Live) target is used to modify the time to live field
in the IP header. The TTL target is only valid within the mangle
table.
One useful application of this is to change all TTL values to the same
value on all outgoing packets. One reason for doing this is if we have
a bully ISP which does not allow us to have more than one machine
connected to the same Internet connection, and who actively pursues
this.
Setting all TTL values to the same value, will effectively make it a
little bit harder for them to notice that we are doing this. We may
then reset the TTL value for all outgoing packets to a standardized
value, such as 64 as specified in the Linux kernel. It takes 3 options
as of writing this, all of them described below:
--ttl-set
- Example:
iptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-set 64
- Explanation: The
--ttl-set option tells the TTL target which TTL
value to set on the packet in question. A good value would be
around 64 somewhere. It is not too long, and it is not too short.
Do not set this value too high, since it may affect our network
and it is a bit immoral to set this value to high, since the
packet may start bouncing back and forth between two badly
configured routers, and the higher the TTL, the more bandwidth
will be eaten unnecessarily in such a case. This target could be
used to limit how far away our clients are. A good case of this
could be DNS servers, where we do not want the clients to be too
far away.
--ttl-dec
- Example:
iptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-dec 1
- Explanation: The
--ttl-dec option tells the TTL target to
decrement the time to live value by the amount specified after the
--ttl-dec option. In other words, if the TTL for an incoming
packet was 53 and we had set --ttl-dec 3 , the packet would leave
our machine with a TTL value of 49. The reason for this is that the
networking code will automatically decrement the TTL value by 1,
hence the packet will be decremented by 4 steps, from 53 to 49.
This could for example be used when we want to limit how far away
the people using our services are. For example, users should
always use a close-by DNS, and hence we could match all packets
leaving our DNS server and then decrease it by several steps. Of
course, the --set-ttl may be a better idea for this usage.
--ttl-inc
- Example:
iptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-inc 1
- Explanation: The
--ttl-inc option tells the TTL target to
increment the time to live value with the value specified to the
--ttl-inc option. This means that we should raise the TTL value
with the value specified in the --ttl-inc option, and if we
specified --ttl-inc 4 , a packet entering with a TTL of 53 would
leave the machine with TTL 56. Note that the same thing goes here, as
for the previous example of the --ttl-dec option, where the
network code will automatically decrement the TTL value by 1,
which it always does. This may be used to make our packet filter a bit
more stealthy to trace-routes among other things. By setting the
TTL one value higher for all incoming packets, we effectively make
the packet filter hidden from trace-routes. Trace-routes are a loved
and hated thing, since they provide excellent information on
problems with connections and where it happens, but at the same
time, it gives the hacker/cracker some good information about our
upstreams if they have targeted us.
ULOG target
The ULOG target is used to provide userspace logging of matching
packets.
If a packet is matched and the ULOG target is set, the packet
information is multicasted together with the whole packet through a
netlink socket. One or more userspace processes may then subscribe to
various multicast groups and receive the packet i.e. this is a more
complete and more sophisticated logging facility that is only used by
iptables and netfilter so far, and it contains much better facilities
for logging packets.
This target enables us to log information to MySQL databases, and
other databases, making it much simpler to search for specific
packets, and to group log entries. We can find the ULOGD userland
applications at the ULOGD project page.
--ulog-nlgroup
- Example:
iptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-nlgroup 2
- Explanation: The
--ulog-nlgroup option tells the ULOG target which
netlink group to send the packet to. There are 32 netlink groups,
which are simply specified as 1-32. If we would like to reach
netlink group 5, we would simply write --ulog-nlgroup 5 . The
default netlink group used is 1.
--ulog-prefix
- Example:
iptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-prefix "SSH connection attempt: "
- Explanation: The
--ulog-prefix option works just the same as the
prefix value for the standard LOG target. This option prefixes all
log entries with a user-specified log prefix. It can be 32
characters long, and is definitely most useful to distinguish
different log-messages and where they came from.
--ulog-cprange
- Example:
iptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-cprange 100
- Explanation: The
--ulog-cprange option tells the ULOG target how
many bytes of the packet to send to the userspace daemon of ULOG .
If we specify 100 as above, we would copy 100 bytes of the whole
packet to userspace, which would include the whole header
hopefully, plus some leading data within the actual packet. If we
specify 0, the whole packet will be copied to userspace,
regardless of the packets size. The default value is 0, so the
whole packet will be copied to userspace.
--ulog-qthreshold
- Example:
iptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-qthreshold 10
- Explanation: The
--ulog-qthreshold option tells the ULOG target
how many packets to queue inside the kernel before actually
sending the data to userspace. For example, if we set the
threshold to 10 as above, the kernel would first accumulate 10
packets inside the kernel, and then transmit it outside to the
userspace as one single netlink multi part message. The default
value here is 1 because of backward compatibility, the userspace
daemon did not know how to handle multi-part messages previously.
Network Address Translation
NAT (Network Address Translation) is one of the biggest attractions of
Linux and netfilter to this day it seems. Instead of using fairly
expensive third party solutions from Juniper/Cisco/etc., a lot of
companies and individuals have chosen to go with netfilter instead.
One of the main reasons is that it is cheap, and secure (no blackbox
because it is FLOSS (Free/Libre Open Source Software)). All it
requires is a piece of hardware appropriate to the planned use case, a
fairly new Linux kernel which we can download for free from the
Internet, one or two NICs (Network Interface Cards) and cabling.
NAT Use Cases and Terms
Basically, NAT allows a machine or several machines to share the same
IP address. For example, let us say we have a LAN consisting of 5-10
clients. We set their default gateways to point through the NAT
server. Normally the packet would simply be forwarded by the gateway
machine, but in the case of an NAT server it is a little bit
different.
NAT servers translates the source and/or destination addresses of IP
packets to different addresses. The NAT server receives the packet,
rewrites the source and/or destination address and then recalculates
the checksum of the packet.
One of the most common usages of NAT is the SNAT (Source Network
Address Translation) function. Basically, this is used when we have
only one public IP address but several machines (note that those may
as well be virtual machines, or real physical ones or even a mixture
of the both) within our LAN which we want to connect to the Internet,
plus, we cannot afford or see any real benefit in having a public IP
for each and every one of those machines within our LAN.
In that case, we use one of the private IP addresses for our LAN e.g.
192.168.1.0/24 , and then turn on SNAT. The packet filter will then use
SNAT and translate all 192.168.1.0/24 addresses into it is own public
IP address i.e. rewrite the source IP address for each outgoing IP
packet to for example 145.115.95.34 . This way, there will be 5-10
clients or many many more using the same shared IP address.
There is also something called DNAT (Destination Network Address
Translation), which can be extremely helpful when it comes to setting
up servers etc.
First of all, we can help the greater good when it comes to saving IP
space, second, we can get an more or less totally impenetrable
packet filter in between LAN internal machines and any outside net e.g. the
Internet, and/or simply share an IP for several machines that are
separated into several physically different machines.
For example, we may run a small company server farm containing a httpd
and ftpd on the same physical machine (e.g. by using OpenVZ) while
there is a second physically separated machine containing a couple of
different IM (Instant Messaging) services that the employees working
from home or on the road can use to keep in touch with the employees
that are on-site.
We may then run all of these services on the same IP address from the
outside via DNAT. The above example is also based on separate port
NAT'ing, or often called PNAT (Port Network Address Translation). We
do not refer to this very often, since it is covered by the DNAT and
SNAT functionality in netfilter anyway.
In Linux, there are actually two separate types of NAT that can be
used
- fast-NAT or
- netfilter-NAT.
Fast-NAT is implemented inside the IP routing code of the Linux
kernel, while netfilter-NAT is also implemented in the Linux kernel,
but inside the netfilter code.
Fast-NAT is generally called by this name since it is much faster than
the netfilter NAT code. It does not keep track of connections, and
this is both its main pro and con.
Connection tracking takes a lot of processor power, and hence it is
slower, which is one of the main reasons that fast-NAT is faster than
netfilter-NAT. As we also said, the bad thing about fast-NAT does not
track connections, which means it will not be able to do SNAT very
well for whole networks, neither will it be able to NAT complex
protocols such as FTP, IRC and other protocols that netfilter-NAT is
able to handle very well. It is possible, but it will take much, much
more work than would be expected from the netfilter implementation.
There is also a final word that is basically a synonym to SNAT, which
is the masquerade word. In netfilter, masquerade is pretty much the
same as SNAT with the exception that masquerading will automatically
set the new source IP to the default IP address of the outgoing
network interface.
Caveats using NAT
As we have already explained to some extent, there are quite a lot of
minor caveats with using NAT. The main problem is that certain
protocols and applications may not at all within some NAT setup.
Hopefully, these applications are not too common and even if they
happen to be present, it should always be possible to segregate them
into some non-NAT environment.
The second and smaller problem is applications and protocols which
will only work partially. These protocols are more common than the
ones that will not work at all, which is quite unfortunate, but there
is not very much we can do about it as it seems. If complex protocols
continue to be built, this is a problem we will have to continue
living with. Especially if the protocols are not standardized like for
example Skype, ICQ etc.
The third, and largest problem is the fact that the user who sits
behind a NAT server to get out on the internet will not be able to run
his own server.
It could be done, of course, but it takes a lot more time and work to
set this up. In companies, this is probably preferred over having tons
of servers run by different employees that are reachable from the
Internet, without any supervision.
However, when it comes to home users, this should be avoided to the
very last. We should never as an Internet service provider NAT our
customers from a private IP range to a public IP. It will cause us
more trouble than it is worth having to deal with, and there will
always be one or another client which will want this or that protocol
to work flawlessly. When it does not, we will be called down upon.
As one last note on the caveats of NAT, it should be mentioned that
NAT is actually just a hack more or less. NAT was a solution that was
worked out while the IANA and other organisations noted that the
Internet grew exponentially, and that the IP addresses would soon be
in shortage.
NAT was and is a short term solution to the address shortage problem
with IPv4 — the long term solution to the IPv4 address shortage is
the IPv6 protocol, which also solves a ton of other problems.
IPv6 has 128 bits assigned to their addresses, while IPv4 only has 32
bits used for IP addresses. This is an incredible increase in address
space.
Example NAT machine in theory
This is a small theoretical scenario where we want a NAT server
between 2 different networks and an Internet connection.
What we want to do is to connect 2 networks to each other, and both
networks should have access to each other and the Internet. We will
discuss the hardware questions we should take into consideration, as
well as other theory we should think about before actually starting to
implement the NAT machine.
What is needed to build a NAT Machine
Before we discuss anything further, we should start by looking at what
kind of hardware is needed to build a Linux machine doing NAT.
For most smaller networks, this should be no problem, but if we are
starting to look at larger networks, it can actually become one. The
biggest problem with NAT is that it eats resources quite fast.
For a small private network with possibly 1-10 users, a Pentium with
256MB of RAM (Random Access Memory) will do more than enough. However,
if we are starting to get up around 100 or more users, we should start
considering what kind of hardware we should look at.
Of course, it is also a good idea to consider bandwidth usage, and how
many connections will be open at the same time. Generally, spare
computers will do very well however, and this is one of the big pros
of using a Linux based packet filter. We may use old hardware that we
have left over, and hence the packet filter will be very cheap in
comparison to other packet filters.
-
My opinion however is that I never go for the cheap when it is about
core IT infrastructure components — we should opt for redundancy in
the system in order to increase the overall availability of the system
(i.e. basically increasing the MTTF (Mean Time To Failure) by
decreasing the probability of an hardware caused system outage) which
can be done by using a redundant power supply, hardware RAID
(Redundancy Arrays of Independent Disks) and the like. In short, I
would strongly recommend buying a decent server to do the package
filtering and NAT with which, by using virtualization, we can also do
other things like for example backup our workstations and the like.
We will also need to consider NICs (Network Interface Cards). How many
separate networks will connect to our NAT/filter machine? Most of the
time it is simply enough to connect one LAN to the Internet.
If we connect to the Internet via Ethernet, we should generally have 2
ethernet cards or we use one NIC and set up virtual interfaces like
for example eth0:0 , eth0:1 and so forth. It might also be a good idea
to choose a 1000 Mbit/s network card of a relatively good brand (e.g.
Qlogic) for scalability and reliance, but mostly any kinds of NIC will
do as long as they have drivers in the Linux kernel.
A note on this matter: we should avoid using or getting NICs that do
not have drivers in the Linux kernel. I have, on several occasions,
found network cards/brands that have separately distributed drivers on
discs that work dismally. They are generally not very well maintained,
and if we get them to work on our kernel of choice to begin with, the
chance that they will actually work on the next major Linux kernel
upgrade is very small. This will most of the time mean that we may
have to get a little bit more costly NIC, but in the end it is worth
it.
Finally, one thing more to consider is how much RAM we put into the
NAT/packet filter machine. It is a good idea to put in at least more than
512MB of memory if possible, even if it is possible run it on 256MB of
RAM. NAT is not extremely huge on memory consumption, but it may be
wise to add as much as possible just in case we will get more traffic
than expected.
As we can see, there is quite a lot to think about when it comes to
hardware. But, to be completely honest, in most cases we do not need
to think about these points at all, unless we are building a NAT
machine for a large network or company — in which case we pick a new
and decent server anyway. Most home users need not think about this,
but may more or less use whatever hardware they have at hand. There
are no complete comparisons and tests on this topic, but we should
fare rather well with just a little bit of common sense.
Placement of NAT Machines
This should look fairly simple, however, it may be harder than we
originally thought in large networks.
In general, the NAT machine should be placed on the perimeter of the
network, just like any packet filtering machine out there. This, most
of the time, means that the NAT and packet filtering machines are the
same machine, of course. Also worth a thought, if we have very large
networks, it may be worth splitting the network into smaller networks
using VLANs (Virtual Local Area Networks) and assign a NAT/filtering
machine for each of these networks. Since NAT takes quite a lot of
processing power, this will definitely help keep RTT (Round Trip Time)
down.
In our example network as we described above i.e. two LANs and an
Internet connection, we should look at how large the two networks are.
If we can consider them to be small (>= /25 or so) and depending on
what requirements the clients have a couple of hundred clients should
be no problem on a decent NAT machine.
Otherwise, we could have split up the load over several machines by
setting public IP's on smaller NAT machines, each handling their own
LAN segment and then let the traffic congregate over a specific
routing only machine.
This of course takes into consideration that we must have enough
public IP's for all of our NAT machines, and that they are routed
through our dedicated routing machine.
How to place Proxies
Proxies are a general problem when it comes to NAT in most cases
unfortunately, especially transparent proxies.
Normal proxies should not cause too much trouble, but creating a
transparent proxy is a dog to get to work, especially on larger
networks. The first problem is that proxies take quite a lot of
processing power, just the same as NAT does. To put both of these on
the same machine is not advisable if we are going to handle large
network traffic.
The second problem is that if we SNAT as well as DNAT, the proxy will
not be able to know what machines to contact i.e. which server is the
client trying to contact? All that information is lost during the NAT
translation since the packets cannot contain that information as well
if they are NAT'ed.
Locally, this has been solved by adding the information in the
internal data structures that are created for the packets, and hence
proxies such as squid can get the information.
As we can see, the problem is that we do not have much of a choice if
we are going to run a transparent proxy. There are, of course,
possibilities, but they are not advisable really.
- One possibility is to create a proxy outside the packet filter and
create a routing entry that routes all web traffic through that
machine, and then locally on the proxy machine NAT the packets to
the proper ports for the proxy. This way, the information is
preserved all the way to the proxy machine and is still available
on it.
- The second possibility is to simply create a proxy outside the
packet filter, and then block all webtraffic except the traffic
going to the proxy. This way, we will force all users to actually
use the proxy. It is a crude way of doing it, but it will work.
The final Stage of our NAT Machine
As a final step, we should bring all of this information together, and
see how we would solve the NAT machine issue.
The NAT/filtering machine has a public IP address, as well as the
router and any other machines that may be available on the Internet.
All of the machines inside the NAT'ed networks will be using private
IP's, hence saving both a lot of cash as well as IPv4 address space.
Let us take a look at a picture of the networks and how it looks. We
have decided to put the proxy server at the perimeter of our LAN, just
outside the NAT/filtering machine.
However, the proxy machine is still within a DMZ (Demilitarized Zone)
and thus protected. The DMZ containing the proxy and possibly other
machines is connected to the Internet as well as both of our LANs
trough the NAT/filter machine as can be seen below:
All the normal traffic from the NAT'ed networks will be sent through
the DMZ directly to the router (note that this is a dedicated router
i.e. not the instance used for NAT/filtering), which will send the
traffic on out to the Internet. Except, webtraffic which is instead
marked inside the netfilter part of the NAT machine, and then, based
on the mark, routed to the proxy machine.
Let us take a look at what we are talking about. Say a HTTP packet is
seen by the NAT machine. The mangle table can then be used to mark the
packet with a netfilter mark (also known as nfmark ).
Even later when we should route the packets to our router, we will be
able to check for the nfmark within the routing tables, and based on
this mark, we can choose to route the HTTP packets to the proxy
machine. The proxy machine will then do its work.
SNAT
- There is also a final word that is basically a synonym to SNAT,
which is the Masquerade word. In netfilter, masquerade is pretty
much the same as SNAT with the exception that masquerading will
automatically set the new source IP to the default IP address of
the outgoing network interface.
- netstat-nat
- #netmap_target
Masquerading
- The MASQUERADE target is used basically the same as the SNAT
target, but it does not require any —to-source option. The reason
for this is that the MASQUERADE target was made to work with, for
example, dial-up connections, or DHCP connections, which gets
dynamic IP addresses when connecting to the network in question.
This means that you should only use the MASQUERADE target with
dynamically assigned IP connections, which we don't know the actual
address of at all times. If you have a static IP connection, you
should instead use the SNAT target.
- The MASQUERADE target is used in exactly the same way as SNAT, but
the MASQUERADE target takes a little bit more overhead to compute.
The reason for this, is that each time that the MASQUERADE target
gets hit by a packet, it automatically checks for the IP address to
use, instead of doing as the SNAT target does - just using the
single configured IP address. The MASQUERADE target makes it
possible to work properly with Dynamic DHCP IP addresses that your
ISP might provide for your PPP, PPPoE or SLIP connections to the
Internet.
- Note that the MASQUERADE target is only valid within the
POSTROUTING chain in the nat table, just as the SNAT target.
DNAT
Logging
WRITEME
ULOG
Particular Use Cases
WRITEME
A default setup
Thus, a proper packet filter setup would be one with a default deny
policy, that is:
- For the best security, a packet filter should be applied before the
internet-facing interface is brought up. If you have a dynamic IP
and need to use it in your ruleset, consider loading a simple
deny-all packet filter (remember to allow DHCP) before bringing up
the interface, then switching to the real firewall after the you
get an IP.
- incoming connections are allowed only to local services by allowed
machines.
- outgoing connections are only allowed to services used by your
system (DNS, web browsing, POP, email...).
- the forward rule denies everything (unless you are protecting other
systems, see below).
- all other incoming or outgoing connections are denied.
- set ttl to increase by 1 so the packet filter stays stealthy
- set ttl the same on all outgoing packets to hide internal LAN structure
- The PREROUTING chain is only traversed by the first packet in a
stream, which means that all subsequent packets will go totally
unchecked in this chain.
Recent Module
WRITEME
OpenVZ
WRITEME
MARK Target
WRITEME
Apply Rules
It is time now to come up with a practical solution i.e. a rule set
which we are going to feed to netfilter in order to do all kinds of
fancy things like for example SNAT (Source Network Address
Translation) or traffic shaping using the TOS (Type of Services) match
and target.
Last but not least, 9 out of 10 people use netfilter in order to do
packet filtering i.e. protect their infrastructure from malicious
activities or erroneous software.
I have a Bash script (packet_filter ) which does not blindly apply a
set of netfilter/iptables rules for SNAT, TOS, packet filtering etc.,
but rather, which does so in a flexible and dynamic manner by looking
at the system it is run on.
The rationale is simple. I wanted the whole task of securing a network
or single machine with netfilter/iptables to be flexible enough so I
can use it for my notebook, my workstation, which is a LAN machine,
and also for servers which are either LAN machines or gateway machines
... this script can do it all.
Also, packet_filter has been made with the notion of OpenVZ in mind
i.e. it allows to secure VEs (Virtual Environments) from the HN
(Hardware Node). packet_filter is run on the HN i.e. even in a
virtualized environment, there is only one central security instance
that concerns us with regards to packet filtering. This makes things
easy to maintain and to adapt to certain needs that might arise.
However, even so the script can be used for OpenVZ environments, it
also works perfectly fine for non-OpenVZ environments (i.e. the
standard Debian box).
packet_filter makes use of generic.sh (another one of my shell
scripts) for generic functions which are used in several of my shell
scripts and not just packet_filter . We need to download and put
generic.sh in place as well to make things works.
One can find all of my scripts here —
packet_filter and generic.sh
are needed for firewalling. How to set things up is detailed
further down.
Debugging / Testing
WRITEME
Start Packet Filter at System Boot
Once we have a packet filter in place and come up with a set of rules
that we can load into it in order to protect our infrastructure, we
want to have those protective measures in place automatically every
time a machine boots.
What we need to do is to make it so that loading the rules into the
packet filter happens automatically every time the machine boots. It
might get rebooted by a manually issued reboot on the CLI (Command
Line Interface), one may schedule reboots via a cron job or maybe
there just bad luck and a power outage happens in which case, at some
point, the machine will boot up again too.
The point is, if we do not take care of the fact that by default, the
packet filter allows traffic without any restrictions, the level of
protection, otherwise possible with netfilter/iptables, equals zero.
/etc/init.d/script_name or /etc/network/interfaces
In the past the number one choice how to accomplish an automatic
loading of our rules has been /etc/init.d/<script_name> . Because of
its unnecessary complexity and security related issues, Debian
switched to the now standard approach of using
/etc/network/interfaces . The switch was made official with releasing
Etch.
One major benefit of using /etc/network/interfaces is that we can
avoid the following very easy and most effectively:
- there is a time slot in which the interfaces are all up and
functioning already but in which the packet filter is not fully
functional already
- an attacker could now use this time slot to do harm, even take over
the machine
Because we need/want to avoid this achilles heel, what we are going to
do is:
- At first, we perform a lockdown of the machine in question — no
single bit can enter or leave the machine.
- Next, we bring up the interfaces. When they are all up and
functional, the machine still remains in lockdown.
- At the end of this whole sequence, we load our rule set into the
packet filter (using our script
packet_filter ) by which the
machine transitions from a lockdown state to a state where it is
protected by the packet filter — our rule set is applied by the
packet filter on any kind of traffic, forwarded, inbound and
outbound traffic.
As can be seen, at no time will the machine be online without either
being in lockdown-mode or protected by the packet filter i.e.
netfilter/iptables that is.
Configuration
We already know about packet_filter and generic.sh. Now we need to put
packet_filter and generic.sh at a place where it can be accessed and
thus do its job of testing for certain parameters as well as system
settings and then load the appropriate rules.
Actually, we want to choose the path so that this can be done
automatically but also manually i.e. without the need for some shell
alias or the need to append something to PATH . Long story cut short,
we put the scripts into /usr/local/bin
sa@wks:/usr/local/bin$ type ll; ll
ll is aliased to `ls -lh'
total 40K
-rwxr-xr-x 1 sa sa 3.3K 2009-05-13 19:53 generic.sh
-rwxr-xr-x 1 sa sa 37K 2009-06-15 19:58 packet_filter
sa@wks:/usr/local/bin$
Preparing /etc/network/interfaces
Next we use the pre-up and post-up commands in /etc/network/interfaces
in order to carry out our 3 step sequence from above i.e. lockdown the
machine, bring its interfaces up and finally, load our rule set based
on decisions made by packet_filter as it gathers information about the
machine and its configuration and its intended purpose (notebook, LAN
machine, gateway, etc.).
Because both, pre-up and post-up as well as Bash come with an
environment variable PATH set to
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin , the
location (/usr/local/bin ) we choose works fine. Also, FHS (Filesystem
Hierarchy Standard) tells us to put packet_filter and generic.sh into
/usr/local/bin .
Once we are done editing /etc/network/interfaces , below is what it
looks like for my workstation which we know is a LAN machine and gets
its IP assigned via DHCP (Dynamic Host Configuration Protocol):
1 sa@wks:~$ grep -A44 primary /etc/network/interfaces
2 # The primary network interface
3 # post-up /usr/local/bin/packet_filter start needs to run two times
4 # so we can determine the gateway ip and hostname; the first run
5 # opens the firewall which allows the second run to determine the
6 # gateway hostname and ip and use it for further settings
7
8 auto eth0
9 iface eth0 inet dhcp
10 pre-up /usr/local/bin/packet_filter lockdown
11 post-up /usr/local/bin/packet_filter start
12 post-up /usr/local/bin/packet_filter start
13 pre-down /usr/local/bin/packet_filter save
14 sa@wks:~$
If we have several pre-up commands in one stanza, they are executed in
order of appearance — the same is true for post-up , pre-down and
post-down . Lines 8 and 9 were in place before already i.e. they are as
set up during installing the system.
What we added are lines 10 to 13. As mentioned, those lines perform
the sequence we lined out above i.e. like this we make sure there
never is a time slot where our network/machine is unprotected.
Line 10 does the lockdown. Next the interfaces are brought up. Then
line 11 kicks in, transitioning the firewall from total lockdown to
protected state. The reason we issue the same line twice is simple,
packet_filter contains a few commands that need access to the outside
world e.g. when we determine the gateway IP and hostname.
What happens is simply that after line 11 the firewall already
protects us but, in order to gather more information, we run it again
i.e. after line 12, our network/machine is protected and packet_filter
was able to gather information it could not find locally and thus had
to search for it outside my workstation.
We are done! At this point our network respectively machine is
protected by netfilter and it all happened totally automatic at system
boot. Last but not least, when the machine is going for a reboot or is
about to be shut down, we save the current rule set in line 13. That
information might be important to have for investigating some bug odd
behavior if it might occur.
It is important to note that whatever script is used for packet
filtering (packet_filter in our case), it needs to support the
commands/parameters given to it above in lines 10 to 13. packet_filter
does so of course as it supports
sa@wks:~$ grep -A6 '# Usage' /usr/local/bin/packet_filter
# Usage: packet_filter start
# packet_filter restart
# packet_filter status
# packet_filter stop
# packet_filter save
# packet_filter panic|lockdown
### END INIT INFO
sa@wks:~$
Therefore, one can manually issue packet_filter status to get a status
report on the current situation, what rules are active, how much
traffic each rule saw, etc. packet_filter save saves the current rule
set to the filesystem. packet_filter panic or packet_filter lockdown
are synonymous, both do exactly the same, which is locking down the
system.
About the later, we shall not issue packet_filter panic if we are
logged into a machine remotely via SSH (Secure Shell) or we will have
successfully locked ourselves out!
If we want to test it however, we can do so and either use
iptables-apply or have a cron job in place which runs packet_filter
stop (disable the firewall i.e. let traffic pass without restriction)
every 5 minutes or so. When we are done testing, we remove the cron
job again.
Port Knocking
WRITEME
fwknop
- how do I use GPG authentication for fwknopd in conjunction with
monkeysphere i.e.
- in
~/.ssh/config , use the ssh-proxycommand for fwknop in order to
wrap the fwknop/monkeysphere step into one simple ssh call i.e. not
even a shell alias for the first step (opening the sshd port fwknop
on the server)
- http://code.google.com/p/ssh-fwknop/
- What types of services can be protected by fwknop? Technically, any
service that can be filtered by a Netfilter policy is a candidate
for protection by fwknop. Having said this however, fwknop is most
commonly used to provided an additional layer of security for
services that typically have long running sessions such as OpenSSH
or OpenVPN.
- Any service protected by fwknop is inaccessible (by using iptables
or ipfw to intercept packets within the kernel) before
authenticating; anyone scanning for the service will not be able to
detect that it is even listening.
- Multiple users are supported by the fwknop server, and each user
can be assigned their own symmetric or asymmetric encryption key
via the /etc/fwknop/access.conf file.
- For iptables firewalls, ACCEPT rules added by fwknop are added and
deleted (after a configurable timeout) from custom iptables chains
so that fwknop does not interfere with any existing iptables
policy. The iptables rule additions are managed with the
IPTables::ChainMgr module originally developed for the psad
project.
- Port randomization is supported for the destination port of SPA
packets as well as the port over which the follow-on connection is
made via the iptables NAT capabilities.
- Supports the execution of shell commands on behalf of valid SPA packets.
- The fwknop server can be configured to place multiple restrictions
on inbound SPA packets beyond those enforced by encryption keys and
replay attack detection. Namely, packet age, source IP address,
remote user, access to requested ports, filtering regular
expressions against commands, and more.
Prerequisites
- up and running SSH (Secure Shell) setup
Install and Configure
sa@wks:~$ ssh website
/ \\ _-'
_/ \\-''- _ /
__-' { \\
/ \\
/ "o. |o }
| \\ ; YOU ARE BEING WATCHED!
',
\\_ __\\
''-_ \\.//
/ '-____'
/
_'
_-'
This computer system is the private property of its owner, whether individual, corporate or government. It is
for authorized use only. Users (authorized or unauthorized) have no explicit or implicit expectation of
privacy.
Any or all uses of this system and all files on this system may be intercepted, monitored, recorded, copied,
audited, inspected, and disclosed to your employer, to authorized site, government, and law enforcement
personnel, as well as authorized officials of government agencies, both domestic and foreign.
By using this system, the user consents to such interception, monitoring, recording, copying, auditing,
inspection, and disclosure at the discretion of such personnel or officials.
UNAUTHORIZED OR IMPROPER USE OF THIS SYSTEM MAY RESULT
IN CIVIL AND CRIMINAL PENALTIES AND ADMINISTRATIVE OR
DISCIPLINARY ACTION, AS APPROPRIATE !!
By continuing to use this system you indicate your awareness of and consent to these terms and conditions of
use. LOG OFF IMMEDIATELY if you do not agree to the conditions stated in this warning. However, if you are
authorized personal with no bad intentions please continue. Have a nice day! :-)
sa@wks-ve10:~$ su
Password:
wks-ve10:/home/sa# netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 18892356 1690/apache2
tcp 0 0 0.0.0.0:18689 0.0.0.0:* LISTEN 0 18882867 299/sshd
tcp6 0 0 :::18689 :::* LISTEN 0 18882869 299/sshd
wks-ve10:/home/sa# exit
exit
sa@wks-ve10:~$ exit
logout
Connection to 10.0.3.4 closed.
sa@wks:~$ nmap -p- 10.0.3.4
Starting Nmap 4.68 ( http://nmap.org ) at 2009-06-08 11:09 CEST
Interesting ports on 10.0.3.4:
Not shown: 65533 closed ports
PORT STATE SERVICE
80/tcp open http
18689/tcp open unknown
Nmap done: 1 IP address (1 host up) scanned in 1.410 seconds
sa@wks:~$
- within VEs, we have to use
venet0:0 instead of eth0 when
configuring fwknopd
- listening port for fwknop server UDP 62201
- fwknop can be run in debug mode with the —debug command line
option. This will disable daemon mode execution, and print verbose
information to the screen on STDERR as packets are received Also,
after issuing the first command, port 22 should be open on the
server. I would use nmap to scan the server for specifically port
22 to see if the port is open.
Files
Take look at /etc/fwknop/fwknop.conf . The config files are
/etc/fwknop/access.conf
/etc/fwknop/fwknop.conf
/etc/fwknop/pf.os
GPG (GNU Privacy Guard)
Port Randomization
Testing
Miscellaneous
fwknop Daemons
knopmd , knoptm , knopwatchd we consider those helpers to fwknopd
fwknop and Tor
see man 8 fwknop_serv
Authentication
WRITEME
DoS, DDoS
WRITEME
- built something Python-based e.g. use fabric to manage
netfilter/iptables and enable/disable rules based on system
monitoring (the usual OS-level stuff e.g. HTTP GETs, bandwidth,
diskspace, etc.) and higher-level metrics such as number/type DNS
queries, number of credit card requests/declines, number of user
authentications, etc.
- http://en.wikipedia.org/wiki/Denial-of-service_attack
OSI Layer 4
OSI Layer 7
Slowloris
Pro-active Approaches
WRITEME
fail2ban
psad
PSAD is a collection of four lightweight system daemons written in
Perl and in C that is designed to work with Linux firewalling code in
order to detect port scans and act appropriate i.e. change firewalling
rules on the fly, thus adapting the system to the current security
threat.
fwsnort
Miscellaneous
WRITEME
xtables addons
Application Layer
GUIs
fwbuilder
Saving / Restoring Rulesets
Instead of including all of the iptables rules in the
/etc/ini.d/<name_of_shell_script_containing_ruleset> script we can use
the iptables-restore program to restore the rules saved using
iptables-save .
In order to do this we need to setup our rules, save the ruleset under
a static location (such as /etc/default/firewall
http://iptables-tutorial.frozentux.net/iptables-tutorial.html#SPEEDCONSIDERATIONS
|