跳到主要内容

python脚本解压gbk编码zip

· 阅读需 1 分钟

编码问题很烦人

gbk编码的zip在linux下解压出来文件名会乱码,可以用下面脚本解压过程中转换下

#!/usr/bin/env python2
# coding: utf-8

import os
import sys
import zipfile

f = zipfile.ZipFile(sys.argv[1],"r");
for n in f.namelist():
try:
u = n.decode("gbk")
except:
u = n
p = os.path.dirname(u)
if not p:
continue
if not os.path.exists(p):
os.makedirs(p)
d = file.read(n)
if os.path.exists(u):
continue
with open(u, "w") as o:
o.write(data)

CentOS8安装后grub菜单增加windows入口

· 阅读需 2 分钟

电脑双系统centos+windows,安装完centos8之后默认没有引导windows的入口,按照下面方法手搓即可。

1 启动进入centos

查看磁盘分区信息,如下: fdisk -l

# fdisk -l
Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x297f5cef

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 250058751 250056704 119.2G 7 HPFS/NTFS/exFAT
/dev/sda2 250058752 393418751 143360000 68.4G 7 HPFS/NTFS/exFAT
/dev/sda3 393418752 394442751 1024000 500M 83 Linux
/dev/sda4 394442752 500117503 105674752 50.4G 5 Extended
/dev/sda5 394444800 500117503 105672704 50.4G 83 Linux

通过fdisk结果看到windows第一个partion在sda1,对应grub的磁盘索引编号是hd0,1,接下来编辑grub配置文件,自定义配置路径:

 vi  /etc/grub.d/40_custom

配置示例如下:

 #!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries. Simply type the
# menu entries you want to add after this comment. Be careful not to change
# the 'exec tail' line above.

menuentry "Windows" {
set root=(hd0,1)
chainloader +1
}

保存并执行以下命令使自定义配置生效:

grub2-mkconfig --output=/boot/grub2/grub.cfg

OVER.

shell style guide

· 阅读需 21 分钟

这里是一句长长的引言

  • Shell 编码规范

前言

与其它的编程规范一样,这里所讨论的不仅仅是编码格式美不美观的问题, 同时也讨论一些约定及编码标准。这份文档主要侧重于我们所普遍遵循的规则, 对于那些不是明确强制要求的,我们尽量避免提供意见。

为什么要有编码规范

编码规范对于程序员而言尤为重要,有以下几个原因:

  • 一个软件的生命周期中,80%的花费在于维护
  • 几乎没有任何一个软件,在其整个生命周期中,均由最初的开发人员来维护
  • 编码规范可以改善软件的可读性,可以让程序员尽快而彻底地理解新的代码
  • 如果你将源码作为产品发布,就需要确任它是否被很好的打包并且清晰无误,一如你已构建的其它任何产品

编码规范原则

本文档中的准则致力于最大限度达到以下原则:

  • 正确性
  • 可读性
  • 可维护性
  • 可调试性
  • 一致性
  • 美观

尽管本文档涵盖了许多基础知识,但应注意的是,没有编码规范可以为我们回答所有问题,开发人员始终需要再编写完代码后,对上述原则做出正确的判断。

代码规范等级定义

  • 可选(Optional):用户可参考,自行决定是否采用;
  • 推荐(Preferable):用户理应采用,但如有特殊情况,可以不采用;
  • 必须(Mandatory):用户必须采用(除非是少数非常特殊的情况,才能不采用);

注: 未明确指明的则默认为 必须(Mandatory)

本文档参考

主要参考如下文档:

源文件

基础

使用场景

仅建议Shell用作相对简单的实用工具或者包装脚本。因此单个shell脚本内容不宜太过复杂。

在选择何时使用shell脚本时时应遵循以下原则:

  • 如主要用于调用其他工具且需处理的数据量较少,则shell是一个选择
  • 如对性能十分敏感,则更推荐选择其他语言,而非shell
  • 如需处理相对复杂的数据结构,则更推荐选择其他语言,而非shell
  • 如脚本内容逐渐增长且有可能出现继续增长的趋势,请尽早使用其他语言重写

文件名

可执行文件不建议有扩展名,库文件必须使用 .sh 作为扩展名,且应是不可执行的。

执行一个程序时,无需知道其编写语言,且shell脚本并不要求具有扩展名,所以更倾向可执行文件没有扩展名。

而库文件知道其编写语言十分重要,使用 .sh 作为特定语言后缀的扩展名,可以和其他语言编写的库文件加以区分。

文件名要求全部小写, 可以包含下划线 _ 或连字符 -, 建议可执行文件使用连字符,库文件使用下划线。

正例:

my-useful-bin
my_useful_libraries.sh
myusefullibraries.sh

反例:

My_Useful_Bin
myUsefulLibraries.sh

文件编码

源文件编码格式为UTF-8。 避免不同操作系统对文件换行处理的方式不同,一律使用LF

单行长度

每行最多不超过120个字符。每行代码最大长度限制的根本原因是过长的行会导致阅读障碍,使得缩进失效。

除了以下两种情况例外:

  • 导入模块语句
  • 注释中包含的URL

如出现长度必须超过120个字符的字符串,应尽量使用here document或者嵌入的换行符等合适的方法使其变短。

示例:

# DO use 'here document's
cat <<END;
I am an exceptionally long
string.
END

# Embedded newlines are ok too
long_string="I am an exceptionally
long string."

空白字符

除了在行结束使用换行符,空格是源文件中唯一允许出现的空白字符。

  • 字符串中的非空格空白字符,使用转义字符
  • 不允许行前使用tab缩进,如果使用tab缩进,必须设置1个tab为4个空格
  • 不应在行尾出现没有意义的空白字符

垃圾清理 推荐

对从来没有用到的或者被注释的方法、变量等要坚决从代码中清理出去,避免过多垃圾造成干扰。

结构

使用bash

Bash 是唯一被允许使用的可执行脚本shell。

可执行文件必须以 #!/bin/bash 开始。请使用 set 来设置shell的选项,使得用 bash <script_name> 调用你的脚本时不会破坏其功能。

限制所有的可执行shell脚本为bash使得我们安装在所有计算机中的shell语言保持一致性。 正例:

#!/bin/bash
set -e

反例:

#!/bin/sh -e

许可证或版权信息 推荐

许可证与版权信息需放在源文件的起始位置。例如:

#
# Licensed under the BSD 3-Clause License (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# https://opensource.org/licenses/BSD-3-Clause
#
# Unless required by applicable law or agreed to in writing, software distributed
# under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.
#

缩进

块缩进

每当开始一个新的块,缩进增加4个空格(不能使用\t字符来缩进)。当块结束时,缩进返回先前的缩进级别。缩进级别适用于代码和注释。

main() {
# 缩进4个空格
say="hello"
flag=0
if [[ $flag = 0 ]]; then
# 缩进4个空格
echo "$say"
fi
管道

如果一行容不下整个管道操作,那么请将整个管道操作分割成每行一个管段。

如果一行容得下整个管道操作,那么请将整个管道操作写在同一行,管道左右应有空格。

否则,应该将整个管道操作分割成每行一段,管道操作的下一部分应该将管道符放在新行并且缩进4个空格。这适用于管道符 | 以及逻辑运算 ||&& 。 正例:

# 单行管道连接,管道左右空格
command1 | command2

# 长命令管道换行连接,管道放置于下一个命令开头,缩进4个空格
command1 \
| command2 \
| command3 \
| command4

反例:

# 管道左右无空格
command1|command2

# 换行连接管道放置于行末
command1 | \
command2 | \
command3 | \
command4
循环

请将 ; do , ; thenwhile , for , if 放在同一行。

shell中的循环略有不同,但是我们遵循跟声明函数时的大括号相同的原则。即: ; do , ; then 应该和 while/for/if 放在同一行。 else 应该单独一行。 结束语句应该单独一行且跟开始语句缩进对齐。

正例:

for dir in ${dirs_to_cleanup}; do
if [[ -d "${dir}/${BACKUP_SID}" ]]; then
log_date "Cleaning up old files in ${dir}/${BACKUP_SID}"
rm "${dir}/${BACKUP_SID}/"*
if [[ "$?" -ne 0 ]]; then
error_message
fi
else
mkdir -p "${dir}/${BACKUP_SID}"
if [[ "$?" -ne 0 ]]; then
error_message
fi
fi
done

反例:

function getBatchName()
{
batch_name="batch"
if [[ "$input5"x == *$batch_name* ]]
then
batch_name=$input5
else if [[ "$input6"x == *$batch_name* ]]
then
batch_name=$input6
else if [[ "$input7"x == *$batch_name* ]]
then
batch_name=$input7
fi
fi
fi
}
case语句

通过4个空格缩进可选项。 可选项中的多个命令应该被拆分成多行,模式表达式、操作和结束符 ;; 在不同的行。 匹配表达式比 case 和 esac 缩进一级。多行操作要再缩进一级。 模式表达式前面不应该出现左括号。避免使用 ;&;;& 符号。 示例:

case "${expression}" in
a)
variable="..."
some_command "${variable}" "${other_expr}" ...
;;
absolute)
actions="relative"
another_command "${actions}" "${other_expr}" ...
;;
*)
error "Unexpected expression '${expression}'"
;;
esac

只要整个表达式可读,简单的单行命令可以跟模式和 ;; 写在同一行。当单行容不下操作时,请使用多行的写法。 单行示例:

verbose='false'
aflag=''
bflag=''
files=''
while getopts 'abf:v' flag; do
case "${flag}" in
a) aflag='true' ;;
b) bflag='true' ;;
f) files="${OPTARG}" ;;
v) verbose='true' ;;
*) error "Unexpected option ${flag}" ;;
esac
done

函数位置

将文件中所有的函数统一放在常量下面。不要在函数之间隐藏可执行代码。

如果你有函数,请将他们统一放在文件头部。只有includes, set 声明和常量设置可能在函数声明之前完成。不要在函数之间隐藏可执行代码。如果那样做,会使得代码在调试时难以跟踪并出现意想不到的结果。

主函数main

对于包含至少了一个其他函数的足够长的脚本,建议定义一个名为 main 的函数。对于功能简单的短脚本, main函数是没有必要的。

为了方便查找程序的入口位置,将主程序放入一个名为 main 的函数中,作为最底部的函数。这使其和代码库的其余部分保持一致性,同时允许你定义更多变量为局部变量(如果主代码不是一个函数就不支持这种做法)。 文件中最后的非注释行应该是对 main 函数的调用:

main "$@"

注释

代码注释的基本原则:

  • 注释应能使代码更加明确
  • 避免注释部分的过度修饰
  • 保持注释部分简单、明确
  • 在编码以前就应开始写注释
  • 注释应说明设计思路而不是描述代码的行为

注释与其周围的代码在同一缩进级别,#号与注释文本间需保持一个空格以和注释代码进行区分。

文件头

每个文件的开头是其文件内容的描述。除版权声明外,每个文件必须包含一个顶层注释,对其功能进行简要概述。

例如:

#!/bin/bash
#
# Perform hot backups of databases.
功能注释

主体脚本中除简洁明了的函数外都必须带有注释。库文件中所有函数无论其长短和复杂性都必须带有注释。

这使得其他人通过阅读注释即可学会如何使用你的程序或库函数,而不需要阅读代码。

所有的函数注释应该包含:

  • 函数的描述
  • 全局变量的使用和修改
  • 使用的参数说明
  • 返回值,而不是上一条命令运行后默认的退出状态

例如:

#!/bin/bash
#
# Perform hot backups of databases.

export PATH='/usr/sbin/bin:/usr/bin:/usr/local/bin'

#######################################
# Cleanup files from the backup dir
# Globals:
# BACKUP_DIR
# BACKUP_SID
# Arguments:
# None
# Returns:
# None
#######################################
cleanup() {
...
}
实现部分的注释

注释你代码中含有技巧、不明显、有趣的或者重要的部分。

这部分遵循代码注释的基本原则即可。不要注释所有代码。如果有一个复杂的不易理解的逻辑,请进行简单的注释。

TODO注释

对那些临时的, 短期的解决方案, 或已经够好但仍不完美的代码使用 TODO 注释.

TODO 注释要使用全大写的字符串 TODO, 在随后的圆括号里写上你的名字,邮件地址, bug ID, 或其它身份标识和与这一 TODO 相关的 issue。 主要目的是让添加注释的人 (也是可以请求提供更多细节的人) 可根据规范的TODO 格式进行查找。 添加 TODO 注释并不意味着你要自己来修正,因此当你加上带有姓名的 TODO 时, 一般都是写上自己的名字。

这与C++ Style Guide中的约定相一致。

例如:

# TODO(mrmonkey): Handle the unlikely edge cases (bug ####)
# TODO(--bug=123456): remove the "Last visitors" feature

命名

函数名

使用小写字母,并用下划线分隔单词。使用双冒号 :: 分隔包名。函数名之后必须有圆括号。

如果你正在写单个函数,请用小写字母来命名,并用下划线分隔单词。如果你正在写一个包,使用双冒号 :: 来分隔包名。 函数名和圆括号之间没有空格,大括号必须和函数名位于同一行。 当函数名后存在 () 时,关键词 function 是多余的,建议不带 function 的写法,但至少做到同一项目内风格保持一致。 正例:

# Single function
my_func() {
...
}

# Part of a package
mypackage::my_func() {
...
}

反例:

function my_func
{
...
}

变量名

规则同函数名一致。

循环中的变量名应该和正在被循环的变量名保持相似的名称。 示例:

for zone in ${zones}; do
something_with "${zone}"
done

常量和环境变量名

全部大写,用下划线分隔,声明在文件的顶部。

常量和任何导出到环境中的变量都应该大写。 示例:

# Constant
readonly PATH_TO_FILES='/some/path'

# Both constant and environment
declare -xr BACKUP_SID='PROD'

有些情况下首次初始化及常量(例如,通过getopts),因此,在getopts中或基于条件来设定常量是可以的,但之后应该立即设置其为只读。 值得注意的是,在函数中使用 declare 对全局变量无效,所以推荐使用 readonly 和 export 来代替。 示例:

VERBOSE='false'
while getopts 'v' flag; do
case "${flag}" in
v) VERBOSE='true' ;;
esac
done
readonly VERBOSE

只读变量

使用 readonly 或者 declare -r 来确保变量只读。

因为全局变量在shell中广泛使用,所以在使用它们的过程中捕获错误是很重要的。当你声明了一个变量,希望其只读,那么请明确指出。 示例:

zip_version="$(dpkg --status zip | grep Version: | cut -d ' ' -f 2)"
if [[ -z "${zip_version}" ]]; then
error_message
else
readonly zip_version
fi

局部变量

每次只声明一个变量,不要使用组合声明,比如a=1 b=2;

使用 local 声明特定功能的变量。声明和赋值应该在不同行。

必须使用 local 来声明局部变量,以确保其只在函数内部和子函数中可见。这样可以避免污染全局名称空间以及避免无意中设置可能在函数外部具有重要意义的变量。

当使用命令替换进行赋值时,变量声明和赋值必须分开。因为内建的 local 不会从命令替换中传递退出码。 正例:

my_func2() {
local name="$1"
# 命令替换赋值,变量声明和赋值需放到不同行:
local my_var
my_var="$(my_func)" || return
...
}

反例:

my_func2() {
# 禁止以下写法: $? 将获取到'local'指令的返回值, 而非 my_func
local my_var="$(my_func)"
[[ $? -eq 0 ]] || return

...
}

异常与日志

异常

使用shell返回值来返回异常,并根据不同的异常情况返回不同的值。

日志

所有的错误信息都应被导向到STDERR,这样将有利于出现问题时快速区分正常输出和异常输出。

建议使用与以下函数类似的方式来打印正常和异常输出:

err() {
echo "[$(date +'%FT%T%z')]: $@" >&2
}

if ! do_something; then
err "Unable to do_something"
exit "${E_DID_NOTHING}"
fi

编程实践 持续分类并完善

变量扩展 推荐

通常情况下推荐为变量加上大括号如 "${var}" 而不是 "$var" ,但具体也要视情况而定。

以下按照优先顺序列出建议:

  • 与现有代码保持一致
  • 单字符变量在特定情况下才需要被括起来
  • 使用引号引用变量,参考下一节:变量引用

详细示例如下: 正例:

# 位置变量和特殊变量,可以不用大括号:
echo "Positional: $1" "$5" "$3"
echo "Specials: !=$!, -=$-, _=$_. ?=$?, #=$# *=$* @=$@ \$=$$ ..."

# 当位置变量大于等于10,则必须有大括号:
echo "many parameters: ${10}"

# 当出现歧义时,必须有大括号:
# Output is "a0b0c0"
set -- a b c
echo "${1}0${2}0${3}0"

# 使用变量扩展赋值时,必须有大括号:
DEFAULT_MEM=${DEFUALT_MEM:-"-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g"}

# 其他常规变量的推荐处理方式:
echo "PATH=${PATH}, PWD=${PWD}, mine=${some_var}"
while read f; do
echo "file=${f}"
done < <(ls -l /tmp)

反例:

# 无引号, 无大括号, 特殊变量,单字符变量
echo a=$avar "b=$bvar" "PID=${$}" "${1}"

# 无大括号产生歧义场景:以下会被解析为 "${1}0${2}0${3}0",
# 而非 "${10}${20}${30}
set -- a b c
echo "$10$20$30"

变量引用 推荐

变量引用通常情况下应遵循以下原则:

  • 默认情况下推荐使用引号引用包含变量、命令替换符、空格或shell元字符的字符串
  • 在有明确要求必须使用无引号扩展的情况下,可不用引号
  • 字符串为单词类型时才推荐用引号,而非命令选项或者路径名
  • 不要对整数使用引号
  • 特别注意 [[ 中模式匹配的引号规则
  • 在无特殊情况下,推荐使用 $@ 而非 $*

以下通过示例说明:

# '单引号' 表示禁用变量替换
# "双引号" 表示需要变量替换

# 示例1: 命令替换需使用双引号
flag="$(some_command and its args "$@" 'quoted separately')"

# 示例2:常规变量需使用双引号
echo "${flag}"

# 示例3:整数不使用引号
value=32
# 示例4:即便命令替换输出为整数,也需要使用引号
number="$(generate_number)"

# 示例5:单词可以使用引号,但不作强制要求
readonly USE_INTEGER='true'

# 示例6:输出特殊符号使用单引号或转义
echo 'Hello stranger, and well met. Earn lots of $$$'
echo "Process $$: Done making \$\$\$."

# 示例7:命令参数及路径不需要引号
grep -li Hugo /dev/null "$1"

# 示例8:常规变量用双引号,ccs可能为空的特殊情况可不用引号
git send-email --to "${reviewers}" ${ccs:+"--cc" "${ccs}"}

# 示例9:正则用单引号,$1可能为空的特殊情况可不用引号
grep -cP '([Ss]pecial|\|?characters*)$' ${1:+"$1"}

# 示例10:位置参数传递推荐带引号的"$@",所有参数作为单字符串传递用带引号的"$*"
# content of t.sh
func_t() {
echo num: $#
echo args: 1:$1 2:$2 3:$3
}

func_t "$@"
func_t "$*"
# 当执行 ./t.sh a b c 时输出如下:
num: 3
args: 1:a 2:b 3:c
num: 1
args: 1:a b c 2: 3:

命令替换

使用 $(command) 而不是反引号。

因反引号如果要嵌套则要求用反斜杠转义内部的反引号。而 $(command) 形式的嵌套无需转义,且可读性更高。

正例:

var="$(command "$(command1)")"

反例:

var="`command \`command1\``"

条件测试

使用 [[ ... ]] ,而不是 [ , test , 和 /usr/bin/[

因为在 [[]] 之间不会出现路径扩展或单词切分,所以使用 [[ ... ]] 能够减少犯错。且 [[ ... ]] 支持正则表达式匹配,而 [ ... ] 不支持。 参考以下示例:

# 示例1:正则匹配,注意右侧没有引号
# 详尽细节参考:http://tiswww.case.edu/php/chet/bash/FAQ 中E14部分
if [[ "filename" =~ ^[[:alnum:]]+name ]]; then
echo "Match"
fi

# 示例2:严格匹配字符串"f*"(本例为不匹配)
if [[ "filename" == "f*" ]]; then
echo "Match"
fi

# 示例3:[]中右侧不加引号将出现路径扩展,如果当前目录下有f开头的多个文件将报错[: too many arguments
if [ "filename" == f* ]; then
echo "Match"
fi

字符串测试

尽可能使用变量引用,而非字符串过滤。

Bash可以很好的处理空字符串测试,请使用空/非空字符串测试方法,而不是过滤字符,让代码具有更高的可读性。 正例:

if [[ "${my_var}" = "some_string" ]]; then
do_something
fi

反例:

if [[ "${my_var}X" = "some_stringX" ]]; then
do_something
fi

正例:

# 使用-z测试字符串为空
if [[ -z "${my_var}" ]]; then
do_something
fi

反例:

# 使用空引号测试空字符串,能用但不推荐
if [[ "${my_var}" = "" ]]; then
do_something
fi

正例:

# 使用-n测试非空字符串
if [[ -n "${my_var}" ]]; then
do_something
fi

反例:

# 测试字符串非空,能用但不推荐
if [[ "${my_var}" ]]; then
do_something
fi

文件名扩展

当进行文件名的通配符扩展时,请指定明确的路径。

当目录中有特殊文件名如以 - 开头的文件时,使用带路径的扩展通配符 ./* 比不带路径的 * 要安全很多。

# 例如目录下有以下4个文件和子目录:
# -f -r somedir somefile

# 未指定路径的通配符扩展会把-r和-f当作rm的参数,强制删除文件:
psa@bilby$ rm -v *
removed directory: `somedir'
removed `somefile'

# 而指定了路径的则不会:
psa@bilby$ rm -v ./*
removed `./-f'
removed `./-r'
rm: cannot remove `./somedir': Is a directory
removed `./somefile'

慎用eval

应该避免使用eval。

Eval在用于分配变量时会修改输入内容,但设置变量的同时并不能检查这些变量是什么。 反例:

# 以下设置的内容及成功与否并不明确
eval $(set_my_variables)

慎用管道连接while循环

请使用进程替换或者for循环,而不是通过管道连接while循环。

这是因为在管道之后的while循环中,命令是在一个子shell中运行的,因此对变量的修改是不能传递给父shell的。

这种管道连接while循环中的隐式子shell使得bug定位非常困难。 反例:

last_line='NULL'
your_command | while read line; do
last_line="${line}"
done

# 以下会输出'NULL':
echo "${last_line}"

如果你确定输入中不包含空格或者其他特殊符号(通常不是来自用户输入),则可以用for循环代替。 例如:

total=0
# 仅当返回结果中无空格等特殊符号时以下可正常执行:
for value in $(command); do
total+="${value}"
done

使用进程替换可实现重定向输出,但是请将命令放入显式子shell,而非while循环创建的隐式子shell。 例如:

total=0
last_file=
# 注意两个<之间有空格,第一个为重定向,第二个<()为进程替换
while read count filename; do
total+="${count}"
last_file="${filename}"
done < <(your_command | uniq -c)

echo "Total = ${total}"
echo "Last one = ${last_file}"

检查返回值

总是检查返回值,且提供有用的返回值。

对于非管道命令,使用 $? 或直接通过 if 语句来检查以保持其简洁。

例如:

# 使用if语句判断执行结果
if ! mv "${file_list}" "${dest_dir}/" ; then
echo "Unable to move ${file_list} to ${dest_dir}" >&2
exit "${E_BAD_MOVE}"
fi

# 或者使用$?
mv "${file_list}" "${dest_dir}/"
if [[ $? -ne 0 ]]; then
echo "Unable to move ${file_list} to ${dest_dir}" >&2
exit "${E_BAD_MOVE}"
fi

内建命令和外部命令

当内建命令可以完成相同的任务时,在shell内建命令和调用外部命令之间,应尽量选择内建命令。

因内建命令相比外部命令而言会产生更少的依赖,且多数情况调用内建命令比调用外部命令可以获得更好的性能(通常外部命令会产生额外的进程开销)。

正例:

# 使用内建的算术扩展
addition=$((${X} + ${Y}))
# 使用内建的字符串替换
substitution="${string/#foo/bar}"

反例:

# 调用外部命令进行简单的计算
addition="$(expr ${X} + ${Y})"
# 调用外部命令进行简单的字符串替换
substitution="$(echo "${string}" | sed -e 's/^foo/bar/')"

文件加载

加载外部库文件不建议用使用.,建议使用source,已提升可阅读性。 正例:

source my_libs.sh

反例:

. my_libs.sh

内容过滤与统计

除非必要情况,尽量使用单个命令及其参数组合来完成一项任务,而非多个命令加上管道的不必要组合。 常见的不建议的用法例如:cat和grep连用过滤字符串; cat和wc连用统计行数; grep和wc连用统计行数等。

正例:

grep net.ipv4 /etc/sysctl.conf
grep -c net.ipv4 /etc/sysctl.conf
wc -l /etc/sysctl.conf

反例:

cat /etc/sysctl.conf | grep net.ipv4
grep net.ipv4 /etc/sysctl.conf | wc -l
cat /etc/sysctl.conf | wc -l

正确使用返回与退出

除特殊情况外,几乎所有函数都不应该使用exit直接退出脚本,而应该使用return进行返回,以便后续逻辑中可以对错误进行处理。 正例:

# 当函数返回后可以继续执行cleanup
my_func() {
[[ -e /dummy ]] || return 1
}

cleanup() {
...
}

my_func
cleanup

反例:

# 当函数退出时,cleanup将不会被执行
my_func() {
[[ -e /dummy ]] || exit 1
}

cleanup() {
...
}

my_func
cleanup

附:常用工具

推荐以下工具帮助我们进行代码的规范:

HIVE中常见的小文件合并方法

· 阅读需 1 分钟

介绍hive小文件常见处理方法

hive的文件产生过程

小文件太多的影响

为什么会产生小文件

如何处理小文件

case 1

INSERT OVERWRITE TABLE tb1
SELECT * FROM tb2
ORDER BY 1;
ALTER TABLE tb2 RENAME TO b_tb2;
ALTER TABLE tb1 RENAME TO tb2;

case 2

INSERT TABLE tb1
SELECT c1, c2 FROM (
SELECT c1, c2
FROM tb2
WHERE xxx
AND xxx
) t
ORDER BY c1, c2;

case 3

SELECT c1
FROM (
xxx
) t
GROUP BY x;

case 4

INSERT OVERWRITE TABLE tb1
SELECT
xxx
FROM
xxx
WHERE
xxx) t
distribute by rand();

case 5

INSERT TABLE tb1
SELECT c1,c2
FROM tb2
WHERE xxx
sort by c1;

系统中的随机数和熵值

· 阅读需 1 分钟

本文试着总结系统中的随机数前前后后以及管理中需要注意的问题 [先欠着]

1 什么是随机数?

随机数就是无法预测的数

2 随机数有什么用?

随机是为了安全

3 如何获得随机数?

/dev/random /dev/urandom /proc/sys/kernel/random/entropy_avail

4 会有哪些问题?

1 随机数数产生速度慢 2 影响上层应用

haveged 和 rng-tools

CentOS7系统安装ansible awx记录

· 阅读需 2 分钟

记录awx的安装测试过程以及需要注意的点

关于awx

awx(https://github.com/ansible/awx)是ansible tower的开源版本,作为tower的upstream对外开源, 项目从2013年开始维护,2017年由redhat对外开源,目前维护得比较活跃。由于官方的install guide写得有点杂不是很直观, 导致想要安装个简单的测试环境体验一下功能都要折腾半天,这里提供一个简单版本的安装流程方便快速体验。

安装过程

安装软件包

yum -y install epel-release
systemctl disable firewalld
systemctl stop firewalld
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
yum -y install git gettext ansible docker nodejs npm gcc-c++ bzip2 python-docker-py

启动服务

systemctl start docker
systemctl enable docker

clone awx代码

git clone https://github.com/ansible/awx.git
cd awx/installer/
# 注意修改一下postgres_data_dir到其他目录比如/data/pgdocker
vi inventory
ansible-playbook -i inventory install.yml

检查日志

docker logs -f awx_task

以上是安装过程,因为本地环境访问外网要经过代理,这里记录一下配置docker通过代理访问外网的方式,否则pull image会有问题。

mkdir /etc/systemd/system/docker.service.d/
cat > /etc/systemd/system/docker.service.d/http-proxy.conf <<EOF
[Service]
Environment="HTTP_PROXY=proxy.test.dev:8080" "HTTPS_PROXY=proxy.test.dev:8080" "NO_PROXY=localhost,127.0.0.1,172.1.0.2"
EOF

systemctl daemon-reload
systemctl restart docker
systemctl show --property=Environment docker

参考文档:

[1] http://khmel.org/?p=1245

[2] https://docs.docker.com/engine/admin/systemd/#httphttps-proxy

很诡异的服务日志无法切割问题分析

· 阅读需 3 分钟

遇到某服务进程产生的日志始终无法切割,原来原因在这里

问题现象

某业务机器因磁盘容量超过阈值收到告警,分析发现是由于该机器上某个服务进程产生的日志文件量太大,且日志文件未按周期切割,进而导致历史日志信息积累到单个日志文件中。 未避免故障发生采取临时措施手工切割该日志文件,由于该服务进程并未提供内置的日志切割,因此手工模拟类似logrotate的copytruncate模式对日志进行切割。 但在将日志truncate之后,奇怪的一幕发生了,通过ls查看文件大小发现并未减少,直觉判断这可能是文件句柄一直处于打开状态且偏移量未发生改变导致。 在进一步检查了该进程的启动方式之后,发现该进程通过nohup启动,并将标准输出重定向到持续增大的日志文件中。

模拟

我们通过下面几行脚本来模拟此现象:

#!/bin/bash
while true; do
sleep 1
head -5000 /dev/urandom
done

脚本启动后会有一个常驻进程每个1秒钟输出一堆字符串以此来模拟日志文件增涨,我们按照以下方式启动:

nohup ./daemon.sh >out.log 2>&1 < /dev/null &

等待一会之后我们观察到日志已经写入了

[root@localhost t]# ll -h out.log ;du -h out.log 
-rw-r--r-- 1 root root 64M Oct 19 17:41 out.log
64M out.log

接着将日志文件清空,再观察文件大小变化

[root@localhost t]# 
[root@localhost t]# truncate -s0 out.log
[root@localhost t]# ll -h out.log ;du -h out.log
-rw-r--r-- 1 root root 93M Oct 19 17:41 out.log
4.0M out.log

这时可以看到,虽然文件被清空了,但是ls看到的大小依然没有发生变化,也就是说文件中产生了大量空洞。

解决方法

将nohup启动进程后的输出重定向 > 替换为 >>, 即改为append模式来写入日志,这时再truncate就不会出现上面的问题了。

 nohup ./daemon.sh >>out.log 2>&1  </dev/null &


[root@localhost t]# ll -h out.log ;du -h out.log
-rw-r--r-- 1 root root 48M Oct 19 19:43 out.log
64M out.log
[root@localhost t]# ll -h out.log ;du -h out.log
-rw-r--r-- 1 root root 77M Oct 19 19:43 out.log
128M out.log
[root@localhost t]# truncate -s0 out.log
[root@localhost t]# ll -h out.log ;du -h out.log
-rw-r--r-- 1 root root 1.3M Oct 19 19:43 out.log
2.0M out.log

这里留一个问题: 为什么使用append模式就不会出现这个问题?

参考文档:

[1] https://www.gnu.org/software/bash/manual/bash.html#Redirections

[2] https://www.gnu.org/software/coreutils/manual/html_node/nohup-invocation.html

HADOOP3.0.0纠删码测试

· 阅读需 5 分钟

记录hadoop3.0.0版本测试安装过程,对hadoop3.0.0中纠删码进行了简单测试。

基础环境

HADOOP3.0.0版本中增加了纠删码技术,在提高可用性的同时还能减低存储成本,目前处于实验阶段,以下将测试环境中的搭建步骤及简单测试过程进行记录。本次仅对hdfs进行测试,因此不会部署其他服务。

系统环境如下:

kvm虚拟机,1个namenode节点,6个datanode节点,4core ,8G mem , 50G disk

hadoop版本:3.0.0-alpha4

java版本: 1.8.0_144

测试集群安装

基础包安装

从apache镜像下载hadoop-3.0.0-alpha4,当前最新版本,hadoop home目录/opt/hadoop,在namenode节点将配置等修改好之后拷贝到所有datanode节点。

tar xf hadoop-3.0.0-alpha4.tar.gz
mv hadoop-3.0.0-alpha4 /opt/hadoop
yum -y install jdk --disablerepo=* --enablerepo=local-custom

配置文件修改

需要修改的配置文件有hadoop-env.sh 、core-site.xml、hdfs-site.xml

修改/opt/hadoop/etc/hadoop/hadoop-env.sh中以下参数:

export JAVA_HOME=/usr/java/jdk1.8.0_144
export HADOOP_HOME=/opt/hadoop
# 注意这里配置heapsize和2.7版本的差别,2.7为HADOOP_HEAPSIZE
export HADOOP_HEAPSIZE_MAX=1024
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS”
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_PID_DIR=/tmp

修改/opt/hadoop/etc/hadoop/hdfs-site.xml内容如下:

<configuration>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>192.168.199.26:8020</value>
</property>
</configuration>

将环境变量增加到当前用户bashrc:

# ~/.bashrc
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:/opt/hadoop/bin

配置修改完成后将/opt/hadoop整个目录同步到所有datanode节点。

启动HDFS

格式化namenode

hdfs namenode -format ns1

启动namenode和datanode

# 注意3.0.0版本的差别,在2.7中启动脚本如下
hadoop-daemon.sh --config $HADOOP_HOME/etc/hadoop --script hdfs start namenode
hadoop-daemon.sh --config $HADOOP_HOME/etc/hadoop --script hdfs start datanode
# 新版本中已经重写了管理脚本,统一到hdfs命令中,启动方式如下:
hdfs --daemon start namenode
hdfs --daemon start datanode

启动成功之后即可通过namenode web ui观察到集群基本情况,2.7中默认web ui端口为50070,而3.0.0中修改为9870,

测试hdfs

启动完成之后可以通过hdfs命令测试服务是否可用:

hdfs dfs -mkdir hdfs://192.168.199.26:8020/t1/
dd if=/dev/urandom of=f1 bs=1M count=5000
hdfs dfs -put f1 hdfs://192.168.199.26:8020/t1/
hdfs dfs -rm -skipTrash hdfs://192.168.199.26:8020/t1/f1

这里使用的时候需要写完整的hdfs协议和namenode:port,我们可以修改一下配置文件,将defaultfs修改为hdfs协议,方便测试。同时此版本中默认并未启用纠删码,需要手工配置。 默认内置支持的policy有 RS-3-2-64k, RS-6-3-64k, RS-10-4-64k, RS-LEGACY-6-3-64k, XOR-2-1-64k,我这里只准备了少量节点,因此只对其中两种进行简单测试。

修改hdfs-site.xml 增加以下配置:

    <property>
<name>dfs.namenode.ec.policies.enabled</name>
<value>XOR-2-1-64k,RS-3-2-64k</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>192.168.199.26:8020</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

修改core-site.xml增加以下配置:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
<final>true</final>
</property>
</configuration>

重启namenode节点后,就可以使用纠删码了,纠删码可以针对目录进行设置,不同的目录设置不同的策略。

hdfs ec -setPolicy -policy XOR-2-1-64k -path /t1
hdfs ec -setPolicy -policy RS-3-2-64k -path /t2

我们准备了3个目录,t1和t2分别设置了不同的policy,t3不设置,往3个目录上传相同的5G大小的文件,每次上传前均清空hdfs中数据,并清空虚拟机和物理机缓存,统计的耗时和空间占用情况大致如下:

HDFS目录纠删码策略put耗时磁盘占用kb
t1XOR-2-1-64k1m15.559s7740364
t2RS-3-2-64k1m13.920s8600436
t3无(三副本)2m7.705s15480600

通过简单的测试对比,我们可以大致了解到在理想状态下纠删码技术比传统的三副本在写入速度上有提升,因其降低了对磁盘IO和带宽的消耗,同时占用的磁盘空间小于三副本方式。其中磁盘占用和不同的纠删码策略理论值基本吻合, 磁盘空间消耗倍数为 (校验块+数据块)/数据块

注意这里的测试仅限于对HADOOP3.0.0中纠删码有一个感性认识,测试方法有很多不严谨的地方,比如put数据的耗时并未考虑各种环境因素,仅仅是在一个相对理想的环境下进行简单测试。实际生产环境中情况非常复杂,需要权衡CPU带宽磁盘,甚至机架供电等各方面因素,如果需要获得一份可靠的性能对比数据则必须保障稳定运行足够长的时间,通过长期观察才能得出对生产有实际指导意义的信息。

参考文档:

[1] http://hadoop.apache.org/docs/r3.0.0-alpha4/hadoop-project-dist/hadoop-common/ClusterSetup.html

[2] http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html

k8s测试环境部署记录

· 阅读需 4 分钟

本文记录了在本地部署k8s测试环境的过程,部署脚本参考github上其他同学分享的脚本,在其基础上做了些小改动。

k8s 中的节点类型

master 负责管理其他节点的调度中心,master可以有备机replica做冗余

minion 由master管理,运行容器服务,1个集群中有N个minion节点

部署前准备

部署此测试环境参考了github上其他同学的分享,我fork的repo地址:https://github.com/5xops/k8s-deploy

首先按照github repo中的readme部分将k8s rpm包下载到本地,以准备离线部署。其次我在本地测试环境准备了6以上的虚拟机节点,其中3个节点用于部署etcd集群,2个节点用于部署k8s master,其余节点用于部署k8s minion。 所有虚拟机均运行在一台物理服务器上,管理虚拟机用了这个脚本(https://github.com/itxx00/vmm)。

首先创建好需要使用到的虚拟机节点:

vmm create etcd1
vmm create etcd2
vmm create etcd3
vmm create kubem1
vmm create kubem2
vmm create node1
vmm create node2

因部署k8s集群时要求所有节点都配置好主机名,因为默认创建出来的虚拟机没有修改hostname,需要使用另外一个脚本来配置hostname并配置好/etc/hosts,首先准备好初始化执行的脚本:

#!/bin/bash
# content of ~/.vmm/init.sh
cd /tmp
hostnm=$(cat hostname)
[[ -n $hostnm ]] || {
echo "err"
exit 1
}
echo $hostnm >/etc/hostname
hostname $hostnm
cat /tmp/hosts.tmp >/etc/hosts

接着执行初始化操作:

vminit etcd1
vminit etcd2
vminit etcd3
vminit kubem1
vminit kubem2
vminit node1
vminit node2

每个节点的hostname将被设置成虚拟机的名字,同时vminit脚本会把本地的ssh公钥和私钥都拷贝到虚拟机节点,这样初始化之后的节点可以通过相同的密钥互相免密登录,注意这样的操作仅适用于快速搭建测试环境,生产环境千万不要这么处理。打通免密ssh是因为后面部署k8s时会用到。

搭建etcd集群

在部署k8s集群之前,我们需要先部署一个独立的etcd集群,k8s会用到这个集群。这里我采用ansible playbook来部署,playbook已经分享到github上面,(https://github.com/itxx00/ansible-etcd),按照repo中的readme来准备好集群。

准备镜像仓库

将下载下来的k8s rpm包和docker镜像放到/data/k8s-deploy目录,其中rpms目录存放了需要用到的rpm包,images目录存放需要用到的docker镜像,因原脚本中不包含k8s dashboard(一套web ui管理界面),为了能够部署dashboard,对原脚本作了修改增加了部署dashboard的选项,部署dashboard需要一些额外的docker镜像和配置文件,这里先补充好docker镜像:

yum -y install docker
systemctl start docker
docker pull googlecontainer/kubernetes-dashboard-amd64:v1.6.1
docker pull googlecontainer/heapster-influxdb-amd64:v1.1.1
docker pull googlecontainer/heapster-grafana-amd64:v4.0.2
docker pull googlecontainer/heapster-amd64:v1.3.0
cd /data/k8s-deploly/images
docker save googlecontainer/kubernetes-dashboard-amd64 -o kubernetes-dashboard-amd64_v1.6.1.tar
docker save googlecontainer/heapster-influxdb-amd64 -o heapster-influxdb-amd64_v1.1.1.tar
docker save googlecontainer/heapster-grafana-amd64 -o heapster-grafana-amd64_v4.0.2.tar
docker save googlecontainer/heapster-amd64 -o heapster-amd64_v1.3.0.tar

原脚本中安装rpm包是在各节点下载好后本地安装的,我改进了一下采用yum方式安装,准备yum仓库:

createrepo /data/k8s-deploy/rpms
yum -y install nginx
cat >/etc/nginx/conf.d/k8srepo.conf <<EOF
server {
listen 8000;
server_name _;
root /data/k8s-deploy;
autoindex on;

location / {
autoindex on;
}
}
EOF
systemctl restart nginx

至此yum repo和docker镜像准备完成。

部署k8s集群

部署过程参考https://github.com/5xops/k8s-deploy/blob/master/README.md ,需要修改repo中的k8slocal.repo文件中ip地址为yum仓库对应的ip地址,部署master、minion和replica可参考master.sh,minon.sh, replica.sh的内容。 需要注意的是为了部署dashboard服务我们额外增加了dashboard相关的配置文件,具体增加的内容请参考这个commit: https://github.com/5xops/k8s-deploy/commit/1766a675d76edb32f310acd98d5c6ed50a356e5b

至此,k8s测试环境及搭建完成,后续我将使用这个k8s测试环境来部署其他服务,后面会慢慢分享。

nftables:nft man文档阅读笔记

· 阅读需 37 分钟

利用空闲时间学习了nftables的基础知识,其中官方的man page中包含了大量信息,在阅读过程中整理了一份带中文注释的笔记,以辅助加深记忆。

工具名称

nft --  包过滤规则管理工具

基本用法

nft [ -n | --numeric ] [ -s | --stateless ] [ [-I | --includepath] directory ] [ [-f | --file] filename | [-i | --interactive] | cmd ]

nft [ -h | --help ] [ -v | --version ]

工具描述

nftables 作为新一代的防火墙策略框架,旨在替代之前的各种防火墙工具诸如iptables/ebtables等,而且提供了类似tc的带宽限速能力。而nft则提供了nftables的命令行入口,是用户空间的管理工具。

选项说明

执行nft --help查看完整帮助信息

-h, --help

查看帮助信息.

-v, --version

查看版本号.

-n, --numeric

以数值方式展示数据,可重复使用,一个-n表示不解析域名,第二次不解析端口号,第三次不解析协议和uid/gid。

-s, --stateless

省略规则和有状态对象的状态信息

-N

将ip地址解析成域名,依赖dns解析。

-a, --handle

输出内容中展示规则handle信息

-I, --includepath directory

添加include文件搜索目录

-f, --file filename

从文件获取输入

-i, --interactive

从交互式cli获取输入

文件格式

语法规定

单行过长可用\换行连接; 多个命令写到同一行可用分号; 分隔; 注释使用井号#打头; 标识符用大小写字母打头,后面跟数字字母下划线正斜杠反斜杠以及点号; 用双引号引起来表示纯字符串。

文件引用

include "filename"

可由外部文件通过include导入到当前文件,用-I/--includepath指定导入文件所在目录,如果include后面接的是目录而非文件,则整个目录的文件将以字母顺序依次导入。

符号变量

define variable = expr

$variable

Symbolic variables can be defined using the define statement. Variable references are expressions and can be used initialize other variables. The scope of a definition is the current block and all blocks contained within. 变量使用define定义,变量引用属于表达式,可以用于初始化其他变量,变量的生效范围在当前block以及被包含的所有block内。

示例 1. 使用符号变量

define int_if1 = eth0
define int_if2 = eth1
define int_ifs = { $int_if1, $int_if2 }

filter input iif $int_ifs accept

地址族

根据处理的包的种类不同可以将其分为不同的地址族。不同的地址族在内核中包含有特定阶段的处理路径和hook点,当对应hook的规则存在时则会被nftables处理。具体类型如下:

ip

IPv4 地址族

ip6

IPv6 地址族

inet

Internet (IPv4/IPv6) 地址族

arp

ARP 地址族

bridge

Bridge 地址族

netdev

Netdev 地址族

所有nftables对象存在于特定的地址族namespace中,换言之所有identifier都含有一个特定的地址族,如果未指定则默认使用ip地址族

IPv4/IPv6/Inet address families

IPv4/IPv6/Inet 地址族用于处理 IPv4和IPv6包,其在network stack中在不同的包处理阶段一共包含了5个hook.

Table 1. IPv4/IPv6/Inet 地址类hook列表

Hook名称描述
prerouting所有进入到系统的包都会被prerouting hook进行处理. 它在routing流程之前就被发起,用于靠前阶段的包过滤或者更改影响routing的包属性.
input发往本地系统的包将被input hook处理.
forward被转发到其他主机的包会经由forward hook处理.
output由本地进程发送出去的包将被output hook处理.
postrouting所有离开系统的包都将被postrouting hook处理.

ARP address family

ARP地址族用于处理经由系统接收和发送的ARP包。一般在集群环境中对ARP包进行mangle处理以支持clustering。

Table 2. ARP address family hooks

Hook描述
input分发到本机的包会经过input hook.
output由本机发出的包会经过output hook.

Bridge address family

bridge地址族处理通过桥接设备的ethernet包。

Netdev address family

Netdev地址族处理从ingress过来的包。

Table 3. Netdev address family hooks

HookDescription
ingress所有进入系统的包都将被ingress hook处理。它在进入layer 3之前的阶段就开始处理。

Tables

{add | delete | list | flush} table [family] {table}

table是chain/set/stateful object的容器,table由其地址族和名字做标识。地址族必须属于ip, ip6, arp, bridge, netdev中的一种,inet地址族是一个虚拟地址族,同来创建同时包含IPv4和IPv6的table,如果没有指定地址族则默认使用ip地址族。

add

添加指定地址族,指定名称的table

delete

删除指定的table

list

列出指定table中的所有chain和rule

flush

清除指定table中的所有chain和rule

Chains

{add} chain [family] {table} {chain} {hook} {priority} {policy} {device}

{add | create | delete | list | flush} chain [family] {table} {chain}

{rename} chain [family] {table} {chain} {newname}

chain是rule的容器,他们存在于两种类型,基础链(base chain)和常规链(regular chain)。base chain是网络栈中数据包的入口点,regular chain则可用于jump的目标并对规则进行更好地组织。

add

在指定table中添加新的链,当hook和权重值被指定时,添加的chain为base chain,将在网络栈中hook相关联。

create

add命令类似,不同之处在于当创建的chain存在时会返回错误。

delete

删除指定的chain,被删除的chain不能有规则且不能是跳转目标chain。

rename

重命名chain

list

列出指定chain中的所有rule

flush

清除指定chain中所有rule

Rules

[add | insert] rule [family] {table} {chain} [position position] {statement...}

{delete} rule [family] {table} {chain} {handle handle}

Rules are constructed from two kinds of components according to a set of grammatical rules: expressions and statements.

add

Add a new rule described by the list of statements. The rule is appended to the given chain unless a position is specified, in which case the rule is appended to the rule given by the position.

insert

Similar to the add command, but the rule is prepended to the beginning of the chain or before the rule at the given position.

delete

Delete the specified rule.

Sets

{add} set family] {table} {set}{ {type} [flags] [timeout] [gc-interval] [elements] [size] [policy]}

{delete | list | flush} set [family] {table} {set}

{add | delete} element [family] {table} {set}{ {elements}}

Sets are elements containers of an user-defined data type, they are uniquely identified by an user-defined name and attached to tables. sets 是用户定义的数据类型的容器,具有用户定义的唯一标识,被应用到table上。

add

在指定的table中添加一个新的set

delete

删除指定set

list

查看set内的元素

flush

清空整个set

add element

往set中添加元素,多个使用逗号分隔

delete element

从set中删除元素,多个使用逗号分隔

Table 4. Set 参数

关键字描述类型
type元素的数据类型string: ipv4_addr, ipv6_addr, ether_addr, inet_proto, inet_service, mark
flagsset flagsstring: constant, interval, timeout
timeout元素在set中的存活时间string, 带单位的小数. 单位: d, h, m, s
gc-interval垃圾回收间隔, 仅当timeout或flag timeout设置时生效string, decimal followed by unit. Units are: d, h, m, s
elementsset中包含的元素set data type
sizeset可存放的最大元素个数unsigned integer (64 bit)
policyset policystring: performance [default], memory

Maps

{add} map [family] {table} {map}{ {type} [flags] [elements] [size] [policy]}

{delete | list | flush} map [family] {table} {map}

{add | delete} element [family] {table} {map}{ {elements}}

Maps store data based on some specific key used as input, they are uniquely identified by an user-defined name and attached to tables.

add

Add a new map in the specified table.

delete

Delete the specified map.

list

Display the elements in the specified map.

flush

Remove all elements from the specified map.

add element

Comma-separated list of elements to add into the specified map.

delete element

Comma-separated list of element keys to delete from the specified map.

Table 5. Map specifications

KeywordDescriptionType
typedata type of map elementsstring ':' string: ipv4_addr, ipv6_addr, ether_addr, inet_proto, inet_service, mark, counter, quota. Counter and quota can't be used as keys
flagsmap flagsstring: constant, interval
elementselements contained by the mapmap data type
sizemaximun number of elements in the mapunsigned integer (64 bit)
policymap policystring: performance [default], memory

Stateful objects

{add | delete | list | reset} type [family] {table} {object}

Stateful objects are attached to tables and are identified by an unique name. They group stateful information from rules, to reference them in rules the keywords "type name" are used e.g. "counter name".

add

Add a new stateful object in the specified table.

delete

Delete the specified object.

list

Display stateful information the object holds.

reset

List-and-reset stateful object.

Ct

ct {helper} {type} {type} {protocol} {protocol} [l3proto] [family]

Ct helper is used to define connection tracking helpers that can then be used in combination with the "ct helper set" statement. type and protocol are mandatory, l3proto is derived from the table family by default, i.e. in the inet table the kernel will try to load both the ipv4 and ipv6 helper backends, if they are supported by the kernel.

Table 6. conntrack helper specifications

KeywordDescriptionType
typename of helper typequoted string (e.g. "ftp")
protocollayer 4 protocol of the helperstring (e.g. tcp)
l3protolayer 3 protocol of the helperaddress family (e.g. ip)

示例 2. defining and assigning ftp helper

Unlike iptables, helper assignment needs to be performed after the conntrack lookup has completed, for example with the default 0 hook priority.

table inet myhelpers {
ct helper ftp-standard {
type "ftp" protocol tcp
}
chain prerouting {
type filter hook prerouting priority 0;
tcp dport 21 ct helper set "ftp-standard"
}
}

Counter

counter [packets bytes]

Table 7. Counter specifications

KeywordDescriptionType
packetsinitial count of packetsunsigned integer (64 bit)
bytesinitial count of bytesunsigned integer (64 bit)

Quota

quota [over | until] [used]

Table 8. Quota specifications

KeywordDescriptionType
quotaquota limit, used as the quota nameTwo arguments, unsigned interger (64 bit) and string: bytes, kbytes, mbytes. "over" and "until" go before these arguments
usedinitial value of used quotaTwo arguments, unsigned interger (64 bit) and string: bytes, kbytes, mbytes

Expressions

Expressions represent values, either constants like network addresses, port numbers etc. or data gathered from the packet during ruleset evaluation. Expressions can be combined using binary, logical, relational and other types of expressions to form complex or relational (match) expressions. They are also used as arguments to certain types of operations, like NAT, packet marking etc.

Each expression has a data type, which determines the size, parsing and representation of symbolic values and type compatibility with other expressions.

describe command

describe {expression}

The describe command shows information about the type of an expression and its data type.

示例 3. The describe command

$ nft describe tcp flags
payload expression, datatype tcp_flag (TCP flag) (basetype bitmask, integer), 8 bits

pre-defined symbolic constants:
fin 0x01
syn 0x02
rst 0x04
psh 0x08
ack 0x10
urg 0x20
ecn 0x40
cwr 0x80

Data types

Data types determine the size, parsing and representation of symbolic values and type compatibility of expressions. A number of global data types exist, in addition some expression types define further data types specific to the expression type. Most data types have a fixed size, some however may have a dynamic size, f.i. the string type.

Types may be derived from lower order types, f.i. the IPv4 address type is derived from the integer type, meaning an IPv4 address can also be specified as an integer value.

In certain contexts (set and map definitions) it is necessary to explicitly specify a data type. Each type has a name which is used for this.

Integer type

Table 9.

NameKeywordSizeBase type
Integerintegervariable-

The integer type is used for numeric values. It may be specified as decimal, hexadecimal or octal number. The integer type doesn't have a fixed size, its size is determined by the expression for which it is used.

Bitmask type

Table 10.

NameKeywordSizeBase type
Bitmaskbitmaskvariableinteger

The bitmask type (bitmask) is used for bitmasks.

String type

Table 11.

NameKeywordSizeBase type
Stringstringvariable-

The string type is used to for character strings. A string begins with an alphabetic character (a-zA-Z) followed by zero or more alphanumeric characters or the characters /, -, _ and .. In addition anything enclosed in double quotes (") is recognized as a string.

示例 4. String specification


# Interface name
filter input iifname eth0

# Weird interface name
filter input iifname "(eth0)"

Table 12.

NameKeywordSizeBase type
Link layer addresslladdrvariableinteger

The link layer address type is used for link layer addresses. Link layer addresses are specified as a variable amount of groups of two hexadecimal digits separated using colons (:).

示例 5. Link layer address specification


# Ethernet destination MAC address
filter input ether daddr 20:c9:d0:43:12:d9

IPv4 address type

Table 13.

NameKeywordSizeBase type
IPv4 addressipv4_addr32 bitinteger

The IPv4 address type is used for IPv4 addresses. Addresses are specified in either dotted decimal, dotted hexadecimal, dotted octal, decimal, hexadecimal, octal notation or as a host name. A host name will be resolved using the standard system resolver.

示例 6. IPv4 address specification

# dotted decimal notation
filter output ip daddr 127.0.0.1

# host name
filter output ip daddr localhost

IPv6 address type

Table 14.

NameKeywordSizeBase type
IPv6 addressipv6_addr128 bitinteger

The IPv6 address type is used for IPv6 addresses. FIXME

示例 7. IPv6 address specification

# abbreviated loopback address
filter output ip6 daddr ::1

Boolean type

Table 15.

NameKeywordSizeBase type
Booleanboolean1 bitinteger

The boolean type is a syntactical helper type in user space. It's use is in the right-hand side of a (typically implicit) relational expression to change the expression on the left-hand side into a boolean check (usually for existence).

The following keywords will automatically resolve into a boolean type with given value:

Table 16.

KeywordValue
exists1
missing0

示例 8. Boolean specification

The following expressions support a boolean comparison:

Table 17.

ExpressionBehaviour
fibCheck route existence.
exthdrCheck IPv6 extension header existence.
tcp optionCheck TCP option header existence.
# match if route exists
filter input fib daddr . iif oif exists

# match only non-fragmented packets in IPv6 traffic
filter input exthdr frag missing

# match if TCP timestamp option is present
filter input tcp option timestamp exists

ICMP Type type

Table 18.

NameKeywordSizeBase type
ICMP Typeicmp_type8 bitinteger

The ICMP Type type is used to conveniently specify the ICMP header's type field.

The following keywords may be used when specifying the ICMP type:

Table 19.

KeywordValue
echo-reply0
destination-unreachable3
source-quench4
redirect5
echo-request8
router-advertisement9
router-solicitation10
time-exceeded11
parameter-problem12
timestamp-request13
timestamp-reply14
info-request15
info-reply16
address-mask-request17
address-mask-reply18

示例 9. ICMP Type specification


# match ping packets
filter output icmp type { echo-request, echo-reply }

ICMPv6 Type type

Table 20.

NameKeywordSizeBase type
ICMPv6 Typeicmpv6_type8 bitinteger

The ICMPv6 Type type is used to conveniently specify the ICMPv6 header's type field.

The following keywords may be used when specifying the ICMPv6 type:

Table 21.

KeywordValue
destination-unreachable1
packet-too-big2
time-exceeded3
parameter-problem4
echo-request128
echo-reply129
mld-listener-query130
mld-listener-report131
mld-listener-done132
mld-listener-reduction132
nd-router-solicit133
nd-router-advert134
nd-neighbor-solicit135
nd-neighbor-advert136
nd-redirect137
router-renumbering138
ind-neighbor-solicit141
ind-neighbor-advert142
mld2-listener-report143

示例 10. ICMPv6 Type specification


# match ICMPv6 ping packets
filter output icmpv6 type { echo-request, echo-reply }

Primary expressions

The lowest order expression is a primary expression, representing either a constant or a single datum from a packet's payload, meta data or a stateful module.

Meta expressions

meta {length | nfproto | l4proto | protocol | priority}

[meta] {mark | iif | iifname | iiftype | oif | oifname | oiftype} [meta] {skuid | skgid | nftrace | rtclassid | ibriport | obriport | pkttype | cpu | iifgroup | oifgroup | cgroup | random}

A meta expression refers to meta data associated with a packet.

There are two types of meta expressions: unqualified and qualified meta expressions. Qualified meta expressions require the meta keyword before the meta key, unqualified meta expressions can be specified by using the meta key directly or as qualified meta expressions.

Table 22. Meta expression types

KeywordDescriptionType
lengthLength of the packet in bytesinteger (32 bit)
protocolEthertype protocol valueether_type
priorityTC packet prioritytc_handle
markPacket markmark
iifInput interface indexiface_index
iifnameInput interface namestring
iiftypeInput interface typeiface_type
oifOutput interface indexiface_index
oifnameOutput interface namestring
oiftypeOutput interface hardware typeiface_type
skuidUID associated with originating socketuid
skgidGID associated with originating socketgid
rtclassidRouting realmrealm
ibriportInput bridge interface namestring
obriportOutput bridge interface namestring
pkttypepacket typepkt_type
cpucpu number processing the packetinteger (32 bits)
iifgroupincoming device groupdevgroup
oifgroupoutgoing device groupdevgroup
cgroupcontrol group idinteger (32 bits)
randompseudo-random numberinteger (32 bits)

Table 23. Meta expression specific types

TypeDescription
iface_indexInterface index (32 bit number). Can be specified numerically or as name of an existing interface.
ifnameInterface name (16 byte string). Does not have to exist.
iface_typeInterface type (16 bit number).
uidUser ID (32 bit number). Can be specified numerically or as user name.
gidGroup ID (32 bit number). Can be specified numerically or as group name.
realmRouting Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.
devgroup_typeDevice group (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/group.
pkt_typePacket type: Unicast (addressed to local host), Broadcast (to all), Multicast (to group).

示例 11. Using meta expressions


# qualified meta expression
filter output meta oif eth0

# unqualified meta expression
filter output oif eth0

fib expressions

fib {saddr | daddr [mark | iif | oif]]} {oif | oifname | type}

A fib expression queries the fib (forwarding information base) to obtain information such as the output interface index a particular address would use. The input is a tuple of elements that is used as input to the fib lookup functions.

Table 24. fib expression specific types

KeywordDescriptionType
oifOutput interface indexinteger (32 bit)
oifnameOutput interface namestring
typeAddress typefib_addrtype

示例 12. Using fib expressions


# drop packets without a reverse path
filter prerouting fib saddr . iif oif missing drop

# drop packets to address not configured on ininterface
filter prerouting fib daddr . iif type != { local, broadcast, multicast } drop

# perform lookup in a specific 'blackhole' table (0xdead, needs ip appropriate ip rule)
filter prerouting meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : jump prohibited, unreachable : drop }

Routing expressions

rt {classid | nexthop}

A routing expression refers to routing data associated with a packet.

Table 25. Routing expression types

KeywordDescriptionType
classidRouting realmrealm
nexthopRouting nexthopipv4_addr/ipv6_addr

Table 26. Routing expression specific types

TypeDescription
realmRouting Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.

示例 13. Using routing expressions


# IP family independent rt expression
filter output rt classid 10

# IP family dependent rt expressions
ip filter output rt nexthop 192.168.0.1
ip6 filter output rt nexthop fd00::1
inet filter meta nfproto ipv4 output rt nexthop 192.168.0.1
inet filter meta nfproto ipv6 output rt nexthop fd00::1

Payload expressions

Payload expressions refer to data from the packet's payload.

Ethernet header expression

ether [ethernet header field]

Table 27. Ethernet header expression types

KeywordDescriptionType
daddrDestination MAC addressether_addr
saddrSource MAC addressether_addr
typeEtherTypeether_type

VLAN header expression

vlan [VLAN header field]

Table 28. VLAN header expression

KeywordDescriptionType
idVLAN ID (VID)integer (12 bit)
cfiCanonical Format Indicatorinteger (1 bit)
pcpPriority code pointinteger (3 bit)
typeEtherTypeether_type

ARP header expression

arp [ARP header field]

Table 29. ARP header expression

KeywordDescriptionType
htypeARP hardware typeinteger (16 bit)
ptypeEtherTypeether_type
hlenHardware address leninteger (8 bit)
plenProtocol address leninteger (8 bit)
operationOperationarp_op

IPv4 header expression

ip [IPv4 header field]

Table 30. IPv4 header expression

KeywordDescriptionType
versionIP header version (4)integer (4 bit)
hdrlengthIP header length including optionsinteger (4 bit) FIXME scaling
dscpDifferentiated Services Code Pointdscp
ecnExplicit Congestion Notificationecn
lengthTotal packet lengthinteger (16 bit)
idIP IDinteger (16 bit)
frag-offFragment offsetinteger (16 bit)
ttlTime to liveinteger (8 bit)
protocolUpper layer protocolinet_proto
checksumIP header checksuminteger (16 bit)
saddrSource addressipv4_addr
daddrDestination addressipv4_addr

ICMP header expression

icmp [ICMP header field]

Table 31. ICMP header expression

KeywordDescriptionType
typeICMP type fieldicmp_type
codeICMP code fieldinteger (8 bit)
checksumICMP checksum fieldinteger (16 bit)
idID of echo request/responseinteger (16 bit)
sequencesequence number of echo request/responseinteger (16 bit)
gatewaygateway of redirectsinteger (32 bit)
mtuMTU of path MTU discoveryinteger (16 bit)

IPv6 header expression

ip6 [IPv6 header field]

Table 32. IPv6 header expression

KeywordDescriptionType
versionIP header version (6)integer (4 bit)
dscpDifferentiated Services Code Pointdscp
ecnExplicit Congestion Notificationecn
flowlabelFlow labelinteger (20 bit)
lengthPayload lengthinteger (16 bit)
nexthdrNexthdr protocolinet_proto
hoplimitHop limitinteger (8 bit)
saddrSource addressipv6_addr
daddrDestination addressipv6_addr

ICMPv6 header expression

icmpv6 [ICMPv6 header field]

Table 33. ICMPv6 header expression

KeywordDescriptionType
typeICMPv6 type fieldicmpv6_type
codeICMPv6 code fieldinteger (8 bit)
checksumICMPv6 checksum fieldinteger (16 bit)
parameter-problempointer to probleminteger (32 bit)
packet-too-bigoversized MTUinteger (32 bit)
idID of echo request/responseinteger (16 bit)
sequencesequence number of echo request/responseinteger (16 bit)
max-delaymaximum response delay of MLD queriesinteger (16 bit)

TCP header expression

tcp [TCP header field]

Table 34. TCP header expression

KeywordDescriptionType
sportSource portinet_service
dportDestination portinet_service
sequenceSequence numberinteger (32 bit)
ackseqAcknowledgement numberinteger (32 bit)
doffData offsetinteger (4 bit) FIXME scaling
reservedReserved areainteger (4 bit)
flagsTCP flagstcp_flag
windowWindowinteger (16 bit)
checksumChecksuminteger (16 bit)
urgptrUrgent pointerinteger (16 bit)

UDP header expression

udp [UDP header field]

Table 35. UDP header expression

KeywordDescriptionType
sportSource portinet_service
dportDestination portinet_service
lengthTotal packet lengthinteger (16 bit)
checksumChecksuminteger (16 bit)

UDP-Lite header expression

udplite [UDP-Lite header field]

Table 36. UDP-Lite header expression

KeywordDescriptionType
sportSource portinet_service
dportDestination portinet_service
checksumChecksuminteger (16 bit)

SCTP header expression

sctp [SCTP header field]

Table 37. SCTP header expression

KeywordDescriptionType
sportSource portinet_service
dportDestination portinet_service
vtagVerfication Taginteger (32 bit)
checksumChecksuminteger (32 bit)

DCCP header expression

dccp [DCCP header field]

Table 38. DCCP header expression

KeywordDescriptionType
sportSource portinet_service
dportDestination portinet_service

Authentication header expression

ah [AH header field]

Table 39. AH header expression

KeywordDescriptionType
nexthdrNext header protocolinet_proto
hdrlengthAH Header lengthinteger (8 bit)
reservedReserved areainteger (16 bit)
spiSecurity Parameter Indexinteger (32 bit)
sequenceSequence numberinteger (32 bit)

Encrypted security payload header expression

esp [ESP header field]

Table 40. ESP header expression

KeywordDescriptionType
spiSecurity Parameter Indexinteger (32 bit)
sequenceSequence numberinteger (32 bit)

IPcomp header expression

comp [IPComp header field]

Table 41. IPComp header expression

KeywordDescriptionType
nexthdrNext header protocolinet_proto
flagsFlagsbitmask
cpiCompression Parameter Indexinteger (16 bit)

Extension header expressions

Extension header expressions refer to data from variable-sized protocol headers, such as IPv6 extension headers and TCPs options.

nftables currently supports matching (finding) a given ipv6 extension header or TCP option.

hbh {nexthdr | hdrlength}

frag {nexthdr | frag-off | more-fragments | id}

rt {nexthdr | hdrlength | type | seg-left}

dst {nexthdr | hdrlength}

mh {nexthdr | hdrlength | checksum | type}

tcp option {eol | noop | maxseg | window | sack-permitted | sack | sack0 | sack1 | sack2 | sack3 | timestamp} [_tcp_option_field_]

The following syntaxes are valid only in a relational expression with boolean type on right-hand side for checking header existence only:

exthdr {hbh | frag | rt | dst | mh}

tcp option {eol | noop | maxseg | window | sack-permitted | sack | sack0 | sack1 | sack2 | sack3 | timestamp}

Table 42. IPv6 extension headers

KeywordDescription
hbhHop by Hop
rtRouting Header
fragFragmentation header
dstdst options
mhMobility Header

Table 43. TCP Options

KeywordDescriptionTCP option fields
eolEnd of option listkind
noop1 Byte TCP No-op optionskind
maxsegTCP Maximum Segment Sizekind, length, size
windowTCP Window Scalingkind, length, count
sack-permittedTCP SACK permittedkind, length
sackTCP Selective Acknowledgement (alias of block 0)kind, length, left, right
sack0TCP Selective Acknowledgement (block 0)kind, length, left, right
sack1TCP Selective Acknowledgement (block 1)kind, length, left, right
sack2TCP Selective Acknowledgement (block 2)kind, length, left, right
sack3TCP Selective Acknowledgement (block 3)kind, length, left, right
timestampTCP Timestampskind, length, tsval, tsecr

示例 14. finding TCP options

filter input tcp option sack-permitted kind 1 counter

示例 15. matching IPv6 exthdr

ip6 filter input frag more-fragments 1 counter

Conntrack expressions

Conntrack expressions refer to meta data of the connection tracking entry associated with a packet.

There are three types of conntrack expressions. Some conntrack expressions require the flow direction before the conntrack key, others must be used directly because they are direction agnostic. The packets, bytes and avgpkt keywords can be used with or without a direction. If the direction is omitted, the sum of the original and the reply direction is returned. The same is true for the zone, if a direction is given, the zone is only matched if the zone id is tied to the given direction.

ct {state | direction | status | mark | expiration | helper | label | l3proto | protocol | bytes | packets | avgpkt | zone}

ct {original | reply} {l3proto | protocol | saddr | daddr | proto-src | proto-dst | bytes | packets | avgpkt | zone}

Table 44. Conntrack expressions

KeywordDescriptionType
stateState of the connectionct_state
directionDirection of the packet relative to the connectionct_dir
statusStatus of the connectionct_status
markConnection markmark
expirationConnection expiration timetime
helperHelper associated with the connectionstring
labelConnection tracking label bit or symbolic name defined in connlabel.conf in the nftables include pathct_label
l3protoLayer 3 protocol of the connectionnf_proto
saddrSource address of the connection for the given directionipv4_addr/ipv6_addr
daddrDestination address of the connection for the given directionipv4_addr/ipv6_addr
protocolLayer 4 protocol of the connection for the given directioninet_proto
proto-srcLayer 4 protocol source for the given directioninteger (16 bit)
proto-dstLayer 4 protocol destination for the given directioninteger (16 bit)
packetspacket count seen in the given direction or sum of original and replyinteger (64 bit)
bytesbytecount seen, see description for packets keywordinteger (64 bit)
avgpktaverage bytes per packet, see description for packets keywordinteger (64 bit)
zoneconntrack zoneinteger (16 bit)

Statements

Statements represent actions to be performed. They can alter control flow (return, jump to a different chain, accept or drop the packet) or can perform actions, such as logging, rejecting a packet, etc.

Statements exist in two kinds. Terminal statements unconditionally terminate evaluation of the current rule, non-terminal statements either only conditionally or never terminate evaluation of the current rule, in other words, they are passive from the ruleset evaluation perspective. There can be an arbitrary amount of non-terminal statements in a rule, but only a single terminal statement as the final statement.

Verdict statement

The verdict statement alters control flow in the ruleset and issues policy decisions for packets.

{accept | drop | queue | continue | return}

{jump | goto} {chain}

accept

Terminate ruleset evaluation and accept the packet.

drop

Terminate ruleset evaluation and drop the packet.

queue

Terminate ruleset evaluation and queue the packet to userspace.

continue

Continue ruleset evaluation with the next rule. FIXME

return

Return from the current chain and continue evaluation at the next rule in the last chain. If issued in a base chain, it is equivalent to accept.

jump chain

Continue evaluation at the first rule in chain. The current position in the ruleset is pushed to a call stack and evaluation will continue there when the new chain is entirely evaluated of a return verdict is issued.

goto chain

Similar to jump, but the current position is not pushed to the call stack, meaning that after the new chain evaluation will continue at the last chain instead of the one containing the goto statement.

示例 16. Verdict statements

# process packets from eth0 and the internal network in from_lan
# chain, drop all packets from eth0 with different source addresses.

filter input iif eth0 ip saddr 192.168.0.0/24 jump from_lan
filter input iif eth0 drop

Payload statement

The payload statement alters packet content. It can be used for example to set ip DSCP (differv) header field or ipv6 flow labels.

示例 17. route some packets instead of bridging

# redirect tcp:http from 192.160.0.0/16 to local machine for routing instead of bridging
# assumes 00:11:22:33:44:55 is local MAC address.
bridge input meta iif eth0 ip saddr 192.168.0.0/16 tcp dport 80 meta pkttype set unicast ether daddr set 00:11:22:33:44:55

示例 18. Set IPv4 DSCP header field

ip forward ip dscp set 42

Log statement

log [prefix _quoted_string_] [level _syslog-level_] [flags log-flags]

log [group _nflog_group_] [prefix _quoted_string_] [queue-threshold value] [snaplen size]

The log statement enables logging of matching packets. When this statement is used from a rule, the Linux kernel will print some information on all matching packets, such as header fields, via the kernel log (where it can be read with dmesg(1) or read in the syslog). If the group number is specified, the Linux kernel will pass the packet to nfnetlink_log which will multicast the packet through a netlink socket to the specified multicast group. One or more userspace processes may subscribe to the group to receive the packets, see libnetfilter_queue documentation for details. This is a non-terminating statement, so the rule evaluation continues after the packet is logged.

Table 45. log statement options

KeywordDescriptionType
prefixLog message prefixquoted string
syslog-levelSyslog level of loggingstring: emerg, alert, crit, err, warn [default], notice, info, debug
groupNFLOG group to send messages tounsigned integer (16 bit)
snaplenLength of packet payload to include in netlink messageunsigned integer (32 bit)
queue-thresholdNumber of packets to queue inside the kernel before sending them to userspaceunsigned integer (32 bit)

Table 46. log-flags

FlagDescription
tcp sequenceLog TCP sequence numbers.
tcp optionsLog options from the TCP packet header.
ip optionsLog options from the IP/IPv6 packet header.
skuidLog the userid of the process which generated the packet.
etherDecode MAC addresses and protocol.
allEnable all log flags listed above.

示例 19. Using log statement

# log the UID which generated the packet and ip options
ip filter output log flags skuid flags ip options

# log the tcp sequence numbers and tcp options from the TCP packet
ip filter output log flags tcp sequence,options

# enable all supported log flags
ip6 filter output log flags all

Reject statement

reject [with] {icmp | icmp6 | icmpx} [type] {icmp_type | icmp6_type | icmpx_type}

reject [with] {tcp} {reset}

A reject statement is used to send back an error packet in response to the matched packet otherwise it is equivalent to drop so it is a terminating statement, ending rule traversal. This statement is only valid in the input, forward and output chains, and user-defined chains which are only called from those chains.

Table 47. reject statement type (ip)

ValueDescriptionType
icmp_typeICMP type response to be sent to the hostnet-unreachable, host-unreachable, prot-unreachable, port-unreachable [default], net-prohibited, host-prohibited, admin-prohibited

Table 48. reject statement type (ip6)

ValueDescriptionType
icmp6_typeICMPv6 type response to be sent to the hostno-route, admin-prohibited, addr-unreachable, port-unreachable [default], policy-fail, reject-route

Table 49. reject statement type (inet)

ValueDescriptionType
icmpx_typeICMPvXtype abstraction response to be sent to the host, this is a set of types that overlap in IPv4 and IPv6 to be used from the inet family.port-unreachable [default], admin-prohibited, no-route, host-unreachable

Counter statement

A counter statement sets the hit count of packets along with the number of bytes.

counter {packets _number_ } {bytes _number_ }

Conntrack statement

The conntrack statement can be used to set the conntrack mark and conntrack labels.

ct {mark | eventmask | label | zone} [set] value

The ct statement sets meta data associated with a connection. The zone id has to be assigned before a conntrack lookup takes place, i.e. this has to be done in prerouting and possibly output (if locally generated packets need to be placed in a distinct zone), with a hook priority of -300.

Table 50. Conntrack statement types

KeywordDescriptionValue
eventmaskconntrack event bitsbitmask, integer (32 bit)
helpername of ct helper object to assign to the connectionquoted string
markConnection tracking markmark
labelConnection tracking labellabel
zoneconntrack zoneinteger (16 bit)

示例 20. save packet nfmark in conntrack

ct mark set meta mark

示例 21. set zone mapped via interface

table inet raw {
chain prerouting {
type filter hook prerouting priority -300;
ct zone set iif map { "eth1" : 1, "veth1" : 2 }
}
chain output {
type filter hook output priority -300;
ct zone set oif map { "eth1" : 1, "veth1" : 2 }
}
}

示例 22. restrict events reported by ctnetlink

ct eventmask set new or related or destroy

Meta statement

A meta statement sets the value of a meta expression. The existing meta fields are: priority, mark, pkttype, nftrace.

meta {mark | priority | pkttype | nftrace} [set] value

A meta statement sets meta data associated with a packet.

Table 51. Meta statement types

KeywordDescriptionValue
priorityTC packet prioritytc_handle
markPacket markmark
pkttypepacket typepkt_type
nftraceruleset packet tracing on/off. Use monitor trace command to watch traces0, 1

Limit statement

limit [rate] [over]_packet_number_ [/] {second | minute | hour | day} [burst _packet_number_ packets]

limit [rate] [over]_byte_number_ {bytes | kbytes | mbytes} [/] {second | minute | hour | day | week} [burst _byte_number_ bytes]

A limit statement matches at a limited rate using a token bucket filter. A rule using this statement will match until this limit is reached. It can be used in combination with the log statement to give limited logging. The over keyword, that is optional, makes it match over the specified rate.

Table 52. limit statement values

ValueDescriptionType
packet_numberNumber of packetsunsigned integer (32 bit)
byte_numberNumber of bytesunsigned integer (32 bit)

NAT statements

snat [to _address_ [:port]] [persistent, random, fully-random]

snat [to _address_ - _address_ [:_port_ - _port_]] [persistent, random, fully-random]

dnat [to _address_ [:_port_]] [persistent, random, fully-random]

dnat [to _address_ [:_port_ - _port_]] [persistent, random, fully-random]

masquerade [to [:_port_]] [persistent, random, fully-random]

masquerade [to [:_port_ - _port_]] [persistent, random, fully-random]

redirect [to [:_port_]] [persistent, random, fully-random]

redirect [to [:_port_ - _port_]] [persistent, random, fully-random]

The nat statements are only valid from nat chain types.

The snat and masquerade statements specify that the source address of the packet should be modified. While snat is only valid in the postrouting and input chains, masquerade makes sense only in postrouting. The dnat and redirect statements are only valid in the prerouting and output chains, they specify that the destination address of the packet should be modified. You can use non-base chains which are called from base chains of nat chain type too. All future packets in this connection will also be mangled, and rules should cease being examined.

The masquerade statement is a special form of snat which always uses the outgoing interface's IP address to translate to. It is particularly useful on gateways with dynamic (public) IP addresses.

The redirect statement is a special form of dnat which always translates the destination address to the local host's one. It comes in handy if one only wants to alter the destination port of incoming traffic on different interfaces.

Note that all nat statements require both prerouting and postrouting base chains to be present since otherwise packets on the return path won't be seen by netfilter and therefore no reverse translation will take place.

Table 53. NAT statement values

ExpressionDescriptionType
addressSpecifies that the source/destination address of the packet should be modified. You may specify a mapping to relate a list of tuples composed of arbitrary expression key with address value.ipv4_addr, ipv6_addr, eg. abcd::1234, or you can use a mapping, eg. meta mark map { 10 : 192.168.1.2, 20 : 192.168.1.3 }
portSpecifies that the source/destination address of the packet should be modified.port number (16 bits)

Table 54. NAT statement flags

FlagDescription
persistentGives a client the same source-/destination-address for each connection.
randomIf used then port mapping will be randomized using a random seeded MD5 hash mix using source and destination address and destination port.
fully-randomIf used then port mapping is generated based on a 32-bit pseudo-random algorithm.

示例 23. Using NAT statements

# create a suitable table/chain setup for all further examples
add table nat
add chain nat prerouting { type nat hook prerouting priority 0; }
add chain nat postrouting { type nat hook postrouting priority 100; }

# translate source addresses of all packets leaving via eth0 to address 1.2.3.4
add rule nat postrouting oif eth0 snat to 1.2.3.4

# redirect all traffic entering via eth0 to destination address 192.168.1.120
add rule nat prerouting iif eth0 dnat to 192.168.1.120

# translate source addresses of all packets leaving via eth0 to whatever
# locally generated packets would use as source to reach the same destination
add rule nat postrouting oif eth0 masquerade

# redirect incoming TCP traffic for port 22 to port 2222
add rule nat prerouting tcp dport 22 redirect to :2222

Queue statement

This statement passes the packet to userspace using the nfnetlink_queue handler. The packet is put into the queue identified by its 16-bit queue number. Userspace can inspect and modify the packet if desired. Userspace must then drop or reinject the packet into the kernel. See libnetfilter_queue documentation for details.

queue [num _queue_number_] [bypass]

queue [num _queue_number_from_ - _queue_number_to_] [bypass,fanout]

Table 55. queue statement values

ValueDescriptionType
queue_numberSets queue number, default is 0.unsigned integer (16 bit)
queue_number_fromSets initial queue in the range, if fanout is used.unsigned integer (16 bit)
queue_number_toSets closing queue in the range, if fanout is used.unsigned integer (16 bit)

Table 56. queue statement flags

FlagDescription
bypassLet packets go through if userspace application cannot back off. Before using this flag, read libnetfilter_queue documentation for performance tuning recomendations.
fanoutDistribute packets between several queues.

Additional commands

These are some additional commands included in nft.

export

Export your current ruleset in XML or JSON format to stdout.

Examples:

[...]
% nft export json
[...]

monitor

The monitor command allows you to listen to Netlink events produced by the nf_tables subsystem, related to creation and deletion of objects. When they ocurr, nft will print to stdout the monitored events in either XML, JSON or native nft format.

To filter events related to a concrete object, use one of the keywords 'tables', 'chains', 'sets', 'rules', 'elements'.

To filter events related to a concrete action, use keyword 'new' or 'destroy'.

Hit ^C to finish the monitor operation.

示例 24. Listen to all events, report in native nft format

% nft monitor

示例 25. Listen to added tables, report in XML format

% nft monitor new tables xml

示例 26. Listen to deleted rules, report in JSON format

% nft monitor destroy rules json

示例 27. Listen to both new and destroyed chains, in native nft format

% nft monitor chains

Error reporting

When an error is detected, nft shows the line(s) containing the error, the position of the erroneous parts in the input stream and marks up the erroneous parts using carrets (^). If the error results from the combination of two expressions or statements, the part imposing the constraints which are violated is marked using tildes (~).

For errors returned by the kernel, nft can't detect which parts of the input caused the error and the entire command is marked.

示例 28. Error caused by single incorrect expression

<cmdline>:1:19-22: Error: Interface does not exist
filter output oif eth0
^^^^

示例 29. Error caused by invalid combination of two expressions

<cmdline>:1:28-36: Error: Right hand side of relational expression (==) must be constant
filter output tcp dport == tcp dport
~~ ^^^^^^^^^

示例 30. Error returned by the kernel

<cmdline>:0:0-23: Error: Could not process rule: Operation not permitted
filter output oif wlan0
^^^^^^^^^^^^^^^^^^^^^^^

退出状态码

On success, nft exits with a status of 0. Unspecified errors cause it to exit with a status of 1, memory allocation errors with a status of 2, unable to open Netlink socket with 3.

See Also

iptables(8), ip6tables(8), arptables(8), ebtables(8), ip(8), tc(8)

There is an official wiki at: wiki.nftables.org

Authors

nftables was written by Patrick McHardy and Pablo Neira Ayuso, among many other contributors from the Netfilter community.

nftables is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This documentation is licenced under the terms of the Creative Commons Attribution-ShareAlike 4.0 license, CC BY-SA 4.0.