thrift compact protocol
Thrift compact protocol encoding
。需要说明的是在fbthrift中已经不是完全兼容原始的apache thrift的编码格式了,做了一些扩展。
- Integer
先zigzag变换,再转换为varint (UNSIGNED LEB128) 我们简单说下编码的流程,假设待编码的值为3600 * 24 * 1000 = 86400000
- zigzag转换
所谓zigzag转换主要是为了让正数和负数的有符号数都映射到映射到无符号数空间,之所以这么做主要是为了避免后面varint对于负数编码效率过低的问题,比如想要将int32类型的-1经过varint编码结果是ff ff ff ff 0f
,这对于经常使用的-1这种数而言比较浪费。所以zigzag转换做的事情就是将有符号数0, 1, -1, 2, -2...
依次映射为无符号数0, 1, 2, 3, 4...
constexpr inline uint32_t i32ToZigzag(const int32_t n) {
return (static_cast<uint32_t>(n) << 1) ^ static_cast<uint32_t>(n >> 31);
int32_t zigzagToI32(uint32_t n) {
return (n & 1) ? ~(n >> 1) : (n >> 1);
- 然后把zigzag转换的结果进行varint编码
- 转换为二进制形式的大端的无符号数
- 分成7个bits一组,最高位的一组不组7位的补0
- 然后在每组7bit左侧加0或者1 (第一组加0 其余都加1 为了满足varint性质)
- 进行大小端转换(也是为了满足varint性质),此时所得结果就是varint编码
50399 = 1100010011011111 (二进制大端)
= 11 0001001 1011111 (分成7bits一组)
= 0000011 0001001 1011111 (第一组不够7个bits 补0)
= 00000011 10001001 11011111 (第一组最前面补0 其余补1)
= 11011111 10001001 00000011 (大小端转换)
= 0xDF 0x89 0x03
172800000 = 1010010011001011100000000000
= 1010010 0110010 1110000 0000000 (分成7bits一组 已经是7的倍数 不需要补0了)
= 01010010 10110010 11110000 10000000 (第一组最前面补0 其余补1)
= 10000000 11110000 10110010 01010010 (大小端转换)
= 0x80 0xF0 0xB2 0x52 (最终结果)
具体可以参照LEB128 - Wikipedia
- Enum
- Binary
Binary protocol, binary data, 1+ bytes:
| byte length | bytes |
byte length
- String
先转换为UTF8,然后按Binary编码,不包含Null终止符(所以我们代码中是不推荐使用string的 转换为UTF8没必要)
- Double
Values of type double are first converted to an int64 according to the IEEE 754 floating-point “double format” bit layout.
摘自thrift/ at master · apache/thrift (,存疑的是为什么要先转换为int64,再按浮点数编码
- Message
Compact protocol Message (4+ bytes):
|pppppppp|mmmvvvvv| seq id | name length | name |
- `pppppppp` is the protocol id, fixed to `1000 0010`, 0x82.
- `mmm` is the message type, an unsigned 3 bit integer.
- `vvvvv` is the version, an unsigned 5 bit integer, fixed to `00001`.
- `seq id` is the sequence id, a signed 32 bit integer encoded as a var int.
- `name length` is the byte length of the name field, a signed 32 bit integer encoded as a var int (must be >= 0).
- `name` is the method name to invoke, a UTF-8 encoded string.
Message types are encoded with the following values:
- Call: 1
- Reply: 2
- Exception: 3
- Oneway: 4
- Struct
结构体中包含0个或多个字段,结构体结束时会编码一个终止符(stop-field, 编码为0x00),每个字段使用其对应的编码格式。另外union和exception的编码方式和struct相同。结构体的BNF如下:
struct ::= ( field-header field-value )* stop-field
field-header ::= field-type field-id
struct TestStruct {
1: i64 xxx,
2: binary yyy,
3: optional list<bool> zzz,
Compact protocol field header (short form) and field value:
|ddddtttt| field value |
Compact protocol field header (1 to 3 bytes, long form) and field value:
|0000tttt| field id | field value |
Compact protocol stop field:
- `dddd` is the field id delta, an unsigned 4 bits integer, strictly positive.
- `tttt` is field-type id, an unsigned 4 bit integer.
- `field id` the field id, a signed 16 bit integer encoded as zigzag int.
- `field-value` the encoded field value.
里面对于每个字段编码时候可能有两个方式,主要原因在于编码时候不要求每个字段按照FieldId的顺序(当然生成的中间代码可以按FieldId顺序进行编码,但不是必须的),所以为了支持乱序编码各个字段,引入了两个格式。如果当前编码的字段FieldId - 上一个编码的FieldId < 16
,就可以使用short form,所以在short form中对于FieldId是使用了delta encoding,而如果不满足这个条件的时候,就可以使用long form。
// BOOLEAN_TRUE, encoded as 1
// BOOLEAN_FALSE, encoded as 2
// I8, encoded as 3
// I16, encoded as 4
// I32, encoded as 5
// I64, encoded as 6
// DOUBLE, encoded as 7
// BINARY, used for binary and string fields, encoded as 8
// LIST, encoded as 9
// SET, encoded as 10
// MAP, encoded as 11
// STRUCT, used for both structs and union fields, encoded as 12
// FLOAT, encoded as 13 (只有fbthrift支持)
- List and Set
Compact protocol list header (1 byte, short form) and elements:
|sssstttt| elements |
Compact protocol list header (2+ bytes, long form) and elements:
|1111tttt| size | elements |
- `ssss` is the size, 4 bit unsigned int, values `0` - `14` (只能用在[0-14]大小 15被long form使用了)
- `tttt` is the element-type, a 4 bit unsigned int (类型和上面FieldType基本一样 除了BOOL统一按2)
- `size` is the size, a var int (int32), positive values `15` or higher
- `elements` are the encoded elements
- Map
Compact protocol map header (1 byte, empty map):
Compact protocol map header (2+ bytes, non empty map) and key value pairs:
| size |kkkkvvvv| key value pairs |
- `size` is the size, a var int (int32), strictly positive values
- `kkkk` is the key element-type, a 4 bit unsigned int
- `vvvv` is the value element-type, a 4 bit unsigned int
- `key value pairs` are the encoded keys and values
Show me the code
service TestService {
string sendResponse(1: string str);
TEST_F(RocketClientChannelTest, SyncThread) {
folly::EventBase evb;
auto client = makeClient(evb);
std::string response;
client.sync_sendResponse(response, "doodle");
EXPECT_EQ("doodle", response);
00000000 00 00 65 00 00 00 00 05 00 00 01 00 00 7f ff ff |..e.............|
00000010 ff 7f ff ff ff 0a 74 65 78 74 2f 70 6c 61 69 6e |......text/plain|
00000020 0a 74 65 78 74 2f 70 6c 61 69 6e 00 00 3a f0 9f |.text/plain..:..|
00000030 9a 80 35 0c 15 10 5c 18 17 52 6f 63 6b 65 74 43 |..5...\..RocketC|
00000040 6c 69 65 6e 74 43 68 61 6e 6e 65 6c 2e 63 70 70 |lientChannel.cpp|
00000050 18 14 76 65 73 6f 66 74 2d 31 39 32 2d 31 36 38 |..vesoft-192-168|
00000060 2d 38 2d 32 31 31 00 00 00 00 2a 00 00 00 01 11 |-8-211....*.....|
00000070 00 00 00 18 15 04 18 0c 73 65 6e 64 52 65 73 70 |........sendResp|
00000080 6f 6e 73 65 15 00 25 80 f0 b2 52 00 18 06 64 6f ||
00000090 6f 64 6c 65 00 00 00 00 00 00 00 00 00 00 00 00 |odle............|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
不难发现这里面包含了类似HTTP Header的东西,这其中包含了一个SetupFrame
00000070 00 00 00 18 15 04 18 0c 73 65 6e 64 52 65 73 70 |........sendResp|
00000080 6f 6e 73 65 15 00 25 80 f0 b2 52 00 18 06 64 6f ||
00000090 6f 64 6c 65 00 00 00 00 00 00 00 00 00 00 00 00 |odle............|
-> RequestChannel::sendRequestAsync
-> RocketClientChannel::sendRequestResponse
-> RocketClientChannel::sendThriftRequest
-> apache::thrift::detail::makeRequestRpcMetadata (准备Metadata)
-> RocketClientChannel::sendSingleRequestSingleResponse (准备rpc request)
-> apache::thrift::rocket::pack (代码在thrift/lib/cpp2/transport/rocket/PayloadUtils.h)
-> apache::thrift::rocket::makePayload (实际打包)
-> RequestRpcMetadata::write(CompactProtocolWriter*) (最终调用ProtocolWriter写入Metadata)
字段 | 作用 | 类型 | isSet |
__fbthrift_field_protocol | 编码协议,RocketClientChannel默认使用apache::thrift::ProtocolId::COMPACT | int32 | Y |
__fbthrift_field_name | RPC函数名 | string (size + data) | Y |
__fbthrift_field_kind | RPC的类型,上面代码中是apache::thrift::RpcKind::SINGLE_REQUEST_SINGLE_RESPONSE | int32 | Y |
__fbthrift_field_clientTimeoutMs | 客户端设置的超时时间 默认60s | int32 | Y |
__fbthrift_field_queueTimeoutMs | 排队超时时间 默认是0 (无超时) | N | |
__fbthrift_field_priority | RPC的优先级(thrift接口中可定义) | N | |
__fbthrift_field_otherMetadata | 剩余的Metadata 以map<string, string>形式保存 | N | |
__fbthrift_field_crc32c | CRC32 默认不填 | N | |
__fbthrift_field_loadMetric | 没研究作用 | N | |
__fbthrift_field_compression | NONE 压缩算法 | N | |
__fbthrift_field_compressionConfig | 压缩配置 | N | |
__fbthrift_field_interactionId | 没研究作用 | N | |
__fbthrift_field_interactionCreate | 没研究作用 | N | |
__fbthrift_field_serviceTraceMeta | 没研究作用 | N | |
frameworkMetadata | 框架级别的Metadata | N |
。对应上面的struct的short form编码格式,我们详细看下是MetaData(也是个struct)是如何编码的:
Compact protocol field header (short form) and field value:
|ddddtttt| field value |
- `dddd` is the field id delta, an unsigned 4 bits integer, strictly positive.
- `tttt` is field-type id, an unsigned 4 bit integer.
这四个字段的FieldId依次为1,2,3,5(为啥跳过4不清楚,生成的中间文件中就是这样),类型依次为Int32, Binary, Int32, Int32。
// 第一个字段__fbthrift_field_protocol
// fieldId为1, type为I32即5, 所以编码为(1 << 4) | 5, 最终值为0x15
// 对应的值为COMPACT即2, 经过Int编码转换为i32ToZigzag(writeVarint(2)) = 4
0x15, 0x04,
// 第二个字段__fbthrift_field_name
// fieldId为2, previous fieldId为1, type为Binary即8, 所以编码为
// (2 - 1) << 4 | 8 = 0x18
// 然后由于类型为binary, 所以编码其长度为0x0c
// 然后剩下的就是实际值从0x73开始的12个字节对应的就是这个接口名sendResponse的ASCII值
0x18, 0x0c, 0x73, 0x65, 0x6e, 0x64, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65,
// 第三个字段__fbthrift_field_kind
// fieldId为3, previous fieldId为2, type为I32即5, 所以编码为
// (3 - 2) << 4 | 5 = 0x15
// 值为0 编码出来也是0
0x15, 0x00,
// __fbthrift_field_clientTimeoutMs
// 我在测试中把这个字段设置为了3600 * 24 * 1000 = 86400000 也就是一天的毫秒表示
// fieldId为5, previous fieldId为3, type为I32即5, 所以编码为
// (5 - 3) << 4 | 5 = 0x25
// 对应的值为86400000, 经过Int编码转换为writeVarint(i32ToZigzag(86400000)) = 0x80, 0xf0, 0xb2, 0x52
// 可以参考最上面介绍Integer编码部分
0x25, 0x80, 0xf0, 0xb2, 0x52,
// Metadata这个stuct的终止符
uint32_t CompactProtocolWriter::writeFieldBeginInternal(
const char* /*name*/,
const TType fieldType,
const int16_t fieldId,
int8_t typeOverride,
int16_t previousId) {
DCHECK_EQ(previousId, lastFieldId_);
uint32_t wsize = 0;
// if there's a type override, use that.
int8_t typeToWrite =
(typeOverride == -1
? apache::thrift::detail::compact::TTypeToCType[fieldType]
: typeOverride);
// check if we can use delta encoding for the field id
if (fieldId > previousId && fieldId - previousId <= 15) {
// write them together
wsize += writeByte(
static_cast<int8_t>((fieldId - previousId) << 4) | typeToWrite);
} else {
// write them separate
wsize += writeByte(typeToWrite);
wsize += writeI16(fieldId);
lastFieldId_ = fieldId;
return wsize;
18 06 64 6f 6f 64 6c 65
thrift/ at master · apache/thrift (
Variable-length quantity - Wikipedia
facebook/fbthrift: Facebook’s branch of Apache Thrift, including a new C++ server. (