Algebraicfile Format Specification
Overview
The algebraicfile file format is a container file format for encrypted data. The encrypted data in an algebraicfile may correspond to data from any arbitrary source—in-memory data, data from a Unix domain socket, data from a regular file, and so on. Typically, however, an algebraicfile contains encrypted file metadata and encrypted file data corresponding to a single file on a file system.
The file format design allows an algebraicfile to be read and written as a
stream, with constant in-memory overhead. The recommended filename extension
is .algebraic
. The Uniform Type Identifier is
org.littleroot.algebraicfile
.
This document describes version 4 of the format, which is the current version. Programs and packages that read algebraicfile-formatted files should support the latest and all previous versions. Readers must return an error if a version is unknown or unsupported. Writers may support only the latest version.
Each algebraicfile has 6 sections:
# | Section name | Length | Encryption |
---|---|---|---|
1 | Identifier | 6 byte | none |
2 | Primary header | 81 byte | none |
3 | Secondary header | variable | XChaCha20-Poly1305 |
4 | Filler | variable, possibly zero | unspecified |
5 | Data | variable, possibly zero | XChaCha20 |
6 | Checksum | 32 byte | none |

Example Algebraicfile
The example below corresponds to an original source file named hello.txt
with the contents “hello, world\n
”.
% cat hello.txt
hello, world
% xxd hello.txt
00000000: 6865 6c6c 6f2c 2077 6f72 6c64 0a hello, world.
%
In this example, the file hello.txt
was encrypted—with the option turned on
to obfuscate the true length of the original data—into an algebraicfile named
hello.txt.algebraic
. Each section of the example algebraicfile is discussed
below.
% xxd hello.txt.algebraic 00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d .u....0S.Z.,.Z.} 00000010: 2820 2cec cb7d 0000 0001 0040 0000 08c0 ( ,..}.....@.... 00000020: a81c 5dca d766 80ba 6bee 70ae 40c1 a6b3 ..]..f..k.p.@... 00000030: d1ab 6ca2 8a95 ad8e 759e 7bc0 0177 8d0f ..l.....u.{..w.. 00000040: 933b 53c6 d813 bf42 f955 3351 3f62 e000 .;S....B.U3Q?b.. 00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2 ......*.f.<\.J(. 00000060: 4c58 7b49 9575 b0b5 438b 928c 9f73 155a LX{I.u..C....s.Z 00000070: 70e2 22a4 b71e 66fc 01a2 5f60 b9c9 ea3d p."...f..._`...= 00000080: a87e d1f8 68e8 4586 7ec4 e136 21a1 213b .~..h.E.~..6!.!; 00000090: 783d 40ae 1b11 9819 f25f c3c8 5bee c1ca x=@......_..[... 000000a0: 9db6 ed04 f03e 6691 a117 d7b3 7853 b449 .....>f.....xS.I 000000b0: 0291 3f9c 8c30 bf9a 36d8 3829 9eca c570 ..?..0..6.8)...p 000000c0: aac0 5c5d aab1 e46e c282 cda1 21a6 e131 ..\]...n....!..1 000000d0: 0f48 4615 f363 7eaf 6855 89b4 a67e 2884 .HF..c~.hU...~(. 000000e0: 2e05 5078 fe38 a96e 35b4 1c10 6148 e4ba ..Px.8.n5...aH.. 000000f0: 7240 a79f 33b8 9c6b 7d30 3171 700a 7073 r@..3..k}01qp.ps 00000100: 4bad ec34 3110 8935 5c1d cc44 9867 ae7f K..41..5\..D.g.. 00000110: 53c9 c33c 1abb 24d0 a5b4 e076 a0dc b316 S..<..$....v.... 00000120: 928e 7594 7ac3 7df8 59e8 26cf f649 9e68 ..u.z.}.Y.&..I.h 00000130: e127 11e5 009f 0ed3 3246 fa5a 5434 e49a .'......2F.ZT4.. 00000140: 2449 d70b e5b6 28f7 b1c6 bdb1 0845 5434 $I....(......ET4 00000150: 0b85 3bf2 e1ab 0228 f317 df54 1f4a 9e66 ..;....(...T.J.f 00000160: f08a 70c8 024d b13d 0ea6 89b8 2f5a 9ef2 ..p..M.=..../Z.. 00000170: 6c06 673b 8478 84d4 4fea a23b 93ff ce6e l.g;.x..O..;...n 00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40 xm.k....H4.&nz.@ 00000190: b95e 6faa bc15 a154 1ff0 6c78 4d35 3fbe .^o....T..lxM5?. 000001a0: 3e18 124b b75a 6138 e6c1 382a f423 b3 >..K.Za8..8*.#. %
File Structure
1. Identifier section
The identifier section is binary-encoded in big-endian order. It consists of:
magic 5 byte
version 1 byte
The magic value is 0x0c 0x75 0x0d 0x05 0x0e
. The version field is the
algebraicfile format version as an integer; for example, for version 4 of the
format the value is 0x04
. Programs that read an algebraicfile should read
the identifier section, and based on the version number adjust their parsing
behavior for the remaining sections.
In the example algebraicfile xxd
output from earlier, the identifier section
is these bytes:
00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d .u....0S.Z.,.Z.} - SNIP -
2. Primary Header section
The primary header section is binary-encoded in big-endian order. It consists of:
salt 16 byte
time 4 byte
mem 4 byte
threads 1 byte
aead-nonce 24 byte
stream-nonce 24 byte
nextlen 8 byte
The salt
, time
, mem
, and threads
fields are parameters for
Argon2id key derivation. The mem
value must be in unit kibibyte
(KiB). The aead-nonce
field is the nonce to use with XChaCha20-Poly1305. The
stream-nonce
field is the nonce to use with XChaCha20. The nextlen
field
represents the length in bytes of the variable-length secondary header section
that follows this section.
In the example algebraicfile xxd
output from earlier, the primary header section
is these bytes:
00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d .u....0S.Z.,.Z.} 00000010: 2820 2cec cb7d 0000 0001 0040 0000 08c0 ( ,..}.....@.... 00000020: a81c 5dca d766 80ba 6bee 70ae 40c1 a6b3 ..]..f..k.p.@... 00000030: d1ab 6ca2 8a95 ad8e 759e 7bc0 0177 8d0f ..l.....u.{..w.. 00000040: 933b 53c6 d813 bf42 f955 3351 3f62 e000 .;S....B.U3Q?b.. 00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2 ......*.f.<\.J(. - SNIP -
which breaks down to the hex field values:
salt 3053b25a9b2ce75ae47d28202ceccb7d
time 00000001
mem 00400000
threads 08
aead-nonce c0a81c5dcad76680ba6bee70ae40c1a6b3d1ab6ca28a95ad
stream-nonce 8e759e7bc001778d0f933b53c6d813bf42f95533513f62e0
nextlen 000000000000012a
3. Secondary Header section
The secondary header section consists of a JSON-encoded object, encrypted with XChaCha20-Poly1305. The
section byte size includes the Poly1305 authentication tag (in other
words, the AEAD overhead). The nonce for the encryption is the aead-nonce
value in the primary header section. The encryption key is derived by hashing
a user-supplied password with Argon2id; the parameters for Argon2id must match
the values in the primary header section.
The section largely consists of metadata of the original file. The structure of the JSON-encoded object is:
{
cp: string // packed copyfile(3) data, base64-encoded.
fl: number // length of "Filler" section, int64.
m: number // file mode bits, uint32; see Go type fs.FileMode for format.
n: string // filename, final path element only, base64-encoded.
l: string // linkname, present iff original file is a symbolic link, base64-encoded.
u: number // file uid, int64.
g: number // file gid, int64.
mt: number // file modification time, int64.
at: number // file access time, int64.
ct: number // file change time, int64.
bt: number // file birth time, int64.
}
First, basic rules that apply to all fields: All fields are optional in the encoded JSON. For example, if an algebraicfile represents encrypted in-memory data, then fields such as the original file’s name, its file mode bits, and its modification time are not applicable and hence will be absent.
If a field’s value is unavailable or invalid, writers must omit the property in its entirety in the encoded JSON. Readers must use “nil”, “empty”, or “zero” values for missing fields when decoding JSON. Readers must take into account integer precision and sign requirements when decoding numbers from JSON. Readers must skip without error unknown properties present in the encoded JSON.
Details on specific fields: The fl
field represents the length in bytes of
the variable-length filler section that follows this section. Note that if
the property does not exist in the encoded JSON,
readers must consider the value to be zero.
The l
field represents the target name for a symbolic link. It must be
present if and only if the original file corresponding to an algebraicfile is
a symbolic link.
The cp
field consists metadata about the original file. The value is the
base64-encoded result of copyfile(3)
called with flags COPYFILE_ACL |
COPYFILE_XATTR | COPYFILE_PACK
. Writers should omit the field if the value
cannot be constructed (e.g. because copyfile(3)
isn’t available).
In the example algebraicfile xxd
output from earlier, the secondary header section
is the following 298 encrypted bytes—the length having been specified by the
nextlen
field in the primary header section.
- SNIP - 00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2 ......*.f.<\.J(. 00000060: 4c58 7b49 9575 b0b5 438b 928c 9f73 155a LX{I.u..C....s.Z 00000070: 70e2 22a4 b71e 66fc 01a2 5f60 b9c9 ea3d p."...f..._`...= 00000080: a87e d1f8 68e8 4586 7ec4 e136 21a1 213b .~..h.E.~..6!.!; 00000090: 783d 40ae 1b11 9819 f25f c3c8 5bee c1ca x=@......_..[... 000000a0: 9db6 ed04 f03e 6691 a117 d7b3 7853 b449 .....>f.....xS.I 000000b0: 0291 3f9c 8c30 bf9a 36d8 3829 9eca c570 ..?..0..6.8)...p 000000c0: aac0 5c5d aab1 e46e c282 cda1 21a6 e131 ..\]...n....!..1 000000d0: 0f48 4615 f363 7eaf 6855 89b4 a67e 2884 .HF..c~.hU...~(. 000000e0: 2e05 5078 fe38 a96e 35b4 1c10 6148 e4ba ..Px.8.n5...aH.. 000000f0: 7240 a79f 33b8 9c6b 7d30 3171 700a 7073 r@..3..k}01qp.ps 00000100: 4bad ec34 3110 8935 5c1d cc44 9867 ae7f K..41..5\..D.g.. 00000110: 53c9 c33c 1abb 24d0 a5b4 e076 a0dc b316 S..<..$....v.... 00000120: 928e 7594 7ac3 7df8 59e8 26cf f649 9e68 ..u.z.}.Y.&..I.h 00000130: e127 11e5 009f 0ed3 3246 fa5a 5434 e49a .'......2F.ZT4.. 00000140: 2449 d70b e5b6 28f7 b1c6 bdb1 0845 5434 $I....(......ET4 00000150: 0b85 3bf2 e1ab 0228 f317 df54 1f4a 9e66 ..;....(...T.J.f 00000160: f08a 70c8 024d b13d 0ea6 89b8 2f5a 9ef2 ..p..M.=..../Z.. 00000170: 6c06 673b 8478 84d4 4fea a23b 93ff ce6e l.g;.x..O..;...n 00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40 xm.k....H4.&nz.@ - SNIP -
4. Filler section
The filler section may be used to increase the size of an algebraicfile, in
order to obfuscate the true length of the data. The number of bytes in the
section must match the fl
field in the secondary header section. The bytes
must be indistinguishable from any actual encrypted data.
Readers may ignore the filler section, by discarding or seeking past fl
bytes after the secondary header.
The filler section can have zero length.
In the example algebraicfile xxd
output from earlier, the filler section
is exactly 1
byte—the length would have been indicated by the encrypted fl
field in the
secondary header.
- SNIP - 00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40 xm.k....H4.&nz.@ - SNIP -
5. Data section
The data section is all bytes after the filler section but before the final,
fixed-length checksum section. The section consists of the original source
file’s data, encrypted with XChaCha20. The nonce for the encryption is the
value of the stream-nonce
field in the primary header section. The
encryption key is the same one used in the secondary header section.
The data section can have zero length. In practice this can happen, for example, if the original file was a zero length regular file or the original file was a symbolic link.
In the example algebraicfile xxd
output from earlier, the checksum section
is these
13 encrypted bytes, which is the same length as the original plain text
input:
- SNIP - 00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40 xm.k....H4.&nz.@ - SNIP -
6. Checksum section
The 32-byte checksum is the SHA-256 sum of all the bytes in the preceding sections. Readers may forgo checksum verification.
In the example algebraicfile xxd
output from earlier, the checksum section
is these
final 32 bytes:
- SNIP - 00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40 xm.k....H4.&nz.@ 00000190: b95e 6faa bc15 a154 1ff0 6c78 4d35 3fbe .^o....T..lxM5?. 000001a0: 3e18 124b b75a 6138 e6c1 382a f423 b3 >..K.Za8..8*.#.