Writing a 9P server from scratch, pt 2: Protocol parsing

In the previous post, we began work on the protocol parser, getting to the point where we could read 9P messages from a network connection. With this post we will pick up the pace, writing a full decoder and encoder for the 9P protocol. You will see that with even as simple a protocol as 9P, there are a wide range of choices to make when designing the API and its implementation.

The code for this post can be found in the styxproto package. It is always changing, so do not be surprised if there are discrepancies.

9P message format

9P protocol data comes as a series of messages, consisting of a size field, an unsigned 32-bit little-endian integer, followed by size-4 bytes. There are no sentinel values or delimiters.

Within the payload of a message are several fixed-sized and variable-length fields. The first byte of the payload determines the message type.

Variable-length fields, such as the version field of a Tversion message, always consist of a 16-bit, little-endian integer n, followed by n bytes of UTF8 text. The Twalk message is the most complex message to parse; its body contains a count, nwname, followed by nwname variable-length fields of UTF8 text.

That's it! That's the most complex message we'll have to deal with! The messages that are of particular note are those used for I/O, specifically the Twrite and Rread messages. These write data to and return data from a file on a server, respectively. The headers of these messages are not unlike the other messages.

The final count field is a 32-bit integer specifying the size of the message body. This means that Tread and Rread messages can be very large, close to 4 gigabytes in size. You would be hard-pressed, however, to find a server in the wild that did not negotiate a maximum message size much lower than that. Regardless, how to handle these two little message types is one of the more important design choices we will face.

Message representation

As you can see above, the 9P message format is quite simple. Because most of the fields are of a fixed-width, each field can be accessed quickly. In fact, this format is so trivial that it almost seems a waste to copy each message out into a struct. I will be taking the same approach used by packages like capnproto and flatbuffers; the protocol representation of a message will be used directly rather than being unmarshalled into an intermediate type. I am not going to make predictions on cache hits, because honestly I wouldn't know what I was talking about. Instead I'll talk about the benefits I do know about:

The connection buffers themselves can be used to store the messages while they are being processed. With this approach, less work is made for the garbage collector, and time is not spent copying the message into an internal format.
We do not need to allocate additional memory for every message. This helps with our goal for predictable per-connection resource usage. At the very minimum, each connection would only need enough memory to hold a single message in its connection buffer.
We do not waste space on struct fields that do not apply to a given message type. Go does not have union types like in C.

Here is our representation of a Tread message using this strategy. It is almost the same number of lines of code as a struct definition, but without the need to unmarshal a message (only validation is necessary):

type Tread []byte
func (m Tread) Tag() uint16   { return guint16(m[5:7]) }
func (m Tread) Len() int64    { return int64(guint32(m[:4])) }

// Fid is the handle of the file to read from.
func (m Tread) Fid() uint32 { return guint32(m[7:11]) }

// Offset is the starting point in the file from which to begin
// returning data.
func (m Tread) Offset() int64 { return int64(guint64(m[11:19])) }

// Count is the number of bytes to read from the file. Count
// cannot be more than the maximum value of a 32-bit unsigned
// integer.
func (m Tread) Count() int64 { return int64(guint32(m[19:23])) }

For completeness, here are the definitions for the integer parsing functions:

// Shorthand for parsing numbers
var (
	guint16 = binary.LittleEndian.Uint16
	guint32 = binary.LittleEndian.Uint32
	guint64 = binary.LittleEndian.Uint64
)

Dealing with large messages

As I mentioned before, certain 9P messages can be quite large. A server can negotiate a maximum message size below the protocol limit, but there are some benefits to allowing large messages. If, for instance, you are persisting large files to disk, it can be more efficient to make 1 large write instead of 100 small writes. If your server implements some sort of transactional semantics, it is easier to implement if you can accept an entire request in a single message. For these reasons, the styxproto package should support messages close to the maximum allowed by the protocol.

Fixed-size buffer

Supporting arbitrarily-large (up to 4GB) messages with fixed memory usage can be somewhat tricky, and has an influence on our API design. Here are all the T-messages (client → server) in the 9P2000 protocol, as a reminder:

The pale yellow fields are variable length. By imposing reasonable maximum sizes on file names, user names, and the other variable-length fields, we can come up with a minimum buffer size. See the limits.go file for the limits imposed on various fields. With the limits chosen, the largest message (excluding Twrite and Rread) is Twalk, because it can include up to 16 file names of variable width. Thus, at the very minimum, our buffer must be at least

const MinBufSize = MaxWElem*(MaxFilenameLen+2) + 13 + 4

MinBufSize bytes long. We add 2 to MaxFilenameLen to account for the 2-byte length preceding variable-length strings. We add 13 to the product to account for the preceding fixed-width fields. And we add 4 to that to account for the size field.

In order to randomly address any field in a message we must be able to hold all of it in memory. That presents a problem when a message is extremely large. For this reason, we will treat Twrite and Rread messages specially; their data fields will not be randomly accessible. They will implement the io.Reader interface, and must be accessed as a stream of bytes.

A Streaming API for message decoding

A Twrite message looks like this:

type Twrite struct {
	r io.Reader
	msg msg // headers plus any extra buffered data
}

In the msg member are the bytes containing the fixed-length fields of the message. These fields are accessed just like the other messages:

func (m Twrite) Fid() uint32 { return guint32(m.msg[7:11]) }

The r member provides an interface for accessing the rest of the message, some of which may be already buffered in the msg member, some of which may still be waiting to be read from the underlying connection.

func (m Twrite) Read(p []byte) (int, error) {
	return m.r.Read(p)
}

The Twrite type implements the io.Reader interface, so that a Twrite value can be used anywhere a byte stream can be used. For example:

// Assuming msg is a Twrite value, write its data to realFile
n, err := io.Copy(realFile, msg)
// send an Rwrite message
if err != nil {
	// send an Rerror message
}

Using a streaming API for potentially large data fields gives us the freedom to create an implementation that does not incur unexpected resource usage.

The Decoder

Following the example set by packages like encoding/xml and encoding/json, we define a Decoder, which wraps a connection value. Decoding 9P messages is done through methods on the Decoder, which fill and flush its connection buffers.

type Decoder struct {
	MaxSize int64
	r io.Reader
	br *bufio.Reader
	start, pos int
	msg []Msg
	err error
}

You'll see that I expose both the raw connection in the r member in addition to the buffered reader in the br member. This will be important later. The start and pos members are used during parsing. The msg and err messages implement an API similar to a bufio.Scanner.

A decoder is intended to be used in a loop, similar to a bufio.Scanner:

d := NewDecoder(conn)
for d.Next() {
	for _, msg := range d.Messages() {
		switch m := msg.(type) {
		case Twrite:
			// handle Twrite
		case Tauth:
			// handle Tauth
		default:
			// unexpected message
		}
	}
}
if d.Err() != nil {
	log.Fatal(d.Err())
}

Note how error handling is done later, to reduce clutter in the main loop. The Next method fills a Decoder's buffer with data from the underlying connection, overwriting any previously buffered messages. If an error occurs reading or parsing data, Next returns false, breaking the loop. Within the loop, the Messages method returns a slice of 1 or more messages read during the last call to Next.

The parsing code can be found here. Even with a simple message format like 9P, parsing can be difficult and dangerous, as you are dealing with untrusted data. I won't cover the parsing too closely, as it is likely to change. However, my approach is to establish as many invariants as I can about the state of the Decoder and the structure of the incoming message. For instance, asserting early on that enough data has been read to decode universal fields like size, tag, and type. This is one of the areas I will come back to again and again. I have implemented fuzz testing against the parser using go-fuzz, but this is one area that would benefit from more rigor.

The construction of Twrite and Rread messages during parsing is worth repeating here:

	// dot contains the currently read bytes for
	// this message, up to size
	m := Twrite{msg: dot}

	// even though we may not have read the
	// whole message, because of the
	// representation we've chosen, we can
	// still call methods on it.
	count := m.Count()

	buffered := dot[23:]
	m.r = bytes.NewReader(buffered)
	if int64(len(buffered)) < count {
		m.r = io.MultiReader(
			m.r,
			io.LimitReader(r, count-int64(len(buffered))))
	}

	return m, nil

By using an io.MultiReader, we can hide the fact that a message's data is partially buffered. To calling code, whether the data is all in memory, or coming from a connection (or a file, even), it is always accessed via the Read method, like any other byte stream.

Encoders

Given the representation we've chosen for 9P messages, there is not much to encode; an encoder simply needs to copy the bytes of a message to the connection. However, it seems like an appropriate place to put methods that create messages from higher-level parameters.

type Encoder struct {
	w  *wire.TxWriter
	ew *util.ErrWriter
}

The wire.TxWriter isolates concurrent writes to an io.Writer. The util.ErrWriter captures errors occured during writing (see errors are values). Here is an example method that writes an Rauth message to an underlying connection (some functions ellided).

func pheader(w io.Writer, size uint32, mtype uint8, tag uint16, extra ...uint32) {
	puint32(w, size)
	puint8(w, mtype)
	puint16(w, tag)
	puint32(w, extra...)
}
func (enc *Encoder) Rauth(tag uint16, qid Qid) error {
	size := uint32(maxSizeLUT[msgRauth])

	tx := enc.w.Tx()
	defer tx.Close()

	pheader(tx, size, msgRauth, tag)
	pqid(tx, qid)
	return enc.Err()
}

And here is an Rread method:

func (enc *Encoder) Rread(tag uint16, data []byte) error {
	if math.MaxUint32-minSizeLUT[msgTwrite] < len(data) {
		return errTooBig
	}
	size := uint32(minSizeLUT[msgRread]) + uint32(len(data))

	tx := enc.w.Tx()
	defer tx.Close()

	pheader(tx, size, msgRread, tag, uint32(len(data)))
	tx.Write(data)
	return enc.Err()
}

Note that on the encoder side, we take a []byte parameter instead of an io.Reader. The reasoning behind this is that it makes the calculation of the size and count fields simpler, and if a program wants to send more data than it wants to hold in memory, it can send multiple Rread messages.

Next steps

See the styxproto package for the full implementation of the low-level encoder and decoder for the 9P2000 protocol. It contains moderate tests, including fuzz testing (which uncovered a couple of bugs). With this package, we can read and write 9P messages from the network. However, it is not enough to just speak the protocol. In the next post, I will cover what it takes to implement a 9P server, including (but not limited to): sessions, fid management, authentication, protocol negotiation. We will implement the higher-level styx package that will interface with user-written servers.

Related posts