Using Protocol Buffers

I’ve written once before about using protocol buffers, using the protobuf-net library, but I didn’t go into any depth regarding the .proto file which is used for the IDL. Let’s rectify this.

Introduction

Protocol buffers are simply a way to define a “language-neutral, “platform-neutral, extensible mechanism for serializing structed data”. What this really means is this is a specification (and tooling) for creating binary data. This data might exist as files or streamed over HTTP or any other type of stream. Think of Protocol Buffers as CSV, XML or the likes, but obviously being binary these resultant streams would generally be more compact than these other formats.

Proto file format

I’m not going to cover the .proto syntax in full as it’s already available at Language Guide (proto3), but as I build up an example .proto file I will cover the pieces that I add to the file as I go.

We’re going to want to create a .proto file which we be used to declare our messages/data. Currently the latest syntax supported is “proto3” and we declare the version we support in our .proto file. If you do not specify the syntax, currently this syntax will default to proto2 syntax.

So first off create a file with the .proto extension – I’m doing this within Visual Studio which supports Protocol Buffer syntax highlighting etc.

To declare the supported syntax we start off by adding the following line (I’m going to use proto3 syntax in the post, there are several differences between proto3 and proto2)

syntax = "proto3";

Packages/Namespaces

Whilst it’s optional, the next thing we’ll add is a package which, whilst optional, is useful for code generation in your preferred language. For example in Java this maps directly to the Java package name and in C# and C++ this maps to the namespace of the code.

We can actually override the package/namespace name for Java and C# by using option java_package and/or option csharp_namespace instead or as well as the package line. Obviously we might wish to have all three in our .proto file so the file can be used to generate for Ruby, Go, C++ etc. as well as explicit definitions of Java and C#

So let’s add a package

package music;

option java_package = "com.putridparrot.music";
option csharp_namespace = "PutridParrot.Music";

Types

Scalar types are supported, such as double, float, int32, int64 etc. along with the string type.

Enum’s are all supported, so let’s add an enum to our file

/*
 Note definitions, where two letters are used,
 the first denotes the # (sharp) and the second 
 the b (flat)
*/
enum Note {
   C = 0;
   CD = 1;
   D = 2;
   DE = 3;
   E = 4;
   F = 5;
   FG = 6;
   G = 7;
   GA = 8;
   A = 9;
   AB = 10;
   B = 11;
}

We need to define the possible values for the enum and these must have a zero element. This obviously gives us a default value (hence zero should be your enum default value).

We’ve also added a multi-line comment using /* */ syntax, single line comments using // are also supported.

A message type can be viewed as a composite type, such as structs, i.e. we can combine types, so let’s create a request and response type (the response will be used in my next post on gRPC)

message NotesRequest {
   Note key = 1;
   string name = 2;
}

message NotesResponse {
   Note key = 1;
   string name = 2;
   repeated Note notes = 3;
}

Notice the use of = 1 etc. these are field numbers and each field must have a unique field number.

As per the Google documentation, fields in the range 1 through 15 take one byte to encode, fields 16 through to 2047 take two bytes. So yes, you could have up to 2047 fields in a message, if you really wanted.

Notice in the NotesResponse message we define a repeated keyword which denotes this field can be repeated, think of this like an array (or list) field.

Code Generation

One of the key things XML gave developers was a specification which allow developers to write tools for generating code from the data specifications. Protocol Buffers is no different and ofcourse, this makes such specification more usable to the developer.

The tool we use is protoc.exe. If you’re using Visual Studio/nuget you can install Google.Protobuf.Tools via nuget. This will then be installed to ${SolutionDir)packages\Google.Protobuf.Tools.3.6.0\tools\windows_x86 (or whichever OS you’re supporting).

Now we can run this tool from nant, or other build tools, or as a pre-build event, i.e. selecting your project in Visual Studio, right mouse clicking, selecting Properties then Build Events.

Here’s an example command (formatted to make it readable)

$(SolutionDir)packages\Google.Protobuf.Tools.3.6.0\tools\windows_x86\protoc.exe 
$(ProjectDir)Proto\music.proto 
-I=$(ProjectDir)Proto 
--csharp_out=$(ProjectDir)

The first line is obviously the location of the installed protoc.exe. Next up we declare where the proto file(s) is/are. We can use wildcard, i.e. *.proto, but if we have several different location for the files we will probably need to run the command multiple times.

The -I= allows us to define import directories. This isn’t really needed in the example here as we’re not importing anything. Finally we declare that we want to generate C# code into the project folder.

Note: If you want to generate the code into another folder you’ll need to ensure it already exists, protoc will not create it for you.

Once run, this command will create a C# file which will include the types/messages as well as serialization/deserialization code.

If you’re using Visual Studio to create an application which uses Protocol Buffers, then you’ll need to install the nuget package Google.Protobuf to install the library that the generated source references.

Serializing/Deserializing

Let’s create a Visual Studio Console application, in the project folder add a Proto folder (which will contain our *.proto) files. Now, added the two nuget packages (previously mentioned, Google.Protobuf.Tools and Google.Protobuf).

Next, create a file in the Proto folder named music.proto which should look like this

syntax = "proto3";

package music;

option java_package = "com.putridparrot.music";
option csharp_namespace = "PutridParrot.Music";

/*
 Note definitions, where two letters are used,
 the first denotes the # (sharp) and the second 
 the b (flat)
*/
enum Note {
   C = 0;
   CD = 1;
   D = 2;
   DE = 3;
   E = 4;
   F = 5;
   FG = 6;
   G = 7;
   GA = 8;
   A = 9;
   AB = 10;
   B = 11;
}

message NotesRequest {
   Note key = 1;
   string name = 2;
}

message NotesResponse {
	Note key = 1;
	string name = 2;
	repeated Note notes = 3;
} 

Next, add to the Pre-Build event for the solution the command line (listed previously) to generate the C# from the .proto file.

Lastly, let’s just add the following using clauses to Program.cs

using PutridParrot.Music;
using Google.Protobuf;

and here’s the code to place in Main

var request = new NotesRequest
{
   Key = Note.C,
   Name = "Major"
};

using (var w = File.Create(@"C:\Data\request.dat"))
{
   request.WriteTo(w);
}

NotesRequest request2;
using (var r = File.OpenRead(@"C:\Data\request.dat"))
{
   request2 = NotesRequest.Parser.ParseFrom(r);
}

This will create a file request.dat with the request instance data and then, if all goes well, load the contents of the file into the request2 variable and that’s all there is to it.

We can stream the object using the WriteTo and ParseForm methods but Protocol Buffers also supports gRPC which we’ll look at in the next post.

Code available here https://github.com/putridparrot/blog-projects/tree/master/ProtocolBuffers/CSharp