There exist on of already. Even without articles attached to them. This goes for every language, a testament to how much performance matters to some people. various articles benchmarks serializers various github projects exist However, it’s incredibly hard to create reproducible, consistent benchmarks. That’s not so surprising, as points out, there are many factors to control for: Hanselman CPU Affinity/Process Priority Other running processes on your test machine If present, handle Garbage Collection without skewing results Using the correct time measuring calls Using the frameworks/libraries under test properly Taking result outliers into account Displaying the results in an easy to interpret way And probably some other factors that I’ve missed. It’s no surprise then that many existing benchmarks can be criticised for missing one or more of these points. Let’s take a look at some: GLD.SerializerBenchmark Even though it’s been superseded by I wanted to give it some attention because it illustrates a lot of oversights even though on first glance it seems to be a valuable source of information. serbench, The first thing you notice with GLD.SerializerBenchmark is a build error: This apparently needs a manual DLL download as described by a . Fixing that, don’t forget to input “100” as a program argument, as is stated in the . This produces the following output: comment in the code article Full output here As you can see, not all tests run succesfully. There are plenty of exceptions and checks that are failing, making comparison between other benchmarks, even its own, harder. Not to mention that the article hasn’t given the full output to compare against. Aside from problems running it, there are a couple of measuring problems I noticed: It doesn’t set CPU Affinity not Process Priority In the , the options that can be given a default like done . Jil Serializer here Also in the constructing a StreamWriter is not necessary, but supported in case you need it for something like WebAPI Jil Serializer, Even though the contains an overridable Initialize member, it’s actually never overriden and during the serialize call. Serializers such as Jil use heavy reflection, but only on the first call, making the initiliaze function crucial to get proper results. parent class sometimes even used contains an unnecessary extra StringWriter NetJSON Serializer The Garbage Collecter between every test run, leading to possible garbage collections during crucial code execution. isn’t run All results are averaged, making it impossible to see what the minimum and maximum values were, nor is it obvious how many tests fell within a certain range (i.e. to see how much jitter there is) There has been a lot of work to add a lot, though not all, Serializers used in the .NET environment. But I wouldn’t trust this benchmark to output very consistent measurements. Serbench GLD.SerializerBenchmark’s README points to this project. However, at the time of writing, some of the shortcomings of the GLD project appear to also exist in Serbench. The contains the same unnecessary stringreader/writer, Jil is still not initialized properly and CPU Affinity and Process Priority are not set. Jil Serializer It does seem to have as well as having removed the duplicate StringWriter and giving Jil a new Options object every run as well as adding the Apolyton.FastJson reference directly in the repository. Despite that, there is no mention of having done a Memory Profile to ascertain if collection is done exclusively induced or that it happens during test runs. Also, serbench upon which to run all the tests, which strikes me as odd. collect the garbage before every run creates an extra thread However, I couldn’t get serbench to work. Apparently it requires something called , which is capable of writing the results into a variety of outputs, such as an RDBMS. It seems they’re aiming for a benchmark that emulates running serializers in parallel, supposedly making it more real-life than the synthetic benchmarks usually shown. While chain-benchmarking would certainly be interesting, I’m quite skeptical as to what they’re trying to prove. Isolating serializers in synthetic benchmarks makes them easily comparable, but once you introduce a whole chain of software stacks, you’re bound to run into the issue that everyone has their own combination. Soon, you’ll have to create an all-encompassing benchmark suite that runs on multiple platforms, has pluggable frameworks and pluggable serializers. NFX Admittedly, I did not spend too much time getting it to work, but I do expect being able to open the project and press start and it should work. I hope they’ll make it easier to use in the future. SimpleSpeedTest While not a complete benchmark for a vast array of serializers, it provides a relatively easy setup to create one. shows how to use it, and in it you can see that it doesn’t have the smaller oversights found in the GLD and serbench projects. It only uses streams where necessary, though doesn’t compare the same library with and without streams. It also does but like GLD, it averages all the . One of the examples garbage collection, results into one number While the results will be reasonably trustworthy, here too, I am missing CPU Affinity and Process Priority. I can see this working for a quick one-on-one comparison, but not for a full benchmark suite. A Better Comparison? I’m sure I’ve not looked at all possible benchmark software for serializers in the .NET environment, my time is limited after all. However, I did which aims to account for as many factors as possible, for a wide variety of serializers, in an isolated, synthetic benchmark style. create a project Hardware & Software As is usual in proper benchmarks, to minimize variance and increase reproducability, one states the used hardware and software. As you can see in these screenshots, I run all tests on a non-overclocked i5–4570 With 8 GB of RAM at the following clock/timings. As for the software, I’m running a 64-bit Windows 10 Professional, with the latest updates, all applications except file explorer closed, but with automatic updates and all privacy sensitive settings disabled. I noticed that automatic updates easily takes up all your disk I/O and one whole CPU core while downloading and installing updates in the background. This would not be viable for a benchmark. I’ve used Visual Studio 2017 (not an update version) with .NET Framework 4.6.2 to compile the C# solution. For used libraries in C# see the . packages.config For NodeJS I used version 7.7.4 For C++ I used Visual Studio 2017 (not an update version) with Cereal 1.2.2 (which uses rapidjson and rapidxml), protobuf 3.2.0 (static library can be ) found on the repository Methodology What most other benchmarks do, is create a couple of objects to serialize and deserialize, run those a number of times in a row and calculate the average. While this gives you a representation of total time required, it does lose some valuable data. For this project, I want to create a moderately large object, measure serialization and deserialization separately and store each single run in a list of measurements. This way I can create an . However, I’m going to alter the definition of the various points. High and low will be the highest time and lowest time measured respectively, but the open point will be the 20/100th measurement and the close point will be the 80/100th measurement. OHLC graph On the left you can see an example with 250 repetitions. The fastest sample(L for Low) is 2117 µs, the slowest sample(H for High) is 2888 µs. All measurements are sorted from lowest to fastest. The 20/100th measurement(O) is measurement #50, which is 2117 µs the 80/100th measurement(C) is measurement #200, which is 2302 µs. You can see that most of the samples (60% of them) are in the 2117–2302 µs range, with only a couple of outliers below that but more outliers above that. This way, you have information about how jittery/consistent the library is as well as having a general idea how the library will perform. Further, all benchmark processes will be run on CPU #0 (so CPU affinity is set), having Process Priority High and will be run as administrator, so that the previous two settings will be possible to do by the process itself. As a last step, I’ll profile the memory to see if garbage collections are strictly occurring when I want them to and not during a test. My first test will be Jil in various setups (normal, streaming, with and without attributes on the data object class, with and without options) on x86 and x64. My second test will be a few JSON serializers in C# on x86 and x64. Third test will be a few binary serializers in C# on x86 and x64. My third test will be comparing a couple of serializers in C# to ones in C++ and NodeJS. Code Just to make sure I’ve done it correctly, I’d like to run you through some of my code. First thing the program does is set Affinity and Priority. Full code here This is the code used to run a single measurement. .NET 4.6 introduces the GC.TryStartNoGCRegion function, which allows you to tell the garbage collector to pre-allocate memory and tell it to try and not run garbage collection until you end the region. I try to before calling the action. allocate 1M Full code here Each test is warmed up, so we don’t measure cold startup time. Then the tests are run for 250 repetitions, which is . Technically, this is the only “measurement” that is thrown out for all tests. hard-coded Full code here For all tests, I create an object with 1000 documents. I tried to give a representative object, containing datetimes, UTF-8 strings and an integer. I realise that many more combinations are possible, but I’m not sure if they would add much. Of course, there are some variations of the specific action, such as when streams are required or when a specific file with preloaded json/xml/binary contents are read into memory before running the test, or when a different type of Person/Document is required for the library, but this is the basic structure for all types. Results — Test #1 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x86 run, be aware of the Y-axis not starting at 0 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x64 run, be aware of the Y-axis not starting at 0 I apologise for not letting the Y-axis start at 0, I am using and I have yet to find out how to change that. If you know how, let me know! Live Charts, The first thing you see in these graphs is that Jil is fast and consistent, especially for a JSON serializer. Most of the calls are under 1 ms. The consistency is probably due to being fast, as we’ll see in the next tests. Second, stream serialization slows it down, but stringwriter apparently speeds it up. Third, the “With Attributes” means that the data object was created with DataContract and Serializable attributes on the class. But what actually happens is that Jil is unable to recognize some of the DataContract attributes, which leads to datetimes not being serialized/deserialized. So the speedup you see in the graph is completely attributed to that. Lastly, deserialization with streams is a lot slower than without, which is a shame. When using Jil with WebAPI, the stream API is used instead of the direct version. The profile will be at the end, since all of the results are from one run of the benchmark. Results — Test #2 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x86 run, be aware of the Y-axis not starting at 0 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x64 run, be aware of the Y-axis not starting at 0 This is where it starts to get interesting! One of the findings of GLD.SerializerBenchmark was that NetJSON was faster than Jil — but not in this benchmark! Except for x64 serialization, Jil is faster by a pretty noticable margin. And even in the x64 serialization case, NetJSON is ~75 µs faster, whereas with x64 deserialization NetJSON is 374 µs slower per call. NewtonSoft.JSON is 2–3x as slow as either Jil and NetJSON, but the real slowpoke is the DataContractJsonSerializer deserializer, which comes with the .NET framework. It is about 6–8x times slower than Jil and NetJSON as well as being the least consistent of every framework. I have no explanation as to why though. Results — Test #3 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x86 run, be aware of the Y-axis not starting at 0 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, x64 run, be aware of the Y-axis not starting at 0 As I had expected, binary serializers can be even faster than JSON serializers. They don’t have to parse text, after all. But what I was really surprised about was that ZeroFormatter was so incredibly fast. Its bold claims on are indeed no lie. It’s faster than Hyperion (the successor to Wire) and protobuf. github If you compare Hyperion or Protobuf to Jil or NetJSON you won’t find much speed difference — in C# at least. MsgPack is a bit slower but you really want to steer clear of BinaryFormatter. Results — Test #4 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, C++ x64 run, be aware of the Y-axis not starting at 0 Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, c++ x64 run Click for bigger view. Ser = Serialization test, Des = Deserialization test, StrSer = Stream Serialization, StrDes = Stream Deserialization, nodejs x64 run, be aware of the Y-axis not starting at 0 These last three graphs are about serializers in C++, C++ and Node.JS respectively. What’s interesting about the first graph is that I had expected JSON and XML serialization in C++ to be faster than in C#. But somehow, cereal JSON serialization is as slow as DataContractJsonSerializer deserialization. The deserialization is incredibly fast though. I’m not sure if I found a performance bug or if I did something wrong. That’s why in the second graph, I removed JSON and XML, and I find that Protobuf is indeed faster in C++. More importantly though, since C++ does not have extra garbage collection metrics, the results are incredibly consistent. The second graph might make it look like it’s equally jittery as C#, but it’s actually all within a few hundred µs. The third and last graph are the results of a couple of Node.JS serializers. Of course the default one and a couple that claim to be faster. The thing though, I think I might be dealing with some problems getting an accurate time in js. Node.JS supports hrtime, which should be accurate, but still my results are all over the place. I’m not sure if I could call the Node.JS benchmark accurate. Results —Profiling To be sure that my C# benchmarks were not hindered by garbage collection, I did a memory profile. Garbage Collection Garbage Collection As you can see, garbage collection only happens when GC.Collect() is called. CPU Profile In the CPU profile, most of the time is spent in some P/Invoke stuff. I’m hazarding a guess that it’s the Garbage Collector stuff I’m doing, since a GC.Collect every 4 milliseconds does add up. Otherwise, the CPU time goes to the serialization libraries. Raw measurements can be found here: , , and . C# x86 C# x64 C++ Node.JS Conclusion In this (rather long) article, I have shown that most benchmarks miss some steps when doing measurements, that doing measurements correctly is hard and that binary serialization definitely is faster than JSON serialization. While I have undoubtedly not tested with a data object that is like whatever you’re going to use in your software, I think it’s safe to say that Jil and NetJSON are the fastest JSON serializers I tested for C# and ZeroFormatter definitely is the fastest binary serializer I tested. But if you really, REALLY want every ounce of performance, you still have to use C++, or other statically compiled languages. If you’ve stuck with me this far — thank you for reading! I hope you had as much fun reading as I had making this. If you have any questions, don’t hestitate to ask.

The Graph

Stacks

Comparing the performance of various serializers

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Why I’m dropping Rust

3 Easy Ways to Improve The Performance Of Your Python Code

5 Performance Tips For .Net Developers

A Benchmark of Low Code Platforms for Test Automation

A brief overview of Automatic Machine Learning solutions (AutoML)

A Guide to Understanding Benchmarks, Baselines, and Golden Images

Why I’m dropping Rust

3 Easy Ways to Improve The Performance Of Your Python Code

5 Performance Tips For .Net Developers

A Benchmark of Low Code Platforms for Test Automation

A brief overview of Automatic Machine Learning solutions (AutoML)

A Guide to Understanding Benchmarks, Baselines, and Golden Images

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps