Hi everyone! Today I want to share with you some .Net 5 performance tips with benchmarking! My system: BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update) Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK=5.0.104 I will provide benchmarks results in percentages where 100% is fastest result. 1. StringBuilder for concatenation As you probably know, strings are immutable. So whenever you concatenate strings, a new string object is allocated, populated with content, and eventually garbage collected. All of that is expensive and that’s why will always have better performance. StringBuilder Benchmark example: StringBuilder sb = (); [ ] => ExecuteConcat( ); [ ] => ExecuteConcat( ); [ ] => ExecuteConcat( ); [ ] => ExecuteConcat( ); [ ] => ExecuteConcat( ); [ ] => ExecuteBuilder( ); [ ] => ExecuteBuilder( ); [ ] => ExecuteBuilder( ); [ ] => ExecuteBuilder( ); [ ] => ExecuteBuilder( ); { s = ; ( i = ; i < size; i++) { s += ; } } { sb.Clear(); ( i = ; i < size; i++) { sb.Append( ); } } private static new Benchmark ( ) public void Concat3 3 Benchmark ( ) public void Concat5 5 Benchmark ( ) public void Concat10 10 Benchmark ( ) public void Concat100 100 Benchmark ( ) public void Concat1000 1000 Benchmark ( ) public void Builder3 3 Benchmark ( ) public void Builder5 5 Benchmark ( ) public void Builder10 10 Benchmark ( ) public void Builder100 100 Benchmark ( ) public void Builder1000 1000 ( ) public void ExecuteConcat size int string "" for int 0 "a" ( ) public void ExecuteBuilder size int for int 0 "a" Results: 3 string concatenations - 218% (35.21 ns) 3 StringBuilder concatenations - 100% (16.09 ns) 5 string concatenations - 277% (66.99 ns) 5 StringBuilder concatenations - 100% (24.16 ns) 10 string concatenations - 379% (160.69 ns) 10 StringBuilder concatenations - 100% (42.37 ns) 100 string concatenations - 711% (2,796.63 ns) 100 StringBuilder concatenations - 100% (393.12 ns) 1000 string concatenations - 3800% (144,100.46 ns) 1000 StringBuilder concatenations - 100% (3,812.22 ns) 2. Initial size for dynamic collections .NET provides a lot of collections like List<T>, Dictionary<T>, and HashSet<T>. All those collections have dynamic size capacity. They automatically expand their size as you add more items. When the collection reaches its size limit, it will allocate a new larger memory buffer (usually an array double in size). That means an additional allocation and deallocation. Benchmark example: [ ] { List< > list = List< >(); ( i = ; i < Size; i++) { list.Add(i); } } [ ] { List< > list = List< >(Size); ( i = ; i < Size; i++) { list.Add(i); } } Benchmark ( ) public void ListDynamicCapacity int new int for int 0 Benchmark ( ) public void ListPlannedCapacity int new int for int 0 In the first method, the List collection started with default capacity and expanded in size. In the second benchmark the initial capacity is set to the number of items it’s going to have. For 1000 items the results are: List Dynamic Capacity - 140% (2.490 us) List Planned Capacity - 100% (1.774 us) Benchmarks for Dictionary and HashSet: Dictionary Dynamic Capacity - 233% (20.314 us) Dictionary Planned Capacity - 100% (8.702 us) HashSet Dynamic Capacity - 223% (17.004 us) HashSet Planned Capacity - 100% (7.624 us) 3. ArrayPool for short-lived large arrays Allocation of arrays and the inevitable de-allocation can be quite costly. Performing these allocations in high frequency will cause GC pressure and hurt performance. An elegant solution is the System.Buffers.ArrayPool class found in the Systems.Buffers . NuGet The idea is pretty similar to to the ThreadPool. A shared buffer for arrays is allocated, which you can reuse without actually allocating and de-allocating memory. The basic usage is by calling ArrayPool<T>.Shared.Rent(size). This returns a regular array, which you can use any way you please. When finished, call ArrayPool<int>.Shared.Return(array) to return the buffer back to the shared pool. Benchmark example: [ ] { [] array = [ArraySize]; } [ ] { pool = ArrayPool< >.Shared; [] array = pool.Rent(ArraySize); pool.Return(array); } Benchmark ( ) public void RegularArray int new int Benchmark ( ) public void SharedArrayPool var int int Result for ArraySize = 1000: Regular Array - 2270% (440.41 ns) Shared ArrayPool - 100% (19.40 ns) 4. Structs instead of Classes have several benefits when it comes to deallocation: Structs When structs are not part of a class, they are allocated on the stack and don’t require garbage collection at all. Structs are stored on the heap when they are part of a class (or any reference-type). In that case, they are stored inline and are deallocated when the containing type is deallocated. Inline means the struct’s data is stored as-is. As opposed to a reference type, where a pointer is stored to another location on the heap with the actual data. This is especially meaningful in collections, where a collection of structs is much cheaper to de-allocate because it’s just one buffer of memory. Structs take less memory than a reference type because they don’t have an ObjectHeader and a MethodTable. Decide whether to use struct or not based on . guidelines Benchmark example: { X { ; ; } Y { ; ; } } VectorStruct { X { ; ; } Y { ; ; } } ITEMS = ; [ ] { VectorClass[] vectors = VectorClass[ITEMS]; ( i = ; i < ITEMS; i++) { vectors[i] = VectorClass(); vectors[i].X = ; vectors[i].Y = ; } } [ ] { VectorStruct[] vectors = VectorStruct[ITEMS]; ( i = ; i < ITEMS; i++) { vectors[i].X = ; vectors[i].Y = ; } } class VectorClass public int get set public int get set struct public int get set public int get set private const int 10000 Benchmark ( ) public void WithClass new for int 0 new 5 10 Benchmark ( ) public void WithStruct new // At this point all the vectors instances are already allocated with default values for int 0 5 10 Results: With Class - 742% (88.83 us) With Struct - 100% (11.97 us) 5. StackAlloc for short-lived array allocations The StackAlloc keyword in C# allows for very fast allocation and deallocation of unmanaged memory. That is, classes won’t work, but primitives, structs, and arrays are supported. Benchmark example: VectorStruct { X { ; ; } Y { ; ; } } [ ] { VectorStruct[] vectors = VectorStruct[ ]; ( i = ; i < ; i++) { vectors[i].X = ; vectors[i].Y = ; } } [ ] { VectorStruct* vectors = VectorStruct[ ]; ( i = ; i < ; i++) { vectors[i].X = ; vectors[i].Y = ; } } [ ] { Span<VectorStruct> vectors = VectorStruct[ ]; ( i = ; i < ; i++) { vectors[i].X = ; vectors[i].Y = ; } } struct public int get set public int get set Benchmark ( ) public void WithNew new 5 for int 0 5 5 10 Benchmark ( ) public unsafe void WithStackAlloc // Note that unsafe context is required stackalloc 5 for int 0 5 5 10 Benchmark ( ) public void WithStackAllocSpan // When using Span, no need for unsafe context stackalloc 5 for int 0 5 5 10 Results: With New - 303% (10.870 ns) With StackAlloc - 102% (3.643 ns) With StackAllocSpan - 100% (3.580 ns) 6. ConcurrentQueue<T> instead of ConcurrentBag<T> Never use ConcurrentBag<T> without benchmarking. This collection has been designed for very specific use-cases (when most of the time an item is dequeued by the thread that enqueued it) and suffers from important performance issues if used otherwise. If in need of a concurrent collection, prefer ConcurrentQueue<T>. Benchmark example: Size = ; [ ] { ConcurrentBag< > bag = (); ( i = ; i < Size; i++) { bag.Add(i); } } [ ] { ConcurrentQueue< > bag = (); ( i = ; i < Size; i++) { bag.Enqueue(i); } } private static int 1000 Benchmark ( ) public void Bag int new for int 0 Benchmark ( ) public void Queue int new for int 0 Results: ConcurrentBag - 165% (24.21 us) ConcurrentQueue - 100% (14.64 us) P.S. Thanks for reading! More benchmarking comming soon! Special thanks to and his ideas. Michael's Coding Spot