最速のジェネリック特殊化を目指して

NOTE: こちらは以前Qiitaに投稿した記事のバックアップです。


最近ジェネリックプログラミングをする機会が増えているのですが、C++のテンプレート特殊化のように型ごとの最適実装を書きたいと思うことがしばしばあります。C#ジェネリックでも型チェックを駆使すれば、特定の型に対する専用処理を実装することはできますが、なまじ色んな方法があってどのように書くのが筋がよいか?というのはイマイチよくわかっていません。

わからないなら検証するしかないよね、ということでいろいろ試してみました。

検証内容

ひとまずは加算を特殊化してみることにします。 次のような、型Tの値を一つ持つクラスを考えます。

public partial class Container<T>
{
    public T Value { get; }

    public Container(T value) => Value = value;
}

Tが加算可能な型である場合に、このクラスのインスタンス2つに対しても加算が利用できるようにしたい。

partial class Container<T>
{
    public static Container<T> Add(Container<T> lhs, Container<T> rhs)
        => throw new NotImplementedException();
}

今回は、この加算メソッドが最速で動作するような方法を探っていきます。 なお、本来ならばlhsrhsの非nullガードが必要だが今回はベンチマークなので書かない。


さて、一方のTとしてはプリミティブ型、構造体、sealedかつ非nullのクラスを考えます。 継承可能だったりnullableだったりするクラスは型判定において扱いがすこぶるめんどくさくなるので今回はサポート対象外。

プリミティブ型としてはとりあえずintおよびdoubleを試します。 構造体、クラスには以下のような型を利用します。いずれもプリミティブ型の単純なラッパです。

public struct IntStruct
{
    public readonly int Value;
    
    public IntStruct(int value) => Value = value;

    public static IntStruct operator+(IntStruct lhs, IntStruct rhs)
        => new IntStruct(lhs.Value + rhs.Value);
}

public struct DoubleStruct
{
    public readonly double Value;

    public DoubleStruct(double value) => Value = value;

    public static DoubleStruct operator +(DoubleStruct lhs, DoubleStruct rhs)
        => new DoubleStruct(lhs.Value + rhs.Value);
}


public sealed class IntClass
{
    public readonly int Value;

    public IntClass(int value) => Value = value;

    public static IntClass operator +(IntClass lhs, IntClass rhs)
        => new IntClass(lhs.Value + rhs.Value);
}


public sealed class DoubleClass
{
    public readonly double Value;

    public DoubleClass(double value) => Value = value;

    public static DoubleClass operator +(DoubleClass lhs, DoubleClass rhs)
        => new DoubleClass(lhs.Value + rhs.Value);
}

これら6種3ケースのTについてベンチマークを取っていきます。

特殊化手法たち

1. ジェネリック静的Strategy

C#の場合、型引数の異なるクローズ型は明確に別の型扱いとなる1。つまり、ジェネリッククラスの静的メンバは型引数が異なると別の実体が割り当てられます。

そのため、Strategyパターンの実装を静的フィールドで保持することで容易に特殊化が実現できます。 .Netの標準ライブラリでも、Comparer<T>.Defaultなどでおなじみの方法でしょう。

まず、下記のようなStrategyを用意します。特殊化すべき型についてはDefaultをちゃんと初期化しておきます。

public interface  IArithmetic<T>
{
    T Add(T lhs, T rhs);
}

public class Arithmetic
    : IArithmetic<int>
    , IArithmetic<double>
    , IArithmetic<IntStruct>
    , IArithmetic<DoubleStruct>
    , IArithmetic<IntClass>
    , IArithmetic<DoubleClass>
{
    static Arithmetic()
    {
        var instance = new Arithmetic();
        Arithmetic<int>         .Default = instance;
        Arithmetic<double>      .Default = instance;
        Arithmetic<IntStruct>   .Default = instance;
        Arithmetic<DoubleStruct>.Default = instance;
        Arithmetic<IntClass>    .Default = instance;
        Arithmetic<DoubleClass> .Default = instance;
    }

    internal static void Initialize() {}

    public int          Add(int          lhs, int          rhs) => lhs + rhs;
    public double       Add(double       lhs, double       rhs) => lhs + rhs;
    public IntStruct    Add(IntStruct    lhs, IntStruct    rhs) => lhs + rhs;
    public DoubleStruct Add(DoubleStruct lhs, DoubleStruct rhs) => lhs + rhs;
    public IntClass     Add(IntClass     lhs, IntClass     rhs) => lhs + rhs;
    public DoubleClass  Add(DoubleClass  lhs, DoubleClass  rhs) => lhs + rhs;
}

public static class Arithmetic<T>
{
    public static IArithmetic<T> Default { get; internal set; }

    static Arithmetic() => Arithmetic.Initialize();
}

ダミーメンバと静的コンストラクタを使ってただ1度だけ初期化されるといったコスいテクニックを使ってはいるが、概ね意図は伝わるんじゃないかな。

Container<T>型はこれを使うだけ。デザパタ的美しさにはなかなか優れた方法じゃないでしょうか。

public partial class Container<T>
{
    public static Container<T> AddByStaticStrategy(Container<T> lhs, Container<T> rhs)
        => new Container<T>(Arithmetic<T>.Default.Add(lhs.Value, rhs.Value));
}

2. コンテナ全体を型スイッチ

C# 7で型スイッチが入ったことで型による分岐は気楽に書けるようになりました。 という訳で、以下のような愚直な型分岐実装が考えられますね。

public partial class Container<T>
{
    public static Container<T> AddByContainerTypeSwitch(Container<T> lhs, Container<T> rhs)
    {
        switch(lhs)
        {
        case Container<int> intL:
            {
                var r = rhs as Container<int>;
                return new Container<int>(intL.Value + r.Value) as Container<T>;
            }

        // ......
        }
        throw new Exception();
    }
}

フル実装はムダに長いので折りたたみ

public partial class Container<T>
{
    public static Container<T> AddByContainerTypeSwitch(Container<T> lhs, Container<T> rhs)
    {
        switch(lhs)
        {
        case Container<int> intL:
            {
                var r = rhs as Container<int>;
                return new Container<int>(intL.Value + r.Value) as Container<T>;
            }
        case Container<double> doubleL:
            {
                var r = rhs as Container<double>;
                return new Container<double>(doubleL.Value + r.Value) as Container<T>;
            }
        case Container<IntStruct> intStructL:
            {
                var r = rhs as Container<IntStruct>;
                return new Container<IntStruct>(intStructL.Value + r.Value) as Container<T>;
            }
        case Container<DoubleStruct> doubleStructL:
            {
                var r = rhs as Container<DoubleStruct>;
                return new Container<DoubleStruct>(doubleStructL.Value + r.Value) as Container<T>;
            }
        case Container<IntClass> intClassL:
            {
                var r = rhs as Container<IntClass>;
                return new Container<IntClass>(intClassL.Value + r.Value) as Container<T>;
            }
        case Container<DoubleClass> doubleClassL:
            {
                var r = rhs as Container<DoubleClass>;
                return new Container<DoubleClass>(doubleClassL.Value + r.Value) as Container<T>;
            }
        }
        throw new Exception();
    }
}

なお、この方法はTに継承可能なクラスを許容する場合素直には書けなくなることに注意されたし。

3. 値の方を型スイッチ

コンテナ全体ではなく、中身だけに型スイッチを適用する方法もあるでしょう。

public partial class Container<T>
{    public static Container<T> AddByValueTypeSwitch(Container<T> lhs, Container<T> rhs)
    {
        switch(lhs.Value)
        {
        case int intL:
            {
                if(rhs.Value is int r)
                    return new Container<int>(intL + r) as Container<T>;
                break;
            }

        // ......
        }
        throw new Exception();
    }
}

フル実装はムダに(ry

public partial class Container<T>
{    public static Container<T> AddByValueTypeSwitch(Container<T> lhs, Container<T> rhs)
    {
        switch(lhs.Value)
        {
        case int intL:
            {
                if(rhs.Value is int r)
                    return new Container<int>(intL + r) as Container<T>;
                break;
            }
        case double doubleL:
            {
                if(rhs.Value is double r)
                    return new Container<double>(doubleL + r) as Container<T>;
                break;
            }
        case IntStruct intStructL:
            {
                if(rhs.Value is IntStruct r)
                    return new Container<IntStruct>(intStructL + r) as Container<T>;
                break;
            }
        case DoubleStruct doubleStructL:
            {
                if(rhs.Value is DoubleStruct r)
                    return new Container<DoubleStruct>(doubleStructL + r) as Container<T>;
                break;
            }
        case IntClass intClassL:
            {
                if(rhs.Value is IntClass r)
                    return new Container<IntClass>(intClassL + r) as Container<T>;
                break;
            }
        case DoubleClass doubleClassL:
            {
                if(rhs.Value is DoubleClass r)
                    return new Container<DoubleClass>(doubleClassL + r) as Container<T>;
                break;
            }
        }
        throw new Exception();
    }
}

こちらは2.とは異なりTの派生には対応できるが、代わりにnullが入ってきたときに正しく動作しません。

4. typeofによるベタ比較

C#ではリフレクションによるメタ情報取得が非常に容易です。2 当然typeofによる型比較も考えられます。

public partial class Container<T>
{    public static Container<T> AddByTypeOf(Container<T> lhs, Container<T> rhs)
    {
        if(typeof(T) == typeof(int))
        {
            var l = lhs as Container<int>;
            var r = rhs as Container<int>;
            return new Container<int>(l.Value + r.Value) as Container<T>;
        }

        // ......

        throw new Exception();
    }
}

フル実装は(ry

public partial class Container<T>
{    public static Container<T> AddByTypeOf(Container<T> lhs, Container<T> rhs)
    {
        if(typeof(T) == typeof(int))
        {
            var l = lhs as Container<int>;
            var r = rhs as Container<int>;
            return new Container<int>(l.Value + r.Value) as Container<T>;
        }

        if(typeof(T) == typeof(double))
        {
            var l = lhs as Container<double>;
            var r = rhs as Container<double>;
            return new Container<double>(l.Value + r.Value) as Container<T>;
        }

        if(typeof(T) == typeof(IntStruct))
        {
            var l = lhs as Container<IntStruct>;
            var r = rhs as Container<IntStruct>;
            return new Container<IntStruct>(l.Value + r.Value) as Container<T>;
        }

        if(typeof(T) == typeof(DoubleStruct))
        {
            var l = lhs as Container<DoubleStruct>;
            var r = rhs as Container<DoubleStruct>;
            return new Container<DoubleStruct>(l.Value + r.Value) as Container<T>;
        }

        if(typeof(T) == typeof(IntClass))
        {
            var l = lhs as Container<IntClass>;
            var r = rhs as Container<IntClass>;
            return new Container<IntClass>(l.Value + r.Value) as Container<T>;
        }

        if(typeof(T) == typeof(DoubleClass))
        {
            var l = lhs as Container<DoubleClass>;
            var r = rhs as Container<DoubleClass>;
            return new Container<DoubleClass>(l.Value + r.Value) as Container<T>;
        }

        throw new Exception();
    }
}

こちらもTの派生には対応しづらい。 できなくはないがパフォーマンス面では非常にきつい予感がしますね。

5. Ldftn + Calli

こちらの記事で実践している方がいるが、関数ポインタを直に叩くことでなんやかんやという話があるのだとか。

現時点ではC#で記述不可能なのでMSILでつらつら書いていきます。

.assembly extern mscorlib
{}

.assembly extern GenericSpecializationBenchmark.Core
{}

.assembly GenericSpecializationBenchmark.Unsafe
{}

.module GenericSpecializationBenchmark.Unsafe.dll


.class private auto ansi abstract sealed FastArithmetic
    extends [mscorlib]System.Object
{
    .method private hidebysig specialname rtspecialname static 
        void .cctor () cil managed 
    {
        .maxstack 8

        ldftn int32 FastArithmetic::Add(int32, int32)
        stsfld void* class FastArithmetic`1<int32>::_fptrAdd

        ldftn float64 FastArithmetic::Add(float64, float64)
        stsfld void* class FastArithmetic`1<float64>::_fptrAdd

        ldftn valuetype [GenericSpecializationBenchmark.Core]IntStruct FastArithmetic::Add(valuetype [GenericSpecializationBenchmark.Core]IntStruct, valuetype [GenericSpecializationBenchmark.Core]IntStruct)
        stsfld void* class FastArithmetic`1<valuetype [GenericSpecializationBenchmark.Core]IntStruct>::_fptrAdd

        ldftn valuetype [GenericSpecializationBenchmark.Core]DoubleStruct FastArithmetic::Add(valuetype [GenericSpecializationBenchmark.Core]DoubleStruct, valuetype [GenericSpecializationBenchmark.Core]DoubleStruct)
        stsfld void* class FastArithmetic`1<valuetype [GenericSpecializationBenchmark.Core]DoubleStruct>::_fptrAdd

        ldftn class [GenericSpecializationBenchmark.Core]IntClass FastArithmetic::Add(class [GenericSpecializationBenchmark.Core]IntClass, class [GenericSpecializationBenchmark.Core]IntClass)
        stsfld void* class FastArithmetic`1<class [GenericSpecializationBenchmark.Core]IntClass>::_fptrAdd

        ldftn class [GenericSpecializationBenchmark.Core]DoubleClass FastArithmetic::Add(class [GenericSpecializationBenchmark.Core]DoubleClass, class [GenericSpecializationBenchmark.Core]DoubleClass)
        stsfld void* class FastArithmetic`1<class [GenericSpecializationBenchmark.Core]DoubleClass>::_fptrAdd

        ret
    }


    .method assembly hidebysig static 
        void Initialize () cil managed 
    {
        .maxstack 8

        ret
    }


    .method public hidebysig static 
        int32 Add (int32 lhs, int32 rhs) cil managed 
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        add
        ret
    }


    .method public hidebysig static 
        int64 Add (int64 lhs, int64 rhs) cil managed 
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        add
        ret
    }


    .method public hidebysig static 
        float32 Add (float32 lhs, float32 rhs) cil managed 
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        add
        ret
    }


    .method public hidebysig static 
        float64 Add (float64 lhs, float64 rhs) cil managed 
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        add
        ret
    }


    .method public hidebysig static
        valuetype [GenericSpecializationBenchmark.Core]IntStruct Add (
            valuetype [GenericSpecializationBenchmark.Core]IntStruct lhs,
            valuetype [GenericSpecializationBenchmark.Core]IntStruct rhs
        )
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        call valuetype [GenericSpecializationBenchmark.Core]IntStruct [GenericSpecializationBenchmark.Core]IntStruct::op_Addition(valuetype [GenericSpecializationBenchmark.Core]IntStruct, valuetype [GenericSpecializationBenchmark.Core]IntStruct)
        ret
    }


    .method public hidebysig static
        valuetype [GenericSpecializationBenchmark.Core]DoubleStruct Add (
            valuetype [GenericSpecializationBenchmark.Core]DoubleStruct lhs,
            valuetype [GenericSpecializationBenchmark.Core]DoubleStruct rhs
        )
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        call valuetype [GenericSpecializationBenchmark.Core]DoubleStruct [GenericSpecializationBenchmark.Core]DoubleStruct::op_Addition(valuetype [GenericSpecializationBenchmark.Core]DoubleStruct, valuetype [GenericSpecializationBenchmark.Core]DoubleStruct)
        ret
    }


    .method public hidebysig static
        class [GenericSpecializationBenchmark.Core]IntClass Add (
            class [GenericSpecializationBenchmark.Core]IntClass lhs,
            class [GenericSpecializationBenchmark.Core]IntClass rhs
        )
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        call class [GenericSpecializationBenchmark.Core]IntClass [GenericSpecializationBenchmark.Core]IntClass::op_Addition(class [GenericSpecializationBenchmark.Core]IntClass, class [GenericSpecializationBenchmark.Core]IntClass)
        ret
    }


    .method public hidebysig static
        class [GenericSpecializationBenchmark.Core]DoubleClass Add (
            class [GenericSpecializationBenchmark.Core]DoubleClass lhs,
            class [GenericSpecializationBenchmark.Core]DoubleClass rhs
        )
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        call class [GenericSpecializationBenchmark.Core]DoubleClass [GenericSpecializationBenchmark.Core]DoubleClass::op_Addition(class [GenericSpecializationBenchmark.Core]DoubleClass, class [GenericSpecializationBenchmark.Core]DoubleClass)
        ret
    }
}



.class public auto ansi abstract sealed beforefieldinit FastArithmetic`1<T>
    extends [mscorlib]System.Object
{
    .field assembly static void* _fptrAdd


    .property bool IsSupported()
    {
        .get bool FastArithmetic`1::get_IsSupported()
    }


    .method public hidebysig specialname static 
        bool get_IsSupported () cil managed aggressiveinlining
    {
        .maxstack 8

        ldsfld void* class FastArithmetic`1<!T>::_fptrAdd
        ldc.i4.0
        conv.u
        ceq
        ldc.i4.0
        conv.u
        ceq
        ret
    }


    .method private hidebysig specialname rtspecialname static 
        void .cctor () cil managed
    {
        .maxstack 8
        call void class FastArithmetic::Initialize()
        ret
    }


    .method public hidebysig static 
        !T Add (!T lhs, !T rhs) cil managed aggressiveinlining
    {
        .maxstack 8

        ldarg.0
        ldarg.1
        ldsfld void* class FastArithmetic`1<!T>::_fptrAdd
        calli !T(!T, !T)
        ret
    }
}

使う側はこう。雰囲気は静的Strategyに近い。

public partial class Container<T>
{
    public static Container<T> AddByLdftnAndCalli(Container<T> lhs, Container<T> rhs)
    {
        if(FastArithmetic<T>.IsSupported)
            return new Container<T>(FastArithmetic<T>.Add(lhs.Value, rhs.Value));

        throw new Exception();
    }
}

unsafeならともかくILの保守なんかしたくないよ!という人は多いと思うので今のところ現実的な方法ではないが、csharplangでは関数ポインタがプロポーザルに上がってたりするのでそのうち実用の範囲まで降りてくるかもしれません。 ひとまず今回は参考記録ということで。

6. 拡張メソッドによるオーバーローディング

拡張メソッドに追い出してしまえば、クローズ型だろうとなんだろうと同名のメソッドでオーバーロードが可能。

public static class Container
{
    public static Container<int> AddByOverload(Container<int> lhs, Container<int> rhs)
        => new Container<int>(lhs.Value + rhs.Value);

    // ......
}

フ(ry

public static class Container
{
    public static Container<int> AddByOverload(Container<int> lhs, Container<int> rhs)
        => new Container<int>(lhs.Value + rhs.Value);


    public static Container<double> AddByOverload(Container<double> lhs, Container<double> rhs)
        => new Container<double>(lhs.Value + rhs.Value);


    public static Container<IntStruct> AddByOverload(Container<IntStruct> lhs, Container<IntStruct> rhs)
        => new Container<IntStruct>(lhs.Value + rhs.Value);


    public static Container<DoubleStruct> AddByOverload(Container<DoubleStruct> lhs, Container<DoubleStruct> rhs)
        => new Container<DoubleStruct>(lhs.Value + rhs.Value);


    public static Container<IntClass> AddByOverload(Container<IntClass> lhs, Container<IntClass> rhs)
        => new Container<IntClass>(lhs.Value + rhs.Value);


    public static Container<DoubleClass> AddByOverload(Container<DoubleClass> lhs, Container<DoubleClass> rhs)
        => new Container<DoubleClass>(lhs.Value + rhs.Value);
}

C#コンパイル時点で完全に別のメソッド呼び出しになってる上、IL命令もcallvirtじゃなくcallなのでパフォーマンスだけ見ればこれが最速でしょう。 とはいえ最初の呼び出しがクローズ型じゃないと呼び分けが機能しないし、非publicなメンバにはアクセスできないし、演算子オーバーロードでは使えないしで、他の方法と比べるとかなり制約が厳しいです。 完全な代替にはなりえないと思われます。 こちらも参考記録ということで。

いざ、ベンチマーク

メソッドも出揃ったのでベンチマークを取っていきます。

ベンチマークコード

まずはベンチマークメソッド全体を定義しておきます。

using System;
using System.Linq;
using System.Reflection;

public static class GenericSpecializationBenchmarkCore
{
    public const int Iteration = 10000;


    static GenericSpecializationBenchmarkCore()
    {
        var results = typeof(GenericSpecializationBenchmarkCore)
            .GetMethods(BindingFlags.Public | BindingFlags.Static)
            .Select(mi => (double)mi.Invoke(null, null))
            .ToList();
        foreach(var res in results)
            if(results[0] != res)
                throw new Exception("Invalid add method impl");
    }

    // こんな感じのメソッドをPrimitive/Struct/Class、および各特殊化手法ごとに定義していく
    public static double AddByStaticStrategy_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByStaticStrategy(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByStaticStrategy(x, y);
            result += x.Value;
        }
        return result;
    }
}

(ry

using System;
using System.Linq;
using System.Reflection;


public static class GenericSpecializationBenchmarkCore
{
    public const int Iteration = 10000;


    static GenericSpecializationBenchmarkCore()
    {
        var results = typeof(GenericSpecializationBenchmarkCore)
            .GetMethods(BindingFlags.Public | BindingFlags.Static)
            .Select(mi => (double)mi.Invoke(null, null))
            .ToList();
        foreach(var res in results)
            if(results[0] != res)
                throw new Exception("Invalid add method impl");
    }


    public static double AddByStaticStrategy_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByStaticStrategy(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByStaticStrategy(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByContainerTypeSwitch_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByContainerTypeSwitch(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByContainerTypeSwitch(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByValueTypeSwitch_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByValueTypeSwitch(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByValueTypeSwitch(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByTypeOf_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByTypeOf(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByTypeOf(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByLdftnAndCalli_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<int>.AddByLdftnAndCalli(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container<double>.AddByLdftnAndCalli(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByOverload_Primitive()
    {
        var result = 0.0;
        {
            var x = new Container<int>(1);
            var y = new Container<int>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value;
        }
        {
            var x = new Container<double>(1);
            var y = new Container<double>(1);
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value;
        }
        return result;
    }


    public static double AddByStaticStrategy_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntStruct>.AddByStaticStrategy(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleStruct>.AddByStaticStrategy(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByContainerTypeSwitch_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntStruct>.AddByContainerTypeSwitch(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleStruct>.AddByContainerTypeSwitch(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByValueTypeSwitch_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntStruct>.AddByValueTypeSwitch(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleStruct>.AddByValueTypeSwitch(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByTypeOf_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntStruct>.AddByTypeOf(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleStruct>.AddByTypeOf(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByLdftnAndCalli_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntStruct>.AddByLdftnAndCalli(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleStruct>.AddByLdftnAndCalli(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByOverload_Struct()
    {
        var result = 0.0;
        {
            var x = new Container<IntStruct>(new IntStruct(1));
            var y = new Container<IntStruct>(new IntStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleStruct>(new DoubleStruct(1));
            var y = new Container<DoubleStruct>(new DoubleStruct(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByStaticStrategy_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntClass>.AddByStaticStrategy(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleClass>.AddByStaticStrategy(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByContainerTypeSwitch_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntClass>.AddByContainerTypeSwitch(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleClass>.AddByContainerTypeSwitch(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByValueTypeSwitch_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntClass>.AddByValueTypeSwitch(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleClass>.AddByValueTypeSwitch(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByTypeOf_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntClass>.AddByTypeOf(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleClass>.AddByTypeOf(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByLdftnAndCalli_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container<IntClass>.AddByLdftnAndCalli(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container<DoubleClass>.AddByLdftnAndCalli(x, y);
            result += x.Value.Value;
        }
        return result;
    }


    public static double AddByOverload_Class()
    {
        var result = 0.0;
        {
            var x = new Container<IntClass>(new IntClass(1)    );
            var y = new Container<IntClass>(new IntClass(1)    );
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value.Value;
        }
        {
            var x = new Container<DoubleClass>(new DoubleClass(1));
            var y = new Container<DoubleClass>(new DoubleClass(1));
            for(var i = 0; i < Iteration; ++i)
                x = Container.AddByOverload(x, y);
            result += x.Value.Value;
        }
        return result;
    }
}

.Net Coreおよび.Net FrameworkではBenchmarkDotNetが使えるのでベンチマーククラスをかぶせていきます。

(ry

using System;
using BenchmarkDotNet.Attributes;


[CoreJob, ClrJob]
public class GenericSpecializationBenchmark
{
    [Benchmark]
    public double AddByStaticStrategy_Primitive()
        => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Primitive();

    [Benchmark]
    public double AddByContainerTypeSwitch_Primitive()
        => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Primitive();

    [Benchmark]
    public double AddByValueTypeSwitch_Primitive()
        => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Primitive();

    [Benchmark]
    public double AddByTypeOf_Primitive()
        => GenericSpecializationBenchmarkCore.AddByTypeOf_Primitive();

    [Benchmark]
    public double AddByLdftnAndCalli_Primitive()
        => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Primitive();

    [Benchmark]
    public double AddByOverload_Primitive()
        => GenericSpecializationBenchmarkCore.AddByOverload_Primitive();

    [Benchmark]
    public double AddByStaticStrategy_Struct()
        => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Struct();

    [Benchmark]
    public double AddByContainerTypeSwitch_Struct()
        => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Struct();

    [Benchmark]
    public double AddByValueTypeSwitch_Struct()
        => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Struct();

    [Benchmark]
    public double AddByTypeOf_Struct()
        => GenericSpecializationBenchmarkCore.AddByTypeOf_Struct();

    [Benchmark]
    public double AddByLdftnAndCalli_Struct()
        => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Struct();

    [Benchmark]
    public double AddByOverload_Struct()
        => GenericSpecializationBenchmarkCore.AddByOverload_Struct();

    [Benchmark]
    public double AddByStaticStrategy_Class()
        => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Class();

    [Benchmark]
    public double AddByContainerTypeSwitch_Class()
        => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Class();

    [Benchmark]
    public double AddByValueTypeSwitch_Class()
        => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Class();

    [Benchmark]
    public double AddByTypeOf_Class()
        => GenericSpecializationBenchmarkCore.AddByTypeOf_Class();

    [Benchmark]
    public double AddByLdftnAndCalli_Class()
        => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Class();

    [Benchmark]
    public double AddByOverload_Class()
        => GenericSpecializationBenchmarkCore.AddByOverload_Class();
}

C#のプラットフォームとしてもう一つデカいやつ、Unityがあるのですが、残念ながらBenchmarkDotNetはUnity上では動きません。 代わりにPerformance Testing Extension for Unity Test Runnerなるものを見つけたので、今回はこれを使ってみます。

(ry

using System;
using UnityEngine;
using Unity.PerformanceTesting;


public class GenericSpecializationBenchmark : MonoBehaviour
{
    [PerformanceTest]
    public void AddByStaticStrategy_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByContainerTypeSwitch_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByValueTypeSwitch_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByTypeOf_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByTypeOf_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByLdftnAndCalli_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByOverload_Primitive()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByOverload_Primitive())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByStaticStrategy_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByContainerTypeSwitch_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByValueTypeSwitch_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByTypeOf_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByTypeOf_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByLdftnAndCalli_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByOverload_Struct()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByOverload_Struct())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByStaticStrategy_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByStaticStrategy_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByContainerTypeSwitch_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByContainerTypeSwitch_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByValueTypeSwitch_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByValueTypeSwitch_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByTypeOf_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByTypeOf_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByLdftnAndCalli_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByLdftnAndCalli_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

    [PerformanceTest]
    public void AddByOverload_Class()
    {
        Measure.Method(() => GenericSpecializationBenchmarkCore.AddByOverload_Class())
            .WarmupCount(16)
            .MeasurementCount(128)
            .IterationsPerMeasurement(16)
            .Run();
    }

}

結果

という訳で結果発表。テスト環境は以下の通り。 Unityも同じマシンを使っており、バージョンは2018.3.5f1です。

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3914060 Hz, Resolution=255.4892 ns, Timer=TSC
.NET Core SDK=2.2.103
  [Host] : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3324.0
  Core   : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT

.Net Framework

Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Primitive Clr Clr 122.10 us 0.2588 us 0.2021 us
AddByContainerTypeSwitch_Primitive Clr Clr 178.35 us 0.8562 us 0.8009 us
AddByValueTypeSwitch_Primitive Clr Clr 403.07 us 3.7490 us 3.5068 us
AddByTypeOf_Primitive Clr Clr 150.96 us 1.0237 us 0.9576 us
AddByLdftnAndCalli_Primitive Clr Clr 113.50 us 1.0959 us 1.0251 us
AddByOverload_Primitive Clr Clr 81.95 us 0.6142 us 0.5745 us
Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Struct Clr Clr 144.91 us 1.5194 us 1.2688 us
AddByContainerTypeSwitch_Struct Clr Clr 274.19 us 0.6559 us 0.5477 us
AddByValueTypeSwitch_Struct Clr Clr 525.21 us 1.3129 us 1.1639 us
AddByTypeOf_Struct Clr Clr 156.33 us 0.9623 us 0.9002 us
AddByLdftnAndCalli_Struct Clr Clr 158.57 us 1.0668 us 0.9979 us
AddByOverload_Struct Clr Clr 124.76 us 0.2196 us 0.1833 us
Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Class Clr Clr 442.95 us 1.5895 us 1.4869 us
AddByContainerTypeSwitch_Class Clr Clr 425.81 us 1.1064 us 1.0350 us
AddByValueTypeSwitch_Class Clr Clr 407.23 us 3.9100 us 3.6574 us
AddByTypeOf_Class Clr Clr 284.35 us 1.6016 us 1.4198 us
AddByLdftnAndCalli_Class Clr Clr 347.96 us 2.9368 us 2.7471 us
AddByOverload_Class Clr Clr 156.59 us 0.7971 us 0.6656 us

.Net Frameworkの場合、値型では静的Strategy、クラスではtypeofが速かったです。 プリミティブ相手だと関数ポインタが速くて悪くないんですが、苦労に比べれば大した改善じゃないし構造体やクラス相手だとむしろ遅いしでどうしようもないです。

.Net Core

Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Primitive Core Core 150.60 us 0.3208 us 0.2843 us
AddByContainerTypeSwitch_Primitive Core Core 110.63 us 0.1589 us 0.1487 us
AddByValueTypeSwitch_Primitive Core Core 85.13 us 0.0894 us 0.0836 us
AddByTypeOf_Primitive Core Core 91.29 us 0.1205 us 0.0941 us
AddByLdftnAndCalli_Primitive Core Core 148.50 us 0.1943 us 0.1818 us
AddByOverload_Primitive Core Core 87.54 us 0.4548 us 0.4255 us
Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Struct Core Core 156.91 us 0.2384 us 0.2114 us
AddByContainerTypeSwitch_Struct Core Core 207.03 us 0.4905 us 0.4589 us
AddByValueTypeSwitch_Struct Core Core 175.25 us 0.3431 us 0.3210 us
AddByTypeOf_Struct Core Core 129.96 us 0.5253 us 0.4656 us
AddByLdftnAndCalli_Struct Core Core 161.22 us 1.1491 us 0.9596 us
AddByOverload_Struct Core Core 132.88 us 0.5114 us 0.4534 us
Method Job Runtime Mean Error StdDev
AddByStaticStrategy_Class Core Core 388.40 us 0.6757 us 0.5643 us
AddByContainerTypeSwitch_Class Core Core 416.41 us 0.6913 us 0.6128 us
AddByValueTypeSwitch_Class Core Core 415.41 us 1.1371 us 1.0080 us
AddByTypeOf_Class Core Core 256.51 us 0.9296 us 0.8241 us
AddByLdftnAndCalli_Class Core Core 335.43 us 1.1102 us 1.0385 us
AddByOverload_Class Core Core 167.72 us 0.3065 us 0.2867 us

全体を通してtypeofが速いです。 プリミティブ型に対しては値に対しての型スイッチが速いですが、構造体・クラス相手だとむしろ遅いです。 プリミティブ型相手だとJITでガッツリ最適化かかってるんですかね。

Unity

Method Median Min Max Avg Std
AddByStaticStrategy_Primitive 2.78 ms 2.70 ms 3.33 ms 2.85 ms 0.14 ms
AddByContainerTypeSwitch_Primitive 2.73 ms 2.66 ms 3.19 ms 2.80 ms 0.12 ms
AddByValueTypeSwitch_Primitive 13.69 ms 13.39 ms 16.06 ms 13.73 ms 0.24 ms
AddByTypeOf_Primitive 2.72 ms 2.68 ms 3.20 ms 2.80 ms 0.12 ms
AddByLdftnAndCalli_Primitive 6.95 ms 6.88 ms 7.44 ms 7.03 ms 0.13 ms
AddByOverload_Primitive 2.60 ms 2.55 ms 2.85 ms 2.67 ms 0.11 ms
Method Median Min Max Avg Std
AddByStaticStrategy_Struct 3.04 ms 2.99 ms 3.39 ms 3.11 ms 0.12 ms
AddByContainerTypeSwitch_Struct 3.03 ms 2.98 ms 3.69 ms 3.11 ms 0.14 ms
AddByValueTypeSwitch_Struct 19.12 ms 18.82 ms 21.28 ms 19.14 ms 0.31 ms
AddByTypeOf_Struct 3.02 ms 2.97 ms 4.05 ms 3.11 ms 0.15 ms
AddByLdftnAndCalli_Struct 7.31 ms 7.19 ms 9.42 ms 7.38 ms 0.22 ms
AddByOverload_Struct 2.84 ms 2.80 ms 3.19 ms 2.92 ms 0.12 ms
Method Median Min Max Avg Std
AddByStaticStrategy_Class 5.67 ms 5.39 ms 7.02 ms 5.66 ms 0.23 ms
AddByContainerTypeSwitch_Class 5.52 ms 5.18 ms 7.13 ms 5.50 ms 0.24 ms
AddByValueTypeSwitch_Class 5.56 ms 5.31 ms 5.85 ms 5.52 ms 0.12 ms
AddByTypeOf_Class 5.71 ms 5.43 ms 6.73 ms 5.69 ms 0.18 ms
AddByLdftnAndCalli_Class 9.83 ms 9.53 ms 11.76 ms 9.81 ms 0.22 ms
AddByOverload_Class 5.26 ms 5.00 ms 5.83 ms 5.21 ms 0.13 ms

全体を通してあまり差がない・・・プリミティブ型・構造体相手のときに値の型スイッチに対してやたら遅くなるくらいですかね?

考察

速い手法はなぜ速いのか?を調べるにはJIT結果を見るのが一番なので試してみます。 Tint/IntStruct/IntClassのときの実質的なアセンブリを確認していきます。 なお、Container<T>の各AddメソッドにはMethodImpl(MethodImplOptions.NoInlining)属性を指定して測定しています。 メソッド全体インライン化されたらどこ見たらいいかわかんないからね。 途中のcall先でどれだけ命令が呼ばれているのか追跡しきれなかったのであくまで参考値ですが命令数も載せておきます。

ひとまずは.Net Core 2.2.1で実証。 誰か他の環境を調べて

プリミティブ型

AddByOverload_Primitive

   174:         => new Container<int>(lhs.Value + rhs.Value);
00007FFC87257260  push        rdi  
00007FFC87257261  push        rsi  
00007FFC87257262  sub         rsp,28h  
00007FFC87257266  mov         rsi,rdx  
00007FFC87257269  mov         edi,dword ptr [rcx+8]  
00007FFC8725726C  mov         rcx,7FFC8730A778h  
00007FFC87257276  call        00007FFCE6D5B3B0  
00007FFC8725727B  mov         edx,edi  
00007FFC8725727D  add         edx,dword ptr [rsi+8]  
00007FFC87257280  mov         dword ptr [rax+8],edx  
00007FFC87257283  add         rsp,28h  
00007FFC87257287  pop         rsi  
00007FFC87257288  pop         rdi  
00007FFC87257289  ret  

AddByStaticStrategy_Primitive

    22:         => new Container<T>(Arithmetic<T>.Default.Add(lhs.Value, rhs.Value));
00007FFC87265D90  push        rdi  
00007FFC87265D91  push        rsi  
00007FFC87265D92  push        rbp  
00007FFC87265D93  push        rbx  
00007FFC87265D94  sub         rsp,28h  
00007FFC87265D98  mov         rsi,rcx  
00007FFC87265D9B  mov         rdi,rdx  
00007FFC87265D9E  mov         rcx,7FFC87349D80h  
00007FFC87265DA8  xor         edx,edx  
00007FFC87265DAA  call        00007FFCE6D32120  
00007FFC87265DAF  mov         rcx,1A944672AD0h  
00007FFC87265DB9  mov         rbx,qword ptr [rcx]  
00007FFC87265DBC  mov         esi,dword ptr [rsi+8]  
00007FFC87265DBF  mov         rcx,7FFC8731A778h  
00007FFC87265DC9  call        00007FFCE6D5B3B0  
00007FFC87265DCE  mov         rbp,rax  
00007FFC87265DD1  mov         r8d,dword ptr [rdi+8]  
00007FFC87265DD5  mov         rcx,rbx  
00007FFC87265DD8  mov         edx,esi  
00007FFC87265DDA  mov         r11,7FFC87150028h  
00007FFC87265DE4  cmp         dword ptr [rcx],ecx  
00007FFC87265DE6  call        qword ptr [7FFC87150028h]  

    00007FFC872661C0  lea         eax,[rdx+r8]  
    00007FFC872661C4  ret  

00007FFC87265DEC  mov         dword ptr [rbp+8],eax  
00007FFC87265DEF  mov         rax,rbp  
00007FFC87265DF2  add         rsp,28h  
00007FFC87265DF6  pop         rbx  
00007FFC87265DF7  pop         rbp  
00007FFC87265DF8  pop         rsi  
00007FFC87265DF9  pop         rdi  
00007FFC87265DFA  ret  

AddByContainerTypeSwitch_Primitive

    28:         switch(lhs)
00007FFC872567D0  push        rdi  
00007FFC872567D1  push        rsi  
00007FFC872567D2  sub         rsp,0F8h  
00007FFC872567D9  mov         rsi,rdx  
00007FFC872567DC  test        rcx,rcx  
00007FFC872567DF  je          00007FFC87256805  
00007FFC872567E1  mov         edi,dword ptr [rcx+8]  
00007FFC872567E4  mov         rcx,7FFC8730A778h  
00007FFC872567EE  call        00007FFCE6D5B3B0  
00007FFC872567F3  mov         ecx,edi  
00007FFC872567F5  add         ecx,dword ptr [rsi+8]  
00007FFC872567F8  mov         dword ptr [rax+8],ecx  
00007FFC872567FB  add         rsp,0F8h  
00007FFC87256802  pop         rsi  
00007FFC87256803  pop         rdi  
00007FFC87256804  ret  

AddByValueTypeSwitch_Primitive

    68:         switch(lhs.Value)
00007FFC87256A90  push        rdi  
00007FFC87256A91  push        rsi  
00007FFC87256A92  sub         rsp,28h  
00007FFC87256A96  mov         esi,dword ptr [rcx+8]  
00007FFC87256A99  mov         ecx,dword ptr [rdx+8]  
00007FFC87256A9C  mov         edi,ecx  
    73:                     return new Container<int>(intL + r) as Container<T>;
00007FFC87256A9E  mov         rcx,7FFC8730A778h  
00007FFC87256AA8  call        00007FFCE6D5B3B0  
00007FFC87256AAD  add         esi,edi  
00007FFC87256AAF  mov         dword ptr [rax+8],esi  
00007FFC87256AB2  add         rsp,28h  
00007FFC87256AB6  pop         rsi  
00007FFC87256AB7  pop         rdi  
00007FFC87256AB8  ret  

AddByTypeof_Primitive

   114:         if(typeof(T) == typeof(int))
00007FFC87256C90  push        rdi  
00007FFC87256C91  push        rsi  
00007FFC87256C92  sub         rsp,28h  
00007FFC87256C96  mov         rsi,rdx  
00007FFC87256C99  mov         edi,dword ptr [rcx+8]  
00007FFC87256C9C  mov         rcx,7FFC8730A778h  
00007FFC87256CA6  call        00007FFCE6D5B3B0  
00007FFC87256CAB  mov         edx,edi  
00007FFC87256CAD  add         edx,dword ptr [rsi+8]  
00007FFC87256CB0  mov         dword ptr [rax+8],edx  
00007FFC87256CB3  add         rsp,28h  
00007FFC87256CB7  pop         rsi  
00007FFC87256CB8  pop         rdi  
00007FFC87256CB9  ret  

Method Instruction Count Mean
AddByOverload_Primitive 14 87.54 us
AddByStaticStrategy_Primitive 30 150.60 us
AddByContainerTypeSwitch_Primitive 16 110.63 us
AddByValueTypeSwitch_Primitive 14 85.13 us
AddByTypeof_Primitive 14 91.29 us

AddByValueTypeSwitchAddByTypeofAddByOverloadと同じ命令数まで最適化されています。 AddByTypeofに至っては一字一句すべて一致しています。 つまり、JIT後はAddByOverloadAddByTypeofで全く同じということですね。

構造体

AddByOverload_Struct

   184:         => new Container<IntStruct>(lhs.Value + rhs.Value);
00007FFC87258530  push        rsi  
00007FFC87258531  sub         rsp,20h  
00007FFC87258535  mov         ecx,dword ptr [rcx+8]  
00007FFC87258538  mov         eax,dword ptr [rdx+8]  
00007FFC8725853B  lea         esi,[rcx+rax]  
00007FFC8725853E  mov         rcx,7FFC8733B288h  
00007FFC87258548  call        00007FFCE6D5B3B0  
00007FFC8725854D  mov         dword ptr [rax+8],esi  
00007FFC87258550  add         rsp,20h  
00007FFC87258554  pop         rsi  
00007FFC87258555  ret  

AddByStaticStrategy_Struct

    22:         => new Container<T>(Arithmetic<T>.Default.Add(lhs.Value, rhs.Value));
00007FFC87257490  push        rdi  
00007FFC87257491  push        rsi  
00007FFC87257492  push        rbp  
00007FFC87257493  push        rbx  
00007FFC87257494  sub         rsp,28h  
00007FFC87257498  mov         rax,23910002AE0h  
00007FFC872574A2  mov         rsi,qword ptr [rax]  
00007FFC872574A5  mov         edi,dword ptr [rcx+8]  
00007FFC872574A8  mov         ebx,dword ptr [rdx+8]  
00007FFC872574AB  mov         rcx,7FFC8733B288h  
00007FFC872574B5  call        00007FFCE6D5B3B0  
00007FFC872574BA  mov         rbp,rax  
00007FFC872574BD  mov         rcx,rsi  
00007FFC872574C0  mov         r8d,ebx  
00007FFC872574C3  mov         edx,edi  
00007FFC872574C5  mov         r11,7FFC87140038h  
00007FFC872574CF  cmp         dword ptr [rcx],ecx  
00007FFC872574D1  call        qword ptr [7FFC87140038h]  

    00007FFC87257510  lea         eax,[rdx+r8]  
    00007FFC87257514  ret  

00007FFC872574D7  mov         dword ptr [rbp+8],eax  
00007FFC872574DA  mov         rax,rbp  
00007FFC872574DD  add         rsp,28h  
00007FFC872574E1  pop         rbx  
00007FFC872574E2  pop         rbp  
00007FFC872574E3  pop         rsi  
00007FFC872574E4  pop         rdi  
00007FFC872574E5  ret  

AddByContainerTypeSwitch_Struct

    28:         switch(lhs)
00007FFC872677A0  push        rdi  
00007FFC872677A1  push        rsi  
00007FFC872677A2  push        rbp  
00007FFC872677A3  push        rbx  
00007FFC872677A4  sub         rsp,0D8h  
00007FFC872677AB  vzeroupper  
00007FFC872677AE  vmovaps     xmmword ptr [rsp+0C0h],xmm6  
00007FFC872677B8  mov         rsi,rdx  
00007FFC872677BB  mov         rdi,rcx  
00007FFC872677BE  test        rdi,rdi  
00007FFC872677C1  je          00007FFC872678B6  
00007FFC872677C7  mov         rdx,rdi  
00007FFC872677CA  mov         rcx,7FFC8731A778h  
00007FFC872677D4  call        00007FFCE6D59C70  
00007FFC872677D9  mov         rbx,rax  
00007FFC872677DC  test        rbx,rbx  
00007FFC872677DF  jne         00007FFC872677FD  
00007FFC872677E1  mov         rdx,rdi  
00007FFC872677E4  mov         rcx,7FFC8731A978h  
00007FFC872677EE  call        00007FFCE6D59C70  
00007FFC872677F3  mov         rbp,rax  
00007FFC872677F6  test        rbp,rbp  
00007FFC872677F9  jne         00007FFC8726782E  
00007FFC872677FB  jmp         00007FFC8726786B  

00007FFC8726786B  mov         ecx,dword ptr [rdi+8]  
00007FFC8726786E  mov         eax,dword ptr [rsi+8]  
00007FFC87267871  lea         esi,[rcx+rax]  
00007FFC87267874  mov         rcx,7FFC8734B288h  
00007FFC8726787E  call        00007FFCE6D5B3B0  
00007FFC87267883  mov         dword ptr [rax+8],esi  
    43:                 return new Container<IntStruct>(intStructL.Value + r.Value) as Container<T>;
00007FFC87267886  jmp         00007FFC872678A0  

00007FFC872678A0  vmovaps     xmm6,xmmword ptr [rsp+0C0h]  
00007FFC872678AA  add         rsp,0D8h  
00007FFC872678B1  pop         rbx  
00007FFC872678B2  pop         rbp  
00007FFC872678B3  pop         rsi  
00007FFC872678B4  pop         rdi  
00007FFC872678B5  ret  

AddByValueTypeSwitch_Struct

    68:         switch(lhs.Value)
00007FFC87257CC0  push        rsi  
00007FFC87257CC1  sub         rsp,20h  
00007FFC87257CC5  mov         ecx,dword ptr [rcx+8]  
00007FFC87257CC8  mov         eax,dword ptr [rdx+8]  
    85:                     return new Container<IntStruct>(intStructL + r) as Container<T>;
00007FFC87257CCB  lea         esi,[rcx+rax]  
00007FFC87257CCE  mov         rcx,7FFC8733B288h  
00007FFC87257CD8  call        00007FFCE6D5B3B0  
00007FFC87257CDD  mov         dword ptr [rax+8],esi  
00007FFC87257CE0  add         rsp,20h  
00007FFC87257CE4  pop         rsi  
00007FFC87257CE5  ret  

AddByTypeof_Struct

   114:         if(typeof(T) == typeof(int))
00007FFC87257F80  push        rsi  
00007FFC87257F81  sub         rsp,20h  
00007FFC87257F85  mov         ecx,dword ptr [rcx+8]  
00007FFC87257F88  mov         eax,dword ptr [rdx+8]  
00007FFC87257F8B  lea         esi,[rcx+rax]  
00007FFC87257F8E  mov         rcx,7FFC8733B288h  
00007FFC87257F98  call        00007FFCE6D5B3B0  
00007FFC87257F9D  mov         dword ptr [rax+8],esi  
00007FFC87257FA0  add         rsp,20h  
00007FFC87257FA4  pop         rsi  
00007FFC87257FA5  ret  

Method Instruction Count Mean
AddByOverload_Struct 11 132.88 us
AddByStaticStrategy_Struct 28 156.91 us
AddByContainerTypeSwitch_Struct 38 207.03 us
AddByValueTypeSwitch_Struct 11 175.25 us
AddByTypeof_Struct 11 129.96 us

AddByValueTypeSwitchAddByTypeofは引き続き優秀で、AddByTypeofAddByOverloadと同等なのもプリミティブ型のときと一緒です。 プリミティブ型では前述の2手法には及ばなかったAddByStaticStrategyも、命令数肥大化がほとんどないためか構造体に対しては良好なパフォーマンスが得られていることがわかります。

一方でAddByContainerTypeSwitchは著しく悪化してしまいました。 途中jne/jmp命令が挟まっていることから最適化による条件判定の消去が実施されていないことが伺えます。

クラス

AddByOverload_Class

   194:         => new Container<IntClass>(lhs.Value + rhs.Value);
00007FFC87299EC0  push        rdi  
00007FFC87299EC1  push        rsi  
00007FFC87299EC2  push        rbx  
00007FFC87299EC3  sub         rsp,20h  
00007FFC87299EC7  mov         rsi,qword ptr [rcx+8]  
00007FFC87299ECB  mov         rdi,qword ptr [rdx+8]  
00007FFC87299ECF  mov         rcx,7FFC8737A988h  
00007FFC87299ED9  call        00007FFCE6D5B3B0  
00007FFC87299EDE  mov         rbx,rax  
00007FFC87299EE1  mov         ecx,dword ptr [rsi+8]  
00007FFC87299EE4  add         ecx,dword ptr [rdi+8]  
00007FFC87299EE7  mov         dword ptr [rbx+8],ecx  
00007FFC87299EEA  mov         rcx,7FFC8737B7C8h  
00007FFC87299EF4  call        00007FFCE6D5B3B0  
00007FFC87299EF9  mov         rsi,rax  
00007FFC87299EFC  lea         rcx,[rsi+8]  
00007FFC87299F00  mov         rdx,rbx  
00007FFC87299F03  call        00007FFCE6D59F10  
00007FFC87299F08  mov         rax,rsi  
00007FFC87299F0B  add         rsp,20h  
00007FFC87299F0F  pop         rbx  
00007FFC87299F10  pop         rsi  
00007FFC87299F11  pop         rdi  
00007FFC87299F12  ret  

AddByStaticStrategy_Class

    22:         => new Container<T>(Arithmetic<T>.Default.Add(lhs.Value, rhs.Value));
00007FFC872687D0  push        r14  
00007FFC872687D2  push        rdi  
00007FFC872687D3  push        rsi  
00007FFC872687D4  push        rbp  
00007FFC872687D5  push        rbx  
00007FFC872687D6  sub         rsp,30h  
00007FFC872687DA  mov         qword ptr [rsp+28h],rcx  
00007FFC872687DF  mov         rsi,rcx  
00007FFC872687E2  mov         rdi,rdx  
00007FFC872687E5  mov         rbx,r8  
00007FFC872687E8  mov         rcx,qword ptr [rsi+30h]  
00007FFC872687EC  mov         rbp,qword ptr [rcx]  
00007FFC872687EF  mov         rcx,qword ptr [rbp+8]  
00007FFC872687F3  test        rcx,rcx  
00007FFC872687F6  jne         00007FFC8726880D  

00007FFC8726880D  call        00007FFC872659F8  

    00007FFC87268890  push        rsi  
    00007FFC87268891  sub         rsp,30h  
    00007FFC87268895  mov         qword ptr [rsp+28h],rcx  
    00007FFC8726889A  mov         rsi,rcx  
    00007FFC8726889D  mov         rcx,rsi  
    00007FFC872688A0  call        00007FFCE6EBE2E0  
    00007FFC872688A5  mov         rcx,rsi  
    00007FFC872688A8  call        00007FFCE6CED420  
    00007FFC872688AD  mov         rax,qword ptr [rax]  
    00007FFC872688B0  add         rsp,30h  
    00007FFC872688B4  pop         rsi  
    00007FFC872688B5  ret  

00007FFC87268812  mov         r14,rax  
00007FFC87268815  mov         rdi,qword ptr [rdi+8]  
00007FFC87268819  mov         rbx,qword ptr [rbx+8]  
00007FFC8726881D  mov         rbp,qword ptr [rbp+10h]  
00007FFC87268821  test        rbp,rbp  
00007FFC87268824  jne         00007FFC8726883B  

00007FFC8726883B  mov         rcx,rsi  
00007FFC8726883E  call        00007FFCE6D5B3B0  
00007FFC87268843  mov         rsi,rax  
00007FFC87268846  mov         rcx,r14  
00007FFC87268849  mov         r11,rbp  
00007FFC8726884C  mov         rdx,rdi  
00007FFC8726884F  mov         r8,rbx  
00007FFC87268852  cmp         dword ptr [rcx],ecx  
00007FFC87268854  call        qword ptr [rbp]  

    00007FFC872688D0  push        rdi  
    00007FFC872688D1  push        rsi  
    00007FFC872688D2  sub         rsp,28h  
    00007FFC872688D6  mov         rsi,rdx  
    00007FFC872688D9  mov         rdi,r8  
    00007FFC872688DC  mov         rcx,7FFC8734A988h  
    00007FFC872688E6  call        00007FFCE6D5B3B0  
    00007FFC872688EB  mov         edx,dword ptr [rsi+8]  
    00007FFC872688EE  add         edx,dword ptr [rdi+8]  
    00007FFC872688F1  mov         dword ptr [rax+8],edx  
    00007FFC872688F4  add         rsp,28h  
    00007FFC872688F8  pop         rsi  
    00007FFC872688F9  pop         rdi  
    00007FFC872688FA  ret  

00007FFC87268857  lea         rcx,[rsi+8]  
00007FFC8726885B  mov         rdx,rax  
00007FFC8726885E  call        00007FFCE6D59F10  
00007FFC87268863  mov         rax,rsi  
00007FFC87268866  add         rsp,30h  
00007FFC8726886A  pop         rbx  
00007FFC8726886B  pop         rbp  
00007FFC8726886C  pop         rsi  
00007FFC8726886D  pop         rdi  
00007FFC8726886E  pop         r14  
00007FFC87268870  ret  

AddByContainerTypeSwitch_Class

    28:         switch(lhs)
00007FFC87288B40  push        r15  
00007FFC87288B42  push        r14  
00007FFC87288B44  push        r13  
00007FFC87288B46  push        r12  
00007FFC87288B48  push        rdi  
00007FFC87288B49  push        rsi  
00007FFC87288B4A  push        rbp  
00007FFC87288B4B  push        rbx  
00007FFC87288B4C  sub         rsp,78h  
00007FFC87288B50  vzeroupper  
00007FFC87288B53  vmovaps     xmmword ptr [rsp+60h],xmm6  
00007FFC87288B5A  mov         qword ptr [rsp+58h],rcx  
00007FFC87288B5F  mov         rdi,rcx  
00007FFC87288B62  mov         rsi,r8  
00007FFC87288B65  mov         rbx,rdx  
00007FFC87288B68  test        rbx,rbx  
00007FFC87288B6B  je          00007FFC87288E64  
00007FFC87288B71  mov         rdx,rbx  
00007FFC87288B74  mov         rcx,7FFC8733A778h  
00007FFC87288B7E  call        00007FFCE6D59C70  
00007FFC87288B83  mov         rbp,rax  
00007FFC87288B86  test        rbp,rbp  
00007FFC87288B89  jne         00007FFC87288C2A  
00007FFC87288B8F  mov         rdx,rbx  
00007FFC87288B92  mov         rcx,7FFC8733A978h  
00007FFC87288B9C  call        00007FFCE6D59C70  
00007FFC87288BA1  mov         r14,rax  
00007FFC87288BA4  test        r14,r14  
00007FFC87288BA7  jne         00007FFC87288C6B  
00007FFC87288BAD  mov         rdx,rbx  
00007FFC87288BB0  mov         rcx,7FFC8736B288h  
00007FFC87288BBA  call        00007FFCE6D59C70  
00007FFC87288BBF  mov         r15,rax  
00007FFC87288BC2  test        r15,r15  
00007FFC87288BC5  jne         00007FFC87288CB6  
00007FFC87288BCB  mov         rdx,rbx  
00007FFC87288BCE  mov         rcx,7FFC8736B488h  
00007FFC87288BD8  call        00007FFCE6D59C70  
00007FFC87288BDD  mov         r12,rax  
00007FFC87288BE0  test        r12,r12  
00007FFC87288BE3  jne         00007FFC87288CFA  
00007FFC87288BE9  mov         rdx,rbx  
00007FFC87288BEC  mov         rcx,7FFC8736B7C8h  
00007FFC87288BF6  call        00007FFCE6D59C70  
00007FFC87288BFB  mov         r13,rax  
00007FFC87288BFE  test        r13,r13  
00007FFC87288C01  jne         00007FFC87288D84  

00007FFC87288D84  mov         rdx,rsi  
00007FFC87288D87  mov         rcx,7FFC8736B7C8h  
00007FFC87288D91  call        00007FFCE6D59C70  
00007FFC87288D96  mov         rsi,qword ptr [r13+8]  
00007FFC87288D9A  mov         rbx,qword ptr [rax+8]  
00007FFC87288D9E  mov         rcx,7FFC8736A988h  
00007FFC87288DA8  call        00007FFCE6D5B3B0  
00007FFC87288DAD  mov         rbp,rax  
00007FFC87288DB0  mov         ecx,dword ptr [rsi+8]  
00007FFC87288DB3  add         ecx,dword ptr [rbx+8]  
00007FFC87288DB6  mov         dword ptr [rbp+8],ecx  
00007FFC87288DB9  mov         rcx,7FFC8736B7C8h  
00007FFC87288DC3  call        00007FFCE6D5B3B0  
00007FFC87288DC8  mov         rsi,rax  
00007FFC87288DCB  lea         rcx,[rsi+8]  
00007FFC87288DCF  mov         rdx,rbp  
00007FFC87288DD2  call        00007FFCE6D59F10  
    53:                 return new Container<IntClass>(intClassL.Value + r.Value) as Container<T>;
00007FFC87288DD7  mov         rcx,rdi  
00007FFC87288DDA  mov         rdx,rsi  
00007FFC87288DDD  call        00007FFCE6D59C70  
00007FFC87288DE2  jmp         00007FFC87288E4B  

00007FFC87288E4B  nop  
00007FFC87288E4C  vmovaps     xmm6,xmmword ptr [rsp+60h]  
00007FFC87288E53  add         rsp,78h  
00007FFC87288E57  pop         rbx  
00007FFC87288E58  pop         rbp  
00007FFC87288E59  pop         rsi  
00007FFC87288E5A  pop         rdi  
00007FFC87288E5B  pop         r12  
00007FFC87288E5D  pop         r13  
00007FFC87288E5F  pop         r14  
00007FFC87288E61  pop         r15  
00007FFC87288E63  ret  

AddByValueTypeSwitch_Class

    68:         switch(lhs.Value)
00007FFC87269080  push        r15  
00007FFC87269082  push        r14  
00007FFC87269084  push        r12  
00007FFC87269086  push        rdi  
00007FFC87269087  push        rsi  
00007FFC87269088  push        rbp  
00007FFC87269089  push        rbx  
00007FFC8726908A  sub         rsp,90h  
00007FFC87269091  vzeroupper  
00007FFC87269094  vmovaps     xmmword ptr [rsp+80h],xmm6  
00007FFC8726909E  vmovaps     xmmword ptr [rsp+70h],xmm7  
00007FFC872690A5  mov         rsi,rcx  
00007FFC872690A8  lea         rdi,[rsp+50h]  
00007FFC872690AD  mov         ecx,6  
00007FFC872690B2  xor         eax,eax  
00007FFC872690B4  rep stos    dword ptr [rdi]  
00007FFC872690B6  mov         rcx,rsi  
00007FFC872690B9  mov         qword ptr [rsp+68h],rcx  
00007FFC872690BE  mov         rdi,rcx  
00007FFC872690C1  mov         rsi,r8  
00007FFC872690C4  mov         rbx,qword ptr [rdx+8]  
00007FFC872690C8  test        rbx,rbx  
00007FFC872690CB  je          00007FFC87269557  
00007FFC872690D1  mov         rbp,rbx  
00007FFC872690D4  mov         rdx,rbp  
00007FFC872690D7  mov         rcx,7FFCE69B6930h  
00007FFC872690E1  cmp         qword ptr [rbp],rcx  
00007FFC872690E5  je          00007FFC872690E9  
00007FFC872690E7  xor         edx,edx  
00007FFC872690E9  test        rdx,rdx  
00007FFC872690EC  je          00007FFC87269119  

00007FFC87269119  mov         rbp,rbx  
00007FFC8726911C  mov         rdx,rbp  
00007FFC8726911F  mov         rcx,7FFCE69B6768h  
00007FFC87269129  cmp         qword ptr [rbp],rcx  
00007FFC8726912D  je          00007FFC87269131  
00007FFC8726912F  xor         edx,edx  
00007FFC87269131  test        rdx,rdx  
00007FFC87269134  je          00007FFC87269163  

00007FFC87269163  mov         rbp,rbx  
00007FFC87269166  mov         rdx,rbp  
00007FFC87269169  mov         rcx,7FFC8734A6B8h  
00007FFC87269173  cmp         qword ptr [rdx],rcx  
00007FFC87269176  je          00007FFC8726917A  
00007FFC87269178  xor         edx,edx  
00007FFC8726917A  test        rdx,rdx  
00007FFC8726917D  je          00007FFC872691AA  

00007FFC872691AA  mov         rbp,rbx  
00007FFC872691AD  mov         rdx,rbp  
00007FFC872691B0  mov         rcx,7FFC8734A820h  
00007FFC872691BA  cmp         qword ptr [rbp],rcx  
00007FFC872691BE  je          00007FFC872691C2  
00007FFC872691C0  xor         edx,edx  
00007FFC872691C2  test        rdx,rdx  
00007FFC872691C5  je          00007FFC872691F7  

00007FFC872691F7  mov         r12,rbx  
00007FFC872691FA  mov         rdx,7FFC8734A988h  
    68:         switch(lhs.Value)
00007FFC87269204  cmp         qword ptr [r12],rdx  
00007FFC87269208  je          00007FFC8726920D  

00007FFC8726920D  test        r12,r12  
00007FFC87269210  jne         00007FFC87269456  

00007FFC87269456  mov         rcx,qword ptr [rsi+8]  
00007FFC8726945A  test        rcx,rcx  
00007FFC8726945D  je          00007FFC87269470  
00007FFC8726945F  mov         rax,7FFC8734A988h  
00007FFC87269469  cmp         qword ptr [rcx],rax  
00007FFC8726946C  je          00007FFC87269470  
00007FFC8726946E  xor         ecx,ecx  
00007FFC87269470  mov         rbx,rcx  
00007FFC87269473  test        rbx,rbx  
00007FFC87269476  je          00007FFC87269557  
    97:                     return new Container<IntClass>(intClassL + r) as Container<T>;
00007FFC8726947C  mov         rcx,7FFC8734A988h  
00007FFC87269486  call        00007FFCE6D5B3B0  
00007FFC8726948B  mov         rsi,rax  
00007FFC8726948E  mov         ecx,dword ptr [r12+8]  
00007FFC87269493  add         ecx,dword ptr [rbx+8]  
00007FFC87269496  mov         dword ptr [rsi+8],ecx  
00007FFC87269499  mov         rcx,7FFC8734B7C8h  
00007FFC872694A3  call        00007FFCE6D5B3B0  
00007FFC872694A8  mov         rbx,rax  
00007FFC872694AB  lea         rcx,[rbx+8]  
00007FFC872694AF  mov         rdx,rsi  
00007FFC872694B2  call        00007FFCE6D59F10  
00007FFC872694B7  mov         rcx,rdi  
00007FFC872694BA  mov         rdx,rbx  
00007FFC872694BD  call        00007FFCE6D59C70  
00007FFC872694C2  jmp         00007FFC87269533  

00007FFC87269533  nop  
00007FFC87269534  vmovaps     xmm6,xmmword ptr [rsp+80h]  
00007FFC8726953E  vmovaps     xmm7,xmmword ptr [rsp+70h]  
00007FFC87269545  add         rsp,90h  
00007FFC8726954C  pop         rbx  
00007FFC8726954D  pop         rbp  
00007FFC8726954E  pop         rsi  
00007FFC8726954F  pop         rdi  
00007FFC87269550  pop         r12  
00007FFC87269552  pop         r14  
00007FFC87269554  pop         r15  
00007FFC87269556  ret  

AddByTypeof_Class

   114:         if(typeof(T) == typeof(int))
00007FFC87269780  push        rdi  
00007FFC87269781  push        rsi  
00007FFC87269782  push        rbp  
00007FFC87269783  push        rbx  
00007FFC87269784  sub         rsp,48h  
00007FFC87269788  vzeroupper  
00007FFC8726978B  mov         qword ptr [rsp+40h],rcx  
00007FFC87269790  mov         rsi,rcx  
00007FFC87269793  mov         rdi,r8  
00007FFC87269796  mov         rcx,qword ptr [rsi+30h]  
00007FFC8726979A  mov         rcx,qword ptr [rcx]  
00007FFC8726979D  mov         rbx,qword ptr [rcx]  
00007FFC872697A0  mov         rcx,rbx  
00007FFC872697A3  mov         ebp,ecx  
00007FFC872697A5  and         ebp,1  
   140:         }
   141: 
   142:         if(typeof(T) == typeof(IntClass))
00007FFC872697A8  mov         rcx,rbx  
00007FFC872697AB  test        ebp,ebp  
00007FFC872697AD  je          00007FFC872697B3  

00007FFC872697B3  mov         rax,7FFC8734A988h  
00007FFC872697BD  cmp         rcx,rax  
00007FFC872697C0  jne         00007FFC8726983C  
   143:         {
   144:             var l = lhs as Container<IntClass>;
00007FFC872697C2  mov         rcx,7FFC8734B7C8h  
00007FFC872697CC  call        00007FFCE6D59C70  
00007FFC872697D1  mov         rbx,rax  
00007FFC872697D4  mov         rdx,rdi  
00007FFC872697D7  mov         rcx,7FFC8734B7C8h  
00007FFC872697E1  call        00007FFCE6D59C70  
00007FFC872697E6  mov         rbp,qword ptr [rbx+8]  
00007FFC872697EA  mov         rdi,qword ptr [rax+8]  
00007FFC872697EE  mov         rcx,7FFC8734A988h  
00007FFC872697F8  call        00007FFCE6D5B3B0  
00007FFC872697FD  mov         rbx,rax  
00007FFC87269800  mov         ecx,dword ptr [rbp+8]  
00007FFC87269803  add         ecx,dword ptr [rdi+8]  
00007FFC87269806  mov         dword ptr [rbx+8],ecx  
00007FFC87269809  mov         rcx,7FFC8734B7C8h  
00007FFC87269813  call        00007FFCE6D5B3B0  
00007FFC87269818  mov         rdi,rax  
00007FFC8726981B  lea         rcx,[rdi+8]  
00007FFC8726981F  mov         rdx,rbx  
00007FFC87269822  call        00007FFCE6D59F10  
   146:             return new Container<IntClass>(l.Value + r.Value) as Container<T>;
00007FFC87269827  mov         rcx,rsi  
00007FFC8726982A  mov         rdx,rdi  
00007FFC8726982D  call        00007FFCE6D59C70  
00007FFC87269832  nop  
00007FFC87269833  add         rsp,48h  
00007FFC87269837  pop         rbx  
00007FFC87269838  pop         rbp  
00007FFC87269839  pop         rsi  
00007FFC8726983A  pop         rdi  
00007FFC8726983B  ret  

Method Instruction Count Mean
AddByOverload_Class 24 167.72 us
AddByStaticStrategy_Class 69 388.40 us
AddByContainerTypeSwitch_Class 80 416.41 us
AddByValueTypeSwitch_Class 99 415.41 us
AddByTypeof_Class 51 256.51 us

プリミティブ型や構造体のときと異なり、全体的に条件分岐の消去ができていない印象があります。 それでもAddByTypeofは命令数も少なめで実測値もよろしく優秀。 StaticStrategyデザパタ的美しさの割りにはそれほど悪くはなさそうに感じます。

結論

typeofを駆使するのが最速っぽそう

.Net Core上で、Tが値型の場合には普通に非ジェネリックオーバーロードで呼び分けるのと変わらない性能が出ます。 ただし、UnityではJIT最適化が甘いのかtypeofでの特殊化を書いても大幅に性能向上という感じではなさそうです。

また、最適化品質はプリミティブ型>構造体>>>クラスという感じで、特に値型と参照型の壁は非常に大きいものがあるみたいです。よっぽどじゃない限りクラスを使え、とは昔から言われていますが、これだけ最適化の恩恵があるなら構造体を使いたい欲が湧いてきます。イミュータブルなデータクラスならラッパー構造体を作るという手もありますね。

正直な話、いくらILに専用命令があるとはいえ3typeof比較がこんなに速いとは思っていませんでした。 これだけ高品質な最適化が働くなら積極的に使っていってもいいのではないでしょうか?

未検証テーマ

  • 加算のようにILで1命令で収まる小規模コードではなく、インライン化が利かなさそうな大規模処理相手だとどうなるか?

  • if文やパターンマッチングswitch文には本来順序依存性があるが、特殊化においてその影響はないのか?

    • JIT結果を見る限りでは最適化で条件分岐がまるまる消えるので影響はない?確証を得るにはベンチマークを取って見る必要がありそう。
  • 特殊化で分岐ルートが決定した後はTが何なのかわかっているはずなので、もっと効率の良いキャスト手段はないだろうか?

    • System.Runtime.CompilerServices.Unsafe.As<TFrom, TTo>とか?
  • .Net Framework, Unityや、.Net Coreの別のバージョンでのJIT検証

    • UnityってJIT結果見る方法あるの?

編集履歴

  • 2019/03/06

    • リフレクションの定義に関する誤情報の訂正
    • typeofが専用命令になるという誤情報の訂正
    • Unityバージョンを加筆
    • markdownバグの修正
    • typo修正
  • 2019/03/13


  1. 具体的な型引数が確定した後の型。対義語はオープン型。TList<T>がオープン型、intList<int>がクローズ型。

  2. typeofはリフレクションとは異なる概念でした。C#にリフレクションAPIが豊富なのは事実ですが、本件とは別の話題です。

  3. 大嘘こきました。実際にはldtoken + call Type.GetTypeFromHandle(RuntimeTypeHandle)で表現されています。