Things you probably shouldn't do: Bending IL2CPP to your will
Disclaimer: These blog posts will be discussing some... unorthodox approaches. If you think pointer math and gotos in C# are sacrilege then these posts might not be for you.
Hello and welcome to this blog! I will be writing a series of blog posts about my experience with different Unity topics. Developing CodeFiCS requires a low-level approach and takes me places a lot of developers usually don't go. It is my hope that these posts will help you understand some inner workings of Unity and allow you to write better code and create more ambitious projects.
Today I'll be talking about some lesser known features of IL2CPP builds. IL2CPP does a lot more than just make your code run faster, but to make sure we're on the same page let's start with some basics.
What is IL2CPP?
You may know IL2CPP simply as a tool for converting C# code into C++, but it is so much more than that! C# isn't compiled into native code (instructions that CPUs execute) - it is compiled into Intermediate Language (IL in IL2CPP). IL resembles assembly, but has a number of abstractions (ex. using field tokens instead of memory offsets when loading/storing a value). You cannot run C# executable without an additional layer - Common Language Runtime (CLR). It plays two roles: First and foremost it compiles IL into native code that is suited for current CPU architecture. This process is known as Just-In-Time compiling (JIT). It determines the best way to lay out type fields and generate native code based on CPU capabilities and other parameters. This allows C# to run pretty efficiently on any machine with CLR, but can have a significant performance impact, especially when a bunch of methods are called for the first time and need to be compiled.
The second role of CLR is to provide a number of features to allow generated code
to run. This includes things like garbage collection, platform-specific APIs, metadata,
implementations of a bunch of mscorlib
methods and
much more. Without it IL is nothing more than a fancy low-level pseudocode.
So how does IL2CPP factor into this? Well if you simply convert IL code into C++ it will be missing
all of those features that CLR provides. Accessing DateTime.Now
will fail since there
will be no
layer for C++ code to access OS' time-related API. So how does IL2CPP solve this? By providing its
own CLR!
IL2CPP CLR is based on Mono. You can find its source code in
%UnityEditorFolder%/Data/il2cpp/libil2cpp
folder or in IL2CPP builds you make if you
use the
Create Visual Studio Solution
option. IL2CPP has some commonalities with Mono, but
differs quite
significantly in numerous implementations. IL2CPP builds need to be compiled Ahead-Of-Time (AOT,
opposite of JIT) to allow execution on platforms that prohibit runtime code generation, so you will
quickly notice that the whole JIT compiler is missing. With it goes metadata that was designed to
support it or that is its byproduct. A lot of core metadata types (MonoClass
- CLR
representation
of System.Type
, MonoMethod
- CLR representation of
System.Runtime.MethodBase
, etc.) have been
redesigned from ground up to simplify access to data. IL2CPP has started out as a branch of Mono,
but has grown into its own CLR with a lot of interesting things to uncover.
So how does Unity factor into this?
When Unity calls IL2CPP it passes all assemblies that need to be converted as a parameter. This naturally includes all user assemblies and also managed Unity ones (UnityEngine.dll, UnityEngine.Physics.dll, etc). One thing it does NOT include is Unity's internal unmanaged assembly (UnityPlayer.dll). It is at the core of Unity and its source code is held under a lot of secrecy, so naturally it cannot be included as source code into builds. It does, however, need to interact with IL2CPP-generated code, so how exactly does that happen? If you build a Unity project into a Visual Studio solution, you will notice that the solution is split into 4 projects.
The first project - Il2CppOutputProject
- is where all the
C#-converted code
resides. It also contains IL2CPP CLR and a few other things.
UnityData
contains some defines and resources and can be ignored in most situations.
UnityPlayerStub
is at the core of the mystery and I'll get back to it in a bit
And last but not least, a project whose name matches the solution name (and the name of your Unity
project). There's not much here aside from a few defines, but it has the most important piece of the
whole application - the Main
method (well actually it's wWinMain
for
Windows Standalone builds,
but it serves the same purpose and I'll be referring to it as Main for brevity).
This method quickly transfers control to a UnityMain
method, but if you look up its
definition you
will be disappointed as all it does is return 0;
How come? How is your game able to run
if there's
nothing happening in Main method?
UnityMain
is defined in Exports.h
file of UnityPlayerStub
project. It is the only piece of
code in that project, and it seemingly does nothing, so why is it there? Take a moment to consider
all the pieces at play: you have your Main
method in one project that needs to call
UnityPlayer.dll that needs to call methods from Il2CppOutputProject
all without
revealing
UnityPlayer's inner workings and compiling for different architectures (unrelated to this post, but
an important piece). UnityPlayerStub
is actually compiled into UnityPlayer.dll, but it
is not the
one that gets included into final builds - this one is ~36kb, while the one containing Unity's guts
is +40mb. This effectively illustrates a neat little trick - Visual Studio solution includes a
'placeholder' project that gest substituted in place of Unity internals and, once compiled, is
replaced with the actual thing (technically the actual implementation is copied into the
build
folder when Unity creates Visual Studio solution, but that's semantics). And all that is possible
thanks to two magical words: "__declspec(dllexport).
So what does it mean?
Internals of dllexport (and dllimport) are outside of the scope of this post, but in (extremely)
oversimplified terms it is C++ equivalent of P/Invoke. It allows DLLs to store infromation about
methods they declare to allow other DLLs/executables to link to those methods at runtime. This is
why UnityPlayer.dll trick works - your project isn't compiled referencing the UnityMain
method
from UnityPlayerStub
project - it is compiled to reference that method in
UnityPlayer.dll
and it
doesn't care where that file comes from. Similarly, UnityPlayer
references a number of
methods
from Il2CppOutputProject
.
Why is this useful?
UnityPlayer
is a beast and needs access to a lot of methods and data to function. Doing
something
as simple as calling Update
methods on active MonoBehaviors
requires
extensive konwledge of your
game's functionality. So in order to achieve that (and much more) "Il2CppOutputProject
dllexports
numerous methods and UnityPlayer binds to them at runtime. This is a neat little trick that isn't
very useful to us until you consider that Unity has a special meaning for
DllImport("__Internal")
.
You may have come across this if you ever tried using native plugins
on iOS or had C++
code in your Unity project. I will be talking more about the mechanics of this trick in another
post, but DllImport("__Internal")
allows you to call dllexported methods from within
Il2CppOutputProject
. Usually those are methods you define yourself, but, as I have
found out, it
also works with IL2CPP API. Any method that is dllexported from "l2CppOutputProject
can
be DllImported in C#, and IL2CPP CLR has a bunch of them!
There are over 200 dllexported methods you can access, and most of them are in
\Il2CppOutputProject\IL2CPP\libil2cpp\il2cpp-api-functions.h
. These methods can help
you tweak
performance of your application, profile data layout or even do things you may have thought
impossible in C#. There is no point in listing them all (especially since they're easy to find), but
here are a few to peak your curiosity.
Using these methods is as simple as
[DllImport("__Internal", CallingConvention = CallingConvention.Cdecl, EntryPoint = "il2cpp_object_get_size")]
public static extern uint GetObjectSize(object obj);
It goes without saying, but this trick only works in IL2CPP builds. There are ways to do similar
things in Mono, but that's a topic for another post. Naturally extreme caution and rigorous testing
should be exercised when doing this as some methods can have huge impact on your apps (ex. turning
off GC is an easy way to run into an endangered species - OutOfMemoryException). API can also change
between major Unity releases, but in my experience methods in il2cpp-api-functions.h
are fairly
consistent.