LLVM Structure Code Generation
Continuing in the vein of LLVM frustrations, I’ve been working on generating LLVM IR bytecode for structures. Creating a structure is generally pretty straightforward, but accessing the members is not very well documented. This post should help use the LLVM C++ API to generate LLVM bytecode for structs.
Creating a structure
This was the most straight-forward, creating a structure type is as easy as loading up an ArrayRef<Value *>
with the Type
s for each member and then passing that (called “members_array_ref”
in the below example) to:
llvm::StructType::create(llvm::getGlobalContext(), members_array_ref, struct_name, false)
Once this StructType
is created, it can be passed to the constructor to AllocaInst
and allocate specific instances of the structure type:
new llvm::AllocaInst(struct_type, var_name, block)
The Often Misunderstood GEP Instruction
To access an individual field of the structure (either for StoreInst
or LoadInst
access), you must use a very poorly documented instruction: GetElementPtrInst
.
This instruction is so confusing that it has its own page on the main LLVM site which provides a helpful overview of what the IR instruction does, but not how to create
one through the C++ API. There is a decent start to a solution found on this Stackoverflow question however
it fails to explain the generated code.
From the official Doxygen page for the class, you create a GEP instruction using the Create
method:
static GetElementPtrInst * Create (Value *Ptr, ArrayRef< Value * > IdxList, const Twine &NameStr, BasicBlock *InsertAtEnd)
The most confusing of these arguments is the IdxList
. From the Stackoverflow question, I was able to deduce that you need to create an ArrayRef<Value *>
of the indices (0, then the index into the struct).
For example, the indices for the first element of the struct will have the indices (0, 0), the second element will be (0, 1) and so forth. To calculate the second index, I looped through the members and determined
the index. Once I had the indices, I needed to transform them into a Value *
, which I first attempted by calling:
llvm::ConstantInt::get(llvm::getGlobalContext(), llvm::APInt(64, index, false));
Which would cause a segfault in the Create
call! After banging my head against this for a few hours, I tried to change these indices from i64
s to i32
s which stopped the segfaulting!
Now that I could create the GEP instruction (*Ptr
is set to the return value from the AllocaInst
call with the StructType
), I could pass that to either my StoreInst
or LoadInst
and access the structure members!