Assembly functions
Assembly functions (or asm functions for short) are module-level functions that allow you to write Tact assembly. Unlike all other functions, their bodies consist only of TVM instructions and some other primitives, and don’t use any Tact statements or expressions.
TVM instructions
In Tact, the term TVM instruction refers to the command that is executed by the TVM during its run-time — the compute phase. Where possible, Tact will try to optimize their use for you, but it won’t define new ones or introduce extraneous syntax for their pre-processing. Instead, it is recommended to combine the best of Tact and TVM instructions, as shown in the onchainSha256()
example near the end of this page.
Each TVM instruction, when converted to its binary representation, is an opcode (operation code) to be executed by the TVM plus some optional arguments to it written immediately after. However, when writing instructions in asm
functions, the arguments, if any, are written before the instruction and are separated by spaces. This reverse Polish notation (RPN) syntax is intended to show the stack-based nature of TVM.
For example, the DROP2
or its alias 2DROP
, which drop (discard) two top values from the stack, have the same opcode prefix — 0x5B
, or 1011011
in binary.
The arguments to TVM instructions in Tact are called primitives — they don’t manipulate the stack themselves and aren’t pushed on it by themselves. Attempting to specify a primitive without the instruction that immediately consumes it will result in compilation errors.
For some instructions, the resulting opcode depends on the specified primitive. For example, the PUSHINT
, or its shorter alias INT
, have the same opcode 0x7
if the specified number argument is in the inclusive range from to . However, if the number is greater than that, the opcode changes accordingly: 0x80
for arguments in the inclusive range from to , 0x81
for arguments in the inclusive range from to , and so on. For your convenience, all these variations of opcodes are described using the same instruction name, in this case PUSHINT
.
Stack calling conventions
The syntax for parameters and returns is the same as for other function kinds, but there is one caveat — argument values are pushed to the stack before the function body is executed, and return type is what’s captured from the stack afterward.
Parameters
The first parameter is pushed to the stack first, the second one second, and so on, so that the first parameter is at the bottom of the stack and the last one at the top.
Since the bodies of asm
functions do not contain Tact statements, any direct references to parameters in function bodies will be recognized as TVM instructions, which can easily lead to very obscure error messages.
The parameters of arbitrary Struct types are distributed over their fields, recursively flattened as the arguments are pushed onto the stack. In particular, the value of the first field of the Struct is pushed first, the second is pushed second, and so on, so that the value of the first field is at the bottom of the stack and the value of the last is at the top. If there are nested structures inside those Structs, they’re flattened in the same manner.
Returns
When present, return type of an assembly function attempts to capture relevant values from the resulting stack after the function execution and possible stack arrangements. When not present, however, assembly function does not take any values from the stack.
When present, an assembly function’s return type attempts to grab relevant values from the resulting stack after the function execution and any result arrangements. If the return type is not present, however, the assembly function does not take any values from the stack.
Specifying a primitive type, such as an Int
or a Cell
, will make the assembly function capture the top value from the stack. If the run-time type of the taken value doesn’t match the specified return type, an exception with exit code 7 will be thrown: Type check error
.
Just like in parameters, arbitrary Struct return types are distributed across their fields and recursively flattened in exactly the same order. The only differences are that they now capture values from the stack and do so in a right-to-left fashion — the last field of the Struct grabs the topmost value from the stack, the second-to-last grabs the second to the top, and so on, so that the last field contains the value from the top of the stack and the first field contains the value from the bottom.
If the run-time type of some captured value doesn’t match some specified field type of the Struct or the nested Structs, if any, an exception with exit code 7 will be thrown: Type check error
. Moreover, attempts to capture more values than there were on the stack throw an exception with exit code 2: Stack underflow
.
As parameters and return values of assembly functions, Structs can only have up to fields. Each of these fields can in turn be declared as another Struct, where each of these nested structures can also only have up to fields. This process can be repeated until there would be a total of fields of primitive types due to the assembly function limitations. This restriction also applies to the parameter list of assembly functions — you can only declare up to parameters.
Stack registers
The so-called stack registers are a way of referring to the values at the top of the stack. In total, there are stack registers, i.e. values held on the stack at any given time. You can specify any of them using any of s0
, s1
, …, s255
, but only if the certain TVM instruction expects it as an argument. Otherwise, their concept is meant for succinct descriptions of the effects of a particular TVM instruction in text or comments to the code, not in the code itself.
Register s0
is the value at the top of the stack, register s1
is the value immediately after it, and so on, until we reach the bottom of the stack, represented by s255
, i.e. the th stack register. When a value x
is pushed onto a stack, it becomes the new s0
. At the same time, old s0
becomes new s1
, old s1
— new s2
, and so on.
Arrangements
Often times it’s useful to change the order of arguments pushed to the stack or the order of return values without referring to stack registers in the body. You can do this with asm
arrangements — with them, the evaluation flow of the assembly function can be thought of in these steps:
- Function takes arguments in the order specified by the parameters.
- If an argument arrangement is present, arguments are reordered before being pushed to the stack.
- Function body, consisting of TVM instructions and primitives, is executed.
- If a result arrangement is present, resulting values are reordered on the stack.
- The resulting values are captured (partially or fully) by the return type of the function.
The argument arrangement has the syntax asm(arg2 arg1)
, where arg1
and arg2
are some arguments of the function in the order we want to push them onto the stack: arg1
will be pushed first and get on the bottom of the stack, while arg2
will be pushed last and get on top of the stack. Arrangements are not limited by two arguments and operate on all parameters of the function. If there are any parameters of arbitrary Struct types, their arrangement is done prior to their flattening.
The return arrangement has the syntax asm(-> 1 0)
, where and are a left-to-right reordering of stack registers s1
and s0
correspondingly: the contents of s1
will be at the top of the stack, followed by the contents of s0
. Arrangements are not limited by two return values and operate on captured values. If an arbitrary Struct is specified as the return type, the arrangement is done with respect to their fields, mapping values on the stack to the recursively flattened Struct.
Both argument and return arrangement can be combined together and written as follows: asm(arg2 arg1 -> 1 0)
.
Using all those re-arranged functions together we get:
Note, that arrangements do not drop or discard any values — they only manipulate the order of arguments and return values as those are declared. This means, for example, that arrangement cannot access values from the stack that are not captured by the return type of the assembly function.
That said, there’s a caveat to mutates
attribute and asm
arrangements.
Limitations
Attempts to drop the number of stack values below throw an exception with exit code 2: Stack underflow
.
The TVM stack itself has no limit on the total number of values, so you can theoretically push new values there until you run out of gas. However, various continuations may have a maximum number of values defined for their inner stacks, going over which will throw an exception with exit code 3: stack overflow
.
Although there are only stack registers, the stack itself can have more than values on it in total. The deeper values won’t be immediately accessible by any TVM instructions, but they would be on the stack nonetheless.
Caveats
Case sensitivity
TVM instructions are case-sensitive and are always written in upper case (capital letters).
No double quotes needed
It is not necessary to enclose TVM instructions in double quotes. On the contrary, they are then interpreted as strings, which is probably not what you want:
mutates
consumes an extra value
Specifying a mutates
attribute, i.e. defining a mutation function, makes the assembly function consume one more value deeper into the stack than the declared return values. Consider the following example:
There, LDREF
instruction produces two stack entries: a Cell
and a modified Slice
in that order, with the Slice
pushed on top of the stack. Then, the arrangement -> 1 0
inverses those values, making the Cell
sit on top of the stack.
Finally, the mutates
attribute makes the function consume the deepest value on the stack, i.e. Slice
, and assign it to self
, while returning the Cell
value to the caller.
Overall, mutates
attribute can be useful in some cases, but you must stay vigilant when using it with assembly functions.
Don’t rely on initial stack values
The TVM places a couple of values onto its stack upon initialization, and those values are based on the event that caused the transaction. In other languages you might’ve had to rely on their order and types, while in Tact the parsing is done for you. Thus, in Tact these initial stack values are different from what’s described in TON Docs.
Therefore, to access details such as the amount of nanoToncoins in a message or the Address
of the sender it’s strongly recommended to call the context()
or sender()
functions instead of attempting to look for those values on the stack.
Debugging
The number of values the stack has at any given time is called the depth, and it’s accessible via the DEPTH
instruction. It’s quite handy for seeing the number of values before and after calling the assembly functions you’re debugging, and can be used within asm logic.
To see both the stack depth and the values on it, there’s a function in the Core library of Tact: dumpStack()
. It’s great for keeping track of the stack while debugging, although it’s computationally expensive and only prints values, not returns them, so use it sparingly and only when testing.
Read more about debugging Tact contracts on the dedicated page: Debugging.
Attributes
The following attributes can be specified:
inline
— does nothing, since assembly functions are always inlined.extends
— makes it an extension function.mutates
(along withextends
) — makes it an extension mutation function.
Those attributes cannot be specified:
abstract
— assembly functions must have a body defined.virtual
andoverride
— assembly functions cannot be defined within a contract or a trait.get
— assembly functions cannot be getters.
Interesting examples
On the TVM instructions page, you may have noticed that the “signatures” of instructions are written in a special form called stack notation, which describes the state of the stack before and after the given instruction is executed.
For example, x y - z
describes an instruction that grabs two values x
and y
from the stack, with y
at the top of the stack and x
second to the top, and then pushes the result z
onto the stack. Notice that other values deeper down the stack are not accessed.
That notation omits the type info and only implicitly describes the state of stack registers, so for the following examples we’ll use a different one, combining the notions of parameters and return values with the stack notation like this:
When there are literals involved, they’ll be shown as is. Additionally, when values on the stack do not represent the parameters or Struct fields of the return type, only their type is given.
keccak256
The HASHEXT_SHA256
and HASHEXT_BLAKE2B
instructions can be used in the similar manner, with respect to different number of return values. In addition, all of those can also work with values of type Builder
.
The HASHEXT_KECCAK512
and HASHEXT_SHA512
, however, put a tuple of two integers on the stack instead of putting two separate integers there. Because of that, you’d need to also add the UNPAIR
instruction right after them.
isUint8
Mapping onto a single instruction by itself is inefficient if the values they place onto the stack can vary depending on some conditions. That’s because one cannot map them to Tact types directly and often needs to some additional stack manipulations prior or post to their execution.
Since this is often the case for the “quiet” versions of instructions, the recommendation is to prefer their non-quiet alternatives. Usually, non-quiet versions throw exceptions and are consistent in their return values, while quiet ones push or other values onto the stack, thus varying the number or the type of their result values.
For the simpler cases such as this example, it’s convenient to do all the stack manipulations within the same function.
ecrecover
This example shows one possible way to work with partially captured results from the stack, getting the omitted ones later.
onchainSha256
This example extends the ecrecover()
one and adds more complex stack management and interaction with Tact statements such as loops.