Golang on the PlayStation 2 (part 3)

By Ricardo

September 6, 2025

Holy shit (again!)

I’m really glad to see this topic is interesting to a lot more people than I anticipated. I’ve been working on this on my spare time just for fun, without any actual goal - and yet people seem to dig it.

As a matter of fact, I’m giving two talks this year about this topic - exactly, two opportunities for everyone to join me in this crazy saga! I’ll be talking and explaining this project (possibly with live demos!) at both GoLab (Oct 5-7) and GambiConf (Nov 29-30) this year. So grab your tickets and join me on this rollercoaster of quirks, bugs, Sony weirdness and a bunch of hacks!

Fun fact: for GoLab you can use my SP20FR coupon for 20% off on conference ticket!

Ok, with all that said, let’s dive into part 3!

Disclaimer

Once again, a lot of what you’ll see here is based on my crazy weekends trying to hack things together. I am not, by any chance in the world, an expert in any of the subjects discussed in this post and/or series. Information here provided is as-in and may be inaccurate and/or wrong, and I am not responsible for any harm to your consoles or sanity.

You have been warned.

Our plan

There are multiple things I’d like to do at this point in the project. The first one, however, is to reduce the amount of hacks on TinyGo itself, specially as they are very platform-specific. If I had to choose, I’d prefer to have them on the LLVM codebase, leaving TinyGo generating a normal bog standard IR code. For that, I need to dive deeper into that world.

The general process for compiling things on TinyGo works like this:

Right now we’re hacking things on the TinyGo section, generating a valid LLVM IR code for our platform. We want to move our hacks into the LLVM part instead, so that TinyGo has as little modifications as possible and LLVM has to deal with our weird, quirky CPU. Easy, right?

To make our lives easier from now on, I’ve created a small test application that goes through a series of tests, validating each relevant type, some of its variants, and even running through some standard code to see if anything breaks (spoiler: it will!). At the time of writing, this is what I was testing:

8/16/32/64-bit integers (signed and unsigned) for addition, subtraction, multiplication, division and remainder operations
32/64-bit floats for addition, subtraction, multiplication, division and remainder operations
Some basic string operations and formatting
String formatting for all the integer types from above
String formatting for all the float types from above

Here’s a sneak peek of how some of its tests are defined:

type numberTest[T comparable] struct {
  name   string
  left   T
  right  T
  result T
  fn     func(T, T) T
}

var (
  int64Tests = []numberTest[int64]{
    {"add", 1234567890, 9876543210, 11111111100, func(a, b int64) int64 { return a + b }},
    {"sub", 9876543210, 1234567890, 8641975320, func(a, b int64) int64 { return a - b }},
    {"mul", 123456, 789012, 97408265472, func(a, b int64) int64 { return a * b }},
    {"div", 11111111100, 3, 3703703700, func(a, b int64) int64 { return a / b }},
    {"mod", 11111111100, 1000, 100, func(a, b int64) int64 { return a % b }},
  }
  uint64Tests = []numberTest[uint64]{
    {"add", 1234567890, 9876543210, 11111111100, func(a, b uint64) uint64 { return a + b }},
    {"sub", 10000000000, 1234567890, 8765432110, func(a, b uint64) uint64 { return a - b }},
    {"mul", 123456, 789012, 97408265472, func(a, b uint64) uint64 { return a * b }},
    {"div", 11111111100, 3, 3703703700, func(a, b uint64) uint64 { return a / b }},
    {"mod", 11111111100, 1000, 100, func(a, b uint64) uint64 { return a % b }},
  }
)

A lot more tests are available and this is definitely not testing everything!

With this, we can build our TinyGo test program and see if the changes in the LLVM fixed the problems or not. And I can also use this for other tests in the future. Plus, since we saw in part 2 that the emulator and the actual hardware deal with things differently, I can actually use this to validate all the code changes against the real thing.

Disclaimer: I need reiterate that I am not an expert in any of this. This post may give inaccurate and/or plain wrong information. There was a lot of ChatGPT and just hacking through code from now on, so fasten your seatbelts because this will be a wild ride!

Faking it: the int64 solution

The first hack I’d like to remove is the MUL (*) and QUO (/) operations I’ve hacked in part 1 This is easy enough: for the relevant operations in the LLVM IR code, I want to force it to use a library call instead (ie. do the operation in software).

You see, when the LLVM is trying to convert its IR code into actual machine code, it goes through a few steps, and one of them is called “lowering”. In this step, as far as I understand, it needs to know what is legal and what is not on a CPU. This is important as not all CPUs are equal: a Pentium 3 doesn’t support all the features of a Pentium 4, for example. The LLVM is responsible for knowing what is allowed and what is not, and translating it into the appropriate valid instructions if possible, or doing the entire thing in software.

The LLVM has a few possible actions for legalizing instructions on a target:

  /// This enum indicates whether operations are valid for a target, and if not,
  /// what action should be used to make them valid.
  enum LegalizeAction : uint8_t {
    Legal,      // The target natively supports this operation.
    Promote,    // This operation should be executed in a larger type.
    Expand,     // Try to expand this to other ops, otherwise use a libcall.
    LibCall,    // Don't try to expand this to other ops, always use a libcall.
    Custom      // Use the LowerOperation hook to implement custom lowering.
  };

I won’t go into the details of this, but we want to target LibCall in our scenarios. This basically tells the LLVM that we want to fake the instruction: it’s not supported on our target, so create by doing a function call that can do such operation for us.

I think we could probably get this solved by either using Expand (to expand this into multiple supported instructions) or Custom (and manually implement such instructions), but I’m lazy as hell and want this to get done in software. Plus, we were doing it this way anyway, just in a slightly worse manner!

For that to happen in our case, we need to look at the MipsSETargetLowering class. Its constructor define what is supported and what is not based on the target and subtarget machines. At the end of it, we want to tell LLVM that any multiplication/division/similar operation with 64-bit integers, signed or not, is not allowed in our target and must be performed in software - a library call (LibCall). Since the simple MUL and QUO tokens can be lowered into a bunch of instructions depending on the use case (also known as: I have no clue why just a few didn’t work), I’ve opted to force all of the int64-related things into a library call. This is what it looks like:

MipsSETargetLowering::MipsSETargetLowering(const MipsTargetMachine &TM,
                                           const MipsSubtarget &STI)
    : MipsTargetLowering(TM, STI) {
    // (...)

  setOperationAction(ISD::MUL, MVT::i64, LibCall);
  setOperationAction(ISD::SDIV, MVT::i64, LibCall);
  setOperationAction(ISD::UDIV, MVT::i64, LibCall);
  setOperationAction(ISD::SREM, MVT::i64, LibCall);
  setOperationAction(ISD::UREM, MVT::i64, LibCall);
  setOperationAction(ISD::SDIVREM, MVT::i64, LibCall);
  setOperationAction(ISD::UDIVREM, MVT::i64, LibCall);
  setOperationAction(ISD::SMUL_LOHI, MVT::i64, LibCall);
  setOperationAction(ISD::UMUL_LOHI, MVT::i64, LibCall);
  setOperationAction(ISD::MULHS, MVT::i64, LibCall);
  setOperationAction(ISD::MULHU, MVT::i64, LibCall);

  // (...)
}

Fun fact, first time I tried this I did the `Custom` approach instead and implemented the library call myself. Thankfully I saw this `LibCall` thing and was able to delete a few dozen lines of unnecessary badly written C++ code!

Mind you that, since we’re doing this the hacky way, I didn’t even bother adding a flag, but we might want to do so in the future so that we can keep other parts of this LLVM functional.

Anyway, with these calls, we’re essentially telling the LLVM that IR instructions such as MUL, SDIV and UDIV for 64bit integers are not allowed, and must be performed in software. This way, even if TinyGo outputs something like %47 = mul i64 2, %40, !dbg !25370, which is a 64-bit integer multiplication in LLVM IR, our compiler will not try to write that as a MULT or DMULT instructions, but as a call to __muldi3 instead.

And sure enough, after fixing all the multiplication and division instructions for 64bit integers, our arithmetic problems seem to be fixed:

Note that we didn’t change any addition or subtraction: those instructions are supported by the PS2. Only multiplication and division are not. Weird. Anyway, now we can restore TinyGo’s original code for handling these operations to its former glory and not have to worry about this anymore (hopefully).

The FPU is single

Quick thing to explain before we dive into this: the floating point numbers operations in the PS2 are done inside a coprocessor called COP1. It handles only single-precision floating point numbers (ie. float and not double).

During my (definitely not exhaustive) testing, I’ve noticed that floating point numbers are not behaving correctly - especially here:

func main() {
  debug.Init()
  debug.Printf("Start\n\n")

  for i := 0.12345; i <= 12345; i = i * 10 {
    debug.Printf("%.02f\n", i)
  }

  debug.Printf("\n\nEnd")
  for {}
}

This is what it prints:

Ok, this is definitely not normal. If we disassamble the main.main function, we get this:

00054d50 <main.main>:
   54d50:  27bdffd0   addiu  sp,sp,-48
   54d54:  ffbf0028   sd  ra,40(sp)
   54d58:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54d5c:  00000000   nop
   54d60:  3c010000   lui  at,0x0
   54d64:  24210000   addiu  at,at,0
   54d68:  00202025   move  a0,at
   54d6c:  64050007   daddiu  a1,zero,7
   54d70:  64080000   daddiu  a4,zero,0
   54d74:  01003025   move  a2,a4
   54d78:  01003825   move  a3,a4
   54d7c:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54d80:  00000000   nop
   54d84:  3c010000   lui  at,0x0
   54d88:  d4200000   ldc1  $f0,0(at)
   54d8c:  f7a00020   sdc1  $f0,32(sp)
   54d90:  08000000   j  0 <(internal/gclayout.Layout).AsPtr>
   54d94:  00000000   nop
   54d98:  d7a00020   ldc1  $f0,32(sp)
   54d9c:  f7a00018   sdc1  $f0,24(sp)
   54da0:  3c010000   lui  at,0x0
   54da4:  d4210000   ldc1  $f1,0(at)
   54da8:  46210036   c.ole.d  $f0,$f1
   54dac:  00000000   nop
   54db0:  45000026   bc1f  54e4c <main.main+0xfc>
   54db4:  00000000   nop
   54db8:  08000000   j  0 <(internal/gclayout.Layout).AsPtr>
   54dbc:  00000000   nop
   54dc0:  64040008   daddiu  a0,zero,8
   54dc4:  ffa40008   sd  a0,8(sp)
   54dc8:  640500c5   daddiu  a1,zero,197
   54dcc:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54dd0:  00000000   nop
   54dd4:  dfa40008   ld  a0,8(sp)
   54dd8:  afa20014   sw  v0,20(sp)
   54ddc:  64050000   daddiu  a1,zero,0
   54de0:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54de4:  00000000   nop
   54de8:  d7a00018   ldc1  $f0,24(sp)
   54dec:  00401825   move  v1,v0
   54df0:  8fa20014   lw  v0,20(sp)
   54df4:  f4600000   sdc1  $f0,0(v1)
   54df8:  3c010000   lui  at,0x0
   54dfc:  24210000   addiu  at,at,0
   54e00:  ac430004   sw  v1,4(v0)
   54e04:  ac410000   sw  at,0(v0)
   54e08:  00403025   move  a2,v0
   54e0c:  3c010000   lui  at,0x0
   54e10:  24210000   addiu  at,at,0
   54e14:  00202025   move  a0,at
   54e18:  64050006   daddiu  a1,zero,6
   54e1c:  64080001   daddiu  a4,zero,1
   54e20:  01003825   move  a3,a4
   54e24:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54e28:  00000000   nop
   54e2c:  d7ac0018   ldc1  $f12,24(sp)
   54e30:  3c010000   lui  at,0x0
   54e34:  d42d0000   ldc1  $f13,0(at)
   54e38:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54e3c:  00000000   nop
   54e40:  f7a00020   sdc1  $f0,32(sp)
   54e44:  08000000   j  0 <(internal/gclayout.Layout).AsPtr>
   54e48:  00000000   nop
   54e4c:  3c010000   lui  at,0x0
   54e50:  24210000   addiu  at,at,0
   54e54:  00202025   move  a0,at
   54e58:  64050005   daddiu  a1,zero,5
   54e5c:  64080000   daddiu  a4,zero,0
   54e60:  01003025   move  a2,a4
   54e64:  01003825   move  a3,a4
   54e68:  0c000000   jal  0 <(internal/gclayout.Layout).AsPtr>
   54e6c:  00000000   nop
   54e70:  08000000   j  0 <(internal/gclayout.Layout).AsPtr>
   54e74:  00000000   nop
   54e78:  08000000   j  0 <(internal/gclayout.Layout).AsPtr>
   54e7c:  00000000   nop

At first glance things look normal… until you look into some of those instructions:

ldc1  $f0,0(at)
sdc1  $f0,32(sp)

You see, LDC1 and SDC1 are instructions for loading double words to coprocessor 1 (the FPU). However, Sony didn’t implement these instructions in the PS2 CPU, as seen in their own manual (EE Core User’s Manual, page 50):

To fix this, we can do the same LLVM hack as before: for the floating point instructions, force anything that is using float64 to use software instead of hardware operations:

  setOperationAction(ISD::FABS, MVT::f64, LibCall);
  setOperationAction(ISD::FADD, MVT::f64, LibCall);
  setOperationAction(ISD::FSUB, MVT::f64, LibCall);
  setOperationAction(ISD::FMUL, MVT::f64, LibCall);
  setOperationAction(ISD::FDIV, MVT::f64, LibCall);
  setOperationAction(ISD::FREM, MVT::f64, LibCall);
  setOperationAction(ISD::FP_ROUND, MVT::f64, LibCall);
  setOperationAction(ISD::BITCAST, MVT::f64, LibCall);
  setOperationAction(ISD::LOAD, MVT::f64, LibCall);
  setOperationAction(ISD::STORE, MVT::f64, LibCall);

With all of that, we finally get working float64 operations:

… or do we?

Why can’t we have nice things?

Great! Now that we’ve sorted out the core of the arithmetic functions, we need to make sure some other stuff is functional. Do you wanna know a really cool way to test this? Formatting strings.

I’m not sure why exactly, but formatting strings has been a freaking nightmare in this project. It always breaks if something isn’t just right. This is ok though, as this also becomes a cool way of testing our code!

Let’s do some tests:

type genericTest[T comparable] struct {
  name     string
  fn       func() T
  expected T
}

var (
  // Some very simple tests
  stringTests = []genericTest[string]{
    {"emp", func() string { return "" }, ""},
    {"cst", func() string { return "Hello, World!" }, "Hello, World!"},
    {"app", func() string { return "Hello, " + "World!" }, "Hello, World!"},
  }

  // Formatting strings
  formatStringTests = []genericTest[string]{
    {" %%s", func() string { return fmt.Sprintf("%s!", "abc") }, "abc!"},
    {" %%v", func() string { return fmt.Sprintf("%v!", "abc") }, "abc!"},
  }

  // Formatting integers
  formatIntegerTests = []genericTest[string]{
    {" s8", func() string { return fmt.Sprintf("%d", int8(42)) }, "42"},
    {" u8", func() string { return fmt.Sprintf("%d", uint8(42)) }, "42"},
    {"s16", func() string { return fmt.Sprintf("%d", int16(1234)) }, "1234"},
    {"u16", func() string { return fmt.Sprintf("%d", uint16(1234)) }, "1234"},
    {"s32", func() string { return fmt.Sprintf("%d", int32(123456)) }, "123456"},
    {"u32", func() string { return fmt.Sprintf("%d", uint32(123456)) }, "123456"},
    {"s64", func() string { return fmt.Sprintf("%d", int64(1234567890)) }, "1234567890"},
    {"u64", func() string { return fmt.Sprintf("%d", uint64(1234567890)) }, "1234567890"},
  }

  // Formatting floats
  formatFloatTests = []genericTest[string]{
    {"32f", func() string { return fmt.Sprintf("%.5f", float32(1.234)) }, "1.23400"},
    {"32v", func() string { return fmt.Sprintf("%v", float32(1.234)) }, "1.234"},
    {"64f", func() string { return fmt.Sprintf("%.5f", float64(123456789.123456789)) }, "123456789.12346"},
    {"64v", func() string { return fmt.Sprintf("%v", float64(123456789.123456789)) }, "1.2345678912345679e+08"},
  }
)

And this is where our new nightmare begins:

Ok, it can’t format even float32 now. Great. Let’s look at the instructions it is generating for float32 and float64:

0005af28 <main.validateAllNumberTests[float32]>:
   // ...
   5b084:   c4800008    lwc1    $f0,8(a0)
   5b088:   c481000c    lwc1    $f1,12(a0)
   5b08c:   c4820010    lwc1    $f2,16(a0)
  // ...
   5b0a0:   e7a200d8    swc1    $f2,216(sp)
   5b0a4:   e7a100d4    swc1    $f1,212(sp)
   5b0a8:   e7a000d0    swc1    $f0,208(sp)
   // ...

0005b650 <main.validateAllNumberTests[float64]>:
   // ...
   5b7ac:   d4800008    ldc1  $f0,8(a0)
   5b7b0:   d4810010    ldc1  $f1,16(a0)
   5b7b4:   d4820018    ldc1  $f2,24(a0)
   // ...
   5b7c8:   f7a20108    sdc1  $f2,264(sp)
   5b7cc:   f7a10100    sdc1  $f1,256(sp)
   5b7d0:   f7a000f8    sdc1  $f0,248(sp)
   // ...

So, for float32, it is generating the correct instructions: LWC1 and SWC1. However, for float64 it is still using LDC1 and SDC1, even though we told it that it had to use a LibCall for that. Interesting, and quite odd, as the operations we tried before seem to be working just fine? This might be because the numbers I’m using are lower precision, but that’s something we can fix later.

Fun fact: while investigating this, I noticed that I already tell the compiler to not use LDC1 and SDC1 through the -mno-ldc1-sdc1 flag, but it seems to ignore it. My theory is that the way TinyGo calls the LLVM is a bit different and that flag gets lost, or something rewrites it along the way. Anyway.

After careful consideration (aka asking ChatGPT a few things), I’ve noticed that if I try to force single float through a flag in our ps2.json configuration file, it breaks the LLVM:

Stack dump: 0. Program arguments: /Users/ricardo/dev/tinygo/llvm-build/bin/clang -fno-pic -c --target=mips64el -mcpu=mips3 -fno-inline-functions -mabi=n32 -mhard-float -mxgot -mlittle-endian -v -o build/test.o build/test.ll 1. Code generation 2. Running pass 'Function Pass Manager' on module 'build/test.ll'. 3. Running pass 'MIPS DAG->DAG Pattern Instruction Selection' on function '@"(*sync.Once).Do"'

Going back and forth with it, it eventually told me about some constraints being added by the defer code generated by TinyGo. This is how it handles a defer checkpoint on mips:

  case "mips":
    // $4 flag (zero or non-zero)
    // $5 defer frame
    asmString = `
.set noat
move $$4, $$zero
jal 1f
1:
addiu $$ra, 8
sw $$ra, 4($$5)
.set at`
    constraints = "={$4},{$5},~{$1},~{$2},~{$3},~{$5},~{$6},~{$7},~{$8},~{$9},~{$10},~{$11},~{$12},~{$13},~{$14},~{$15},~{$16},~{$17},~{$18},~{$19},~{$20},~{$21},~{$22},~{$23},~{$24},~{$25},~{$26},~{$27},~{$28},~{$29},~{$30},~{$31},~{memory}"
    if !strings.Contains(b.Features, "+soft-float") {
      // Using floating point registers together with GOMIPS=softfloat
      // results in a crash: "This value type is not natively supported!"
      // So only add them when using hardfloat.
      constraints += ",~{$f0},~{$f1},~{$f2},~{$f3},~{$f4},~{$f5},~{$f6},~{$f7},~{$f8},~{$f9},~{$f10},~{$f11},~{$f12},~{$f13},~{$f14},~{$f15},~{$f16},~{$f17},~{$f18},~{$f19},~{$f20},~{$f21},~{$f22},~{$f23},~{$f24},~{$f25},~{$f26},~{$f27},~{$f28},~{$f29},~{$f30},~{$f31}"
    }

Funny enough, disabling the constraints for the FPU registers on single float mode makes it… work? Do I understand that? Not at all. Do I care? Also not at all!

Yes, I know, I wanted to do as little code changes to TinyGo as I can, but unfortunately it is what it is. 🤷‍♂️

// ...
if !strings.Contains(b.Features, "+soft-float") && !strings.Contains(b.Features, "+single-float") {
// ...
}

And sure enough, disassembling the code I no longer see the invalid instructions:

$ mips64r5900el-ps2-elf-objdump -d test.o | grep -i ldc1 | wc -l
0
$ mips64r5900el-ps2-elf-objdump -d test.o | grep -i sdc1 | wc -l
0
$ mips64r5900el-ps2-elf-objdump -d test.o | grep -i lwc1 | wc -l
82
$ mips64r5900el-ps2-elf-objdump -d test.o | grep -i swc1 | wc -l
56

Disclaimer: this will definitely bite me back in the future, but I’m ok with that for now. Let’s tackle one problem at a time. Plus, we haven’t used defer on live hardware exactly because it breaks things.

Fun fact, with these changes, we no longer need to modify the LLVM to use LibCall for the f64 instructions. As such, I’ve disabled that change for now.

The final piece in the puzzle is now that the floating point values do not match. This is fine, we can implement some code around it on our tests to consider “close enough” numbers to be equal:

    switch any(got).(type) {
    case float32:
      const eps = 1e-5
      equal = math.Abs(float64(any(got).(float32))-float64(any(expected).(float32))) <= eps
    case float64:
      const eps = 1e-9
      equal = math.Abs(any(got).(float64)-any(expected).(float64)) <= eps
    default:
      equal = got == expected
    }

Hacky, but it works. Thanks ChatGPT!

And, finally:

Plus, the best part in my opinion is this - it works on real hardware:

This time captured over HDMI. Loading through ps2link for simplicity.

Finally! Ok, now we’re done!

The Source Code

This project has arrived at a point where I’m comfortable enough to release its source code. The whole thing consists of 3 repositories.

The first repository is my code with all the demos, test applications, etc. It’s all very much hacked together and suffers from terrible coding standards as this is all for weekend fun.

The second and third repositories are the TinyGo and LLVM forks, used to maintain all the code hacks and changes I’ve done to make things work together.

Since this is all a bunch of hacks that I managed to make work together, I do not intend to send pull requests upstream (to TinyGo’s repository) at this moment. If things gets more stable and cleaner in the future, this might change. As such, for now, you’ll have to rely on my repositories to get things to work.

That said, as usual, things as provided AS IS, meaning that no support is provided and I’m not responsible if this breaks your machine and/or console. So… use it at your own discretion!

Have fun!