Cortex Work

Ayo, it’s been a long time since I have posted here. Busy times. Boukou okipe. I have started again to work at the Charity Centre, taking care of the English books. Apart from doing the sorting and testing of all electronics, which I had been doing all year, I also helped to open the shop in the morning. So I realised that all the work I had put into the Book Corner in 2022, before I had stopped this work, was slowly deteriorating. Entropy creeping in. A library needs continuous work, maybe not much at a time, but all the time. Else disorder ensues. I could not want to let this happen anymore! I am there on open days, and leave after two or three hours. No stress. I know I can leave anytime. So all good. No frictions.

Ah yes, there’s air conditioning in the Book Corner. Nice. It’s hot here now. Summer in full swing.

Challenging My Cortex with Cortex

I am a firm believer that challenging your mind is healthy. Especially when getting old. OK, let’s say, older. I enjoy digging into technical challenges, to the extent of sleepless nights. I got into a deep one in the past weeks.

ARM has a series of microprocessors designs that are dubbed “Cortex” such as “Cortex-M0”, or “Cortex-M3”, with the Mx moniker determining the architecture they implement. Cortex-M0 implements ARMv6, for example, Cortex M3 ARMv7. The higher the M-number, the more complex and capable the processor is. Cortex-M4 allows for floating point co-processors, Cortex-M0 does not. You get the picture.

After working mostly on the FPGA-based RISC5 hardware and software in the past year, I went back to dabble with the Cortex processors. The M-series. Which of course are fixed in silicon, no hardware development for the FPGA needed, or even possible. Pros and cons. It’s amazing to be able to tune the processor hardware to exactly what the overall functionality requires in general, and the software in particular. Software can become way simpler if you can tune the hardware. With processors fixed in silicon — that is, every commercially available one — you have a bajillion configuration options that your software must set and control. With an FPGA, you just create the hardware that fits your software. Or remove the need to even have software for that part. With a fixed-silicon system-on-a-chip, only to get it up and running requires a plethora of bits to be set in the right sequence, for different elements, such as the oscillators and the PLLs, and whatnot.

RP2040

raspberry pico

Raspberry Pi has developed a micro-controller a few years back, the RP2040. It sports a Cortex-M0+ core, ARMv6 architecture. Yes, the “+” denotes an improved M0. Unlike the other Raspberry Pi products, the RP2040 chip, and the corresponding board, the Pico, are “bare metal” products. No operating system. My cup of tea. No, I don’t drink tea, but anyway. The processor will fetch the initial stack pointer from address zero of the vector table, and the address of the first code instruction from address four, and that’s it. Se sa. OK, it’s a bit more involved in reality, since there’s also a boot ROM, but you get the picture. Your code runs, nothing else.

I have been using Astrobe for the Cortex processors for some time. It’s an IDE, which cross-compiles for the microprocessor — on Windows. Yes, I know. Se lavi-la. There’s a variant for the M0 processor. But there’s a twist. Up to now, only processors with one core were considered and supported. The RP2040 contains two M0+ cores. Challenge accepted!

The datasheet for the RP2040 is some 600+ pages. If this sounds like a lot, consider that the docs for M0 processors from STM are 1,000+ pages. But they also provide more different peripherals on the chip (SoC). The RP2040 is simpler. I like simple processors. The RP2040 is very well designed, very systematic, highly structured, very logical. I like structured and logical. I had to have the RP2040 manual printed, in order to be able to jump back and forth, to keep the thumb at one location while looking up something at another. A PDF is great for searching, but for browsing, from getting an overview deep into understanding, nothing beats a printed document. Or maybe it’s just my age.

Tools for the Job

Astrobe is a good toolset. Ultrafast turn-around times, from source code to executables loaded onto the microprocessor. But the binaries created cannot be used directly for the RP2040. Alas, it’s not just a question of the file format, but also some contents. The RP2040 requires part of the bootloader in the binary, unlike all other Cortex processors I had worked with. Head-scratcher. Anyway, I have figured it out. I have written a binary file translator, which also inserts the boot code at the beginning. Now, sit down. I have written that tool in Python. My first Python program in my life. I am not a big fan of that language, but had a similar tool at hand I could use as starting point. How difficult can it be to quickly learn and use yet another programming language, right? Turns out, Python is actually pretty useful for this kind of work, or tool. I might even use it for other purposes going forward, in lieu of, say, a bash script. I still don’t like the indentation-based structuring, but hey.

So, on Christmas morning, I had my first program running on the RP2040, compiled and linked with Astrobe, then transmogrified with my Python thingie, and copied over to the RP2040, which presents itself as drive on Windows if set up accordingly. An utterly simple program, but I had a first version of a tool-chain. Yay. As of today, my Python tool does all the work, transmogrification and copying.

Bare Metal

Now, go a few steps back. Nothing works yet. Put yourself into the scene. I have a chip I have never worked with. Over 600 pages of cryptic information. A compiler and linker that are not made for this chip. Even a relevant compiler defect, impacting the correctness of my code. No run-time libraries. An essential tool I had made, or was making, in a programming language I have never used before. If things don’t work, where do you even start?! It could be anything. Ayo. You can’t even print debug information to a terminal. No printing yet. No anything yet, in fact. You cannot connect an oscilloscope either. Those multi-layer boards with their tiny surface-mounted elements don’t really allow to measure anything on them with a hand-held probe.

As outlined above, maybe the processor does not work, since I have not configured the clock circuits correctly. Lots of bits in specific hardware registers to set right, in the correct order. There’s a C/C++ SDK provided by Raspberry, which does all that for you, but who wants to program in C/C++? It gives me a headache. It’s complex and bloated. In the end, I found a Youtube video of a guy, who does it in assembly language without the SDK, so I could at least check if my basic approach was correct. OK, done, I was more confident that at least the basic clock circuitry was up and running. No clock, no fun.

To get the Python tool right, I used a — wait for it — binary file reader, and a tool to display differences in binary files. You know, comparing hex numbers to binary assembly code in ARM’s architecture documentation. Disassembly in my head. In my cortex. Pun intended.

The next breakthrough came when I realised that the Astrobe binary files, which contain the initial stack pointer value, and the code entry point as their first values, would actually work “as usual” with the right bootloader code. I skip over setting addresses right in the compiler and linker, and all that. And actually creating the bootloader. You know, assembly programming. My Python tool now adds the bootloader – more precisely, the code for boot phase 2 – as compiled binary block right in front of Astrobe’s binary output. The latter’s format needs to be translated into a so called UF2 file, but that was the easy part. Just chopping up the binary file into 256 byte blocks, and adding a few headers for each.

Sometimes one has to go back to basics to again realise that on the lowest levels, all is just numbers in the right places, both in files and in memory. The processor does not care about any fancy software concepts, it just diligently loads instructions (ie. numbers), and executes them. Yes, I simplify. Especially if you think about the processors that power our laptop computers, with their crazy parallelism and whatnot right on execution level. When I read about, and try to understand, these powerful processors, it blows my mind. Luckily, the RP 2040 and its ilk are a bit simpler. There are also pipelines and caches, but nothing as crazy as you find in, say, an Apple M3 chip.

First Success

Seeing the LED light up on the Pico board after transmogrifying and transferring my simple test program with my Python tool was reason to have a drink. At times, the most simple things are enough to feel good.

I was set to go. Remember, there are the two cores waiting to have fun with. More about this in the next instalment. Stay tuned. Maybe it will not take another two months.