Wednesday, February 23, 2011

Making of Hyperthreads

I have been working on a fun project where I am trying to virtualize a MIPS machine. For a QEMU emulated NE2000 based network device and for couple of system daemons, I needed a light threaded framework. While the earlier design was that anything in hypervisor context was run on CPU and thus was completely serialized. So we decided on doing something differently here.

We created a virtual CPU and named it hypercore. This virtual CPU is supposed to run anything in context of hypervisor. This virtual CPU, whenever scheduled, always works in kernel mode.

The idea was simple, anything of _maintenance_ sort was scheduled later on that hypercore. I then went on implementing the threading mechanism. Its very similar to what is know as kernel threads in Linux(R) world. There is one function which the thread is supposed to execute. Threads have a lifetime: they are created, run, paused, run again and then finally destroyed.

Initial implementation was non-preemtible. This means that threads had to behave themselves and _yield_ the VCPU if they didn't had anything much to do. In the initial implementation, when a thread wanted to yield, it used to call schedule. The thread or hypercore scheduler used to then find the next runnable thread, and call switch context. As in any context switching code, this used to halt the current thread and the new thread was loaded. Next time again when the older thread is loaded, it used to return back to the same "schedule" function and continue its task. Initially I had a little hard time implementing the schedule code but then it was easy when I understood the fact:

"Current running thread enters the schedule but when schedule returns it returns in next thread context and this continues."

One other thing that helps in context switching is independent stack for each thread. We are having 4KiB allocation of memory for each thread. Top of it works as the stack and at the base lies the thread information. Similar to what we have in Linux(R). This makes information very easily accessible. Just mask of low 3 nibbles and you have the current thread's information. Also, when context swicthing happens much of the information about where was schedule called from and things like that are saved by the compiler on the threads stack. After switching out, this information is intact. When threads switches back in, compiler again finds the same important pieces of information at the same place!

But very soon after this non-preemtible implementation it became quite important to have pre-emtible scheduling. For non-preemtible scheduling, we gave time slice to each thread and on every interrupt, currently running threads ticks were increased. Once they reached max, it was scheduled out and new thread was pulled in. This is how it happens:

If thread A is running and interrupt happens, the A's complete context is saved on a temporary stack. Its same stack everything since we don't support nested interrupts right now. The tick of current is incremented. If it has reached max count, the next thread to be run is figured out from scheduler's queue. The A's context is saved in A's thread info and B's (new thread) context is loaded in same interrupt stack. One the function that does this all returns, the interrupt handler finds, EPC and RA and every other register updated with B's context. When eret is executed, EPC of B is loaded in PC and system actually starts executing B and not A.

Scheduling of VCPU, TLB miss handling and other core virtualizing part are still handled outside this VCPU. This gives some benefits. For non-core stuff, all threads have to share time slice given to hypercore. We _don't_ penalize the guest. No matter how many threads are there on the system, the guest gets its guaranteed time.