Minimal ANSI Renderer – Aaron Savage's Blog

ANSI fractal demo — Fractal noise demo running in kiTTY.

Inspired by:

everything you ever wanted to know about terminals

As someone who loves the terminal, I have often suspected that I could re-create the parts of NCurses functionality that I use in a few hundred lines of code. I’m not interested in developing for vintage computers. (I love them, but that’s a hobby that will have to wait until I move to a bigger apartment.) I haven’t found myself using the “window” functionality in NCurses. I always wrap NCurses “color pairs” in a function that allows me to arbitrarily paint any combination of colors in any cell. All I really want is to be able to “paint” my terminal, redraw a full screen of colored text (preferably at 60fps) and accept non-blocking keyboard input. It turns out that it really doesn’t take much code to do all this from scratch.

Now, I don’t have any practical reason for doing this. NCurses is actually very fast. It took a lot of experimentation and research to get full-screen updates as fast as those in libcaca. I also can’t call NCurses “bloat” in this day and age. It’s available everywhere and we constantly see it used in TUI utilities. There’s no reason not to use it. I just find it satisfying being able to do something like this from scratch, and it also led me to learn a number of things about the terminal, and about *nix and text in general.

I had three parts to figure out: Setup, data representation and rendering, and optimization.

Setup:

This requires some esoteric ANSI escapes and system calls with appropriate flags:

int start_term(){
    // Enter the alternate buffer.
    printf("\x1b[?1049H");
    // Turn off stdout buffering
    char ch_buffer;
    setvbuf(stdout, &ch_buffer, _IONBF, 1); 
    // Clear the screen.
    printf("\x1b[2J");
    // Hide the cursor.
    printf("\x1b[?25l");
    // Save the initial term settings.
    tcgetattr(1, &backup);
    t = backup;
    // Turn off echo and canonical mode.
    t.c_lflag &= (~ECHO & ~ICANON);
    // Send the new settings to the term
    tcsetattr(1, TCSANOW, &t);
    signal(SIGWINCH, resize);
    resize(0);
    signal(SIGTERM, die);
    signal(SIGINT, die);
    // Make input non-blocking
    fcntl(2, F_SETFL, fcntl(2, F_GETFL) | O_NONBLOCK);
}

There’s a similarly arcane cleanup routine:

void endTerm(Window* window){
    // Reset colors
    printf("\x1b[0m");
    // Clear the alternate buffer.
    printf("\x1b[2J");
    // Return to the standard buffer.
    printf("\x1b[?1049L");
    // Show the cursor.
    printf("\x1b[?25h");

    fcntl(2, F_SETFL, fcntl(
      2, F_GETFL) &
      ~O_NONBLOCK);
    restore();
    fputs("\x1b[1;1H", stdout);
    freeWindow(window);
}

I went through several versions of this before I was able to cleanly enter and exit the alternate buffer. There are also a few routines for handling resizing and killing the terminal. This part was all a matter of figuring out what sequence of very specific function calls I needed. Easy enough.

Then I needed to decide how to organize my data, and how to transform it. After some experimentation and deliberation, I decided to stay with a simple ANSI format for now. Unicode not only requires more complex logic, it also introduces possible compatibility problems depending on the user’s choice of terminal emulator and font that can’t be addressed directly by the application itself. So, for now, I’m sticking with something simple and foolproof.

No need for anything fancy here. I want each cell on my terminal to have its own character, foreground color and background color:

typedef struct Cell{
    int color;
    int bg_color;
    char character;
} Cell;

typedef struct Canvas{
    int width, height;
    Cell* cells;
} Canvas;

Canvas* createCanvas(int width, int height);
void freeCanvas(Canvas* canvas);

int getCanvasBgColor(Canvas*, int x, int y);
int getCanvasFgColor(Canvas*, int x, int y);
char getCanvasCharacter(Canvas*, int x, int y);
void setCanvasCharacter(Canvas*, int x, int y, char character);
void setCanvasFgColor(Canvas*, int x, int y, int color);

Where it starts to get complicated is sending all this data to the screen in the form of one giant string. Theoretically, it might be necessary to change the foreground color and background color for every cell. If we have a letter ‘E’ with a cyan foreground and a green background, that requires the sequence:

\x1b[36;42mE

Escape, left bracket, two digits for foreground, semicolon, two digits for background, ‘m’ to specify a formatting ANSI escape sequence, then the actual letter ‘E’. That means my output buffer is potentially pushing width * height * 9 chars to stdout every frame. This would be no problem in OpenGL or SDL, but terminals and terminal emulators aren’t really designed for this. A naive implementation resulted in severe tearing on all my devices — depending on the complexity of the image, the frame would get updated about 3/4 of the way before the rendering loop would need to restart. After struggling with this for some time, and being completely humbled by the perfect fluidity of the libcaca animations or in stunning demoscenes such as BB , I gave up for a long time and converted my personal projects to NCurses.

Two changes got me the performance I was looking for:

1. Manual buffer management.

2. Skipping stdout and writing directly to /dev/tty*

Manual buffer management:

void termRefresh(Window* window){
  char* pointer = window->buffer;
  cursorReturn(&pointer);
  int currentFgColor;
  int currentBgColor;

  for(int y = 0; y < term_height; y++){
    currentFgColor = -1;
    currentBgColor = -1;
    for(int x = 0; x < term_width; x++){
      int offset = x + y * MAX_VIEW_WIDTH;
      int nextFgColor = window->canvas->cells[offset].color;
      int nextBgColor = window->canvas->cells[offset].bg_color;
      updateColor(&pointer, &currentFgColor, &currentBgColor,
        nextFgColor, nextBgColor);
      char next_char = window->canvas->cells[x + y * MAX_VIEW_WIDTH].character;
      addChar(&pointer, next_char);
    }
  addChar(&pointer, '\n');
  }
  // Cut off that last newline so the screen doesn't scroll
  pointer--;
  write(window->tty, window->buffer, pointer - window->buffer);
}

Even with unbuffered stdout going directly to the screen, I was still getting severe tearing. I figured out fairly quickly that one of my main bottlenecks was repeated syscalls to send text to the terminal. A screen full of individual putc() will work fine for many applications, but as I described above the sheer volume of text needed for each fullscreen output became prohibitively large. I’m actually not even using a null-terminated C-string here, just an array of char. The addChar() function inserts the next character and increments the insertion pointer, then when we get to the end we can do a single write() call for the whole screen, using simple pointer arithmetic to determine that we want to write ([insertion pointer] minus [buffer startpoint]) bytes. I went through several versions of this where I got much cuter with the pointer math, and I eliminated as much function-call overhead as possible, but the performance gain really came from the reduction of syscalls.

As I’m writing this, I decided to take a quick look at GCC’s assembly output because I assumed it was automatically inlining all my short function calls to addChar/increment pointer, but it turns out that’s not the case at all. Way way down at the end of my nested Y/X loop:

call	updateColor # I didn't expect this one to get inlined
movq	-56(%rbp), %rax
movq	8(%rax), %rax
movq	8(%rax), %rcx
movl	-4(%rbp), %eax
sall	$8, %eax
movl	%eax, %edx
movl	-8(%rbp), %eax
addl	%edx, %eax
movslq	%eax, %rdx
movq	%rdx, %rax
addq	%rax, %rax
addq	%rdx, %rax
salq	$2, %rax
addq	%rcx, %rax
movl	8(%rax), %eax
movb	%al, -29(%rbp)
movsbl	-29(%rbp), %edx
leaq	-40(%rbp), %rax
movl	%edx, %esi
movq	%rax, %rdi
call	addChar # But I thought surely this one would be!
movl	$0, -16(%rbp)
addl	$1, -8(%rbp)

No big deal I guess, but I’m glad I investigated.

Having a single perfectly-sized write() call for every frame, instead of using something like putc() or printf() calls, was my biggest design success in this project so far.

Skipping stdout and writing directly to /dev/tty*:

The details of this write() call lead me to my other optimization breakthrough: for whatever reason, skipping stdout and writing my string of bytes directly to /dev/tty* gave me a huge performance boost. (I’m guessing stdout is doing work that I don’t need, maybe I’ll find myself looking into the implementation details at some point.)

This line:

open(ttyname(STDIN_FILENO), O_RDWR | O_SYNC);

Gives me an integer index to the current TTY session (in passing I discovered how I could write code that would access other TTY sessions in progress). Those flags give me read/write access, and O_SYNC is needed because otherwise the write call won’t be finished at the exact moment I exit my program, and I’ll end up with garbage on the screen after quitting. This entailed some fun times reading the Linux manpages:

O_SYNC       Write operations on the file will complete according to
             the requirements of synchronized I/O file integrity
             completion (by contrast with the synchronized I/O data
             integrity completion provided by O_DSYNC.)

             By the time write(2) (or similar) returns, the output data
             and associated file metadata have been transferred to the
             underlying hardware (i.e., as though each write(2) was
             followed by a call to fsync(2)).  See NOTES below.

I also spent quite a bit of effort trying to implement double-buffering logic that would use one of the cursor-movement ANSI escapes to skip forward when the backbuffer already contains the correct combination of char and colors at the next cell. However, this logic got fairly complicated and the above optimizations got me to a solid 60fps with no tearing, so I have removed this code for the time being. (Though I suspect I will lay awake at night haunted by the overhead of those redundant chars.)

Another thought — is it possible to access the TTY using mmap() to make this even faster? I’ve never written anything so low-level before, so until I learn some more about how the Linux “file metaphor” works, I’m going to give up on this strategy for the time being.

I’m currently working on a more complex version of this renderer that supports unicode because I want to develop a system of dithering using the 8+8 colors and the unicode Block Element glyphs (similar to classic ANSI art), as well as some standard functions for drawing shapes and “blitting” of text “canvases”, but as I saw the scope of the project creeping, I decided to dial it back and have a simpler finished product for now.

Leave a Comment Cancel Reply