User Input

In a previous section, we wrote text out to the user, but have had no way to get input back from the user. In this section, we will introduce a new system call which allows us to read a line of text from the console.

sys_read is the opposite of sys_write. While sys_write writes data from memory to the console, sys_read reads data from the console and saves that data into memory for later use by the program. Making a sys_read system call is very similar to using sys_write: all we have to do is set the registers to the appropriate values and tell the operating system when we're ready.

To make a sys_write call:

  • rax must be set to 1, indicating sys_write
  • rdi must be set to 1, indicating stdout (console output).
  • rsi must be set to an address in memory where the string to be printed can be found.
  • rdx must be set to the number of characters to write from memory to the console.

Compare the above to sys_read, which is pretty similar:

  • rax must be set to 0, indicating sys_read
  • rdi must be set to 0, indicating stdin (console input).
  • rsi must be set to an address in memory where the input string can be saved.
  • rdx must be set to the maximum number of characters to accept from the user.

The first program we'll make that uses sys_read will be very simple. It will accept input from the user and then print that same string right back out. Create a new file called "repeat.asm" and type the following program into it:

%define sys_exit 60
%define sys_read 0
%define sys_write 1

%define stdin 0
%define stdout 1

%define success 0

section .bss

    %define buffer_len 64
    buffer: resb buffer_len

section .text

global _start
_start:

; Read input from the user
    mov rax, sys_read
    mov rdi, stdin
    mov rsi, buffer
    mov rdx, buffer_len
    syscall

; Write whatever the user entered back out
    mov rdx, rax
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, buffer
    syscall

; End the program
    mov rax, sys_exit
    mov rdi, success
    syscall

There are three high-level operations here:

  1. Read a line of input from the user and save that input into memory.
  2. Write the input string from memory back out to the console.
  3. Exit the program.

Let's go through the source file in detail:

%define sys_exit 60
%define sys_read 0
%define sys_write 1

%define stdin 0
%define stdout 1

%define success 0

These are the constants we'll be using to make system calls. sys_exit is 60, stdin is 0, etc. This is just like previous programs, but we've added some new definitions because of the new system call being made.

section .bss

    %define buffer_len 64
    buffer: resb buffer_len

This is a new section type. Previously, we have worked with the text and data sections. This is a new type of section, called bss. Take a look at how these three sections compare:

  • The text section is where code (instructions) go.
  • The data section is for initialized data. This is memory for which we have an initial value when the program starts. In the "Hello, world!" section, we printed a string out to the user whose value we knew ahead of time.
  • The bss section is for uninitialized data. This is memory which will be set dynamically by the program as it runs. Since the value of this memory will be set to whatever the user enters, we don't know what it will be ahead of time.

We could use the data section for this if we really wanted to, by giving buffer some garbage initial value that we expect to be overwritten, but it's wasteful to include that garbage data in the executable file. The bss section allows us to say we need a region of memory reserved, without actually taking up that number of bytes on disk. The operating system will reserve the requested number of bytes in memory each time the program runs.

So the purpose of this section is to make a region of memory which the user's input can be written to. Let's break it up into pieces and check out each line individually:

section .bss

This defines the beginning of the bss section, where any uninitialized memory is declared.

    %define buffer_len 64

This creates a constant called buffer_len, which will be the total number of bytes of memory reserved for storing user input. In this case, anywhere we use the text "buffer_len" in the code, it will be replaced with the number 64. This value can be basically whatever you want, but 64 is a reasonable number in this case.

Note: this is not actually part of the bss section. %define is an example of an assembler directive, which does not translate directly to machine code. It's a convenience offered by the assembler which allows us to define the size of the buffer once and then refer to it elsewhere, so if we ever want to change the size of the buffer, we only have to change it in this one place.

    buffer: resb buffer_len

This is where the magic happens. This line declares the area in memory where the user's input will be stored. This is in 3 sections:

  • buffer is the name of the area in memory we're declaring. Anywhere in the code that we use the name "buffer" will be replaced with the memory address to the beginning of this region in memory.
  • resb stands for "reserve bytes". This tells the assembler we're reserving some number of bytes of memory.
  • buffer_len gives the number of bytes we want to reserve. In this case we're using the constant buffer_len, which is 64. We could alternatively just type the number 64 here.

All together, this reserves a 64-byte area in memory which we can refer to by the name "buffer". When the program runs, this memory will be reserved for the program and we'll be able to read and write to it.

section .text

Now we're getting into more familiar territory. This is where the code begins.

global _start
_start:

This is the entry-point of the program, marking the first instruction that will be executed when the program is run.

; Read input from the user
    mov rax, sys_read
    mov rdi, stdin
    mov rsi, buffer
    mov rdx, buffer_len
    syscall

The first thing the program does is read input from the user by making a sys_read system call. Like other system calls, we set up the registers with the details of the operation we want carried out and then issue the syscall instruction, which notifies the operating system to do our bidding.

In this case, we're telling the operating system to read characters from the console and store them in memory at the location given by buffer. The operating system will let the user type until they hit the enter key, and then up to 64 characters of text will be saved to memory. After the syscall instruction executes, the total number of bytes entered by the user will be available to us in the rax register. Whatever text the user entered will be stored in memory, and we'll be able to access it through the label buffer.

; Write whatever the user entered back out
    mov rdx, rax
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, buffer
    syscall

Now that the user's input is stored in memory and we can reference that region of memory with the name buffer, we just print whatever the user typed right back out to them.

This is very similar to previous sys_write calls, with one major difference. Previously, we printed a static string "Hello, world!", meaning that we knew what the string would be ahead of time, as well as how many characters it would be. This time around, we don't actually know how many characters the user may have entered. We know they couldn't have entered more than 64 characters, but other than that, we have no idea. Luckily, sys_read returns the number of characters the user entered in the register rax. sys_write expects the number of characters to write to be in the register rdx. So we copy the value left by sys_read in rax to rdx, where sys_write expects it.

Altogether, this system call tells the operating system to write the number of bytes that were previously read from memory, starting at the address buffer, out to the console.

; End the program
    mov rax, sys_exit
    mov rdi, success
    syscall

Finally, we make a third system call to exit the program successfully.

Make sure the program is typed correctly as listed above, save it as "repeat.asm", and run it using the "run" script from previous sections:

./run repeat

The program should appear to pause and do nothing, waiting for input from you. Type some text (like "Greetings!") and press enter. The program should repeat whatever you typed and exit. The total output should look something like this:

Greetings!
Greetings!
0

Remember that 0 is the program status code, indicating that the program exited successfully.

Prompting

The "repeat.asm" program doesn't tell the user what to do: it just hangs until the user presses enter. We can mix and match sys_write and sys_read calls to provide some instructions to the user and some formatting to the output.

For this next program, we're going to ask the user for their name and then greet them. This can be broken down into the following system calls:

  1. sys_write - print "Please enter your name: "
  2. sys_read - input the user's name
  3. sys_write - print "Hello, "
  4. sys_write - print the user's name
  5. sys_write - print "!"

This will produce final output that looks a bit like this (depending on what you enter):

Please enter your name: Brian
Hello, Brian!

To get started, create a new file called "helloname.asm" and type the following program in:

%define sys_exit 60
%define sys_read 0
%define sys_write 1

%define stdin 0
%define stdout 1

%define success 0

%define newline 10

section .bss

    %define name_max_len 64
    name: resb name_max_len
    name_len: resq 1

section .data

    prompt: db "Please enter your name: "
    prompt_len: equ $-prompt

    response_start: db "Hello, "
    response_start_len: equ $-response_start

    response_end: db "!", newline
    response_end_len: equ $-response_end

section .text

global _start
_start:

; Write the prompt out to the console
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, prompt
    mov rdx, prompt_len
    syscall

; Read the user's name from the console
    mov rax, sys_read
    mov rdi, stdin
    mov rsi, name
    mov rdx, name_max_len
    syscall

; Store the number of characters entered by the user
    mov [name_len], rax

; Write the start of the response
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, response_start
    mov rdx, response_start_len
    syscall

; Write the user's name
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, name
    mov rdx, [name_len]
    syscall

; Write the end of the response
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, response_end
    mov rdx, response_end_len
    syscall

; End the program
    mov rax, sys_exit
    mov rdi, success
    syscall

This is a much longer program than the previous one, but it mostly just reuses the same concepts. There are only a couple of new things here. Let's step through it in detail:

%define sys_exit 60
%define sys_read 0
%define sys_write 1

%define stdin 0
%define stdout 1

%define success 0

%define newline 10

These are the same constants we defined before. The only difference is the inclusion of newline, with a value of 10. This is the newline character (produced when you press enter). We'll use this for formatting purposes.

section .bss

    %define name_max_len 64
    name resb: name_max_len
    name_len: resq 1

Here we declare our uninitialized data. Like before, we reserve a 64 byte area in memory for user input. This time we call it name since this is where the user's name will be stored.

We also declare a new value called name_len. This is where we'll store the number of characters the user inputs (the length of name), so we can use it later. The declaration follows the same structure as the name declaration:

  • name_len names the memory we're reserving so we can refer to it in the code.
  • resq means to reserve a quad-word. This is 8 bytes, or 64 bits. On a 64-bit processor, the registers are 64 bits each. This makes 64 bits a natural size for an integer, since it requires no conversion to move it around between registers and memory.
  • 1 means we only need one quad-word reserved. This is not a series of bytes like the string, it's only one piece of data: the number of characters typed by the user.

Altogether, the bss section defines two regions of memory:

  • name, which is 64 bytes and will be used to store up to 64 characters entered by the user.
  • name_len, which is 8 bytes and will be used to store a single integer indicating the total number of characters entered by the user.
section .data

    prompt: db "Please enter your name: "
    prompt_len: equ $-prompt

    response_start: db "Hello, "
    response_start_len: equ $-response_start

    response_end db: "!", newline
    response_end_len: equ $-response_end

Here is the data section, where we declare some initialized data. This is memory for which we have values ahead of time. We're declaring 3 static strings, plus a length count for each:

  • prompt will be shown to the user first, telling them what to do.
  • response_start will be printed before the user's name is repeated back to them.
  • response_end will be printed after the user's name, giving punctuation and formatting to the response: an exclamation point and a newline character.

Each of these also has an accompanying _len value so we know how many characters each string contains.

section .text

global _start
_start:

Now we get to the code!

; Write the prompt out to the console
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, prompt
    mov rdx, prompt_len
    syscall

The first thing we do is make a sys_write call to print out "Please enter your name: " when the program starts.

; Read the user's name from the console
    mov rax, sys_read
    mov rdi, stdin
    mov rsi, name
    mov rdx, name_max_len
    syscall

Next up, we read some input from the user. Whatever they type is stored in memory starting at the address indicated by name.

; Store the number of characters entered by the user
    mov [name_len], rax

After the sys_read call returns, the number of characters entered by the user will be provided in the rax register. We're going to need this later, but unlike in the previous program, we won't be using it immediately. We're going to print the static string "Hello, " first, which will involve overwriting both rax and rdx. By the time we get around to writing the user's name back out, the information we need (the number of characters in the user's name) will be lost.

In order to get around this, we need a place to temporarily save the number of characters in the user's name.

The instruction above copies the value from rax into memory at the address indicated by name_len. Notice the phrasing there. name_len is a memory address: information about where we can store this data. This is unlike dealing with registers, which are storage locations themselves. You can copy a value directly to a register, but when dealing with a memory address you have to clarify that you want to copy the value to memory at the given address.

This is where the square brackets come in. They're necessary because name_len refers to an address in memory where data can be stored. The actual value of name_len might be something like 0x6001b4, or wherever the operating system chooses to put it. We want the value of rax to be copied into memory at that location.

You may be wondering why the square brackets aren't always required. For example, when we read the user's input into memory, the instruction has no square brackets:

    mov rsi, name

In the code above, name is a memory address just like name_len. The difference is that the sys_read system call expects an address. It expects rsi to contain an address in memory where it can write the input data. If we put name in square brackets, that would copy the memory itself into rsi instead of the address. When the sys_read call tried to write to that location in memory it would end up in the wrong place.

Let's take a short digression to explain this better. Here is a table showing some (made up) locations in memory:

Label Address Value ASCII
string 0x6001b0 66 'G'
0x6001b1 114 'r'
0x6001b2 101 'e'
0x6001b3 101 'e'
0x6001b4 116 't'
0x6001b5 105 'i'
0x6001b6 110 'n'
0x6001b7 103 'g'
0x6001b8 115 's'

The table above shows 9 bytes in memory, containing the string "Greetings". Each byte has its own unique address ranging from 0x6001b0 to 0x6001b8. The first byte has a label: string.

If we refer to string directly, we're talking about the memory address. For example:

    mov rax, string

The above instruction would set rax to the value 0x6001b0, which is the address of the beginning of the string.

However, if we refer to string with square brackets, we're referring to the value stored in memory at the address 0x6001b0:

    mov rax, [string]

This instruction would set rax to the value of the first 8 characters in the string: "Greeting". We can also refer to individual characters:

    mov byte al, [string]
    mov byte bl, [string + 4]

These instructions would load the character "G" into the register al and the character "t" into the register 'bl'.

Data labels like name and name_len are just addresses which point to locations in memory which contain data. Adding square brackets indicates that you're interested in the data at that location in memory, not the address itself.

; Write the start of the response
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, response_start
    mov rdx, response_start_len
    syscall

Now that the user has entered their name, we begin to respond. This system call prints out the string response_start, which is "Hello, ".

; Write the user's name
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, name
    mov rdx, [name_len]
    syscall

Next, we print the name the user entered. Again, notice the square brackets: [name_len].

name_len is an address in memory. It might be something like 0x6001b4 (or wherever the operating system decided to locate it). We don't want to print 0x6001b4 bytes to the console, since there aren't nearly that many available. Instead, we want to look up the value stored at the address 0x6001b4 and print that number of characters. This should be a more reasonable number like 5 or 8, depending on the length of the user's name. So we use the square brackets to indicate this.

The total output so far will look something like this (if your name happens to be Brian):

Hello, Brian

Now we finish up the output:

; Write the end of the response
    mov rax, sys_write
    mov rdi, stdout
    mov rsi, response_end
    mov rdx, response_end_len
    syscall

To finish off the sentence and apply some formatting, we write the string response_end: "!\n" to the console. The exclamation point is added to the end of the name and the newline character \n is for formatting purposes.

; End the program
    mov rax, sys_exit
    mov rdi, success
    syscall

Finally, we end the program here. Type it all into a file called "helloname.asm" and run it with the "run" script:

./run helloname

Enter your name when it prompts you, and you should see something like the following:

Please enter your name: Brian
Hello, Brian
!
0

Okay, not quite what we were going for. Why is the exclamation point on its own line? To troubleshoot the problem, try returning the number of characters entered by the user as the program status code to see how many characters the OS thinks we entered. Change the following:

    mov rdi, success

To this:

    mov rdi, [name_len]

This will report the number of characters we enter as the program status code so we can get some feedback. Make the change, save the file, and rerun it. You should see something more like this:

Please enter your name: Brian
Hello, Brian
!
6

6?! I only typed 5 letters! The thing is, the operating system is including the enter key pressed after typing the name. So for the name "Brian", the actual string we get back is "Brian\n". That extra newline is garbage, it's not part of the data, it's just formatting. We can prevent the newline from being written by subtracting 1 from the value of name_len. Even though the string will still have a newline after it (we can't stop the operating system from including it), we can ignore it by only paying attention to the first 5 characters.

Change the following section:

; Store the number of characters entered by the user
    mov [name_len], rax

To this:

; Store the number of characters entered by the user
    dec rax
    mov [name_len], rax

rax contains the number of characters entered by the user. Before saving that value to name_len for later use, we now decrement that value. This means to subtract 1 from it. The instruction dec rax subtracts 1 from whatever value happens to be in rax. If you entered 6 characters including the enter key, this will change it to 5. If you entered 8, this will change it to 7.

By subtracting 1 from the number of characters we write out, we effectively ignore the last character in the string by printing only the part of the string we care about.

Make the change, save, and rerun. You should now get something like this:

Please enter your name: Brian
Hello, Brian!
5

The formatting is no longer messed up. We're ignoring the last character in the string by printing one fewer than the number of characters the operating system returned. The trailing newline is not printed, so our exclamation point appears on the same line as the name.

Next section: Basic math