Command Injection
Forcing commands to run
Paul Krzyzanowski
October 1, 2020
We looked at buffer overflow and printf format string attacks that enable the modification of memory contents to change the flow of control in the program and, in the case of buffer overflows, inject executable binary code (machine instructions). Other injection attacks enable you to modify inputs used by command processors, such as interpreted languages or databases. We will now look at these attacks.
SQL Injection
It is common practice to take user input and make it part of a database query. This is particularly popular with web services, which are often front ends for databases. For example, we might ask the user for a login name and password and then create a SQL query:
sprintf(buf,
”SELECT * from logininfo WHERE username = '%s' AND password = '%s’;",
uname, passwd);
Suppose that the user entered this for a password:
' OR 1=1 --
We end up creating this query string[1]:
SELECT * from logininfo WHERE username = 'paul' AND password = '' OR 1=1 -- ';
The “--” after “1=1” is a SQL comment, telling it to ignore everything else on the line. In SQL, OR operations have precendence over AND so the query checks for a null password (which the user probably does not have) or the condition 1=1, which is always true. In essence, the user’s “password” turned the query into one that ignores the user’s password and unconditionally validates the user.
Statements such as this can be even more destructive as the user can use semicolons to add multiple statements and perform operations such as dropping (deleting) tables or changing values in the database.
This attack can take place because the programmer blindly allowed user input to become part of the SQL command without validating that the user data does not change the quoting or tokenization of the query. A programmer can avoid the problem by carefully checking the input. Unfortunately, this can be difficult. SQL contains too many words and symbols that may be legitimate in other contexts (such as passwords) and escaping special characters, such as prepending backslashes or escaping single quotes with two quotes can be error prone as these escapes differ for different database vendors. The safest defense is to use parameterized queries, where user input never becomes part of the query but is brought in as parameters to it. For example, we can write the previous query as:
uname = getResourceString("username");
passwd = getResourceString("password");
query = "SELECT * FROM users WHERE username = @0 AND password = @1";
db.Execute(query, uname, passwd);
A related safe alternative is to use stored procedures. They have the same property that the query statement is not generated from user input and parameters are clearly identified.
While SQL injection is the most common code injection attack, databases are not the only target. Creating executable statements built with user input is common in interpreted languages, such as Shell, Perl, PHP, and Python. Before making user input part of any invocable command, the programmer must be fully aware of parsing rules for that command interpreter.
Shell attacks
The various POSIX[2] shells (sh, csh, ksh, bash, tcsh, zsh) are commonly used as scripting tools for software installation, start-up scripts, and tying together workflow that involves processing data through multiple commands. A few aspects of how many of the shells work and the underlying program execution environment can create attack vectors.
system() and popen() functions
Both system and popen functions are part of the Standard C Library and are common functions that C programmers use to execute shell commands. The system function runs a shell command while the popen function also runs the shell command but allows the programmer to capture its output and/or send it input via the returned FILE pointer.
Here we again have the danger of turning improperly-validated data into a command. For example, a program might use a function such as this to send an email alert:
char command[BUFSIZE];
snprintf(command, BUFSIZE, "/usr/bin/mail –s \"system alert\" %s", user);
FILE *fp = popen(command, "w");
In this example, the programmer uses snprintf to create the complete command with the desired user name into a buffer. This incurs the possibility of an injection attack if the user name is not carefully validated. If the attacker had the option to set the user name, she could enter a string such as:
nobody; rm -fr /home/*
which will result in popen running the following command:
sh -c "/usr/bin/mail -s \"system alert\" nobody; rm -fr /home/*"
which is a sequence of commands, the latter of which deletes all user directories.
Other environment variables
The shell PATH environment variable controls how the shell searches for commands. For instance, suppose
PATH=/home/paul/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games
and the user runs the ls
command. The shell will search through the PATH sequentially
to find an executable filenamed ls
:
/home/paul/bin/ls
/usr/local/bin/ls
/usr/sbin/ls
/usr/bin/ls
/bin/ls
/usr/local/games/ls
If an attacker can either change a user’s PATH environment variable or if one of the
paths is publicly writable and appears before the “safe” system directories,
then he can add a booby-trapped command in one of those directories. For example,
if the user runs the ls command, the shell may pick up a booby-trapped version
in the /usr/local/bin
directory. Even if a user has trusted locations, such
as /bin and /usr/bin foremost in the PATH, an intruder may place a misspelled
version of a common command into another directory in the path. The safest remedy
is to make sure there are no untrusted directories in PATH.
Some shells allow a user to set an ENV or BASH_ENV variable that contains the name of a file that will be executed as a script whenever a non-interactive shell is started (when a shell script is run, for example). If an attacker can change this variable then arbitrary commands may be added to the start of every shell script.
Shared library environment variables
In the distant past, programs used to be fully linked, meaning that all the code needed to run the program, aside from interactions with the operating system, was part of the executable program. Since so many programs use common libraries, such as the Standard C Library, they are not compiled into the code of an executable but instead are dynamically loaded when needed.
Similar to PATH, LD_LIBRARY_PATH is an environment variable used by the operating system’s program loader that contains a colon-separated list of directories where libraries should be searched. If an attacker can change a user’s LD_LIBRARY_PATH, common library functions can be overwritten with custom versions. The LD_PRELOAD environment variable allows one to explicitly specify shared libraries that contain functions that override standard library functions.
LD_LIBRARY_PATH and LD_PRELOAD will not give an attacker root access but they can be used to change the behavior of program or to log library interactions. For example, by overwriting standard functions, one may change how a program generates encryption keys, uses random numbers, sets delays in games, reads input, and writes output.
As an example, let’s suppose we have a trial program that checks the current time against a hard-coded expiration time:
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
unsigned long expiration = 1483228800;
time_t now;
/* check software expiration */
now = time(NULL);
if (time(NULL) > (time_t)expiration) {
fprintf(stderr, "This software expired on %s", ctime(&expiration));
fprintf(stderr, "This time is now %s", ctime(&now));
}
else
fprintf(stderr, "You're good to go: %lu days left in your trial.\n",
(expiration-now)/(60*60*24));
return 0;
}
When run, we may get output such as:
$ ./testdate
This software expired on Sat Dec 31 19:00:00 2016
This time is now Sun Feb 18 15:50:44 2018
Let us write a replacement time function that always returns a fixed value
that is less than the one we test for. We’ll put it in a file called time.c
:
unsigned long time() {
return (unsigned long) 1483000000;
}
We compile it into a shared library:
gcc -shared -fPIC time.c -o newtime.so
Now we set LD_PRELOAD and run the program:
$ export LD_PRELOAD=$PWD/newtime.so
$ ./testdate
You're good to go: 2 days left in your trial.
Note that our program now behaves differently and we never had to recompile it or feed it different data!
Input sanitization
The important lesson in writing code that uses any user input in forming commands is that of input sanitization. Input must be carefully validated to make sure it conforms to the requirements of the application that uses it and does not try to execute additional commands, escape to a shell, set malicious environment variables, or specify out-of-bounds directories or devices.
File descriptors
POSIX systems have a convention that programs expect to receive three open file descriptors when they start up:
file descriptor 0: standard input
file descriptor 1: standard output
file descriptor 2: standard error
Functions such as printf, scanf, puts, getc and others expect these file desciptors to be available for input and output. When a program opens a new file, the operating system searches through the file descriptor table and allocates the first available unused file descriptor. Typically this will be file descriptor 3. However, if any of the three standard file descriptors are closed, the operating system will use one of those as an available, unused file descriptor.
The vulnerability lies in the fact that we may have a program running with elevated privileges (e.g., setuid root) that modifies a file that is not accessible to regular users. If that program also happens to write to the user via, say, printf, there is an opportunity to corrupt that file. The attacker simply needs to close the standard output (file descriptor 1) and run the program. When it opens its secret file, it will be given file descriptor 1 and will be able to do its read and write operations on the file. However, whenever the program will print a message to the user, the output will not be seen by the user as it will be directed to what printf assumes is the standard output: file descriptor 1. Printf output will be written onto the secret file, thereby corrupting it.
The shell command (bash, sh, or ksh) for closing the standard output file is
an obscure-looking >&-
. For example:
./testfile >&-
Comprehension Errors
The overwhelming majority of security problems are caused by bugs or misconfigurations. Both often stem from comprehension errors. These are mistakes created when someone – usually the programmer or administrator – does not understand the details and every nuance of what they are doing. Some example include:
Not knowing all possible special characters that need escaping in SQL commands.
Not realizing that the standard input, output, or error file descriptors may be closed.
Not understanding how access control lists work or how to configure mandatory access control mechanisms such as type enforcement correctly.
If we consider the Windows CreateProcess function, we see it is defined as:
BOOL WINAPI CreateProcess(
_In_opt_ LPCTSTR lpApplicationName,
_Inout_opt_ LPTSTR lpCommandLine,
_In_opt_ LPSECURITY_ATTRIBUTES lpProcessAttributes,
_In_opt_ LPSECURITY_ATTRIBUTES lpThreadAttributes,
_In_ BOOL bInheritHandles,
_In_ DWORD dwCreationFlags,
_In_opt_ LPVOID lpEnvironment,
_In_opt_ LPCTSTR lpCurrentDirectory,
_In_ LPSTARTUPINFO lpStartupInfo,
_Out_ LPPROCESS_INFORMATION lpProcessInformation);
We have to wonder whether a programmer who does not use this frequently will take the time to understand the ramifications of correctly setting process and thread security attributes, the current directory, environment, inheritance handles, and so on. There’s a good chance that the programmer will just look up an example on places such as github.com or stackoverflow.com and copy something that seems to work, unaware that there may be obscure side effects that compromise security.
As we will see in the following sections, comprehension errors also apply to the proper understanding of things as basic as various ways to express characters.
Directory parsing
Some applications, notably web servers, accept hierarchical filenames from a
user but need to ensure that they restrict access only to files within a specific
point in the directory tree. For example, a web server may need to ensure that
no page requests go outside of /home/httpd/html
.
An attacker may try to gain access by using paths that include ..
(dot-dot), which
is a link to the parent directory. For example, an attacker may try to download
a password file by requesting
http://poopybrain.com/../../../etc/passwd
The hope is that the programmer did not implement parsing correctly and might try simply suffixing the user-requested path to a base directory:
"/home/httpd/html/" + "../../../etc/passwd"
to form
/home/httpd/html/../../../etc/passwd
which will retrieve the password file, /etc/passwd
.
A programmer may anticipate this and check for dot-dot but has to realize that dot-dot directories can be anywhere in the path. This is also a valid pathname but one that should be rejected for trying to escape to the parent:
http://poopybrain.com/419/notes/../../416/../../../../etc/passwd
Moreover, the programmer cannot just search for ..
because that can be a valid part of
a filename. All three of these should be accepted:
http://poopybrain.com/419/notes/some..other..stuff/
http://poopybrain.com/419/notes/whatever../
http://poopybrain.com/419/notes/..more.stuff/
Also, extra slashes are perfectly fine in a filename, so this is acceptable:
http://poopybrain.com/419////notes///////..more.stuff/
The programmer should also track where the request is in the hierarchy. If dot-dot doesn’t escape above the base directory, it should most likely be accepted:
http://poopybrain.com/419/notes/../exams/
These are not insurmountable problems but they illustrate that a quick-and-dirty attempt at filename processing may be riddled with bugs.
Unicode parsing
If we continue on the example of parsing pathnames in a web server, let us consider a bug in early releases of Microsoft’s IIS (Internet Information Services, their web server). IIS had proper pathname checking to ensure that attempts to get to a parent are blocked:
http://www.poopybrain.com/scripts/../../winnt/system32/cmd.exe
Once the pathname was validated, it was passed to a decode function that decoded any embedded Unicode characters and then processed the request.
The problem with this technique was that non-international characters (traditional ASCII) could also be written as Unicode characters. A “/” could also be written in HTML as its hexadecimal value, %2f (decimal 47). It could also be represented as the two-byte Unicode sequence %c0%af.
The reason for this stems from the way Unicode was designed to support compatibility with one-byte ASCII characters. This encoding is called UTF–8. If the first bit of a character is a 0, then we have a one-byte ASCII character (in the range 0..127). However, if the first bit is a 1, we have a multi-byte character. The number of leading 1s determine the number of bytes that the character takes up. If a character starts with 110, we have a two-byte Unicode character.
With a two-byte character, the UTF–8 standard defines a bit pattern of
110a bcde 10fg hijk
The values a-k above represent 11 bits that give us a value in the range 0..2047.
The “/” character, 0x2f, is 47 in decimal and 0010 1111
in binary. The
value represents offset 47 into the character table (called codepoint in Unicode parlance).
Hence we can represent the “/” as 0x2f or as the two byte Unicode sequence:
1100 0000 1010 1111
which is the hexadecimal sequence %c0%af. Technically, this is disallowed. The standard states that codepoints less than 128 must be represented as one byte but the two byte sequence is supported by most Unicode parsers. We can also construct a valid three-byte sequence too.
Microsoft’s bug was that they ignored parsing %c0%af as being equivalent to a /
because
it should not have been used to represent the character. However, the Unicode parser
was happy to translate it and attackers were able to use this to access any file in
on a server running IIS. This bug also gave attackers the ability to invoke cmd.com
, the
command interpreter, and execute any commands on the server.
After Microsoft fixed the multi-byte Unicode bug, another problem came up. The parsing of escaped characters was recursive, so if the resultant string looked like a Unicode hexadecimal sequence, it would be re-parsed.
As an example of this, let’s consider the backslash (\
), which Microsoft treats
as equivalent to a slash (/
) in URLs since their native pathname separator is
a backlash[3].
The backslash can be written in a URL in hexadecimal format as %5c.
The “%” character can be expressed as %25.
The “5” character can be expressed as %35.
The “c” character can be expressed as %63.
Hence, if the URL parser sees the string %%35c
, it would expand the %35
to the character “5”, which would result in %5c
, which would then be converted to a \
.
If the parser sees %25%35%63
, it would expand each of the %nn
components to get the string %5c
, which would then be converted to a \
.
As a final example, if the parser comes across %255c
, it will expand %25
to %
to get the string %5c
, which would then be converted to a \
.
It is not trivial to know what a name relates to but it is clear that all conversions have to be done before the validity of the pathname is checked. As for checking the validity of the pathname in an application, it is error-prone. The operating system itself parses a pathname a component at a time, traversing the directory tree and checking access rights as it goes along. The application is trying to recreate a similar action without actually traversing the file system but rather by just parsing the name and mapping it to a subtree of the file system namespace.
TOCTTOU attacks
TOCTTOU stands for Time of Check to Time of Use. If we have code of the form:
if I am allowed to do something
then do it
we may be exposing ourselves to a race condition. There is a window of time between the test and the action. If an attacker can change the condition after the check then the action may take place even if the check should have failed.
One example of this is the print spooling program, lpr. It runs as a setuid program with root privileges so that it can copy a file from a user’s directory into a privileged spool directory that serves as a queue of files for printing. Because it runs as root, it can open any file, regardless of permissions. To keep the user honest, it will check access permissions on the file that the user wants to print and then, only if the user has legitimate read access to the file, it will copy it over to the spool directory for printing. An attacker can create a link to a readable file and then run lpr in the background. At the same time, he can change the link to point to a file for which he does not have read access. If the timing is just perfect, the lpr program will check access rights before the file is re-linked but will then copy the file for which the user has no read access.
Another example of the TOCTTOU race condition is the set of temporary filename creation functions (tempnam, tempnam, mktemp, GetTempFileName, etc.). These functions create a unique filename when they are called but there is no guarantee that an attacker doesn’t create a file with the same name before that filename is used. If the attacker creates and opens a file with the same name, she will have access to that file for as long as it is open, even if the user’s program changes access permissions for the file later on.
The best defense for the temporary file race condition is to use the mkstemp function, which creates a file based on a template name and opens it as well, avoiding the race condition between checking the uniqueness of the name and opening the file.
-
Note that sprintf is vulnerable to buffer overflow. We should use snprintf, which allows one to specify the maximum size of the buffer. ↩
-
Unix, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Android, etc. ↩
-
the official Unicode name for the slash and backslash characters are solidus and reverse solidus, respectively. ↩