From a penetration tester’s perspective, SQL Injection (SQLi) remains one of the most critical attack vectors against web applications. Among these, “UNION-based SQL injection”—which exploits the UNION operator—is a powerful technique that allows attackers to directly extract arbitrary information from within the database.
In this article, we’ll explain how attackers subvert the specification of UNION to compromise systems, and discuss robust countermeasures against it from a design level.
The Basics of UNION and Attack Prerequisites
Fundamentally, the UNION operator is used to combine the results of two or more SELECT statements into a single result set. For example, it is used when you want to list both customer and employee data together.
SELECT first_name, city FROM customers
UNION
SELECT first_name, city FROM employees;
For an attacker to exploit this UNION to exfiltrate data from another table (e.g., a user information table), they must overcome two strict prerequisites:
- Matching Column Count: The number of columns returned by the original query must exactly match the number of columns returned by the injected
UNIONquery. - Data Type Compatibility: The data types of each corresponding column must be compatible.
As a foothold to infiltrate the system, attackers typically begin by identifying (enumerating) the column count and data types of the backend query. They achieve this using techniques like iteratively increasing the ORDER BY clause or injecting NULL values (e.g., UNION SELECT NULL, NULL--).
Real-World Attack Scenario: Data Exfiltration
Consider a vulnerable scenario where an application dynamically constructs an SQL query without properly handling user input.
Vulnerable Backend Query:
SELECT * FROM users WHERE username = '$user_input' AND password = '$password_input'
Suppose the attacker submits the following payload into the username input field:
' UNION SELECT username, password FROM administrators --
If the application expands this input directly via string concatenation, the query is transformed as follows:
SELECT * FROM users WHERE username = '' UNION SELECT username, password FROM administrators -- ' AND password = '$password_input'
Anatomy of the Payload (What is happening?)
username = ''typically refers to a non-existent user, so the firstSELECTreturns an empty result (the goal is to exclude the original query’s results and display only the attacker’s desired data).- The second
SELECTfollowing theUNIONjoins the usernames and passwords from theadministratorstable to the query result, outputting them directly to the frontend. - The
--(or#) at the end of the line is the SQL comment syntax, which nullifies all subsequent query logic (such as the password validation) by treating it as a comment.
In advanced attacks, the attacker then exploits this further by querying information_schema (the table holding database metadata) to enumerate all system structures, table names, and column names across the database, aiming for total system compromise.
Countermeasures: A Defense in Depth Approach
From an attacker’s viewpoint, ad-hoc countermeasures like “sanitizing strings with a blacklist” can easily be bypassed using WAF evasion techniques or encoding abuse. A fundamental solution requires architectural remediation.
1. Mandatory Use of Prepared Statements (Parameterized Queries)
This is the most critical and reliable defense. It completely separates the structure of the SQL query (logic) from the user input (data). By doing so, it physically eliminates the possibility of any malicious input data being interpreted as an executable SQL command. Ensure the proper use of PDO (in PHP) or the ORM provided by your language/framework.
2. Whitelist-Based Input Validation
Verify on the server side that user input strictly matches expected data types, lengths, and formats (e.g., numbers only, alphanumeric only, etc.).
3. Application of the Principle of Least Privilege (PoLP)
Restrict the database account used by the application to connect to the database so that it holds only the minimum necessary privileges.
- If the feature only requires reading data, do not grant
UPDATEorDELETEpermissions. - Restrict access to system tables (like
information_schema) or other databases. This localizes the damage in the unlikely event that an SQLi attack is successful.
4. Implementation of a Web Application Firewall (WAF)
This functions as an “added layer of defense” to mitigate attacks. While it can detect and block signatures specific to SQL injection, it remains a mitigation measure and cannot replace secure coding practices.
Conclusion: Utilizing Hacker Mindset for Defense
Although the principle behind UNION-based SQL injection is simple, it is a devastating attack technique that leads directly to the complete exposure of confidential information. Understanding the penetration tester’s perspective—“how an attacker attempts to compromise the system”—is extremely valuable for developers implementing defenses. Enforce secure design and coding best practices to keep your systems robust.
