SRLabs research found a significant patch gap in the Android patch ecosystem, which has since been shrunk. In our hunt for more missing patches, our SnoopSnitch app now detects significantly more potential vulnerabilities by analyzing Java bytecode.
The Android patch gap, which SRLabs found during previous research, is shrinking. But it is not closed. To keep track of the patch gap SRLabs created the Android app SnoopSnitch, which enables users to test whether their phone is missing security patches. In earlier versions of SnoopSnitch, patch testing was limited to Android components written in C and C++. The recent SnoopSnitch version now also features support for detecting missing patches in Android Java components. Integrating these Java patch tests doubled SnoopSnitch’s coverage since more and more vulnerabilities are found in Android components written in Java.
This blog post discusses in detail how missing patches in Java code can be identified. The key is to dive deep into the bytecode to create unique signatures of Java code/methods and then classify these signatures as patched/unpatched. These fingerprints enable SnoopSnitch to verify automatically whether a phone is patched or unpatched. The potential of these heuristics is not limited to Android. While there are differences between Dalvik bytecode and other Java bytecode, our results can be a starting point into exploring how stable signatures can be created for Java bytecode outside of Android.
Android requires regular patching. Android is the most popular mobile OS with over 2.8 billion users worldwide and a market share of 75%. To keep the system secure, regular patching is necessary and new patches are released every month. The monthly Android Security bulletin contains information on current vulnerabilities (CVEs) and links to available patches. Patches are provided by Google and other vendors and then distributed to the affected phones.
The Android security patch process is complex since the Android patch ecosystem is fragmented and patch responsibility lies in the hands of the individual vendors. We looked into the Android patch gap during previous research and even though the situation has significantly improved the patch gap will not go away. The two main reasons are: Different Android versions are in use simultaneously. And Android is an open-source operating system which is modified by different vendors. Therefore, vendors often need to adapt patches for their modified versions.
Keep in mind that a missing Android patch does not automatically equal an exploitable vulnerability. The lion’s share of Android exploitation is based on social engineering and malware. Nevertheless, to avoid known vulnerabilities being used to exploit Android devices, users should ensure that they are patched completely.
This is where SnoopSnitch helps, which compares the actual patch situation to the claimed patch level of a device and flags any hidden patching gap.
Android is mainly written in C/C++ and Java–for the former we already came up with a patch test solution and presented it at HITB 2018.
However, in the past couple of years many of the vulnerabilities fixed in the monthly Android security patches have been in Android Java components and our data shows that more missing patches are written in Java than in C/C++.
Patch test heuristics depend on the compiled language, thus the patch test method differs between e.g., C/C++ and Java tests. We developed techniques to test Java patches.
Java code is compiled to bytecode–an intermediate binary form which has to be either further compiled or interpreted to be executed. The bytecode is packaged into a .dex file, an Android-specific file format.
For a manual analysis it is possible to decompile the bytecode for a given DEX file and check whether the patch has been applied. However, this method is not suitable for running automated tests for a variety of reasons:
Our solution, based on creating custom signatures of patched code, creates a robust way of testing whether a patch has been applied. We draw on what we have done for detecting patches in C and C++ compiled code in the past. Both approaches are based on the same principle: If a signature has been classified as patched/unpatched and we find the same signature on the device being tested, we can directly conclude that the device is also patched/unpatched.
1/ Identify changes. We start by identifying unique changes in published patches. This could be a newly introduced function name in an already existing class. If that function name is at that specific (compiled) part of the component, the patch must have been applied.
2/ Parse DEX. After identifying unique changes in patches, we can look for them inside the bytecode. The first step to be able to do so is to understand the DEX file format, which the bytecode is part of. Every file format follows a well-defined structure that allows finding specific information in specific locations.
Let us say we want to find the code of the function establishConnection, which is part of the class ConnectionManager. Figure 1 illustrates the process of traversing the file with the goal of first identifying the class, then the method, and lastly the code inside the method so we can create a signature.
3/ Create signatures. After locating the bytecode in the file, the next step is to create a unique signature of it. A simple hash of the bytecode will not do the job since many parts (e.g., calls to other methods) will reference parts of the DEX file with some kind of index like references to strings or references to other methods/classes. This index will vary between different firmware images due to minimal changes somewhere else in the codebase.
Instead, we need to include in the signature only the relevant instructions, while leaving out volatile values. This results in robust signatures that handle volatile parts appropriately. The relevant instructions are documented in the Android Open-Source Project: Dalvik bytecode and Dalvik Executable instruction formats. Both websites must be used in conjunction since information is spread across them, e.g., the instructions are defined in the first and the instruction format in the second (see Figure 2).
We concluded that we need to handle bytecode instructions and therefore need to understand their structure. Let us look into one specific example: Loading a string to a register using the const-string instruction.
Each Java bytecode instruction is identified by a unique number–the opcode. Let us take the instruction with opcode 0x1a as an example (Figure 3).
Figure 2 shows that the respective instruction format is 21c. Using that information, we know that this instruction has two operands: A 8-bit destination register and a 16-bit string index, which is an offset in the strings_ids section of the DEX file (see Figure 1). This means the actual string value is located in the strings_ids section and you use the numeric string index to locate it.
If another string in the same DEX file is added or removed, for example through other patches, the numeric string index will likely change. Therefore, it is not possible to include this string index in the signature. However, it is possible to extract the string value from the DEX file and add this value to the signature (instead of the numeric string index). This will only change if the actual value is changed and therefore can be considered a valid part of the signature.
A similar logic is needed for a variety of other opcodes. For example, for creating a new instance of an object (opcode 0x22: new-instance instruction), the class name is hashed instead of the numeric index pointing to the class.
Creating signatures by removing volatile indices and then hashing the bytecode is a big step towards stable signatures. But that is not yet a complete solution due to another source of volatility: Android resource identifiers.
Resource identifiers are a convenient way for managing strings in Android applications. The identifiers are separated from the application logic code and placed in an XML file. Developers can assign each string a variable, which allows referencing a string using something like getString(R.string.hello_world). The advantage is that the code is more readable, and translations to different languages will automatically be selected depending on the system language.
All resources (strings are just one example) are assigned a numeric resource identifier by the build system at compile time. These identifiers are volatile by nature. To mitigate this, resource identifiers need to be excluded from the signature. However, there is no special opcode to load a resource identifier. Loading a resource identifier will pretty much always use the opcode 0x14 (CONST) but this opcode is not reserved for resource identifiers and will also be used for other purposes.
We could just exclude all integer constants loaded with 0x14 (CONST), but that could lead to incorrect matches of signatures when loading other numeric values. Additionally, in some cases a patch will only change some numeric values such as flags and if you exclude all integer constants you will not be able to detect these patches.
The solution lies in being able to recognize resource identifiers. We create and use heuristics. The resource identifier is generated automatically by the Android build system, and it is of the form 0xPPTTNNNN with certain constraints:
Combining these restrictions allows creating a heuristic, which detects practically all valid resource identifiers while only having a relatively low number of false positives (i.e., matches for other 32-bit integers loaded with the same CONST opcode).
If the heuristics detect that a loaded number is likely a resource identifier, the number will be removed from the signature. This assures that different binaries compiled from the same source code (but with different numeric resource identifiers assigned by the build system) will match the same signature.
SRLabs devised an approach to create and detect signatures for Java bytecode in order to analyze the potential security patch gap in Android's Java components. We looked into the details of the DEX file format and the bytecode instruction set. Since some parts of the bytecode are volatile, we had to exclude them from the signature. Heuristics can help to navigate this problem. The Java patch tests that we create based on this logic are now part of our app SnoopSnitch and doubled the coverage of the patch level analysis.
Despite the differences between Dalvik bytecode and other Java bytecode further research should investigate the possibility of a transfer and adaption of these heuristics for the later. Unique patch signatures could e.g., be used to ensure that Log4j is patched in popular Java libraries.
The latest SnoopSnitch version is available on the Play Store. Alternatively, you can download it from our project site. You can report issues on our GitHub to help us to improve our latest version.
Disclaimer: Publicly available firmware form the basis for our test creation. There can be limits of our Java tests due to customizations of Android by different vendors. As of May 2022, SnoopSnitch does not yet include patch tests applicable for Android 12L, since it has just been released. We add tests regularly.
Editing by: Maria Bühner