Reverse Engineering Yaesu FT-70D Firmware Encryption

This article highlights my complete methodology for reverse engineering the tools mentioned in this article. It’s a bit long but is intended to be accessible to those who are not necessarily advanced reverse-engineers.

Click on any image to view it at its original resolution.

# background

Ham radios are a fun way to learn how the radio spectrum works, and more importantly: they’re embedded devices that can run weird chips/firmware! I was curious how easy it would be to hack my Yaesu FT-70D, so I started doing some research. The only existing resource I could find for Yaesu radios was someone who posted about custom firmware for his Yaesu FT1DR.

The Reddit poster mentions that if you go through the firmware update process via USB, the radio exposes its Renesas H8SX microcontroller and its flash can be modified using the Renesas SDK. It was a great start and looked promising, but the SDK wasn’t easy to configure and I wasn’t sure it could even dump firmware… so I didn’t use it for very long.

#other ways

Yesu offers a Windows application on its website that can be used to update the radio’s firmware over USB:

firmware page.04f1c2d731a66b56

The zip contains the following files:

1.2 MB  Wed Nov  8 14:34:38 2017  FT-70D_ver111(USA).exe
682 KB  Tue Nov 14 00:00:00 2017  FT-70DR_DE_Firmware_Update_Information_ENG_1711-B.pdf
8 MB  Mon Apr 23 00:00:00 2018  FT-70DR_DE_MAIN_Firmware_Ver_Up_Manual_ENG_1804-B.pdf
3.2 MB  Fri Jan  6 17:54:44 2012  HMSEUSBDRIVER.exe
160 KB  Sat Sep 17 15:14:16 2011  RComms.dll
61 KB  Tue Oct 23 17:02:08 2012  RFP_USB_VB.dll
1.7 MB  Fri Mar 29 11:54:02 2013  vcredist_x86.exe

I’m assuming the file specific to the FT-70D, “FT-70D_ver111(USA).exe” will likely contain our firmware image. A PE file (.exe) can contain binary resources .rsrc Section – Let’s use XPEViewer to see what’s in this file:

Resources fit into one of several different resource types, but a firmware image will likely be placed in a custom type. What is this last entry, “23”? Expanding that node we have some interesting things:

start update.fd44b49d8f5a99d8

RES_START_DIALOG There is a custom string that the updater shows when preparing an update, so we are in the right area!

res update info.800caa6b65c3a1dc

RES_UPDATE_INFO It seems like it’s just binary data – maybe it’s our firmware image? Unfortunately viewing or running the “Strings” tab in XPEViewer strings The utility doesn’t get anything legible on this data. The firmware image is probably encrypted.

#reverse engineer the binary

Let’s load the update utility into your disassembler of choice to find out how the data is encrypted. I would use IDA Pro, but Ghidra (free!), Radar2 (free!), or Binary Ninja are all great options. Where possible in this article I will try to show my rewritten code in C as it will be closer to the decompiler and machine code output.

A good starting point is the string we looked at above, RES_UPDATE_INFOWindows applications load resources by calling one of these FindResource* API. FindResourceA Have the following parameters:

  1. HMODULEA handle to the module to view the resource.
  2. lpNameresource name.
  3. lpTypeResource Type.

In our disassembler we can find a reference to RES_UPDATE_INFO string and find call FindResourceA with this string as an argument lpName Post.

We find a match in a function that happens to find/load All Under these custom resource types 23,

load resource decompiler output.ed358a98d7f4ab3b

We know where the data is loaded by the application, so now we need to look at how it is used. Doing static analysis from this point forward may be more work than it’s worth if the data is not operated on immediately. To speed things up I’m going to use the help of the debugger. I used WinDbg’s time travel debugging to record the execution trace of the updater while updating my radio. TTD is an invaluable tool and I would highly recommend using it whenever possible. RR is an option for non-Windows platforms.

The decompiler output shows that this function copies RES_UPDATE_INFO Resources for dynamically allocated buffers. qmemcpy() is inlined and represented by a rep movsd Instructions in disassembly, so we need to break down this instructable and examine it edi The value of the register (destination address). I set a breakpoint by typing bp 0x406968 In the command window, leave the application running and we can see when it breaks edi register value is 0x2be5020Now we can set a memory access breakpoint at this address ba r4 0x2be5020 Break whenever this data is read.

Our breakpoint has been hit 0x4047DC – Go back to the disassembler. In IDA you can press G And enter this address to visit it. We end up with what the data processing function looks like:

deobfuscate function.70d489a87307371e

We broke while dereferencing v2 And IDA automatically names the variable it’s being assigned to. Time, Time The variable is passed to another function which formats it as a string. %Y%m%d%H%M%SLet’s clean up the variables to reflect what we know:

1bool __thiscall sub_4047B0(char *this)
2 (2
61 * (v11[1]

Timestamp string is passed sub_4082c0 on line 20 and the remainder of the updated image is passed sub_408350 On line 21. I’m going to concentrate sub_408350 Since I only care about the firmware data right now and depending on how this function is called I bet its signature will be something like this:

status_t sub_408350(uint8_t *input, size_t input_len, uint8_t *output, output_len, size_t *out_data_processed);

Let’s see what it does:

1int __stdcall sub_408350(char *a1, int a2, int a3, int a4, _DWORD *a5)
2{
3 int v5; // edx
4 int v7; // ebp
5 int v8; // esi
6 unsigned int i; // ecx
7 char v10; // al
8 char *v11; // eax
9 int v13; // [esp+10h] [ebp-54h]
10 char v14[64]; // [esp+20h] [ebp-44h] BYREF
11
12 v5 = a2;
13 v7 = 0;
14 memset(v14, 0, sizeof(v14));
15 if ( a2 <= 0 )
16 (2
70 * (idata[2]
21 else
22 (2 * *(idata - 1))))))))))))));
72 idata += 8;
73 if ( !block_size )
74 goto LABEL_12;
75
68}

I think we found our function that starts decrypting the firmware! To confirm, we want to see if output This is what the parameter’s data looks like before and after calling this function. I set a breakpoint in the debugger at the address where it is called (bp 0x404842) and enter the value of edi register (0x2d7507c) in WinDbg’s memory window.

Here is the earlier data:

data before.b3909a2025b89ba1

After stepping on the function call:

data after.7ab8bc7523fb6c44

We can dump this data into a file using the following command:

.writemem C:\users\lander\documents\maybe_deobfuscated.bin 0x2d7507c L100000

There is a built-in strings utility in the 010 editor (Search > Find Strings…) and if we scroll down a bit in the results, we have the actual strings that appear in my radio!

At this point if we are only interested in getting the plaintext firmware we can stop messing with the binary and load the firmware into IDA Pro… but I would like to know how this encryption works.

#Encryption details

Just to recap from the last section:

  • We have identified our data processing routine (let’s call this function). decrypt_update_info,
  • We know that the first 4 bytes of the updated data is a Unix timestamp which is formatted as a string and used for unknown purpose.
  • We know which function starts decrypting our firmware image.

# data decryption

Let’s look at the firmware image decryption routine with some named variables:

1int __thiscall decrypt_data(
2 void *this,
3 char *encrypted_data,
4 int encrypted_data_len,
5 char *output_data,
6 int output_data_len,
7 _DWORD *bytes_written)
8{
9 int data_len; // edx
10 int output_index; // ebp
11 int block_size; // esi
12 unsigned int i; // ecx
13 char encrypted_byte; // al
14 char *idata; // eax
15 int remaining_data; // [esp+10h] [ebp-54h]
16 char inflated_data[64]; // [esp+20h] [ebp-44h] BYREF
17
18 data_len = encrypted_data_len;
19 output_index = 0;
20 memset(inflated_data, 0, sizeof(inflated_data));
21 if ( encrypted_data_len <= 0 )
22 {
23LABEL_13:
24 *bytes_written = output_index;
25 return 0;
26 }
27 else
28 {
29 while ( 1 )
30 {
31 block_size = data_len;
32 if ( data_len >= 8 )
33 block_size = 8;
34 remaining_data = data_len - block_size;
35
36 // inflate 1 byte of input data to 8 bytes of its bit representation
37 for ( i = 0; i < 0x40; i += 8 )
38 {
39 encrypted_byte = *encrypted_data;
40 inflated_data[i] = (unsigned __int8)*encrypted_data >> 7;
41 inflated_data[i + 1] = (encrypted_byte & 0x40) != 0;
42 inflated_data[i + 2] = (encrypted_byte & 0x20) != 0;
43 inflated_data[i + 3] = (encrypted_byte & 0x10) != 0;
44 inflated_data[i + 4] = (encrypted_byte & 8) != 0;
45 inflated_data[i + 5] = (encrypted_byte & 4) != 0;
46 inflated_data[i + 6] = (encrypted_byte & 2) != 0;
47 inflated_data[i + 7] = encrypted_byte & 1;
48 ++encrypted_data;
49 }
50 // do something with the inflated data
51 sub_407980(this, inflated_data, 0);
52 if ( block_size )
53 break;
54LABEL_12:
55 if ( remaining_data <= 0 )
56 goto LABEL_13;
57 data_len = remaining_data;
58 }
59 // deflate the data back to bytes
60 idata = &inflated_data[1];
61 while ( 1 )
62 {
63 --block_size;
64 if ( output_index >= output_data_len )
65 return -101;
66 output_data[output_index++] = idata[6] | (2
67 * (idata[5] | (2
68 * (idata[4] | (2
69 * (idata[3] | (2
70 * (idata[2] | (2
71 * (idata[1] | (2 * (*idata | (2 * *(idata - 1))))))))))))));
72 idata += 8;
73 if ( !block_size )
74 goto LABEL_12;
75 }
76 }
77}

This routine at a high level:

  1. allocates a 64-byte scratch buffer
  2. Checks if there is any data to process. If not, set the output variable out_data_processed Number of bytes processed and return 0x0 (STATUS_SUCCESS,
  3. Loop the input data into 8-byte chunks and extend each byte to its bit representation.
  4. After the 8-byte segment is inflated, call sub_407980 with scratch buffer and 0 As an argument.
  5. Loop over the scratch buffer and reassemble the 8 sequential bits as 1 byte, then set the byte to the appropriate index in the output buffer.

There’s a lot going on here, but let’s take a look at step #3. If we take bytes 0xAA And 0x77 which is slightly represented 0b1010_1010 And 0b0111_1111 Sequentially and using the algorithm above to inflate them into a 16-byte array, we end up with:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |    | 8 | 9 | A | B | C | D | E | F |
|---|---|---|---|---|---|---|---|----|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |    | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |

This routine does this process over 8 bytes at a time and completely fills the 64-byte scratch buffer with 1’s and 0’s like the table above.

Now let’s look at step #4 and see what’s happening sub_407980,

1_BYTE *__thiscall sub_407980(void *this, _BYTE *a2, int a3)
2{
3 // long list of stack vars removed for clarity
4
5 v3 = (int)this;
6 v4 = 15;
7 v5 = a3;
8 v32[0] = (int)this;
9 v28 = 0;
10 v31 = 15;
11 do
12 {
13 for ( i = 0; i < 48; *((_BYTE *)&v33 + i + 3) = v18 )
14 {
15 v7 = v28;
16 if ( !v5 )
17 v7 = v4;
18 v8 = *(_BYTE *)(i + 48 * v7 + v3 + 4) ^ a2[(unsigned __int8)byte_424E50[i] + 31];
19 v9 = v28;
20 *(&v34 + i) = v8;
21 if ( !v5 )
22 v9 = v4;
23 v10 = *(_BYTE *)(i + 48 * v9 + v3 + 5) ^ a2[(unsigned __int8)byte_424E51[i] + 31];
24 v11 = v28;
25 *(&v35 + i) = v10;
26 if ( !v5 )
27 v11 = v4;
28 v12 = *(_BYTE *)(i + 48 * v11 + v3 + 6) ^ a2[(unsigned __int8)byte_424E52[i] + 31];
29 v13 = v28;
30 *(&v36 + i) = v12;
31 if ( !v5 )
32 v13 = v4;
33 v14 = *(_BYTE *)(i + 48 * v13 + v3 + 7) ^ a2[(unsigned __int8)byte_424E53[i] + 31];
34 v15 = v28;
35 v38[i - 1] = v14;
36 if ( !v5 )
37 v15 = v4;
38 v16 = *(_BYTE *)(i + 48 * v15 + v3 + 8) ^ a2[(unsigned __int8)byte_424E54[i] + 31];
39 v17 = v28;
40 v38[i] = v16;
41 if ( !v5 )
42 v17 = v4;
43 v18 = *(_BYTE *)(i + 48 * v17 + v3 + 9) ^ a2[(unsigned __int8)byte_424E55[i] + 31];
44 i += 6;
45 }
46 v32[1] = *(int *)((char *)&dword_424E80
47 + (((unsigned __int8)v38[0] + 2) | (32 * v34 + 2) | (16 * (unsigned __int8)v38[1] + 2) | (8 * v35 + 2) | (4 * v36 + 2) | (2 * v37 + 2)));
48 v32[2] = *(int *)((char *)&dword_424F80
49 + (((unsigned __int8)v38[6] + 2) | (32 * (unsigned __int8)v38[2] + 2) | (16
50 * (unsigned __int8)v38[7]
51 + 2) | (8
52 * (unsigned __int8)v38[3]
53 + 2) | (4 * (unsigned __int8)v38[4] + 2) | (2 * (unsigned __int8)v38[5] + 2)));
54 v32[3] = *(int *)((char *)&dword_425080
55 + (((unsigned __int8)v38[12] + 2) | (32 * (unsigned __int8)v38[8] + 2) | (16
56 * (unsigned __int8)v38[13]
57 + 2) | (8 * (unsigned __int8)v38[9]
58 + 2) | (4 * (unsigned __int8)v38[10] + 2) | (2 * (unsigned __int8)v38[11] + 2)));
59 v32[4] = *(int *)((char *)&dword_425180
60 + (((unsigned __int8)v38[18] + 2) | (32 * (unsigned __int8)v38[14] + 2) | (16
61 * (unsigned __int8)v38[19]
62 + 2) | (8 * (unsigned __int8)v38[15] + 2) | (4 * (unsigned __int8)v38[16] + 2) | (2 * (unsigned __int8)v38[17] + 2)));
63 v32[5] = *(int *)((char *)&dword_425280
64 + (((unsigned __int8)v38[24] + 2) | (32 * (unsigned __int8)v38[20] + 2) | (16
65 * (unsigned __int8)v38[25]
66 + 2) | (8 * (unsigned __int8)v38[21] + 2) | (4 * (unsigned __int8)v38[22] + 2) | (2 * (unsigned __int8)v38[23] + 2)));
67 v32[6] = *(int *)((char *)&dword_425380
68 + (((unsigned __int8)v38[30] + 2) | (32 * (unsigned __int8)v38[26] + 2) | (16
69 * (unsigned __int8)v38[31]
70 + 2) | (8 * (unsigned __int8)v38[27] + 2) | (4 * (unsigned __int8)v38[28] + 2) | (2 * (unsigned __int8)v38[29] + 2)));
71 v32[7] = *(int *)((char *)&dword_425480
72 + (((unsigned __int8)v38[36] + 2) | (32 * (unsigned __int8)v38[32] + 2) | (16
73 * (unsigned __int8)v38[37]
74 + 2) | (8 * (unsigned __int8)v38[33] + 2) | (4 * (unsigned __int8)v38[34] + 2) | (2 * (unsigned __int8)v38[35] + 2)));
75 v19 = (char *)(&unk_425681 - (_UNKNOWN *)a2);
76 v20 = &unk_425680 - (_UNKNOWN *)a2;
77 v33 = *(int *)((char *)&dword_425580
78 + (((unsigned __int8)v38[42] + 2) | (32 * (unsigned __int8)v38[38] + 2) | (16
79 * (unsigned __int8)v38[43]
80 + 2) | (8
81 * (unsigned __int8)v38[39]
82 + 2) | (4 * (unsigned __int8)v38[40] + 2) | (2 * (unsigned __int8)v38[41] + 2)));
83 result = a2;
84 if ( v4 <= 0 )
85 {
86 v30 = 8;
87 do
88 {
89 *result ^= *((_BYTE *)v32 + (unsigned __int8)result[v20] + 3);
90 result[1] ^= *((_BYTE *)v32 + (unsigned __int8)v19[(_DWORD)result] + 3);
91 result[2] ^= *((_BYTE *)v32 + (unsigned __int8)result[&unk_425682 - (_UNKNOWN *)a2] + 3);
92 result[3] ^= *((_BYTE *)v32 + (unsigned __int8)result[byte_425683 - a2] + 3);
93 result += 4;
94 --v30;
95 }
96 while ( v30 );
97 }
98 else
99 {
100 v29 = 8;
101 do
102 {
103 v24 = result[32];
104 v22 = *result ^ *((_BYTE *)v32 + (unsigned __int8)result[v20] + 3);
105 result += 4;
106 result[28] = v22;
107 *(result - 4) = v24;
108 v25 = result[29];
109 result[29] = *(result - 3) ^ *((_BYTE *)v32 + (unsigned __int8)result[(_DWORD)v19 - 4] + 3);
110 *(result - 3) = v25;
111 v26 = result[30];
112 result[30] = *(result - 2) ^ *((_BYTE *)v32 + (unsigned __int8)result[&unk_425682 - (_UNKNOWN *)a2 - 4] + 3);
113 *(result - 2) = v26;
114 v27 = result[31];
115 result[31] = *(result - 1) ^ *((_BYTE *)v32 + (unsigned __int8)result[byte_425683 - a2 - 4] + 3);
116 *(result - 1) = v27;
117 --v29;
118 }
119 while ( v29 );
120 }
121 v5 = a3;
122 v3 = v32[0];
123 v4 = v31 - 1;
124 v23 = v31 - 1 <= -1;
125 ++v28;
126 --v31;
127 }
128 while ( !v23 );
129 return result;
130}

Oops. It is substantially more complex but resembles the essence of the decryption algorithm. We will refer to this function, sub_407980As decrypt_data Out here. We can see what the immediate bottleneck might be: This function takes C++ this Performs bitwise operations on the pointer (line 5) and one of its members (lines 18, 23, etc.). For now let’s call this class member key And come back to it later.

This function is the perfect example of decompilers emitting less than ideal code as a result of compiler optimization/code reordering. For me, TTD was essential to follow how data flows through this function. It took a few hours of banging my head on IDA and WinDbg to figure this out, but this function can be broken down into 3 high-level steps:

  1. Creating a 48-byte buffer containing our key content XOR’d with data from a static table.
1 int v33;
2 unsigned __int8 v34; // [esp+44h] [ebp-34h]
3 unsigned __int8 v35; // [esp+45h] [ebp-33h]
4 unsigned __int8 v36; // [esp+46h] [ebp-32h]
5 unsigned __int8 v37; // [esp+47h] [ebp-31h]
6 char v38[44]; // [esp+48h] [ebp-30h]
7
8 v3 = (int)this;
9 v4 = 15;
10 v5 = a3;
11 v32[0] = (int)this;
12 v28 = 0;
13 v31 = 15;
14 do
15 {
16 // The end statement of this loop is strange -- it's writing a byte somewhere? come back
17 // to this later
18 for ( i = 0; i < 48; *((_BYTE *)&v33 + i + 3) = v18 )
19 {
20 // v28 Starts at 0 but is incremented by 1 during each iteration of the outer `while` loop
21 v7 = v28;
22 // v5 is our last argument which was 0
23 if ( !v5 )
24 // overwrite v7 with v4, which begins at 15 but is decremented by 1 during each iteration
25 // of the outer `while` loop
26 v7 = v4;
27 // left-hand side of the xor, *(_BYTE *)(i + 48 * v7 + v3 + 4)
28 // v3 in this context is our `this` pointer + 4, giving us *(_BYTE *)(i + (48 * v7) + this->maybe_key)
29 // so the left-hand side of the xor is likely indexing into our key material:
30 // this->maybe_key[i + 48 * loop_multiplier]
31 //
32 // right-hand side of the xor, a2[(unsigned __int8)byte_424E50[i] + 31]
33 // a2 is our input encrypted data, and byte_424E50 is some static data
34 //
35 // this full statement can be rewritten as:
36 // v8 = this->maybe_key[i + 48 * loop_multiplier] ^ encrypted_data[byte_424E50[i] + 31]
37 v8 = *(_BYTE *)(i + 48 * v7 + v3 + 4) ^ a2[(unsigned __int8)byte_424E50[i] + 31];
38
39 v9 = v28;
40
41 // write the result of `key_data ^ input_data` to a scratch buffer (v34)
42 // v34 looks to be declared as the wrong type. v33 is actually a 52-byte buffer
43 *(&v34 + i) = v8;
44
45 // repeat the above 5 more times
46 if ( !v5 )
47 v9 = v4;
48 v10 = *(_BYTE *)(i + 48 * v9 + v3 + 5) ^ a2[(unsigned __int8)byte_424E51[i] + 31];
49 v11 = v28;
50 *(&v35 + i) = v10;
51
52 // snip
53
54 // v18 gets written to the scratch buffer at the end of the loop...
55 v18 = *(_BYTE *)(i + 48 * v17 + v3 + 9) ^ a2[(unsigned __int8)byte_424E55[i] + 31];
56
57 // this was probably the *real* last statement of the for-loop
58 // i.e. for (int i = 0; i < 48; i += 6)
59 i += 6;
60 }
  1. Create a 32-byte buffer containing data from a 0x800-byte constant table, with the indexes on this table generated from the indexes created from the buffer in step #1. Combine this 32-byte buffer with the 48-byte buffer in step #1.
1 // dword_424E80 -- some static data
2 // (unsigned __int8)v38[0] + 2) -- the original decompiler output has this wrong.
3 // v33 should be a 52-byte buffer which consumes v38, so v38 is actually data set up in
4 // the loop above.
5 // (32 * v34 + 2) -- v34 should be some data from the above loop as well. This looks like
6 // a binary shift optimization
7 // repeat with different multipliers...
8 //
9 // This can be simplified as:
10 // size_t index = ((v34 << 5) + 2)
11 // | ((v37[1] << 4) + 2)
12 // | ((v35 << 3) + 2)
13 // | ((v36 << 2) + 2)
14 // | ((v37 << 1) + 2)
15 // | v38[0]
16 // v32[1] = *(int*)(((char*)&dword_424e80)[index])
17 v32[1] = *(int *)((char *)&dword_424E80
18 + (((unsigned __int8)v38[0] + 2) | (32 * v34 + 2) | (16 * (unsigned __int8)v38[1] + 2) | (8 * v35 + 2) | (4 * v36 + 2) | (2 * v37 + 2)));
19 // repeat 7 times. each time the reference to dword_424e80 is shifted forward by 0x100.
20 // note: if you do the math, the next line uses dword_424e80[64]. We shift by 0x100 instead of
21 // 64 because is misleading because dword_424e80 is declared as an int array -- not a char array.
  1. Iterate over the next 8 bytes of the output buffer. For each byte index of the output buffer, index yet one more Static 32-byte buffer and use it as the index into the table from step #2. XOR this value with the value of the current index of the output buffer.
1// Not really sure why this calculation works like this. It ends up just being `unk_425681`'s address
2// when it's used.
3 v19 = (char *)(&unk_425681 - (_UNKNOWN *)a2);
4 v20 = &unk_425680 - (_UNKNOWN *)a2;
5
6// v4 is a number that's decremented on every iteration -- possibly bytes remaining?
7 if ( v4 <= 0 )
8 {
9 // Loop over 8 bytes
10 v30 = 8;
11 do
12 {
13 // Start XORing the output bytes with some of the data generated in step 2.
14 //
15 // Cheating here and doing the "draw the rest of the owl", but if you observe that
16 // we use `unk_425680` (v20), `unk_425681` (v19), `unk_425682`, and byte_425683, the
17 // the decompiler generated suboptimal code. We can simplify to be relative to just
18 // `unk_425680`
19 //
20 // *result ^= step2_bytes[unk_425680[output_index] - 1]
21 *result ^= *((_BYTE *)v32 + (unsigned __int8)result[v20] + 3);
22
23 // result[1] ^= step2_bytes[unk_425680[output_index] + 1]
24 result[1] ^= *((_BYTE *)v32 + (unsigned __int8)v19[(_DWORD)result] + 3);
25
26 // result[2] ^= step2_bytes[unk_425680[output_index] + 2]
27 result[2] ^= *((_BYTE *)v32 + (unsigned __int8)result[&unk_425682 - (_UNKNOWN *)a2] + 3);
28
29 // result[3] ^= step2_bytes[unk_425680[output_index] + 3]
30 result[3] ^= *((_BYTE *)v32 + (unsigned __int8)result[byte_425683 - a2] + 3);
31 // Move our our pointer to the output buffer forward by 4 bytes
32 result += 4;
33 --v30;
34 }
35 while ( v30 );
36 }
37 else
38 {
39 // loop over 8 bytes
40 v29 = 8;
41 do
42 {
43 // grab the byte at 0x20, we're swapping this later
44 v24 = result[32];
45
46 // v22 = *result ^ step2_bytes[unk_425680[output_index] - 1]
47 v22 = *result ^ *((_BYTE *)v32 + (unsigned __int8)result[v20] + 3);
48
49 // I'm not sure why the output buffer pointer is incremented here, but
50 // this really makes the code ugly
51 result += 4;
52
53 // Write the byte generated above to offset 0x1c
54 result[28] = v22;
55 // Write the byte at 0x20 to offset 0
56 *(result - 4) = v24;
57
58 // rinse, repeat with slightly different offsets each time...
59 v25 = result[29];
60 result[29] = *(result - 3) ^ *((_BYTE *)v32 + (unsigned __int8)result[(_DWORD)v19 - 4] + 3);
61 *(result - 3) = v25;
62 v26 = result[30];
63 result[30] = *(result - 2) ^ *((_BYTE *)v32 + (unsigned __int8)result[&unk_425682 - (_UNKNOWN *)a2 - 4] + 3);
64 *(result - 2) = v26;
65 v27 = result[31];
66 result[31] = *(result - 1) ^ *((_BYTE *)v32 + (unsigned __int8)result[byte_425683 - a2 - 4] + 3);
67 *(result - 1) = v27;
68 --v29;
69 }
70 while ( v29 );
71 }

in the inner loop else I think the above branch is somewhat bad, so here’s it reimplemented in Rust:

1for _ in 0..8 {
2 // we swap the `first` index with the `second`
3 for (first, second) in (0x1c..=0x1f).zip(0..4) {
4 let original_byte_idx = first + output_offset + 4;
5
6 let original_byte = outbuf[original_byte_idx];
7
8 let constant = unk_425680[output_offset + second] as usize;
9
10 let new_byte = outbuf[output_offset + second] ^ generated_bytes_from_step2[constant - 1];
11
12 let new_idx = original_byte_idx;
13 outbuf[new_idx] = new_byte;
14 outbuf[output_offset + second] = original_byte;
15 }
16
17 output_offset += 4;
18}

#key setup

Now we need to figure out how our key is set up for use decrypt_data Work up. My approach here is to set a breakpoint on the first instruction to access the key data decrypt_datawhatever happens xor bl, [ecx + esi + 4] But 0x4079d3I know this is where we should break because the decompiler output will have the main stuff, on the left side of the XOR operation, Second operands in xor Instructions. As a reminder, the decompiler shows XOR as:

v8 = *(_BYTE *)(i + 48 * v7 + v3 + 4) ^ a2[(unsigned __int8)byte_424E50[i] + 31];

The breakpoint has been hit and the address we are loading from is 0x19f5c4Now we can rely on TTD to help us figure out where this data was last written, Set a 1-byte memory write breakpoint at this address ba w1 0x19f5c4 and press Go Back button. If you’ve never used TTD before, it works exactly the same way Go except for backwards In the trace of the program. In this case it will execute backwards until a breakpoint is hit, an interrupt occurs, or we reach the beginning of the program.

Our memory write breakpoint is triggered 0x4078fb –A function we haven’t seen before. Callstack shows it hasn’t been called too far decrypt_update_info Routine!

  • set_key (Here we are – the function is called basically sub_407850,
  • sub_4082c0
  • decrypt_update_info

What is? sub_4082c0,

timestamp inflation.5e46a11b487ec708

Not much to see here except the same function being called 4 times, initially in position 0 with a timestamp string as an argument, a 64-byte buffer, and a group of function calls using the return value of the last as its input. The function that our debugger just broke takes only 1 argument, which is the 64-byte buffer used. All Of these function calls. So what’s going on inside? sub_407e80,

inflate timestamp.65ac73080c0654a8

Bitwise operations that look similar to the byte to bit inflation we saw above with firmware data. After renaming things and unrolling some loops, things look like this:

1// sub_407850
2int inflate_timestamp(void *this, char *timestamp_str, char *output, uint8_t *key) {
3 for (size_t output_idx = 0; output_idx < 8; output_idx++) {
4 uint8_t ts_byte = *timestamp_str;
5 if (ts_byte) {
6 timestamp_str += 1;
7 }
8
9 for (int bit_idx = 0; bit_idx < 8; bit_idx++) {
10 uint8_t bit_value = (ts_byte >> (7 - bit_idx)) & 1;
11 output[(output_idx * 8) + bit_idx] ^= bit_value;
12 }
13 }
14
15 set_key(this, key);
16 decrypt_data(this, output, 1);
17
18 return timestamp_str;
19}
20
21// sub_4082c0
22int set_key_to_timestamp(void *this, char *timestamp_str) {
23 uint8_t key_buf[64];
24 memset(&key_buf, 0, sizeof(key_buf));
25
26 char *str_ptr = inflate_timestamp(this, timestamp_str, &key_buf, &static_key_1);
27 str_ptr = inflate_timestamp(this, str_ptr, &key_buf, &static_key_2);
28 str_ptr = inflate_timestamp(this, str_ptr, &key_buf, &static_key_3);
29 inflate_timestamp(this, str_ptr, &key_buf, &static_key_4);
30
31 set_key(this, &key_buf);
32}

Now the only secret is this set_key Routine:

1int __thiscall set_key(char *this, const void *a2)
2{
3 _DWORD *v2; // ebp
4 char *v3; // edx
5 char v4; // al
6 char v5; // al
7 char v6; // al
8 char v7; // al
9 int result; // eax
10 char v10[56]; // [esp+Ch] [ebp-3Ch] BYREF
11
12 qmemcpy(v10, a2, sizeof(v10));
13 v2 = &unk_424DE0;
14 v3 = this + 5;
15 do
16 {
17 v4 = v10[0];
18 qmemcpy(v10, &v10[1], 0x1Bu);
19 v10[27] = v4;
20 v5 = v10[28];
21 qmemcpy(&v10[28], &v10[29], 0x1Bu);
22 v10[55] = v5;
23 if ( *v2 == 2 )
24 {
25 v6 = v10[0];
26 qmemcpy(v10, &v10[1], 0x1Bu);
27 v10[27] = v6;
28 v7 = v10[28];
29 qmemcpy(&v10[28], &v10[29], 0x1Bu);
30 v10[55] = v7;
31 }
32 for ( result = 0; result < 48; result += 6 )
33 {
34 v3[result - 1] = v10[(unsigned __int8)byte_424E20[result] - 1];
35 v3[result] = v10[(unsigned __int8)byte_424E21[result] - 1];
36 v3[result + 1] = v10[(unsigned __int8)byte_424E22[result] - 1];
37 v3[result + 2] = v10[(unsigned __int8)byte_424E23[result] - 1];
38 v3[result + 3] = v10[(unsigned __int8)byte_424E24[result] - 1];
39 v3[result + 4] = v10[(unsigned __int8)byte_424E25[result] - 1];
40 }
41 ++v2;
42 v3 += 48;
43 }
44 while ( (int)v2 < (int)byte_424E20 );
45 return result;
46}

This function is a little more simple to reimplement:

1void set_key(void *this, uint8_t *key) {
2 uint8_t scrambled_key[56];
3 memcpy(&scrambled_key, key, sizeof(scrambled_key));
4
5 for (size_t i = 0; i < 16; i++) {
6 size_t swap_rounds = 1;
7 if (((uint32_t*)GLOBAL_KEY_ROUNDS_CONFIG)[i] == 2) {
8 swap_rounds = 2;
9 }
10
11 for (int i = 0; i < swap_rounds; i++) {
12 uint8_t temp = scrambled_key[0];
13 memcpy(&scrambled_key, &scrambled_key[1], 27);
14 scrambled_key[27] = temp;
15
16 temp = scrambled_key[28];
17 memcpy(&scrambled_key[28], &scrambled_key[29], 27);
18 scrambled_key[55] = temp;
19 }
20
21 for (size_t swap_idx = 0; swap_idx < 48; swap_idx++) {
22 size_t scrambled_key_idx = GLOBAL_KEY_SWAP_TABLE[swap_idx] - 1;
23
24 size_t persistent_key_idx = swap_idx + (i * 48);
25 this->key[persistent_key_idx] = scrambled_key[scrambled_key_idx];
26 }
27 }
28}

#put everything together

  1. Updated data is read from resources
  2. The first 4 bytes of the updated data are the UNIX timestamp.
  3. The timestamp is formatted as a string, each byte inflated to its bit representation, and decrypted using some constant key content as the key. This is repeated 4 times with the output of the previous run used as input for the next run.
  4. The resulting data from step 3 is used as a key to decrypt the data.
  5. The remainder of the firmware update image is inflated to a bit representation of 8 bytes at a time and the dynamic key and 3 other unique static lookup tables are used to transform the inflated input data.
  6. The result from step 5 is put back into byte Representation.

My decryption utility that completely re-implements this magic in Rust can be found at https://github.com/landaire/porkchop.

# Loading firmware into IDA Pro

IDA thankfully supports separation of the Hitachi/Renhas H8SX architecture. If we load our firmware into IDA and select the “Hitachi H8SX Advanced” processor type, use the default options for the “Disassembly Memory Organization” dialog, then finally select “H8S/2215R” in the “Select Device Name” dialog…:

rom initial load.883b3f9fcc2c1b5d

We don’t have shit. I’m not an embedded systems expert, but my friend suggested that the first few DWORDs look like they might belong to a vector table. If we right-click on address 0 and select “Double Word 0x142A”, we can click on New Variable unk_142A To go to his place. Press C To define it as code at this location, then press P To create a function at this address:

firmware analyzed.7bd41c86909a3a9f

Now we can reverse engineer our firmware 🙂



Leave a Comment