How to Tell if TCP Payloads Are Identical

I was working on a problem today in which vendor tech support was suggesting that a firewall was subtly modifying TCP data payloads. I couldn't find any suggestion of this in the firewall logs, but seeing as how I've seen that vendor's firewall logs lie egregiously in the past, I wanted to verify it independently.

I took a packet capture from both hosts involved in the conversation and started thinking about how to see if the data sent by the server was the same as the data received by the client. I couldn't just compare the capture files themselves, because elements like timestamps, TTLs, and IP checksums would be different.

After a bunch of fiddling around, I came up with the idea of using tshark to extract the TCP payloads for each stream in the capture file and hash the results. If the hashes matched, the TCP payloads were being transferred unmodified. Here are the shell commands to do this:

tshark -r server.pcap -T fields -e tcp.stream | sort -u | sed 's/\r//' | xargs -i tshark -r server.pcap -q -z follow,tcp,raw,{} | md5sum
2cfe2dbb5f6220f29ff8aff82f7f68f5 *-

You then run exactly the same commands on the "client.pcap" file and compare the resulting hashes. Let's break this down a bit more:

tshark -r server.pcap -T fields -e tcp.stream

This invokes tshark to read the "server.pcap" file and output the TCP stream indexes of each packet. This is just a long series of integers:

0
0
1
2
1
etc.

The next command, sort -u, produces a logical set of the unique (hence the "-u") stream indexes. In other words, it removes duplicates from the previous list. Not all Unix-like operating systems have the "sort -u" option; if yours is missing it, you can use "| sort | uniq" instead.

Next, sed 's/\r//' removes the line break from the end of the resulting stream indexes. If you don't do this, you'll get an error from the next command.

The next one's a bit of a doozy: xargs -i takes each stream index (remember, these are just integers) and executes the tshark -r server.pcap -q -z follow,tcp,raw,{}command once for each stream index, substituting the input stream index for the {} characters.

The tshark -r server.pcap -q -z follow,tcp,raw,{} command itself reads the capture file a second time, running the familiar "Follow TCP Stream in Raw Format" command from Wireshark on the specified TCP stream index that replaces the {} characters. If you're rusty on Wireshark, "Follow TCP Stream" just dumps the TCP payload data in one of a variety of formats, such as "raw" or ASCII. If you've never used this option in Wireshark, make sure you try it today!

The final command, md5sum, runs a MD5 hash on the preceding input.

To summarize, we've done this: taken a file, extracted all the raw TCP data payloads from its packets (without headers), and hashed the data with MD5. If we do this on two files and the hashes are the same, we know they contain exactly the same TCP data (barring the infinitesimally small probability of a MD5 hash collision).

In my case, both capture files produced the same hash, proving that the firewall was (for once) playing nice.

Published: November 02 2013

  • category:
  • tags: