Mozilla runs a crash-stats service, which accepts crash reports from clients (mobile/desktop browsers, B2G, etc) and provides a reporting interface.
Recently, a change landed on the client side to enable multiple minidumps to be attached to an incoming crash, and we want to add support to the server to accept these as soon as possible.
Our usual test procedure is to pull an existing crash from production and submit it as a new crash to our dev and staging instances. Unfortunately, we had no easy way to test this particular scenario, since the current crash collector only stores a single minidump, and discards any others. We really want real data in this case - we of course have unit tests and synthetic data, but the crash collector is a critical service so we want to get it right the first time when we push updates.
We decided that the most expedient way to get real data would be to capture from production using tcpdump, then replay this to the dev/staging servers.
There are tools readily available to do this - the major concern is that we're capturing a large amount of traffic, so we want to filter out as much as possible. Also, tcpdump has a built-in mechanism for rolling and gzipping capture files (either every n seconds, or when the file gets over n bytes).
First, run tcpdump on the target (production) server:
tcpdump -i eth0 dst port 81 -C 100 -z "gzip" -w output.pcap
eth0 is the interface we're interested in, only incoming traffic, and only port 81. The -C and -z commands will cause tcpdump to roll the output.pcap file every 100 megabytes.
This ends up producing a (potentially large) number of files:
output.pcap output.pcap1.gz output.pcap2.gz
When you feel you've captured enough data, stop the tcpdump process and use tcpslice to rebuild a single capture file:
tcpslice -w full.pcap output.pcap*
Then use tcptrace to reassemble the packets into complete sessions (this is necessary since TCP packets may be received out-of-order). This will create one file per HTTP session:
tcptrace -e full.pcap
Now we have a set of files named e.g. fmekmf.dat - if you take a look inside these you will see they are full HTTP sessions. They can be replayed against a dev/stage server using netcat like so:
cat aaju2aajv_contents.dat | nc devserver 80
You may need to modify the files first, to change the Host header for example. This is easy to do in-place with sed:
cat aaju2aajv_contents.dat | sed 's/Host: prodserver/Host: devserver/' | nc devserver 80
NOTE - this technique potentially uses a ton of disk space, I did this in many stages so I could backtrack in case I made any mistakes. If disk space (and overall time) are a premium, for example you are setting up a continuous pipeline, I'd investigate using named pipes instead of creating actual files for uncompressing and running tcpslice + tcptrace.
Also, if you are doing this in a one-off manner then tcpflow or wireshark (wireshark has a terminal version, tshark) are easier to work with- I wanted to do the capture on a locked-down server which had tcpdump available, and wanted to take advantage of the log rolling+compression feature.