Here's how a general algorithm might operate if you were absolutely determined to make this approach work:
- download the first kilobyte of the MP4 file
- make sure the
moov atom is up front; determine moov atom's length and make another request to fetch the rest of the atom
- dig through the
moov atom to find the video trak atom; dig into that atom to find the following atoms: stsd, stss, stco/co64, and stsz
- the
stsd will give you initialization information required to feed into the H.264 video decoder
- the
stss atom gives you a list of all the sync samples (keyframes); these can be decoded independently and would be ideal for your thumbnailing prospects
- when you know which frames are keyframes, courtesy of the
stss atom, you can cross reference with the stco or co64 atoms (a trak will have one or the other) in order to find the absolute file location, and the stsz atom, which will tell you exactly how many bytes are in the frame
With all of this information combined, you should be able to download and decode (and thus resize and re-compress for thumbnailing) just the keyframes of an MP4 video.