========================== Python Tar Patch (Proof) ========================== :Author: Andrew Ellerton :Date: 20 March 2005 I've had problems using the python tarfile module with symlinks. I started with hexdumps of the a tarfile produced by the tarfile.py module compared to one produced by GNU tar - but there are other differences in the file (like magic numbers) that confused the issue. So I wrote a little hand-rolled tarfile reader and found that the data for symlink entries isn't quite right. In particular, the size field is set to the length of the link target name. The size field in a tar header is supposed to by the length in bytes of the data area in the .tar file. In the case of a symlink, the size field must be zero. However, with the existing tarfile.py, if you include a link like: linkto -> therealfile the size will be entered as 11 == strlen(therealfile). If this happens, when un-tarring tar tries to unpack the "data" (because size >0) but actually its starting to read the next tar entry. To reproduce you can use my example files and do: make This will produce an archive (several, actually), with both python tar and gnu tar, then do a diff. The last one is the most important, so you can do: make test4 (The other tests may be useful, but test4 was the main one I worked with). The other way is to build a tarfile with a symlink followed by a file - then do: tar tvf thefile.tar gnu tar will report something like this:: tar tvf archives/test1.my.tar lrwxrwxrwx andrew/dev 3 2005-03-20 16:00:35 barlink -> bar tar: Skipping to next header tar: Error exit delayed from previous errors If you apply the patch tarfile.patch to tarfile.py the error will go away. The solution is to set size=0 if the file type == SYMLINK. CAUTION: I *ASSUME* this is ok on other platforms, but I have only tried it on linux. Andrew Ellerton