2

I set about packaging several large databases in some very basic debian packages so as to make life easier at my job. However, I'm now encountering some issues. Most of the databases will install happily, however, the largest three are failing.

find . -name "*.deb" -exec du -h '{}' \; | sort -h
# These install fine
4.0K    ./hh-suite-data_1.0_all.deb
422M    ./hh-suite-data-env70/package/hh-suite-data-env70_1.0.0_amd64.deb
660M    ./hh-suite-data-env90/package/hh-suite-data-env90_1.0.0_amd64.deb
795M    ./hh-suite-data-env/package/hh-suite-data-env_1.0.0_amd64.deb
1.6G    ./hh-suite-data-scop70/package/hh-suite-data-scop70_1.0.0_amd64.deb
2.6G    ./hh-suite-data-nr70/package/hh-suite-data-nr70_1.0.0_amd64.deb
2.8G    ./hh-suite-data-pfamA/package/hh-suite-data-pfama_1.0.0_amd64.deb
3.2G    ./hh-suite-data-nr90/package/hh-suite-data-nr90_1.0.0_amd64.deb
# These fail to install
4.3G    ./hh-suite-data-nr20/package/hh-suite-data-nr20_1.0.0_amd64.deb
6.2G    ./hh-suite-data-pdb70/package/hh-suite-data-pdb70_1.0.0_amd64.deb
7.4G    ./hh-suite-data-nr/package/hh-suite-data-nr_1.0.0_amd64.deb

The failures look like this:

sudo dpkg -i package/hh-suite-data-nr20_1.0.0_amd64.deb 
[sudo] password for esr: 
(Reading database ... 276172 files and directories currently installed.)
Unpacking hh-suite-data-nr20 (from .../hh-suite-data-nr20_1.0.0_amd64.deb) ...
dpkg: error processing package/hh-suite-data-nr20_1.0.0_amd64.deb (--install):
 corrupted filesystem tarfile - corrupted package archive
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
Errors were encountered while processing:
 package/hh-suite-data-nr20_1.0.0_amd64.deb

I'm somewhat convinced it's because of the size of the archives, that somewhere between 3.2 and 4.3G.

Does anyone have any experience with very large packages and their failure modes? Anyone have any idea why this is happening? I have no reason to believe the tar archive are corrupted, I've built the package many times and still see this error on installation

I'm re-writing my packages to just wget the files from a mirror instead of actually containing the databases, as that'll get around the tar problem.

Running with -D10

# This file unpacks fine
D000010: tarobject ti->name='./usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index' mode=100644 owner=0.0 type=48(-) ti->linkname='' namenode='/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index' flags=2 instead='<none>'
D000010: ensure_pathname_nonexisting `/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index.dpkg-tmp'
D000010: ensure_pathname_nonexisting `/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index.dpkg-new'
# This is a 16G file and fails IMMEDIATELY.
D000010: tarobject ti->name='./usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db' mode=100644 owner=0.0 type=48(-) ti->linkname='' namenode='/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db' flags=2 instead='<none>'
D000010: ensure_pathname_nonexisting `/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-tmp'
D000010: ensure_pathname_nonexisting `/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-new'
dpkg: error processing hh-suite-data-pdb70_1.0.0_amd64.deb (--install):
 corrupted filesystem tarfile - corrupted package archive

Running with -D100

There are two entries in this portion, a good one and the bad one, and some after the failure. What worries me is the "tarobject file open size=0" bit.

D000100: setupvnamevbs main=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index' tmp=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index.dpkg-tmp' new=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.index.dpkg-new'
D000100: tarobject already exists
D000100: tarobject file open size=900749
D000100: tarobject nondirectory, `link' backup
D000100: tarobject done and installation deferred
D000100: setupvnamevbs main=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db' tmp=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-tmp' new=`/usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-new'
D000100: tarobject already exists
D000100: tarobject file open size=0
D000100: tarobject nondirectory, `link' backup
D000100: tarobject done and installation deferred
dpkg: error processing hh-suite-data-pdb70_1.0.0_amd64.deb (--install):
 corrupted filesystem tarfile - corrupted package archive
D000100: setupvnamevbs main=`//usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db' tmp=`//usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-tmp' new=`//usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-new'
D000100: cu_installnew restoring atomic
D000100: secure_remove '//usr/share/hh-suite-data/pdb70/pdb70_19Oct13_a3m_db.dpkg-new' unlink OK
EricR
  • 171
  • 1
    I don't really know anything about deb packaging but the error sounds like you are hitting the limits of some buffer or other. Try running dpkg with the -D (debug) option to see if that gives anything useful. Something like dpkg -D10 -i package.deb. – terdon Nov 12 '13 at 19:10
  • 1
    The breaking point is 4.3GB, that's just over the 32-bit maximum size of 4294967296 bytes. So I'm almost certain you're hitting an addressing issue. Both GNU tar and the kernel in Ubuntu should be able to accommodate 64-bit addressing. What are you using to build the packages, and is the packaging system 64-bit as well? – bahamat Nov 12 '13 at 19:15
  • Unrelated to your issue at hand, if these are in fact data files you should specify the arch as all instead of amd64 since they are not architecture dependent. – bahamat Nov 12 '13 at 19:16
  • Very interesting...it's definitely the bigger files. The debug log (pasted above) shows that it is unpacking some of the smaller files but when it hits a big one it dies. – EricR Nov 12 '13 at 19:16
  • 1
    @bahamat, Building packages on a x64 system, but I would bet you're right. Somewhere in this is probably a 32 bit issue. And yes, you're right, they should be marked all, I had been reusing a set of scripts to generate the package framework which specified amd64. – EricR Nov 12 '13 at 19:18
  • Additionally, running tar manually on the generated package_version.tar.gz file produced during building (dpkg-buildpackages) extracts fine, without any errors. – EricR Nov 12 '13 at 19:27
  • @EricR: It's a very intriguing problem. I don't know enough about the internals of dpkg to recommend a good solution. You could try breaking up the files? I sent Raphael Hertzog a link to this on Twitter, so maybe he'll comment. – bahamat Nov 12 '13 at 19:27
  • Maybe if you play around with different debug values, you might find the exact command that is failing. Have a look at man dpkg and read the D option. For example: 2000 Insane amounts of drivel. – terdon Nov 12 '13 at 19:35
  • Your issue would appear to be old: http://ubuntuforums.org/showthread.php?t=986549 – slm Nov 13 '13 at 01:49
  • Are you on ubuntu or debian? The tags say ubuntu, the title says debian, which is it? – slm Nov 13 '13 at 02:46
  • @slm, ubuntu. However, the packages are still referred to as "debian packages" pretty much everywhere. Additionally, thank you for the ubuntuforums link, that appears to be the same issue I'm experiencing. I guess that means my resolution is split + postinst cat. – EricR Nov 13 '13 at 17:36

1 Answers1

0

You might want to peruse the output from this command:

$ apt-config dump | less

Specifically these options:

$ apt-config dump|grep bzip
APT::Compressor::bzip2 "";
APT::Compressor::bzip2::Name "bzip2";
APT::Compressor::bzip2::Extension ".bz2";
APT::Compressor::bzip2::Binary "bzip2";
APT::Compressor::bzip2::Cost "3";
APT::Compressor::bzip2::CompressArg "";
APT::Compressor::bzip2::CompressArg:: "-9";
APT::Compressor::bzip2::UncompressArg "";
APT::Compressor::bzip2::UncompressArg:: "-d";
Dir::Bin::bzip2 "/bin/bzip2";

There are a lot of other commands related to compression in this output. I'd make sure that the tools these point to can handle files larger then the 4+ GB threshold that you seem to be encountering. Make sure they're all the 64-bit varieties.

32-bit vs. 64-bit

I'm almost positive this is the root cause. See my answer to this question, titled: 32-bit, 64-bit CPU op-mode on Linux, for examples of how to determine your system's CPU bit width & you're OSes compiled architecture.

OS

$ getconf LONG_BIT
64

CPU

$ hwinfo --cpu | grep Arch | tail -1
  Arch: X86-64
slm
  • 369,824
  • Didn't know about apt-config, nifty. My output for apt-config piped to a grep for bzip looks identical to yours. Additionally, it is a 64 bit OS, without a doubt. Both getconf and hwinfo report correctly as 64 bit. – EricR Nov 13 '13 at 17:38
  • @EricR - I would then make sure that the tools you're using are all 64-bit variety. – slm Nov 13 '13 at 17:58
  • dpkg -x file.deb test works fine, extracting everything perfectly. Additionally, archiving/unarchiving the original .tar.gz that contained the databases I'm trying to distribute has no issues. – EricR Nov 13 '13 at 18:14
  • Also, this is a bog standard Ubuntu 12.04 LTS x64 installation on a commodity server. This is the first time I'm seeing anything that looks like a bitness issue. I've tar'd and untar'd much larger files on this server without a problem. I'm convinced it's in the way dpkg handles archives. – EricR Nov 13 '13 at 18:15
  • @EricR - yeah that's kind of my point, in looking at how apt works wrt the Compression settings. I asked on the Debian IRC last night and no one knew of any size limitations wrt dpkg. When they saw your Q and that it was on Ubuntu they said they couldn't help, that it was an Ubuntu issue. – slm Nov 13 '13 at 18:27
  • ah, thanks for following up with them! I wouldn't have thought that dpkg was something that ubuntu modified for their use...perhaps I'll boot up a pure debian VM later tonight and see if the issue applies to debian as well. – EricR Nov 13 '13 at 18:38
  • @EricR - OK, good idea. I'll try and follow-up with the Ubuntu folks in an IRC room to see if anyone knows anything about this, but I'm as perplexed as you. – slm Nov 13 '13 at 18:51