Here is a tiny app that uses deep sampling to find tumors in any disk or directory. It walks the directory tree twice, once to measure it, and the second time to print out the paths to 20 "random" bytes under the directory.
void walk(string sDir, int iPass, int64& n, int64& n1, int64 step){
foreach(string sSubDir in sDir){
walk(sDir + "/" + sSubDir, iPass, n, n1, step);
}
foreach(string sFile in sDir){
string sPath = sDir + "/" + sFile;
int64 len = File.Size(sPath);
if (iPass == 2){
while(n1 <= n+len){
print sPath;
n1 += step;
}
}
n += len;
}
}
void dscan(){
int64 n = 0, n1 = 0, step = 0;
// pass 1, measure
walk(".", 1, n, n1);
print n;
// pass 2, print
step = n/20; n1 = step/2; n = 0;
walk(".", 2, n, n1);
print n;
}
The output looks like this for my Program Files directory:
7,908,634,694
.\ArcSoft\PhotoStudio 2000\Samples\3.jpg
.\Common Files\Java\Update\Base Images\j2re1.4.2-b28\core1.zip
.\Common Files\Wise Installation Wizard\WISDED53B0BB67C4244AE6AD6FD3C28D1EF_7_0_2_7.MSI
.\Insightful\splus62\java\jre\lib\jaws.jar
.\Intel\Compiler\Fortran\9.1\em64t\bin\tselect.exe
.\Intel\Download\IntelFortranProCompiler91\Compiler\Itanium\Data1.cab
.\Intel\MKL\8.0.1\em64t\bin\mkl_lapack32.dll
.\Java\jre1.6.0\bin\client\classes.jsa
.\Microsoft SQL Server\90\Setup Bootstrap\sqlsval.dll
.\Microsoft Visual Studio\DF98\DOC\TAPI.CHM
.\Microsoft Visual Studio .NET 2003\CompactFrameworkSDK\v1.0.5000\Windows CE\sqlce20sql2ksp1.exe
.\Microsoft Visual Studio .NET 2003\SDK\v1.1\Tool Developers Guide\docs\Partition II Metadata.doc
.\Microsoft Visual Studio .NET 2003\Visual Studio .NET Enterprise Architect 2003 - English\Logs\VSMsiLog0A34.txt
.\Microsoft Visual Studio 8\Microsoft Visual Studio 2005 Professional Edition - ENU\Logs\VSMsiLog1A9E.txt
.\Microsoft Visual Studio 8\SmartDevices\SDK\CompactFramework\2.0\v2.0\WindowsCE\wce500\mipsiv\NETCFv2.wce5.mipsiv.cab
.\Microsoft Visual Studio 8\VC\ce\atlmfc\lib\armv4i\UafxcW.lib
.\Microsoft Visual Studio 8\VC\ce\Dll\mipsii\mfc80ud.pdb
.\Movie Maker\MUI\0409\moviemk.chm
.\TheCompany\TheProduct\docs\TheProduct User's Guide.pdf
.\VNI\CTT6.0\help\StatV1.pdf
7,908,634,694
It tells me that the directory is 7.9gb, of which
- ~15% goes to the Intel Fortran compiler
- ~15% goes to VS .NET 2003
- ~20% goes to VS 8
It is simple enough to ask if any of these can be unloaded.
It also tells about file types that are distributed across the file system, but taken together represent an opportunity for space saving:
- ~15% roughly goes to .cab and .MSI files
- ~10% roughly goes to logging text files
It shows plenty of other things in there also, that I could probably do without, like "SmartDevices" and "ce" support (~15%).
It does take linear time, but it doesn't have to be done often.
Examples of things it has found:
- backup copies of DLLs in many saved code repositories, that don't really need to be saved
- a backup copy of someone's hard drive on the server, under an obscure directory
- voluminous temporary internet files
- ancient doc and help files long past being needed
sudo apt install ncdu
on ubuntu gets it easily. It's great – Orion Edwards Jul 19 '17 at 22:30ncdu -x
to only count files and directories on the same filesystem as the directory being scanned. – Luke Cousins Jul 21 '17 at 11:51sudo ncdu -rx /
should give a clean read on biggest dirs/files ONLY on root area drive. (-r
= read-only,-x
= stay on same filesystem (meaning: do not traverse other filesystem mounts) ) – B. Shea Sep 21 '17 at 15:52wget -qO- https://dev.yorhel.nl/download/ncdu-linux-x86_64-1.16.tar.gz | tar xvz && ncdu -x
(official builds) – jan-glx Sep 27 '21 at 07:53sudo mkdir /ncdu && sudo mount -t tmpfs -o size=500m tmpfs /ncdu && wget -qO- https://dev.yorhel.nl/download/ncdu-linux-x86_64-1.16.tar.gz | tar xvz --directory /ncdu && /ncdu/ncdu -x
– jan-glx Sep 27 '21 at 07:58wget -qO- https://dev.yorhel.nl/download/ncdu-linux-x86_64-1.16.tar.gz | tar xvz --directory /dev/shm && /dev/shm/ncdu -x
... urls might change, newer version might be available see here: https://dev.yorhel.nl/ncdu – jan-glx Sep 27 '21 at 08:05brew install ncdu
– Jgonzalez731 Mar 16 '23 at 21:21