How to split each page of a djvu file?

Question

In a djvu file, it has two book pages in one djvu page. I would like to split it so that one book page per djvu page. For example, enter image description here

I was wondering if this can be done by some software, preferably command line utilities? Thanks and regards!

PS: This is a file that can be used for test.

score 4 · Answer 1 · answered Dec 06 '11 at 21:28

The following is untested, but in principle it should work (I will test it if I have more time).

You could convert the djvu file for example to jpg's like this:

#!/bin/sh
# djvu -> jpgs converter

i=1

# number of pages (392)
while [ $i -ne 392 ]
do
ddjvu -page=$i -format=pnm 1.djvu $i.pnm
pnmtojpeg $i.pnm > $i.jpg
rm -f $i.pnm
echo "page $i done"
i=`expr $i + 1`
done

(from http://caree.livejournal.com/74639.html)

Then you could use scantailor to split the pages and produce a new output (consisting of tif files).

Apply in a third step djvubind to that folder and you get your desired djvu file.

Thanks! I wonder if it is possible to operate on djvu files directly, without converting to other formats. I knew that the job can be done by converting djvu files to pdf files, and then operate on pdf files and finally convert back to djvu files. — Tim, Dec 06 '11 at 21:38

score 3 · Answer 2 · answered Feb 04 '14 at 02:05

Here is my complete working script. It needs djvulibre-bin library for ddjvu (decode to tiff format), cjb2 (because i don't know how to directly convert tiff to djvu), djvm (for insertions/deletions of pages in djvu file) tools and imagemagick library for convert (for splitting page and converting to mystical pbm file format) tool. All libraries available through apt-get.

Scheme:

file.djvu --./ddjvu--> bifold tiff --./convert--> single page tiff --./convert--> pbm --./djvm--> out.djvu

Example: djvusplit 3 10 file.djvu which means split pages from 3 to 10.

Complete code:

#!/bin/bash
echo Usage: djvusplit PAGEFROM PAGETO FILE.DJVU

#make temp folder
mkdir ./tmp
cp "$3" ./tmp/
cd tmp

for i in $(seq -w $2 -1 $1) # Descending
do
    ddjvu -format=tiff -page=$i "$3" t$i.tiff
    convert -crop 2x1@ t$i.tiff t$i-%d.tiff
    convert t$i-0.tiff t$i-0.pbm
    convert t$i-1.tiff t$i-1.pbm
    cjb2 t$i-0.pbm t$i-0.djvu
    cjb2 t$i-1.pbm t$i-1.djvu
    djvm -i t$i-0.djvu t$i-1.djvu 2
    djvm -d $3 $i
    djvm -i $3 t$i-0.djvu $i
    rm ./t*
done

# total clean
mv $3 ../out.djvu
cd ..
rm -r ./tmp

Unfortunately...

It works slow because of generating many tiff/pbm/djvu files.
Split page looses any OCR text.

I use this script with some other to maintaining my electronic library. They located here on GitHub.

score 2 · Answer 3 · edited Apr 13 '17 at 12:36

There aren't so many tools that can operate directly on DjVu files, compared with other more common formats such as PDF or JPEG. With image manipulation programs, there's the added hurdle that most of these operate on a single image at a time, but the DjVu file contains multiple pages.

One possibility is to go via pdf. With ddjvu from DjVuLibre, a PDF un2up filter, and pdf2djvu:

ddjvu -format=pdf 2up.djvu 2up.pdf
un2up <2up.pdf | pdf2djvu /dev/stdin >1up.djvu

You might be able to cobble up an un2up for djvu inspired by my pdf version using python-djvulibre. I haven't checked how hard that API is to get into.

score 1 · Answer 4 · answered Apr 27 '22 at 01:42

Method 1. Use a djvused(1) script to save each page to a standalone file

You probably want this method.

In bash, you would do something like this:

input_file=inputfile.djvu
output_dir=outdir
mkdir "$output_dir"
page_count=$( djvused "$input_file" -e n )
for page_num in $( seq 1 "$page_count" ); do
    printf 'select %d; save-page outdir/page-%04d.djvu\n' "$page_num" "$page_num"
done 

| djvused "$input_file"

That would generate a djvused script like this:

select 1; save-page-with outdir/page-0001.djvu
select 2; save-page-with outdir/page-0002.djvu
select 3; save-page-with outdir/page-0003.djvu
select 4; save-page-with outdir/page-0004.djvu
select 5; save-page-with outdir/page-0005.djvu
select 6; save-page-with outdir/page-0006.djvu

which would produce a directory a directory of files, like this:

$ ls -l outdir/
total 400
-rw-r--r-- 1 dwon dwon 59145 Apr 26 17:30 page-0001.djvu
-rw-r--r-- 1 dwon dwon 69848 Apr 26 17:30 page-0002.djvu
-rw-r--r-- 1 dwon dwon 60037 Apr 26 17:30 page-0003.djvu
-rw-r--r-- 1 dwon dwon 68312 Apr 26 17:30 page-0004.djvu
-rw-r--r-- 1 dwon dwon 71849 Apr 26 17:30 page-0005.djvu
-rw-r--r-- 1 dwon dwon 68368 Apr 26 17:30 page-0006.djvu

Method 2. Convert the document to "indirect" storage format

The DjVu format allows documents to be stored as multiple files, with references between them. This is what is known as "indirect" format (as opposed to the usual "bundled" format, which stores everything together). It's possible to convert a document between bundled and indirect format.

The benefit of this method is that any complex document structure (e.g. shared dictionaries for JB2 compression, page repetition, metadata covering the whole file, etc) will be preserved. This makes it handy for adding information & metadata (e.g. OCR text) since it preserves the overall content structure.

The drawbacks are that you don't get any control over filenames, and some software might not be able to handle it. (e.g. Okular 21.12.3 chokes on this sample file when converted to indirect format, even though DjView4 works fine.)

The conversion can be done using the djvmcvt command, or by using the save-indirect subcommand of djvused.

Using djvmcvt(1):

mkdir output_directory
djvmcvt -i doc_in.djvu output_directory index_filename.djvu

Using djvused(1), subcommand save-indirect:

mkdir output_directory
djvused doc_in.djvu -e "save-indirect output_directory/index_filename.djvu"

These commands produce output like this:

$ mkdir outdir
$ djvmcvt -i sample2.djvu outdir index.djvu
$ ls -l outdir/
total 56
-rw-r--r-- 1 dwon dwon 22173 Apr 26 18:32 dict0084.iff
-rw-r--r-- 1 dwon dwon    71 Apr 26 18:32 index.djvu
-rw-r--r-- 1 dwon dwon 22653 Apr 26 18:32 p0001.djvu
-rw-r--r-- 1 dwon dwon  3688 Apr 26 18:32 p0002.djvu
$ djvudump outdir/index.djvu
  FORM:DJVM [59]
    DIRM [47]         Document directory (indirect, 3 files 2 pages)
      dict0084.iff -> dict0084.iff
      p0001.djvu -> p0001.djvu
      p0002.djvu -> p0002.djvu
$ djvudump outdir/dict0084.iff
  FORM:DJVI [22161]
    Djbz [22149]      JB2 shared dictionary
$ djvudump outdir/p0001.djvu
  FORM:DJVU [22641]
    INFO [10]         DjVu 3300x2550, v21, 300 dpi, gamma=2.2
    INCL [12]         Indirection chunk --> {dict0084.iff}
    Sjbz [3054]       JB2 bilevel data
    FGbz [22]         JB2 colors data, v0, 2 colors
    BG44 [3774]       IW4 data #1, 72 slices, v1.2 (color), 1100x850
    BG44 [3400]       IW4 data #2, 11 slices
    BG44 [5270]       IW4 data #3, 10 slices
    BG44 [6635]       IW4 data #4, 6 slices
    TXTz [387]        Hidden text (text, etc.)
$ djvudump outdir/p0002.djvu
  FORM:DJVU [3676]
    INFO [10]         DjVu 3300x2550, v21, 300 dpi, gamma=2.2
    INCL [12]         Indirection chunk --> {dict0084.iff}
    Sjbz [2792]       JB2 bilevel data
    FGbz [404]        JB2 colors data, v0, 43 colors
    BG44 [87]         IW4 data #1, 97 slices, v1.2 (b&w), 275x213
    TXTz [318]        Hidden text (text, etc.)

I posted a script that implements the first method: https://gist.github.com/dlitz/273b228cf47109a5805cff7e5bacd8b0 — dlitz, Apr 27 '22 at 02:05

user66759 · Answer 5 · 2014-05-03T05:23:57.570

Taking bot2417's script as base, here my own

#!/bin/bash
echo "################################################"
echo Usage: djvusplit2 LASTPAGE FILE.DJVU
echo "################################################"

if [ ! -f $2 ];
then
    echo "file $2 not exists!\n"
    exit
fi

start=1    
mkdir ./tmp

for i in $(seq $start +1 $1)
do
    j=$(($i*2-1))
    k=$(($i*2))

    # extract pages to tiff format
ddjvu -format=pbm -page=$i $2 ./tmp/$i.tiff

# split pages
convert -crop 2x1@ ./tmp/$i.tiff ./tmp/$i-%d.tiff

#delete extracted tiff
#rm ./tmp/$i.tiff

# convert tiff to djvu pages
cjb2 ./tmp/$i-0.tiff ./tmp/$j.djvu
cjb2 ./tmp/$i-1.tiff ./tmp/$k.djvu

#delete splitted tiffs
#rm ./tmp/$i-0.tiff
#rm ./tmp/$i-1.tiff

# create djvu file and add pages
if [ $i -eq 1 ];
    then
        djvm -c "(new) $2" ./tmp/$i.djvu
        echo "create new $2 OK"
        djvm -i "(new) $2" ./tmp/$(($i+1)).djvu
        echo "insert page $(($i+1)) OK"
    else
        djvm -i "(new) $2" ./tmp/$j.djvu
        echo "insert page $j OK"
        djvm -i "(new) $2" ./tmp/$k.djvu
        echo "insert page $k OK"
fi

#delete djvu pages
#rm ./tmp/$j.djvu
#rm ./tmp/$k.djvu

done

echo "\nfile (new) $2 created!!!"
echo "\n"

# cleanup temp dir
rm -r ./tmp

Do you have one or two lines of explanation what is changed/better? For those of us to lazy to analyse the script. — Anthon, May 03 '14 at 05:40

score 0 · Answer 6 · answered Dec 06 '11 at 19:27

0

http://en.wikisource.org/wiki/Help:DjVu_files#Splitting_DjVu_files

Hope you find your answer here.

answered Dec 06 '11 at 19:27

Nikhil Mulley

8,315

Thanks, but the link is to extract some pages from a djvu file into a new djvu file. I would like to split each djvu page into two djvu pages in a djvu file. – Tim Dec 06 '11 at 19:58
yeah, idea is to extract all the pages as individual from djvu file and then merge the even number of files into single djvu file and repeat this for every 2 files. Finally, merge all the doubly-paged single djvu files into single djvu file. – Nikhil Mulley Dec 06 '11 at 20:12
I don't quite understand your comment. How do you solve the question in my post? – Tim Dec 06 '11 at 20:31
I need to get djvu tools installed on my system and will try to give you a work around. – Nikhil Mulley Dec 06 '11 at 20:39

How to split each page of a djvu file?

6 Answers6

Method 1. Use a djvused(1) script to save each page to a standalone file

Method 2. Convert the document to "indirect" storage format