Sunday, July 26, 2009

Joining multiple PDF documents with Gostscript.

This really works.

I downloaded all the chapters of beginning perl as published here: http://www.perl.org/books/beginning-perl/

To do it in a painless way I first used curl to grab the html.
curl http://www.perl.org/books/beginning-perl/ -o bp.txt

Then I used grep to grab the hyperlinks to pdf files and piped that into another file.
cat bp.txt | grep -o "<a.*pdf"> bp2.txt

Then I used sed to clean up a little bit.
cat bp2.txt | sed "s/^<a href=\"//g"> bp3.txt

Then I used wget to download each pdf.
for line in `cat bp3.txt`; do wget $line; done

So now I had a whole bunch of pdf files that together make up one book. A quick hunt on google produced this guy.
http://www.linux.com/news/software/applications/8229-putting-together-pdf-files

And I was away laughing.

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=beginning-perl.pdf 3145_Intro.pdf 3145_Chap*.pdf 3145_App*.pdf 3145_Index.pdf
Post a Comment