Wednesday 7 August 2013

The easiest way to split and merge pdf files in linux

I scanned some documents to .pdf format but, I wanted to email them all in one .pdf file.

The easiest way to split, merge or edit pdf files in linux is to use pdftk.

sudo apt-get install pdftk
or
sudo aptitude install pdftk
or
sudo yum install pdftk
depending on what flavour of linux you have.

pdftk is quite powerful but, here are a few examples.

Split a multi-page pdf into separate files:

pdftk largepdfile.pdf burst
results in a number files like 0001.pdf, 0002.pdf, etc.

Merge files into one PDF file:

pdftk *.pdf cat output largepdfile.pdf
or
pdftk file1.pdf file2.pdf file3.pdf cat output largepdfile.pdf

Saturday 12 January 2013

Converting date formats in a flat-file using Regular expressions

I use two free text editors for preparing flat-files (.csv, etc.) prior to importing into MySQL.

I use Notepad++ in M$ Windows and Kate in Linux.  Both can do Regular expression Search & Replace upon text.

I find regular expressions very useful for preparing date fields prior to import into a database table.  So, converting the UK date format
dd/mm/yyyy (12/01/2013) into a MySQL DATE, DATETIME or, TIMESTAMP datatype (2013-01-12) is easy.

Search:
([0-9]+)/+([0-9]+)/+([0-9]+)


Replace (converts UK date format):
\3-\2-\1

Replace (converts US date format):
\3-\1-\2


So the above UK example searches for dates in the UK date format dd/mm/yyyy and converts them to yyyy-mm-dd.

Before:
Click image to enlarge
After:
Click image to enlarge
Datatypes:
Unlike M$ Excel, these text-editors can cope with large files and do this relatively quickly.

Nice tutorials on Regular expressions at http://www.regular-expressions.info/