Wednesday 22 August 2012

Convert a Tab separated to Comma separated file

If you want to replace some text within a file, that is too large to open in a GUI application then, this is when you fall back on the command line tools.

Sometimes I get a Tab separated (.tsv/.txt) file and I need to convert it to a Comma separated (.csv) file before I can import it into a database table.

On a Linux server:
Code:   
tr -s '\t' <input | tr '\t' ',' >output
   
Substitute 'input' for the original filename and 'output' for the desired output filename.


Tuesday 14 August 2012

MySQL distinct count of data across multiple tables

You want to do a unique count of data across multiple tables.

Below is an example of getting a count of unique email addresses across two tables. Actually, I am looking for Females, aged 25 to 45 in the East Midlands, UK. I don't want duplicates because, I would not want to email the same person twice!

Code:   
SELECT COUNT(DISTINCT email) As East_Midlands
FROM (
SELECT email
FROM table_a
WHERE (gender = 'F' OR LOWER(title IN ('ms', 'miss', 'mrs')))
AND str_to_date(date, '%d/%m/%Y') BETWEEN '1967-01-01' AND '1987-08-13'
AND (SUBSTRING(postcode,1,2) IN ('CB','CO','DE','DN','IP','LE','LN','NG','NR','PE') OR postcode LIKE 'S[1-9]%')
UNION
SELECT email
FROM table_b
WHERE (gender = 'F' OR LOWER(title IN ('ms', 'miss', 'mrs')))
AND str_to_date(DOB, '%Y%m%d') BETWEEN '1967-01-01' AND '1987-08-13'
AND (SUBSTRING(postcode,1,2) IN ('CB','CO','DE','DN','IP','LE','LN','NG','NR','PE') OR postcode LIKE 'S[1-9]%')
) As SubQueryAlias;
   
I hope that makes sense. So, you have multiple selects, joined by the 'UNION', wrapped by the outer query which counts the distinct email addresses of the tables it embodies (as many as you want).

The 'As SubQueryAlias' at the end is needed to pull it together or, SQL will throw an error.