Wednesday, 14 September 2011

MySQL count of Geographical postcode selections, grouped

Here is an example for a query to give you a count of members/customers/recipients within a postcode2 area, listed (group by) the postcode2 area.

What do I mean? I wanted to know how many club members were in different England postcode areas thus:

Quote:   
POSTCODE2 COUNT
CA 34
CB 12
...
HP 22
..
MK 62


use this

Code:   
-- England count grouped by postcode2
SELECT SUBSTRING(postcode,1,2) AS POSTCODE2, COUNT(*) AS COUNT
FROM members
WHERE (SUBSTRING(postcode,1,2) IN
('BR','CR','DA','EC','EN','HA','IG','KT','NW','RM','SE','SM','SW','TW','UB','WC','WD','BN','CM','CT','GU','HP','LU','ME','MK',
'PO','RG','RH','SG','SL','SO','SS','TN','CB','CO','DE','DN','IP','LE','LN','NG','NR','PE','CV','DY','HR','NN','ST','TF','WR','WS',
'WV','DH','DL','HG','HU','LS','NE','SR','TS','WF','YO','BB','BD','BL','CA','CH','CW','FY','HD','HX','LA','OL','PR','SK','WA',
'WN','BA','BH','BS','DT','EX','GL','AL','PL','SN','SP','TA','TQ','TR','OX')
OR postcode REGEXP ('^E[1-9]')
OR postcode REGEXP ('^N[1-9]')
OR postcode REGEXP ('^W[1-9]')
OR postcode REGEXP ('^S[1-9]')
OR postcode REGEXP ('^B[1-9]')
OR postcode REGEXP ('^L[1-9]')
OR postcode REGEXP ('^M[1-9]'))
GROUP BY SUBSTRING(postcode,1,2);
   

MYSQL dedupe data

If you want to select data from one table, ensuring that it does not (de-duplicate) match another table, I find the following useful.

Code:   
SELECT * FROM data_2
WHERE data_2.email
NOT IN (
SELECT data_1.email
FROM data_1
WHERE data_1.email=data_2.email
);
   
If you have a flat-file, .csv for example and you want to cleanse of duplicate records, you can upload it to a temporary table, setting the field you want to ignore duplicates from, as a UNIQUE data field.

Code:
CREATE TABLE 'example' (
'id' BIGINT(20) NOT NULL AUTO_INCREMENT,
'email' VARCHAR(255) NOT NULL,
'forename' VARCHAR(30) NULL DEFAULT NULL,
'surname' VARCHAR(30) NULL DEFAULT NULL,
PRIMARY KEY ('id'),
INDEX 'Index 2' ('email'),
UNIQUE INDEX 'Index 3' ('email')
)
COLLATE='utf8_general_ci'
ENGINE='InnoDB'
AUTO_INCREMENT=1;


The above example will check and remove any duplicate email addresses it finds during import of the file, thus de-duping the data.


MYSQL select random records

If you need to select <n> number of sample records at random, you could try the following. Though, it can be resource intensive.

Code:   
SELECT *
FROM members
WHERE (email != '' AND email NOT NULL)
ORDER BY RAND() LIMIT 0,100;
   
This example will select 100 records at random.



MYSQL count unique data

Here I had to select unique (none duplicated) email addresses from a table. You would do this if say, you were mailing club members but there are multiple family members with the same email address but, you only want to send an email to them once. Hey, no one likes their Inbox filling up with the same email!

Code:   
SELECT COUNT(DISTINCT email)
FROM members
WHERE (email != "" OR email IS NOT NULL);
   



MYSQL count email domains

In a previous job where clients supplied their own data, I had to be careful how of their email campaign went to a particular ESP/FQDN in one go.

This count gives an idea of how much of the email data is for what domains.

Code:   
SELECT SUBSTRING_INDEX(email, '@', -1) as Domain, count(email) as Total
FROM members
WHERE (email != '' AND email NOT NULL)
GROUP BY Domain
ORDER BY Total DESC
LIMIT 100;
   


MySQL query for geographical UK postcode selections

First, the basics.
Code:   
SELECT * FROM contacts WHERE postcode = 'MK3 6ZZ';
   
will match any records with a single postcode in Milton Keynes.

Code:   
SELECT * FROM contacts WHERE postcode LIKE 'MK3 %';
   
will match records with the postcode area MK3 in Milton Keynes. Care needs to be taken when using LIKE. Notice the space after the '3'.

Code:   
SELECT * FROM contacts WHERE postcode LIKE 'MK3%';
   
is not the same as the above. Notice that there is no space after the '3'. The would return records, not only in the MK3 postcode area but also, MK31 to MK39 (if they existed).

Code:   
SELECT * FROM contacts WHERE postcode REGEXP ('^E[1-9]');
   
will match any postcodes in the E1 to E9 (London) postcode areas.

As I will show, you can mix these examples to search for more than a single postcode area in one query. Useful if you want the query to return a larger geographical area.

If you want to target members/customers/recipients within a geographical location then here are a few snippets I have gleaned over the years.

Code:   
-- London
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('BR','CR','DA','EC','EN','HA','IG','KT','NW','RM','SE','SM','SW','TW','UB','WC','WD')
OR postcode REGEXP ('^E[1-9]')
OR postcode REGEXP ('^N[1-9]')
OR postcode REGEXP ('^W[1-9]'));

-- Scotland
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('KW','IV','AB','PH','DD','PA','FK','KY','ML','KA','EH','DG','TD')
OR postcode REGEXP ('^G[1-9]'));

-- England
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN
('BR','CR','DA','EC','EN','HA','IG','KT','NW','RM','SE','SM','SW','TW','UB','WC','WD','BN','CM','CT','GU','HP','LU','ME','MK',
'PO','RG','RH','SG','SL','SO','SS','TN','CB','CO','DE','DN','IP','LE','LN','NG','NR','PE','CV','DY','HR','NN','ST','TF','WR','WS',
'WV','DH','DL','HG','HU','LS','NE','SR','TS','WF','YO','BB','BD','BL','CA','CH','CW','FY','HD','HX','LA','OL','PR','SK','WA',
'WN','BA','BH','BS','DT','EX','GL','AL','PL','SN','SP','TA','TQ','TR','OX')
OR postcode REGEXP ('^E[1-9]')
OR postcode REGEXP ('^N[1-9]')
OR postcode REGEXP ('^W[1-9]')
OR postcode REGEXP ('^S[1-9]')
OR postcode REGEXP ('^B[1-9]')
OR postcode REGEXP ('^L[1-9]')
OR postcode REGEXP ('^M[1-9]'));

-- Wales
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('LL','SY','SA','LD','SA','CF','NP'));

-- Northern Ireland
SELECT COUNT(*)
FROM members
WHERE postcode LIKE 'BT%';

-- South East
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('BN','CM','CT','GU','HP','LU','ME','MK','PO','RG','RH','SG','SL','SO','SS','TN'));

-- East Midlands
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('CB','CO','DE','DN','IP','LE','LN','NG','NR','PE')
OR postcode REGEXP ('^S[1-9]'));

-- West Midlands
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('CV','DY','HR','NN','ST','TF','WR','WS','WV')
OR postcode REGEXP ('^B[1-9]'));

-- North East
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('DH','DL','HG','HU','LS','NE','SR','TS','WF','YO'));

-- North West
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('BB','BD','BL','CA','CH','CW','FY','HD','HX','LA','OL','PR','SK','WA','WN')
OR postcode REGEXP ('^L[1-9]%'
OR postcode REGEXP ('^M[1-9]%'));

-- South West
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,2) IN ('BA','BH','BS','DT','EX','GL','AL','PL','SN','SP','TA','TQ','TR','OX'));

I do use LIKE for small queries but REGEXP (Regular Expression) is better to avoid any ambiguity. LIKE 'MK1%' would be a mistake because it would select not only MK1 but, MK11, MK12, etc.

Say that you want to target UK members/customers/recipients within a radius of a postcode, how do you work it out? Well, there are websites that can help. Do a search for something like, 'Find UK Postcodes Inside a Radius' and you will find a few. Type in a postcode for the center of the search radio and then the mile/kilometers radius and then you can feed the results into an SQL query.

Example:
Code:   
SELECT COUNT(*)
FROM members
WHERE (SUBSTRING(postcode,1,4) IN ('MK1 ','MK17','MK2 ','MK3 ','MK4 ','MK5 ');