Code:
SELECT * FROM data_2
WHERE data_2.email
NOT IN (
SELECT data_1.email
FROM data_1
WHERE data_1.email=data_2.email
);
WHERE data_2.email
NOT IN (
SELECT data_1.email
FROM data_1
WHERE data_1.email=data_2.email
);
If you have a flat-file, .csv for example and you want to cleanse of duplicate records, you can upload it to a temporary table, setting the field you want to ignore duplicates from, as a UNIQUE data field.
Code:
CREATE TABLE 'example' (
'id' BIGINT(20) NOT NULL AUTO_INCREMENT,
'email' VARCHAR(255) NOT NULL,
'forename' VARCHAR(30) NULL DEFAULT NULL,
'surname' VARCHAR(30) NULL DEFAULT NULL,
PRIMARY KEY ('id'),
INDEX 'Index 2' ('email'),
UNIQUE INDEX 'Index 3' ('email')
)
COLLATE='utf8_general_ci'
ENGINE='InnoDB'
AUTO_INCREMENT=1;
'id' BIGINT(20) NOT NULL AUTO_INCREMENT,
'email' VARCHAR(255) NOT NULL,
'forename' VARCHAR(30) NULL DEFAULT NULL,
'surname' VARCHAR(30) NULL DEFAULT NULL,
PRIMARY KEY ('id'),
INDEX 'Index 2' ('email'),
UNIQUE INDEX 'Index 3' ('email')
)
COLLATE='utf8_general_ci'
ENGINE='InnoDB'
AUTO_INCREMENT=1;
The above example will check and remove any duplicate email addresses it finds during import of the file, thus de-duping the data.
No comments:
Post a Comment