Saturday, 25 June 2016

Removing columns in a csv file

Sometimes I get supplied a large comma separated (.CSV) file from a client with erroneous data columns (fields) in it. Most people would say, open it in M$ Excel, highlight the columns you don't want and delete. What if the file is too large to open in M$ Excel? Well, Linux offers the awk command as the fastest solution.

For example, let's assume you have a comma delimited CSV file in your Home directory called myfile.csv and you want to display the first and third column

awk -F, '{OFS=",";print $1,$3}' < ~/myfile.csv

To actually strip out the first and third column from this file, to another file, you pipe the result.

awk -F, '{OFS=",";print $1,$3}' < ~/myfile.csv > ~/mynewfile.csv


You can select as many columns you would like to keep by, adding its column number, pre-pended with a dollar sign.

awk -F, '{OFS=",";print $1,$3,$4,$6}' < ~/myfile.csv > ~/mynewfile.csv


No comments: