Catalogue
Formatting Specific CSV Columns with awk — Treating Numbers as Strings

Formatting Specific CSV Columns with awk — Treating Numbers as Strings

🌐 日本語で読む

Overview

When you want to extract some customer data and report on it,
if a CSV file contains a mobile phone number such as “090”, opening the CSV file in Excel
may turn it into “90”.

For such cases, here is a summary of how to treat the values in a specific column as strings.

How to treat numbers as strings in Excel

Given

1
"090"

formatting it like

1
="090"

makes it be treated as a string.

Example)

1
"デミスハサビス",="09099999999","DeepMind"

Let’s format it

  • Example) Suppose you have a tmp.csv like the following.
1
2
"デミスハサビス","09099999999","DeepMind"
"いとうせいこう","08088888888","エムパイヤ・スネーク・ビルディング"
  • Add = to the left side of only the second column.
    ※ Add an “=” to the left side of the value enclosed in “ (double quotes).
    ※ For clarity, I deliberately specified the fields $1, $2, $3 explicitly.
1
2
3
$ awk -F ',' '{print $1 ",="$2","$3}' tmp.csv
"デミスハサビス",="09099999999","DeepMind"
"いとうせいこう",="08088888888","エムパイヤ・スネーク・ビルディング"

As shown above, “=” has been added.

  • Specify the output destination
1
2
3
4
$ awk -F ',' '{print $1 ",="$2","$3}' tmp.csv > output.csv
$ cat output.csv
"デミスハサビス",="09099999999","DeepMind"
"いとうせいこう",="08088888888","エムパイヤ・スネーク・ビルディング"

Let’s open output.csv in Excel.

Done ♪

Bonus

When opening with Excel, the default encoding on Windows is SJIS, so if the character encoding is UTF8, the text will be garbled.
Below, we convert the character encoding of the CSV file to SJIS.

1
$ nkf -sLw output.csv > output_sjis.csv

That’s all.

kenzo0107

kenzo0107