I’ve found a couple times in the past that I have the following problem:
I’ve got a bunch of records of the form
key1: value1 key1: value2 key2: value3 ...
And I want to transform them into the form
key1: value1, value2 key2: value3 ...
so combining consecutive lines with the same initial key and joining the values together on one line.
It’s not a terribly hard problem, but doing it quickly and in bounded amount of memory for very large volumes of data is at least non-trivial enough to require a bit of care.
In the past I generally solved this in some application specific way, but recently I decided to do it properly and have extracted a nice little command line utility from it. It accepts a sensible range of options for configuring behaviour, is reasonably fast (I haven’t benchmarked it very extensively, but on all data I’ve tried it on it’s about an order of magnitude faster than sorting the data, which I tend to want to do before feeding it into squish anyway, and about half the speed of uniq) and runs in memory that will never grow past O(size of largest key).
I didn’t have a good name for it, so I picked a bad one. It’s called “squish” and is available from my data-tools repository (aka “Where I put random crap”). If you have a better name for it I’m all ears.
It’s possible that this duplicates functionality of something that already exists. Anyone know if it does?