HOWTO: Remove Byte-order Mark with Ruby and Iconv

October 19th, 2009

I’m working on a small project that involves loading a UTF-16LE (16-bit Unicode, Little Endian) CSV file, converting it to UTF-8 (normal Unicode, as it may be) with iconv, then parsing the values with FasterCSV. Everything was working fine except for loading the first column of data by the column header value. For example, given data:

First Name Last Name Email
Jimbo Jones jimbo.jones@example.com

I could access column 2 (Last Name) as either row.field("Last Name") or row.field(1). However, if I tried to access the first column using row.field("First Name"), it would return nil. row.field(0), on the other hand, would return the proper value.

Hmmmm.

After some sleuthing, I examined the raw content of the string:

(rdb:1) p row.headers.first.unpack('C*')
[239, 187, 191, 70, 105, 114, 115, 116, 32, 78, 97, 109, 101]

Ah, ha! The first three characters are the byte-order mark, or BOM. Ruby, for whatever reason, does not strip it when reading a file as input, so it’s passed along in the input stream. When loading a file with FasterCSV, it’ll keep those characters in the key name, causing lookups by the first column key name to return nil.

I modified my file conversion code as follows:

  def convert_to_utf8
    # Data files are exported as Little Endian UTF-16. We need to parse as UTF-8
    contents = File.open(@file_name).read      
    begin
      converted = Iconv.iconv('UTF-8', 'UTF-16LE', contents)
      converted.first.gsub!("\xEF\xBB\xBF", '') # strip the BOM (byte order mark) from the first line of input
      output = File.open(@file_name, 'w')
      output.write(converted)
    rescue Iconv::Failure
      puts $!.inspect
    end
  end

And all is well in the world.

An incomplete list

September 11th, 2009

A list of names I have, at one point or another, used to refer to my girlfriend’s dog:

  • Ruf
  • Rufito
  • Rufee
  • Stinky
  • Poops
  • Poops McGee
  • Poop Machine
  • Lil’ Pooper
  • Shit for brains (learned that one at the Sam Mazzara School of Driving, another story, for another time)
  • Stink Machine
  • (the) Nugg
  • Nugget
  • El Nuggo
  • Big Dummy
  • The BEAST (ALL CAPS)
  • (Wee) Lil’ Beastie
  • The Mayor of Cullerton

And for the record, his name is Rufus.

Superior strength

August 20th, 2009

rsync

August 13th, 2009

Note that doubling a single-quote inside a single-quoted string gives you a single-quote; likewise for double-quotes (though you need to pay attention to which quotes your shell is parsing and which quotes rsync is parsing).

rsync man page

Ow, my head hurts.

Directory tree

August 10th, 2009

A handy Bash script to display a tree view of a directory, adapted from http://www.centerkey.com/tree. This version omits .svn and .git directories, and uses the find utility.

echo
if [ "$1" != "" ]  #if parameter exists, use as base folder
   then cd "$1"
   fi
pwd
find . \! \( -path "*.svn*" -or -path "*.git*" \) -type d | \
   sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
if [ `ls -F -1 | grep "/" | wc -l` = 0 ]
   then echo "   -> no sub-directories"
   fi
echo
exit

Example use:

[cgansen@Crystal-Frontier ~]$ tree projects/self.d-struct.org/wp-admin/
 
/Users/cgansen/projects/self.d-struct.org/wp-admin
   .
   |-css
   |-images
   |-import
   |-includes
   |-js
 
[cgansen@Crystal-Frontier ~]$

Assembling Coil

July 6th, 2009

A few weeks ago, John and I dropped by to see our friend Craig exhibiting at the Guerrilla Truck Show in Chicago’s West Loop area. In spite of the pouring rain, we had a great time seeing the exhibits, and I walked away the proud owner of one of Craig’s recent designs, the Coil Lamp. After a few weeks of ignoring it, as I was busy moving into a (totally awesome) new apartment in Pilsen, I finally found a few free minutes to assemble it.

Raw materials

Starting off, I gather the necessary items: lamp with signed dedication, 100′ extension cord, CFL bulb OMG DO NOT USE A INCANDESCENT BULB (the instructions clarified that no fewer than three times), and a beer. My suggested pairing for assembling Coil: Goose Island Summertime. The crisp finish accentuates the sharp, precise laser cut of the clear acrylic form of Coil, and they’re both made here in Chicago.

Instruction manual

The instructions are detailed and nicely illustrated.

The raw form of Coil

Unravelled cord

Unravelled cord

My only mistake was not taking enough time to straighten out the coiled cord, which was pretty gnarled up, which caused me to have a bit of trouble getting a really smooth, even “coil” on the form.

Beginning to take form

First step

First few spins

First few spins

Flipped per instructions

Flipped per instructions

Notice as the beer is slowly drained, the sign of real progress on any project.

Ta-da! Completed Coil

In situ

There you have it, an assembled Coil Lamp. At this moment, it is sitting in the window, casting a cool, fluorescent glow on the streets of Pilsen. More photos of the experience are online over at Flickr.

Shop local is dead.

June 1st, 2009

all-mailboxes-found-62-matches-for-search-1

Amazon Prime is my crack.

Happy as an elephant in mud

May 23rd, 2009

Slowly sorting through all the video I shot while in Tanzania. This is one of my favorites, two elephants playing in the mud at Tarangire National Park.

This is old-time hockey

May 19th, 2009

As a Detroit-area native transplanted to Chicago, I’m often asked who I’m rooting for in the Red Wings vs. Blackhawks playoff matchup. I don’t really care about pro sports, but there’s no way I could root for anyone other than the Red Wings.

Jobs4Recovery 2.0

May 4th, 2009

j4r-logo

I’m pleased to announce the relaunch of Jobs4Recovery.com. I programmed the first version of this site in September 2005, in the aftermath of the Katrina-Rita one-two hurricane punch. After the initial wave of activity, the site fell by the wayside. The US Chamber of Commerce, in partnership with IBM, is ressurecting the site to deal with both economic events and natural disasters. Over the past few weeks I’ve worked on upgrading and refreshing the site. I’m pleased with how it turned out, and I hope it’ll help folks when they need it the most.

On a technical note, sometimes it’s fun to switch your whole working environment for a while. I’ve been hacking Ruby on Rails for the past few years, but went back to PHP for this project. While I’ve fallen in love with the Ruby language and the Rails framework, the immediacy of PHP is refreshing. To upgrade this site, almost all of the effort was in modifying the Javascript calls to reflect changes in the Google Maps API, or tweaking layout issues in Internet Explorer. The same core PHP code from 3.5 years ago worked flawlessly, without changing a single line.