04 September 2013

Reader Mailbag

Once in a while I like to dip into the vast amounts of mail I receive (mostly fanmail addressed to my luxurious beard) and see if I can’t help some random lost soul who is dealing with a programmatic problem, trapped in a metaphorical bog like that horse from “The NeverEnding Story” in what had to be one of the most traumatic movie scenes from my childhood.

Aside: The name of the horse in that movie was "Artax", not to be confused with the Tolkienian "Lord of All Horses", Shadowfax. I don't want to brag, but I met the horse that played "Shadowfax". I asked if he was going to reprise his role in future Tolkien-inspired movies; he replied in the negative.

Today’s question is from Elizabeth:

I need to pull some data from an Excel spreadsheet programatically. Could you please help me understand a good way to do this? What language should I choose? Are there frameworks out there to help me?

I found Elizabeth on Jabber and we had the following dialog:

[M] Hello, Elizabeth. Can I call you Liz?

[E] No, you certainly may not.

[M] That's great. So, Liz, I hear you have an issue with reading Excel data. Would you care to elaborate?

[L] Well, I guess that I just... wait, is that an 'L'? My name is Elizabeth, not "Liz".

[M] Ignore that. Let's talk about Excel!

[L] Um, I have an Excel document and I want to programmatically skim data out of it and put it in a database or something so I can look at historical trends, maybe even write up a web-based application.

[M] That's a terrible idea. You should totally just write some VBA and put a graph in your spreadsheet.

[L] I'm not sure the Excel spreadsheet will be able to manage the amount of historical data I want to deal with.

[M] Sure it can. It's Excel. Now, open up the macro editor...

[L] How about I just Google "C# parse excel"? Look, the second result talks about using the spreadsheet file as an OLE DB data source. Can't I just use that?

[M] Yeah, if you're an idiot. If you want managed code and not VBA I highly recommend Apache's POI project, which parses MS Office files in the relative comfort of a JVM.

[L] Java? But look, right here there's sample code:

var provider = "Microsoft.ACE.OLEDB.12.0";
var filename = "MySpreadsheet.xls";
var properties = "Excel 12.0";
var connectionString = string.Format("Provider={0};Data Source={1};Extended Properties={2};",
    provider,
    filename,
    properties);

using (var adapter = new OleDbDataAdapter("SELECT * FROM [Page1$]", connectionString))
{
   var ds = new DataSet();

   adapter.Fill(ds, "myTable");

   DataTable data = ds.Tables["myTable"];

   foreach (DataRow row in data.Rows)
   {
      var cellData = row[0].ToString();
      
      /**
       * Do something with the cell data
       **/
   }
}      

[M] Huh? That's not Java at all!

[L] And all I have to do is add a reference to System.Data.DataSetExtensions, which I installed via a download from Microsoft's site!

[M] I don't think you're going to want to mix VBA, Java, and C#. Your cause might be hopeless.

[L] I uh, I think I'm gonna go now?

[M] Well, Laurie, good luck. I hope you fix your MS Access problem.


Results

Poor programmer. Unfortunately some of these younger programmers get a bad idea in their heads and there’s nothing you can do about it. I’m sure she’ll be back after creating some abomination in Java while rewriting half of POI even though it’s right there and licensed pretty freely.

Tune in next time on Reader Mailbag when we start taking our ADHD medication.