Windows Phone: performance of parsing different file types
Mood: chipper
Posted on 2014-01-09 22:26:00
Tags: windowsphone projects wpdev
Words: 1033

When I started to work on Baseball Odds I knew I was going to have to worry about performance - the data set I have for the win probability has right around 15000 records. So I thought it would be neat to compare different file formats and how long it took to read their data in. Each record had the inning number (with top or bottom), how many outs, what runners are on base, the score difference, and the number of situations and the number of times the current team won. Here's a brief description of each format and some sample code:


Text:
This was actually the format I already had the data in, as it matched Phil Birnbaum's data file format. A sample line looks like this:

"H",1,0,1,0,81186,47975
and there are 15000 lines in the file. The code to parse this looks something like this:

const bool USE_AWAIT = false;
const bool CONFIGURE_AWAIT = false;
var resource = System.Windows.Application.GetResourceStream(
new Uri(@"Data\winProbs.txt", UriKind.Relative));
using (StreamReader sr = new StreamReader(resource.Stream))
{
string line;
if (USE_AWAIT)
{
if (CONFIGURE_AWAIT)
{
line = await sr.ReadLineAsync().ConfigureAwait(false);
}
else
{
line = await sr.ReadLineAsync();
}
}
else
{
line = sr.ReadLine();
}
while (line != null)
{
var parts = line.Split(',');
bool isHome = (parts[0] == "\"H\"");
_fullData.Add(new Tuple<bool, byte, byte, byte, sbyte>(
isHome, byte.Parse(parts[1]), byte.Parse(parts[2]), byte.Parse(parts[3]),
sByte.Parse(parts[4])),
new Tuple<UInt32, UInt32>(UInt32.Parse(parts[5]), UInt32.Parse(parts[6])));

if (USE_AWAIT)
{
if (CONFIGURE_AWAIT)
{
line = await sr.ReadLineAsync().ConfigureAwait(false);
}
else
{
line = await sr.ReadLineAsync();
}
}
else
{
line = sr.ReadLine();
}
}
}


(what are USE_AWAIT and CONFIGURE_AWAIT all about? See the results below...)


JSON:

To avoid having to write my own parsing code, I decided to write the data in a JSON format and use Json.NET to parse it. One line of the data file looks like this:
{isHome:1,inning:1,outs:0,baserunners:1,runDiff:0,numSituations:81186,numWins:47975}

This is admittedly a bit verbose, and it makes the file over a megabyte. The parsing code is simple, though:

var resource = System.Windows.Application.GetResourceStream(
new Uri(@"Data\winProbs.json", UriKind.Relative));
using (StreamReader sr = new StreamReader(resource.Stream))
{
string allDataString = await sr.ReadToEndAsync();
JArray allDataArray = JArray.Parse(allDataString);
for (int i = 0; I < allDataArray.Count; ++i)
{
JObject dataObj = (JObject)(allDataArray[i]);
_fullData.Add(new Tuple<bool, byte, byte, byte, sbyte>(
(int)dataObj["isHome"] == 1, (byte)dataObj["inning"],
(byte)dataObj["outs"], (byte)dataObj["baserunners"], (sbyte)dataObj["runDiff"]),
new Tuple<UInt32, UInt32>((UInt32)dataObj["numSituations"],
(UInt32)dataObj["numWins"]));
}
}


After I posted this, Martin Suchan pointed out that using JsonConvert might be faster, and even wrote some code to try it out.

Binary:

To try to get the file to be as small as possible (which I suspected correlated with parsing time), I converted the file to a custom binary format. Here's my textual description of the format:
UInt32 = total num records
UInt32 = num of records that have UInt32 for num situations
(these come first)
each record is:
UInt8 = high bit = visitor=0, home=1
rest is inning (1-26)
UInt8 = high 2 bits = num outs (0-2)
rest is baserunners (1-8)
Int8 = score diff (-26 to 27)
UInt32/UInt16 = num situations
UInt16 = num of wins

To format the file this way, I had to write a Windows 8 app that read in the text file and wrote out the binary version using a BinaryWriter with the Write(Byte), etc. methods. Here's the parsing code:

var resource = System.Windows.Application.GetResourceStream(
new Uri([@"Data\winProbs.bin", UriKind.Relative));
using (var br = new System.IO.BinaryReader(resource.Stream))
{
UInt32 totalRecords = br.ReadUInt32();
UInt32 recordsWithUInt32 = br.ReadUInt32();
for (UInt32 i = 0; i < totalRecords; ++i)
{
byte inning = br.ReadByte();
byte outsRunners = br.ReadByte();
sbyte scoreDiff = br.ReadSByte();
UInt32 numSituations = (i < recordsWithUInt32) ? br.ReadUInt32() : br.ReadUInt16();
UInt16 numWins = br.ReadUInt16();
_compressedData.Add(new Tuple<byte, byte, sbyte>(inning, outsRunners, scoreDiff),
new Tuple<uint, ushort>(numSituations, numWins));
}
}



Results:

Without further ado, here are the file sizes and how long the files took to read and parse (running on my Lumia 1020):








TypeFile sizeTime to parse
Text (USE_AWAIT=true)
(CONFIGURE_AWAIT=false)
278K4.8 secs
Text (USE_AWAIT=true)
(CONFIGURE_AWAIT=true)
278K0.4 secs
Text (USE_AWAIT=false)278K0.4 secs
JSON (parsing one at a time)1200KB3.2 secs
JSON (using JsonConvert)1200KB1.3 secs
Binary103KB0.15 secs


A few observations:

So since I had already done all the work I went with the binary format, and Baseball Odds starts up lickety-split!

--

See all my Windows Phone development posts.

I'm planning on writing more posts about Windows Phone development - what would you like to hear about? Reply here, on twitter at @gregstoll, or by email at ext-greg.stoll@nokia.com.

--

Interested in developing for Windows Phone? I'm the Nokia Developer Ambassador for Austin - drop me a line at ext-greg.stoll@nokia.com!


This backup was done by LJBackup.