Wednesday, April 15, 2009

VAERS Update...this time with feeling

The FDA (or is it the CDC? Or is it Health & Human Services? Can't really tell....) just updated its Vaccine Adverse Event Reporting System (VAERS) dataset (April 13, 2009).

What they haven't bothered to note is that they've also changed the structure of their dataset.

VAERS data used to come packaged as 2 CSV files--one called VAERSData.csv and one called VAERSVaccine.csv.

VAERSData.csv used to contain 20 slots per row for symptoms associated with each vaccine adverse event.

In the database world, one says that there is a 1-to-many relationship between a VAERS report and the Symptoms associated with that report.


And day 1 of any remedial Database management course will inform you that because of the 1-to-many relationship, you should separate the VAERS record from the Symptoms records and link each symptom back to its report via a foreign key.

The reason for doing this is that 20 slots for symptoms *may* seem like more than enough slots for any case that you'll ever come across, but at some point, some hypochondriac is going to slip in 21 symptoms and totally screw with your file.

Well, 19 years later, the VAERS folks have finally given the Symptom data its own csv file, appropriately titled VAERSSymptoms.csv. They didn't bother to tell anyone that they did this...and I had to tinker with my parsing algorithm last night to adjust for the changes, but the discerning pharmacovigilantes among us were able to figure it out when we asked ourselves

"Why are there 3 csv files per year now when there used to be 2? And why haven't they updated their file explaining the structure of the VAERS files?" As of this writing, the most recent revision of the VAERS explanation is June 2007.

Ah yes...lest I forget...as of this writing, the Zip file containing all of 2008's VAERS data is contaminated with 2009 data. I think that's just to keep everyone on their toes.