Wednesday, May 13, 2009

FDA AERS Data & Quality Control

FDA just released its Q4 2008 AERS data (yes, it’s May 13, but that’s another story altogether).

This post is about Quality Control of AERS data. It seems that every quarter, there’s some type of SNAFU with the data release (last year they released AERS data partially contaminated with a previous quarter's data).

This quarter, we have the classic newline-characters-where-they-don’t-belong error that's screwing up my AERS parser.

A little Background:
FDA releases its data in 2 forms (ASCII and SGML). ASCII is the one that I use and each ASCII file consists of row after row after row of $-delimited Adverse Event Records.

2 sample rows might look something like this:
$12345$abcdef$somestuff here$blah$more blah$blah $34321$blahblah$doscum$etcetc$vixerunt$gaius$cicero


Each row should represent one particular database record and my parser dutifully goes through each row extracting all the little bits of information between the dollar $ign$.

But with this latest quarterly release FDA released its Drug data file (aka DRUG08Q4.txt) with 4 significant quality control errors (see sample screenshot below).

[
For those who want gruesome details, the following lines in the DRUG.txt file contain errors: 537-538, 258909-258910, 281285-281286, 408948]

The gist of the issue is that whoever entered the data for these 4 drug-records forgot to remove the newline characters (“carriage returns”) and so the record is actually split across 2 or more lines.

While this doesn’t seem like a big deal, if your parser isn’t “smart” it could inadvertently stuff the wrong data into the wrong slots in your database.

And so, you have to design your parser to look for these types of errors--and then you have to have a human look at the problem just to assure yourself that there wasn’t a bigger error. This wastes time...especially when the file you’re looking at has 416,000 records.

‘t would be nicer if FDA did more quality control on their data releases.

Tuesday, May 12, 2009

FAERS - AERS, but Fancier

Someone at FDA sent this to me recently (see below). It's an email sent around from within the FDA announcing a new Adverse Event Reporting system (sort of).

Emphases are mine. Editorial comments are mine, too.

Spoiler alert
: They forked over $$$$$ and went with Oracle.

Bioinformatics Board (BiB) News

Post Market Safety BRB Announces Selection of Vendor[We've decided to shop at Barney's. Can't tell you what we're buying just yet, but it's gonna be FAB!] for New Enterprise Adverse Event System

The Bioinformatics Board (BIB) News provides FDA employees with an update on the activities and progress of the BiB. This month’s news focuses on the Post Market Safety BRB and the selection of the Adverse Event System Vendor. For more information about the BiB or the contents of this message, please contact your BiB representative or email the Bioinformatics Board, bioinformatics@fda.hhs.gov.

Post Market Safety Business Review Board Announces Selection of Vendor for New Enterprise Adverse Event System FDA’s Bioinformatics Board and the Post Market Safety Business Review Board are pleased to announce a significant milestone for the Agency. After an extensive analysis of FDA scientific needs for the management and analysis of post market product safety reports, and an evaluation of leading industry tools, the FDA Adverse Event Reporting System (FAERS) Program has selected Oracle AERS[$$$] (Adverse Event Reporting System) as the new tool for FDA staff[guess who's not getting access to the data juuuust yet...?] to efficiently[you wouldn't believe what we were doing before] analyze post market safety reports in order to identify potential product safety problems.

Selection of the new FAERS tool represents a significant landmark for the Agency. With the selection of an advanced tool that all centers and offices can leverage, the Agency can now focus on implementing processes for sharing post market safety data across our product centers as well as advancing the science of post market report analysis. Achievement of this milestone represents the efforts of numerous individuals from across the Agency over several years[wow. just...wow]. While in many ways this is just a beginning, it is important to recognize our progress towards meeting the goal of providing modern tools necessary to address many post-market surveillance needs. Selection of Oracle AERS[What? no love for MySQL? They're about to be subsumed by Oracle, too, ya know.] positions the Agency for success in this complex and dynamic arena.

We expect the initial users in CDER and CBER to receive training and begin using the new system as early as fall 2009[but don't quote us on that] with additional users being trained and brought on over time. CDRH users are expected to begin using the tool in 2010, and additional centers and offices expected shortly thereafter.


Tuesdays are a busy day @ FDA


3 non-colinear points define a plane...and 3 Tuesdays in a row is enough for me to jump to 2 conclusions, but I think the first conclusion is correct and the 2nd conclusion has a good chance of being right, too.

Conclusion 1. The FDA updates its warning letters database by hand every Tuesday morning.

Why do I think so?
I have an automated ‘bot’ that fetches new FDA warning letters from FDA.gov daily, but recently (when I bothered paying attention to it) I noticed that most days it doesn’t retrieve any new warning letters.

Then I started paying attention, and for the past 3 weeks, my bot only fetches new letters on Tuesday mornings (EST).

Not Monday night (I checked).

Not Tuesday at 6 AM (I checked).

Only Tuesday mornings between ~9 and ~11AM.

So, this smells like a human who has a Tuesday morning to-do list. Task # 1? Push out last week’s warning letters.

Why this can’t be automated on a daily basis? Beats me.


Conclusion 2. Updating of clinical trial data @ clinicaltrials.gov is done by the same person/entity/thing at FDA. This conclusion is far more tenuous, but hear me out.

(admittedly weak) Reasoning?
Every day, I have another bot that fetches new clinical trials data.

Every day, the bot finds updated data around 10AM--Except on Tuesdays when the new data show up around 11:30AM.

Tuesday’s task # 2? Push out new clinical trial data.