Initial upload into FDAble looks like everything was okay.
Interestingly, there are ~111,000 reports in this release compared to ~121,000 for the previous quarter. Not sure how significant this decrease is (is it seasonal? is it just noise? is FDA weeding out duplicates?). Will take closer look later....
Got this from "Webmail (L)" today. Love the personal touch. If I had to guess, some contractor who is now long gone implemented the actual warning letters search engine and now they have to either get that person back there to fix it or try and untangle someone else's spaghetti code themselves. Just a guess, but probably not fun for them either way.
Mr. Danese,
Thank you for your feedback. Our technical team is working hard to resolve the remaining issues. Thank you for sending us emails about the problems you encountered. We expect them to be resolved very soon. Please don't hesitate to contact us when you have a question, suggestion or any issues with our site. We are constantly working to improve the site and appreciate your feedback.
WASHINGTON -- The Food and Drug Administration isn't able to reliably determine how much money it needs to regulate medical products because, among other things, its staff can't track all the adverse-event reports it handles, according to the Government Accountability Office.
I just emailed the FDA asking them for an update regarding their warning letters search engine.
From what I can determine, they have fixed the issue of certain missing warning letters. However, 2 other significant issues (at least) remain. 1. the date filter is still malfunctioning (see previous post here) and the excel document dump is still outputting html (see previous post here).
Our correspondence to Nature Biotech regarding AERS data came out yesterday. I can't post the article due to copyright restrictions, but I'm sure you can pick up a copy at your local newsstand.
FDA has fixed the "beef northwest" issue described in yesterday's post (i.e. if you search for warning letters for "beef northwest" the search engine now returns 1 result (click here for same link as yesterday, but now with correct result).
I don't yet know whether all of the missing warning letters have been restored, but it's a start.
I’ve written a couple of posts on FDA Warning Letters (here and here), but today’s post seems particularly important.
The FDA’s Warning Letter Search Engine is Seriously Flawed. There are at least 2 things that are wrong (in addition to the flaws I outlined earlier).
Certain warning letters that were in the old database have vanished.
The warning letters that are returned when searching by date are often inaccurate.
Allow me to elaborate.
Certain warning letters that were in the old database have vanished If you use the FDA’s Warning Letter Search engine to search for “beef northwest”, you get 0 (zero) results [update: fda has fixed this error--see here]. But there is a warning letter issued to Beef Northwest Feeders LLC issued on August 21, 2007 (see here for the letter). You can also search the FDA’s Warning Letters by Company Name and the record does not show up.[update: fda has fixed this error--see here]
By my estimate there are almost 2,000 missing warning letters (I wrote a small bot that systematically went through the current FDA search engine and recorded warning letter results issued every day from January 1, 1996 to the present day and it returned ~7,700 warning letters, whereas the FDAble Warning Letter Database, made from the FDA’s old search engine, contains ~ 9500 letters).
Warning letters returned when searching by date are often inaccurate If you use the FDA warning letter search engine to search for all warning letters issued from 11/1/1996 to 10/31/1997 you get 152 results.
If you expand your search by 1 day (11/1/96 - 11/1/97) you get 876 results. Here’s a hint: The FDA did not issue 724 warning letters on 11/1/97- it was a Saturday.
I don’t know why the first search yields only 152 results, but it’s clearly wrong, and to be honest, given the errors throughout I’m not confident in the 876 results returned in the 2nd search-The FDAble database says there were 1,008 warning letters issued from 11/1/96 and 11/1/97.
To summarize all 3 posts we have the following problems with FDA Warning Letters Search Engine.
Certain warning letters that were in the old database have vanished.
The # of warning letters returned when searching by date is often inaccurate.
Warning Letter Responses also appear to be missing from the new database.
Downloadable Results are presented as Excel files, but are actually HTML.
Warning Letter results return a maximum of 1,000 records, but this limit is not explicitly noted on the web-site.
I have an email in to FDA asking them to repair and notify others who may have been led astray.
This is only tangentially related to health informatics...unless you feel that public display of your credit card # is dangerous to your financial health.
Today, I used the FAX machine at my local public library. The FAX machine is run by a company called FAX24, and the instructions are pretty standard.
pick up the phone on the FAX machine
Dial *3
Listen to the instructions
Enter your credit card number on the keypad
Enter your credit card expiration date on the keypad
Enter the destination FAX #
Add your sheets and press START
Works like a champ, and at the end the machine releases a small confirmation printout to tell you whether your transaction was OK or whether it failed.
Today was the first time I really looked at the printout.
There's my Credit Card # and expiration date prominently displayed.
I wonder how many people toss this confirmation printout into the trash on their way out of the library.
Is it me or does everyone think this is a major no-no?
But what happens when 2 people request results within the same second? This will probably never happen, but it's a bad idea to dynamically name files like this.
I posted earlier about creating a plugin for Mozilla's Firefox that allowed users to search the FDA and CDCwebsites and FDAble's search engines by right-clicking on a highlighted word.
Turns out that you can submit a plugin, but it's considered experimental until you've received reviews.
You also have to write a short justification of why your plugin is worthy of release to the public at large.
1. download the plugin. 2. if you're on a web-page that contains a drug name or other health related term, highlight the term 3. right-click on the highlighted term and choose whether you'd like to use the term to search, the FDA, CDC or FDAble search engines.
FDA used to have a collection of web-pages that allowed you to search the warning letters and responses that it had issued to various food & drug scofflaws and ne’er-do-wells all the way back to 1996.
There were certainly some strange choices made with the old system that they used.For one thing, they separated the “old” warning letters (those > 1 year old) from the new ones (<= 1 year old) and you had to use a separate search engine for each collection of reports.
With the new search engine, they’ve combined old and new so that both can be searched from one form. However, this appears to be the only thing that they got right with the upgrade.
Another peculiarity was that if you used the old system to download an Excel table of warning letters filtered by date, you got a CSV file that was mistakenly tagged with an .xls extension.This transgression is no big deal as CSV will be read easily by Excel even if it’s mis-tagged, but whoever built the new version seems to have taken the mislabeling one step further (see below). If you dig deeper into the web-pages, you find all sorts of weirdness.
First, your searches are capped at 1,000 results no matter how big the true size of results.The search-form doesn’t say that it will only return the first 1,000 results, but it does.And this initially led to confusion on my part because I was trying to see if the system would retrieve all 9,000+ warning letters that should be in the system.It only returned 1,000.
This is a bit dangerous b/c if a user searches for all warning letters from 1996 to 2009 s/he may mistakenly conclude that there have only been about 1,000 reports issued.What’s the deal?I seriously doubt they’re low on computing power.
The same holds true if you try to download an Excel table of the warning letters (you only get 1,000 results) no matter what you try.
And here’s the really strange bit.Remember how I said that the old system delivered a CSV file that was mis-labeled as an xls file?
Well, the *new* system again lets you download what is ostensibly an Excel file, but it’s not an xls file.It’s also not a CSV file like the old system.And no, it’s not one of those new-fangled Microsoft Office 2007 xlsx files.It’s a file marked with an xls extension, but if you open it up with notepad, you’ll find that it’s HTML !
Specifically, they’ve packaged the HTML table that is returned when a user searches their web-interface for warning letters and passed it off as Excel.Why?I have no explanation, except sheer laziness.
Finally, this section of the FDA’s website is titled “Warning Letters and Responses”[emphasis mine] and there used to be a way to search the responses to the warning letters…and the downloadable 'csv' file would list the location of letters received by the FDA in response to their warning salvos.
Also, they moved the URLs for all of the html versions of their warning letters, thereby breaking all of the fdable warning letter links.It’s not like the FDA is legally obligated to inform me of these changes, but when they do stuff like this they end up breaking the links for anyone/everyone who has ever bothered to link to their warning letter data. (time for me to get back to work…).
The comment about Bristol still stands. The rest is now vanilla.
See for yourselves.
#Added for Bristol-Myers on Sept 2005 User-agent: vspider Disallow: /
#For all other crawlers User-agent: * Disallow: /Management/ # don't crawl healthcheck Hit-rate: 30 # wait 30 seconds before starting a new URL request default=30 Visiting-hours: 23:00EDT-05:00EDT #index this site between 11PM - 5AM EDT Concurrent-hits: 2 # limit concurrent active URLS to 2 for each index server
5 of the 9 feeds are actually useful (Consumer Health Info, MedWatch Safety Alerts, News Releases, Recalls, What's New @ CDRH).
The other 4 are borderline pointless. It's not that the topics are pointless. Hell, I'd be interested in actually getting a glimpse as to what's new @ CDER & CBER. But take a look at the Feeds. Enforcement Reports is simply a running list of when the FDA releases its weekly enforcement report, but there's no summary information provided in the feed itself. The content of each weekly report should be a feed itself.
FDA has a database of postmarket requirements and study commitments (translation: "we're going to approve your drugs, but you have to promise to run these additional trials after we approve").
This database is updated quarterly and FDA even provides its own search engine to mine this database.
I've only been indexing their collection of data for a couple of quarters, but with the most recent update (April 30), I noticed something that I hadn't realized before.
Specifically, each new quarter's data *wipes out* the previous quarter's data. For example, the January 31 update contained 3381 postmarket requirements that could be searched via FDA.
In contrast, the April 30 data release contains only 1910 requirements. So, it looks like the FDA removes commitments that have been satisfied (I'm not 100% sure about the precise commitments that are removed each quarter, but this seems like the most reasonable explanation).
In practice, this means that you can't use FDA's search-engine to get a historical look at the types of commitments/requirements that were mandated by the FDA.
I imagine that this historical information would be useful to many people (clinicians, persons performing competitive intelligence & strategy regarding drug development, etc.).
Can you guys give me a heads-up next time you do something like this (;-))? It's like when my mother-in-law comes over and moves all the drinking glasses to a new cabinet.
At first glance the change appears to be largely cosmetic, but it's a nice touch--what with the drop-shadows and all.
Well, if I can critique the FDA, I sure as hell better be able to critique myself--and I blew it big time.
Cardinal Sin?Hard-wiring potentially variable data into an application (especially when you have no control over when the data is changed).
As my assiduous readers know (all 3 of them), I developed an iPhone app called FDA Mobile News. It's a glorified RSS reader for FDA news feeds.
The app contains the URLs for each RSS feed hard-wired into the application itself, meaning that if the FDA ever bothered to change those URLs, my app would no longer work.
Well, by my calculation, FDA had not changed it's RSS URLs in over 1.5 years, but guess what happened 1 day after my app was approved and put on the app store? Yep, FDA changed the URLs for most of those feeds and got rid of some of them altogether.
I woke up on Saturday morning to find my app throwing more errors than a drunken Derek Jeter. Now, it will take a week (hopefully less) to get the new version out the door.
I knew I shouldn't have hard-wired those addresses into the app, but I figured "hey, what's the likelihood that the URLs will change before I get around to making each app 'phone-home' to a central repository where I can change the addresses at will?"
"How long would I wait to do it the 'right way'?", you ask? As my wife pointed out...I was probably going to get caught up in other work and let it slide until it *did* become a problem--which in my case was all of 36 hours.
Like everyone else on the planet looking to score 99 cents (70 cents after commission), I developed an iPhone app: FDA Mobile News.
Basically it's a glorified RSS reader devoted to FDA's RSS feeds and I charge a whopping $0.00 for it (I get 70% of that--so who's the fool, now?).
It still has some problems that need to be worked out (as my only reviewer pointed out--someone who goes by the name of "Saddam Hussein's Lost Car Keys" [WTF?] it isn't particularly clear that the globe icon that I use is actually a hyperlink to the full RSS article).
Well, the way the App Store is these days, I wouldn't be surprised if my parents were toiling away in their basement crafting some "newfangled iAARP contraption for that iTelephone store where everybody gets rich"--70 cents at a time.
Sometimes me conscience gets to me. And lately I feel like I've been bashing the FDA data releases too much.
So today, I reverse course and list 2 IT-related thingies pushed out by the FDA that I like (sort of).
I'll go back to bashing the FDA later this week.
Thing # 1. The Peanut Product Recall Widget. This is a small Flash box (widget) that anyone can place onto a web-site. You just place a small snippet of code in your HTML and you get access to what looks like *all* FDA-related product recalls (not just the peanuts).
Why then, do they call it the Peanut Product Recall Widget? I have no idea, but bad branding isn't a felony.
The styling of the widget lies somewhere between "design by Stevie Wonder" and "absolutely horrendous", but as mentioned above--not...a...felony.
Thing #2. the MedWatch widget. Same principle applies as with the Product Recall Widget. Small bit of code and you get 4 links to MedWatch information. The utility of this widget is a tad suspect, as all it does is provide a collection of 4 hyperlinks. The Product Recall Widget has a search function, thereby vaulting it to a higher echelon in the pantheon of widgets.
But there you have it. 2 examples of the FDA reaching out on the informatics front and doing a good job.
A little bit of navel gazing today, but I decided to plot the # of reports submitted to the AERS and VAERS systems vs. Time today. There are small bits of information in the graphs shown below that might warrant further examination.
e.g., The # of reports submitted to AERS has almost tripled from 1998 to 2008 (11 years).
Is this a big increase?
I can't really say with any certainty, but it feels like an inadequate increase given the concomitant increases in the use of information technology over that period (e.g., the interwebs).
[Aside: Wolfram Alpha is being of no use in trying to find the change in internet usage over the years to serve as a comparison---okay wikipedia is telling me that internet usage from 1997 to 2007 has gone up almost 6 fold in the developed world...so I think my feeling is probably correct.]
Also...what's going on with those dips in the # of reports in the early 2000s?
Note that 1997 only includes data for Q4 of that year so it's artificially low.
VAERS As far as VAERS is concerned, it looks mostly like a flat-line from 1991 to 2006 (okay it doubled, but that's over 16 years) and then there's a HUUUUGE spike over the past 2 years.
Is VAERS being "marketed" better all of a sudden, or is it The Jenny McCarthy/ Autism effect? Or both?
FDA just released its Q4 2008 AERS data (yes, it’s May 13, but that’s another story altogether).
This post is about Quality Control of AERSdata. It seems that every quarter, there’s some type of SNAFU with the data release (last year they released AERS data partially contaminated with a previous quarter's data).
This quarter, we have the classic newline-characters-where-they-don’t-belong error that's screwing up my AERS parser.
A little Background: FDA releases its data in 2 forms (ASCII and SGML). ASCII is the one that I use and each ASCII file consists of row after row after row of $-delimited Adverse Event Records.
Each row should represent one particular database record and my parser dutifully goes through each row extracting all the little bits of information between the dollar $ign$.
But with this latest quarterly release FDA released its Drug data file (aka DRUG08Q4.txt) with 4 significant quality control errors (see sample screenshot below). [For those who want gruesome details, the following lines in the DRUG.txt file contain errors: 537-538, 258909-258910, 281285-281286, 408948]
The gist of the issue is that whoever entered the data for these 4 drug-records forgot to remove the newline characters (“carriage returns”) and so the record is actually split across 2 or more lines.
While this doesn’t seem like a big deal, if your parser isn’t “smart” it could inadvertently stuff the wrong data into the wrong slots in your database.
And so, you have to design your parser to look for these types of errors--and then you have to have a human look at the problem just to assure yourself that there wasn’t a bigger error. This wastes time...especially when the file you’re looking at has 416,000 records.
‘t would be nicer if FDA did more quality control on their data releases.
Someone at FDA sent this to me recently (see below). It's an email sent around from within the FDA announcing a new Adverse Event Reporting system (sort of).
Emphases are mine. Editorial comments are mine, too. Spoiler alert: They forked over $$$$$ and went with Oracle.
Bioinformatics Board (BiB) News
Post Market Safety BRB Announces Selection of Vendor[We've decided to shop at Barney's. Can't tell you what we're buying just yet, but it's gonna be FAB!] for New Enterprise Adverse Event System
The Bioinformatics Board (BIB) News provides FDA employees with an update on the activities and progress of the BiB. This month’s news focuses on the Post Market Safety BRB and the selection of the Adverse Event System Vendor. For more information about the BiB or the contents of this message, please contact your BiB representative or email the Bioinformatics Board, bioinformatics@fda.hhs.gov.
Post Market Safety Business Review Board Announces Selection of Vendor for New Enterprise Adverse Event System FDA’s Bioinformatics Board and the Post Market Safety Business Review Board are pleased to announce a significant milestone for the Agency. After an extensive analysis of FDA scientific needs for the management and analysis of post market product safety reports, and an evaluation of leading industry tools, the FDA Adverse Event Reporting System (FAERS) Program has selected Oracle AERS[$$$] (Adverse Event Reporting System) as the new tool for FDA staff[guess who's not getting access to the data juuuust yet...?] to efficiently[you wouldn't believe what we were doing before] analyze post market safety reports in order to identify potential product safety problems.
Selection of the new FAERS tool represents a significant landmark for the Agency. With the selection of an advanced tool that all centers and offices can leverage, the Agency can now focus on implementing processes for sharing post market safety data across our product centers as well as advancing the science of post market report analysis. Achievement of this milestone represents the efforts of numerous individuals from across the Agency over several years[wow. just...wow]. While in many ways this is just a beginning, it is important to recognize our progress towards meeting the goal of providing modern tools necessary to address many post-market surveillance needs. Selection of Oracle AERS[What? no love for MySQL? They're about to be subsumed by Oracle, too, ya know.] positions the Agency for success in this complex and dynamic arena.
We expect the initial users in CDER and CBER to receive training and begin using the new system as early as fall 2009[but don't quote us on that] with additional users being trained and brought on over time. CDRH users are expected to begin using the tool in 2010, and additional centers and offices expected shortly thereafter.
3 non-colinear points define a plane...and 3 Tuesdays in a row is enough for me to jump to 2 conclusions, but I think the first conclusion is correct and the 2nd conclusion has a good chance of being right, too.
Conclusion 1. The FDA updates its warning letters database by hand every Tuesday morning.
Why do I think so? I have an automated ‘bot’ that fetches new FDA warning letters from FDA.gov daily, but recently (when I bothered paying attention to it) I noticed that most days it doesn’t retrieve any new warning letters.
Then I started paying attention, and for the past 3 weeks, my bot only fetches new letters on Tuesday mornings (EST).
Not Monday night (I checked).
Not Tuesday at 6 AM (I checked).
Only Tuesday mornings between ~9 and ~11AM.
So, this smells like a human who has a Tuesday morning to-do list. Task # 1? Push out last week’s warning letters.
Why this can’t be automated on a daily basis? Beats me.
Conclusion 2. Updating of clinical trial data @ clinicaltrials.gov is done by the same person/entity/thing at FDA. This conclusion is far more tenuous, but hear me out.
(admittedly weak) Reasoning? Every day, I have another bot that fetches new clinical trials data.
Every day, the bot finds updated data around 10AM--Except on Tuesdays when the new data show up around 11:30AM.
Tuesday’s task # 2? Push out new clinical trial data.
Here's something you won't see every day...it's the FDA's robots.txt file.
For the uninitiated, robots.txt is a small file placed on a web-site to indicate which pages on your website can be crawled by indexing robots (Googlebot, Yahoo! Slurp, etc.). It basically says "Hey Googlebot, you can index these pages, but stay away from those over there."
Here's the FDA's robots.txt file it is in its entirety--I've coloredthe parts that intrigue me.
#robots.txt file for http://www.fda.gov
#Added for Bristol-Myers on Sept 2005 User-agent: vspider Disallow: /
#For all other crawlers User-agent: * Disallow: /scripts/ Disallow: /data/ Disallow: /binn/ Disallow: /cder/test/ Disallow: /opacom/area51/ Disallow: /oashi/aids/listserv/ Disallow: /cdrh/ftparea/cdrh/MDR/coll/mdr/mdrcoll/ Disallow: /foi/warning_letters/d1371b.pdf Disallow: /foi/warning_letters/archive/ Hit-rate: 30 # wait 30 seconds before starting a new URL request default=30 Visiting-hours: 23:00EDT-05:00EDT #index this site between 11PM - 5AM EDT Concurrent-hits: 2 # limit concurrent active URLS to 2 for each index server
1. What's the deal with Bristol Myers' request to ban vspider? And why did the FDA comply with the request? From what I can tell, vspider is a personal indexing robot that can be used by anyone to index a site. Curious in CT.
2. What's going on in area51 and why can't it be indexed? I tried to look at the contents and got a "denied" error...so perhaps it holds the medical records for the little green men in Nevada.
3. Why block indexing of one specific warning letter (d1371b.pdf)? If you try to go to fda.gov/foi/warning_letters/d1371b.pdf you get a 404 (not found) error, but I have a copy from my own search engine. It's a pretty vanilla warning letter from 1998 sent to Trinity Chemical Corporation. Again, I'd love to hear the rationale behind this decision. 4. Why block indexing of the archived warning letters?
Just tried to publish my Firefox add-on, titled FDA Search 0.1.0. What I didn't realize is that Firefox keeps your add-on in purgatory until some FF regulars actually review it, and only then can you nominate your add-on for inclusion in the public repository. Thanks to James Cook and his add-on, which I used as a template for mine.
The FDA (or is it the CDC? Or is it Health & Human Services? Can't really tell....) just updated its Vaccine Adverse Event Reporting System (VAERS) dataset (April 13, 2009).
What they haven't bothered to note is that they've also changed the structure of their dataset.
VAERS data used to come packaged as 2 CSV files--one called VAERSData.csv and one called VAERSVaccine.csv.
VAERSData.csv used to contain 20 slots per row for symptoms associated with each vaccine adverse event.
In the database world, one says that there is a 1-to-many relationship between a VAERS report and the Symptoms associated with that report.
And day 1 of any remedial Database management course will inform you that because of the 1-to-many relationship, you should separate the VAERS record from the Symptoms records and link each symptom back to its report via a foreign key.
The reason for doing this is that 20 slots for symptoms *may* seem like more than enough slots for any case that you'll ever come across, but at some point, some hypochondriac is going to slip in 21 symptoms and totally screw with your file.
Well, 19 years later, the VAERS folks have finally given the Symptom data its own csv file, appropriately titled VAERSSymptoms.csv. They didn't bother to tell anyone that they did this...and I had to tinker with my parsing algorithm last night to adjust for the changes, but the discerning pharmacovigilantes among us were able to figure it out when we asked ourselves
Ah yes...lest I forget...as of this writing, the Zip file containing all of 2008's VAERS data is contaminated with 2009 data. I think that's just to keep everyone on their toes.