Medicine has a long history of bad ideas. Prior to the advent of antibiotics in the early 20th century, your town doctor was probably no better than your grandmother at medicine. Just about every ailment was treated with morphine. We had bloodletting. There were mysterious elixirs with cocaine in them. When you were short of breath, your doctor would hand you a cigarette to help you relax. Your local barber moonlighted as the town surgeon. It was a mess. We’ve come a long way.
If you were born between 1945-1965 there was an 80% chance you had a tonsillectomy. Surgeon’s and ENT’s around the country had a steady revenue stream preventing children all across the US from recurrent pharyngitis. Or so they thought. Turns out it really didn’t help those kids. They continue to get infections. The surgery was unnecessary in most cases. So, it lost its mainstream appeal.
Two years ago, I came across a small study out of Australia that divided patients with acute appendicitis to either surgery or conservative management with antibiotics. The group assigned to antibiotics was very ill due to other medical conditions. They were high risk for surgery. The idea was to avoid unnecessary surgery since it was dangerous in this group of patients. The study showed that antibiotics in the high-risk group worked fairly well.
However, it was a small study with very strict inclusion and exclusion criteria. It was not really applicable to a broad population. Well, some researchers decided to take it a step forward and see if this strategy would work in a broad population of patients with uncomplicated acute appendicitis.
Here’s how they set it up.
This was a randomized, open-label, non-inferiority study. Antibiotics compared to surgery are potentially safer, less burdensome, and less expensive for patients; therefore, the study is eligible for non-inferiority. It is open label because it’s very difficult to conceal surgery. Sham surgical procedures can be harmful and ineffective at dismaying the placebo effect. So, in the study everyone was aware which group the participants were in.
The primary endpoint was a change in pre-validated quality of life questionnaire that was assessed on day 7, 14, and day 30, with the 30 day score as the primary outcome. A subset analysis was performed on patients with an appendicolith. An appendicolith is a large calcium deposit within the appendix. Patients with appendicoliths have a higher complication rate from acute appendicitis.
Patients were eligible if they were adults, English or Spanish speaking, with evidence of acute appendicitis on CT scan.
There was a long list of exclusion criteria: 1) septic shock, 2) diffuse peritonitis, 3) recurrent appendicitis, 4) severe phlegmon, 5) walled off abscess, 6) if a more extensive surgery than appendectomy was recommended, 7) free air, 8) free fluid, and 9) evidence suggestive of neoplasm.
The non-inferiority margin was +/- 0.09 on the EQ-5D health status score. The study was powered high enough to detect a 0.05 difference in score. The EQ-5D looks like a thermometer on a page. A score of 1 is perfect health. A score of 0 would be equivalent to dead. The patient puts a number between 0-1 on their current health status. You can think of it like a percentage.
Here’s what they found.
A total of 776 patients participated in each group of the study. An appendicolith was found in 27% of participants.
96% of the appendectomies were performed laparoscopically.
In the intention to treat analysis. The mean difference in the 30 day score was 0.01 ([CI] -0.001 to 0.03 ).
In the per protocol analysis the difference was the same.
(In the intention to treat analysis patients are analyzed in the group they are assigned based on randomization. In the per protocol analysis patients are analyzed based on the treatment they eventually received. In this case an appendectomy. Intention to treat is important because it preserves the randomization of the groups. Randomization equalizes prognosis at the beginning of the study, which is essential in terms of fairness between the groups. When participants are analyzed by a per-protocol analysis, randomization no longer exists introducing bias. They did this because so many participants eventually got an appendectomy.)
29% of participants in the antibiotics group ended up having an appendectomy within 30 days of enrollment.
The 90-day incidence of appendectomy was 41% in the appendicolith subgroup and 25% in the non-appendicolith subgroup.
24% of patients in the antibiotic group required rehospitalization after initial discharge.
There were no deaths. However, serious adverse events were more common in the antibiotic group and closely correlated with the presence of an appendicolith.
Conclusions
I found this study very difficult to appraise for a number of reasons. The authors concluded that antibiotics were non-inferior to appendectomy based on the population they chose to investigate. Based on their results this is a true statement, but it is it “really” true? Does it tell the full story? Or is it a product of study design rather than reality?
Even though all the participants ended up in the same spot on the quality-of-life scale, they took different routes to get there. This was evident by 29% of those assigned to the antibiotics group eventually needing an appendectomy, and 24% requiring repeat hospitalization after discharge. The illustration below demonstrates my point. The result captures the end, but the experience is quite different.
Appendicitis rarely results in a chronic disease state. Most patients thirty days after the episode are the same as they were the day before the infection. This explains similar results in the intention to treat and per protocol analysis. Thirty days is too long. The storm has passed in both groups.
The real question is…what is the appropriate statistical approach for those participants in the antibiotic group that eventually underwent appendectomy? In this study, they simply imputed the final questionnaire score as the result. This reflects only the end, not the path. In my opinion, the need for an appendectomy is a failure, and the score should be scored as such on the spread sheet as a big fat 0. The treatment failed. That would be more reflective of the path. It is unlikely their result would have been above the non-inferiority threshold in this scenario.
Another option would be to take the lowest number on the scale recorded at 7,14, and 30 days, or the lowest daily number between day 7-30. This would better reflect the downside of each treatment. Non-inferiority is an attempt reduce the downside. Not increase the upside. Superiority is for the upside. If the downsides are the same, but 70% of patients avoid surgery (reduced burden, side effects, and cost), then there may be some validity to this approach. But we don’t know that based on this study. In fact, the down side seems worse for those given antibiotics because their disease state may have been unnecessarily prolonged.
The other issue has to do with the long list of exclusion criteria. It’s difficult to remember a few exclusion criteria in a clinical setting, increase that number to 8, “forget about it!” The antibiotics only approach does not appear to be a safe enough to be the default option.
This is my final take. The author’s statement is true. But it doesn’t tell the whole story. It’s likely the statement would no longer be true if you consider the need for appendectomy after antibiotics as a treatment failure (which it is). The entire goal was to avoid the need for appendectomies in uncomplicated cases. The result, in my opinion, is due to favorable data imputation, and a primary end point date beyond the typical timeframe to capture patient burden and complications. For now, it looks like the appendectomy is here to stay.
Comments