(Image courtesy of CalTech)
In the final chapter of the book Dr. Feynman describes an experiment by a psychologist in 1937 attempting to teach rats to go to a certain door within a maze. It didn't matter where he set them down in the maze, they always ran to the door that previously had food behind it. He thought it was the lighting that gave the previous door away, so he covered the habitat. No change. He thought it may be the color of the door, so he painted all the doors the same color. No change. He thought the rats could smell the food, so he covered the food in chemicals to abate the smell. No change. Every one of his assumptions was wrong.
Eventually he learned that the rats could tell where the previous door was by the sound the floor made when they ran over it. The experimenter covered the maze in sand. The rats were confused and no longer ran to the door that previously held the food.
This was a rigorous attempt to explain rat behavior. This simple experiment generated four different hypotheses. Many of his assumptions were proven wrong. The conditions had to be changed multiple times to get to the answer.
Dr. Feynman comments in the book, “We don’t teach this degree of rigor anymore. We just expect researchers to learn it by Osmosis.”
Psychologists run all kinds of experiments on rats as a surrogate to explain human behavior. B.F Skinner’s work on operant conditioning is probably the most famous. Dr. Feynman mentions in his book how rarely the work above is sited. The reason, it essentially undermines most evidence in experiments attempting to understand animal behavior.
You may be wondering why I’m talking about this and why Dr. Feynman concluded his book with this anecdote. Primarily due to the concern that modern science is drifting away from this type of rigor. There is a growing tendency to assume a theory is right instead of working to prove its wrong. Experiments are crafted to disprove a working hypothesis, not the other way around.
In general, it's more likely that you misjudged the natural world around you then discovered some truth about it.
Experiments that interrogate massive databases, utilize varying endpoints, surrogates, statistical correction for confounders, select populations, subgroup analyses, etc. are becoming more common place. They may reveal a result that is significant, but given their size and spin the result may be the equivalent of a rounding error. Endpoints like “events per million”, or events per 1000 person years are difficult to translate to the world around us. Yet these are regular occurrences in research papers. Despite their difficulty in translation these interventions quickly distribute in the natural world.
But a good appraiser of evidence understands that few things are certain. Most results come with varying degrees of certainty and relevance. This is as true in medicine as any other field. Statistics allow us to place the certainty of a result somewhere underneath a bell curve. We use p-values and confidence intervals to statistically express our degree of confidence, and most times it's just slightly better than breaking even. Therefore, much of high quality medicine is the blending of data interpretation and experience. However, the information age is making the profession far more analytical. The data we rely on is proliferating at an incredible rate, and losing some of its rigor.
As an example, I read an observational study the other day concerning the maternal effects of COVID-19 on unwanted pregnancy outcomes. The study concluded that there was a strong association between COVID-19 infection and unwanted pregnancy outcomes like cesarean section delivery, pre-term birth, and maternal death.
If you are like most doctors, you stop there. You really don’t go beyond the abstract. The fact is this study was terribly flawed. The data was extracted from a large database. Most positive cases were Hispanic, overweight, hypertensive, and lower socioeconomic status (all bad when it comes to COVID). The authors make very little comment on these disparities, other than “correction for individual confounders did not significantly change the result.” (Whatever their definition of significant is.) Since there is no great way to determine in the aggregate if these disparities played a role, I guess we should just take their word for it. Also, the data was extracted prior to the delta variant, and omicron. The statement from the authors, “We do not expect these variants to have a significant impact on our result.” Are you serious?
You’re probably thinking this was published in some obscure OB/Gyn journal. Nope, it was published in JAMA. JAMA!
It even made it to NEJM journal watch.
Still waiting on osmosis, I guess.
My daughter recently participated in a science fair at school. It was a requirement for her 4th grade class. I was excited for this project. Her and I were talking about her project one day in the kitchen. I was emptying the dishwasher and there were all these different water bottles. I asked her, “Nora, how do you know which water bottle is the best at keeping your water hot or cold?” She didn’t know, so we designed an experiment.
We discussed topics like variables, measurements, controls, data collection, graphic display of data, and limitations of our experiment. We ran the experiment, wrote down our results and conclusions, and placed on a cardboard display for the science fair. On the day of the science fair I asked her if she liked some of the other experiments. Come to find out there really weren’t any. Most of the presentations were constructions of some kind. Someone made a “water clock.” One kid made a catapult. I was quite surprised.
Apparently, there was a packet emailed to the parents about the different categories for the fair. I should have read it. Had I read the packet, I would’ve discovered there was little focus on experimental design. Most of the categories were around inventing or building something. There were awards for the environment, social inequality, teaching, mental health, etc. There was one award for scientific design. Which we won! Probably because we were the only team that designed an experiment.
Again, I guess we assume scientific rigor will be learned by osmosis.
Experimental trial and error are how discovery happens. It's where knowledge comes from. In my humble opinion, it’s probably a good thing to learn that at an early age. It sure is better than assuming you’re right all the time. The skills learned in the practice of the scientific method transcend far beyond science. Attorney’s, mathematicians, engineers, economists, architects, politicians, have to test and retest their theories until they are impenetrable. Teach them to develop arguments made from steel, not straw, the world will be better off.
Finally, I leave you with the final passage of the book, which I found to be an aspirational sendoff:
“So I have just one wish for you – the good luck to be somewhere where you are free to maintain the kind of integrity I have described, and where you do not feel forced by a need to maintain your position in the organization, or financial support, or so on, to lose your integrity. May you have that freedom.”