Research has found that up to 90 percent of all spreadsheets have errors that affect their results. I’ve written about the Harvard professors whose basic Excel mistakes led to untold misery for millions of people.

If you search for spreadsheet errors, you’ll get lots of stories, other than the Harvard fiasco.

There are many more, but you get the picture. Errors are so common that there’s a 15-year-old spreadsheet risks interest group.

The dangers

But a recent paper, A Pilot Study Exploring Spreadsheet Risk in Scientific Research, focuses on the risks in neuroscience:

. . . research shows that almost all spreadsheet modellers have no formal training. . . . [M]ost errors do not arise from mistakes in programming the spreadsheet, rather they arise from the misapplication of programming logic. . . .[O]nce committed, logic errors are difficult to find and correct.

The pilot study has a small sample size — it’s a pilot, after all — and looks at neuroscience research at one university, yet the conditions it found are common.

Guardrails, not straightjackets

Even software produced by professionals is buggy. So no surprise when spreadsheet logic created by untrained amateurs is worse.

The question is: What can be done so that casual users — and Harvard professors — will make fewer mistakes?

The problem resembles grammar checking. Language has few hard rules and usage varies widely, so grammar checkers are imprecise, offering suggestions, not answers. Spell checkers are much more precise because spelling is well-defined.

No word from Microsoft

I reached out to Microsoft to get their thoughts on making Excel safer. After an initial acknowledgement and some emails back and forth, nothing.

I’m sure the Excel product manager knows about these problems. If I were her, I’d go to Microsoft Research and try to shake loose some ideas.

Until that happy day, here’s some ideas I shook loose myself.

Data tagging would make a spreadsheet logic checker easier. Just as we specify different data types — dollars, dates, percents — the spreadsheet could optionally ask us to tag data sets, whatever their type, so that dataset range checking could be smarter. If data is important enough to type, it is already tagged, and important enough to remind users to include in formulas.

Dataset range checking seems like a feasible help. A grammar-check like flag — “It appears that some data points were omitted in this calculation” — would have helped.

Bounds checking is an error detection technique used in programming to ensure, for example, that an element in the array is present. In the Excel context, I’d use it in the programming sense, and the reverse, as a check to ensure that all tagged/typed data is included, and, if not, a friendly warning is issued.

Logic flowcharting is another programming tool that Excel could at least partially automate. Users would see a graphical representation of their spreadsheet logic.

Modeling languages typically enforce a separation between logic and data. Users create logic and then designate the data sets it operates on. This simplifies model checking. Excel could implement a form of this, suggesting separating logic and data once a spreadsheet crosses a complexity threshold.

The Storage Bits take

Excel dominates the spreadsheet market, so Microsoft’s attitude is understandable: Why spend money making it safer for users and the rest of society? It won’t expand the market.

When you’ve made a product that even Harvard professors can easily misuse — with disastrous results for millions of people — is maximizing profit your only responsibility?

Of course, Microsoft might reply, if spreadsheet errors are such a big problem, why don’t we hear about more of them? One reason is that since few people look for them, few are found. It takes a truly egregious error, such as the Harvard fiasco — which a smart grad student at a state school found, not anyone at Harvard — to get publicity.

The important errors are buried deep inside research papers, lab results, and economic models. These form the basis of our numerical knowledge of the real world. When that knowledge gives us unexpected results, who goes back to the spreadsheets to check the data?

Yes, users should get trained on whatever spreadsheet they use. But with Excel’s market dominance, Microsoft has a special responsibility to improve Excel’s usability and safety for everyday users.

Courteous comments welcome, of course.



Source link