Thoughts on dark software

I wrote a two-page white paper for a DOE workshop on software productivity for extreme-scale science. In this paper, I coin a new term (at least I think it is new!): dark software. I explain this concept below:

Scientific discovery is the result not of individual simulations but of complex end-to-end research processes. These processes frequently involve, for example, the ingest and analysis of simulation, experimental, and observational data; the invocation of simulations within larger design optimization and uncertainty quantification activities; validation through comparison of experimental and simulation data; and the dissemination of output data to communities for analysis. The software created and used by extreme scale scientists must address all such tasks—and the productivity of those scientists will be determined by the sum of the times taken for all tasks.

But while the software used to perform simulations on extreme-scale computers is often carefully engineered, the software used for other tasks in the end-to-end workflow is typically not. Indeed, it often involves ad-hoc scripts, one off programs, and other non-scalable, non-shareable, and error-prone components. Scientists may not even think of this code as software, even though it consumes much time and energy. Thus, by analogy with dark matter in physics—the stuff that, while invisible, is hypothesized to account for a large part of the total mass in the universe—I term this code dark software . I believe that dark software accounts for a substantial fraction of the total “mass” of an extreme-scale project as measured in lines of code developed by individual scientists—and the time spent with that code during a project’s lifetime. I suggest that a program to improve software productivity for extreme scale science must address dark software if it is to be impactful. I discuss where dark software arises in research and propose a research program to address associated challenges.

Let me know what you think.

Details on the publication:

Foster, Ian (2014): Dark software: Addressing the productivity challenges of extreme-scale science on-ramps and off-ramps. figshare.


2 comments on “Thoughts on dark software
  1. Greg Wilson says:

    Scott Hanselman (Microsoft) wrote about “dark matter developers” in — the ones who build code but don’t have blogs, post to Stack Overflow, etc. I wonder if there’s a correlation between that demographic, and this kind of “invisible” software?

  2. James Whistler says:

    From the review of “The Woman Who Died a Lot”:

    “And then there’s ‘Dark Reading Matter,’ the focus of Thursday’s next adventure, the vast invisible world populated by the imaginary friends of the dead and containing the texts of unpublished and lost stories (Melville’s ‘The Isle of the Cross’ among them).”

    “As always, Fforde makes this wacky world perfectly plausible, elucidating Ffordian physics with just the right ratio of pseudoscientific jargon to punch lines. It’s a dazzling, heady brew of high concept and low humor, absurd antics with a tea-and-toast sensibility that will appeal to fans of Douglas Adams and P. G. Wodehouse alike. Fforde is ffantastic!”
    –Booklist (starred review)

Please let me know your thoughts