OpenSPL, the Open Spatial Programming Language

By Oskar Mencer, Michael J. Flynn and John P. Shen | March 12, 2014

Behind OpenSPL is the factual concern that the temporal computation model of multicore processors reaches its limits in terms of scalability, efficiency, and complexity. This requires radical innovations in the design of systems destined for scale-up. What does spatial mean here? How OpenSPL works and what is the future of this new open programming language? Let’s have a deep view on these…

Oskar Mencer – CEO, Maxeler Technologies / Imperial College London.
Michael J. Flynn – Professor, Stanford University, and Chairman Maxeler Technologies.
John P. Shen – Head of Nokia Research Center North America Lab.

OpenSPL is a novel initiative by CME Group, Chevron, Juniper and Maxeler Technologies to increase awareness and acceptability of computing in space rather than in time sequence. While the initiative is new, the concepts and ideas behind computing in space have been used in practice for a long time. In essence, making an Application Specific Integrated Circuit (ASIC) is just like computing in space where writing the program takes many hundreds of engineers and several years of effort, even without considering the enormous costs. As such, a single ASIC is now so expensive that it has to be able to execute all possible programs. Computing in space, on the other hand, brings the silicon substrate to the trial-and-error programmer and allows us to adapt the spatial structure to the problem the substrate is solving.

Computing in space could be considered a generalization of decades of research (including the authors’ contributions) towards systolic arrays, vector supercomputers, and a wide range of research projects such as Alan Huang’s MIT Thesis on Computational Origami [1]. Going further back, to the late 1950s, IBM revolutionized hardware design with its Standard Modular System (SMS) using standardized circuit cards that could be manufactured quicker and more reliably than the older custom approach. A key piece of SMS was Automated Logic Design (ALD) sheets. The designer used a sheet with 2D grid of blocks; each block specified a logic function (card) and an interconnection. Each interconnected block was entered on a punched card, enabling the design to be managed by computer. The “compiled” design checked the logic, updated signal information and produced the wire routing for manufacturing. This automated design and manufacturing process lead to the market dominance of IBM 1401 and 7090, among other machines The SMS was a precursor to the SLT design system used in System 360 and the age of the mainframe. OpenSPL-based systems gain a similar advantage from employing the lessons of manufacturing (and early IBM systems) to using assembly lines to build the results of computation.

Back to 2014, what does OpenSPL mean in practice? Rather than describing a thread of instructions and duplicating the thread onto multiple processors to handle multiple streams of data (SIMD), OpenSPL requires a split of the computation into control flow and dataflow components, creating a network of arithmetic units (Fig. 1) through which the data flows just like the materials flow through a network of assembly lines in a factory. Each arithmetic unit forwards the result of the operation to the next arithmetic unit, eliminating the need for most register and memory accesses.

How does a spatial program work? Easy: describe a directed graph with sources and sinks, and connect sources and sinks to memory and other IO channels. This can be done by using the syntax of any programming language as illustrated in listing 1. Maybe we could have called OpenSPL computing with directed graphs, but that seemed less elegant.

Here SCS stands for Spatial Computing Substrate and hints at the purpose of the OpenSPL initiative. Rather than pretend that a single program could optimally compile to different architectures and substrates, we set OpenSPL as a baseline from which members of the initiative can construct substrate-specific compilers and runtime implementations.

Listing 2 gives another kernel example where both candidates (x+1 and x-1) are simultaneously computed in space and only the correct value flows out via the SCSVar result. Another example that demonstrates the OpenSPL expressiveness is the moving average kernel shown in listing 3. In this example stream offsets enable access of values from the stream relative to the current position. In this example application specific floating point precision numbers with 7 bit exponent and 17 bit mantissa are used.

The most frequent question at this point is: How about conditionals? The answer is that “conditionals” or “if” statements are not all created equal. There are really three types of conditionals that any programmer can distinguish, but that are hard for machines or compilers to reverse engineer as they look at the code: (1) conditional assignment, (2) data forks, and (3) separate global paths through a program. Computing in Space has three separate natural mechanisms to deal with the three types of conditionals: (1) is a simple multiplexor allowing the programmer to select between two data producers. (2) is a fork in one graph or a fork driving data to two different graphs at the same time. (3) requires a bit of work in separating the different global paths into separate code and implementation. The code reorganization required by (3) is the most unpleasant part for the modern programmer, especially ones trained in the art of C++. Luckily, the untangling of global paths only has to be done to the parts of the program that take most of the time, which are typically small parts.

So now that we can program in 2D space, we need actual computers to do this on. The standard describes the concept of spatial computing substrates, or architectures that support the spatial computing paradigm. While the standard is in its infancy, and is still being refined, we already have a first commercially available substrate: Multiscale Dataflow Computing by Maxeler Technologies. With a range of server, networking and soon also storage products for high end enterprise-level computing, such a substrate can be applied to a wide range of application domains and computing activities. Multiscale Dataflow Computing already comes with simulators, debuggers and a university program having brought access to the technology to over 100 universities worldwide. As the standard develops, there could be many new substrates for Computing in Space. Such new substrates could also bring an order of magnitude reduction in power consumption and physical size of the device to a wide range of additional domains such as wearable computing, embedded, mobile and smart dust computing.

One promising domain is Mobile Supercomputing. Leveraging the OpenSPL standard and a new substrate for real-time sensing and processing, mobile supercomputing systems with order of magnitude reduction in both power consumption and physical system size can become feasible. Such mobile supercomputing systems will become essential in the emerging Mobile Computing Universe.

The emerging Mobile Computing Universe consists of the cloud infrastructure, personal mobile devices, and embedded environmental sensors (or “Internet of Things”). To support the real time processing of massive amounts of mobile data and the performing of deep analysis and inference to extract value from such big data, Exascale supercomputing infrastructure will be needed. However, due to the massive and global scale of this universe, implementing centralized cloud infrastructure to support such real-time supercomputing is not the most efficient approach. Distributing Petascale mobile supercomputers to the edge of the cloud is much more efficient and provides much better service.

The OpenSPL standard based on the dataflow computing model with appropriate substrate supported by powerful software tools offers the potential of achieving two orders of magnitude improvement on performance/power, and the opportunity to build mobile supercomputer systems that can be widely deployed without requiring special machine rooms and the associated supporting infrastructure. There is the opportunity to bring power and energy efficient supercomputing to the mobile mass market.

[References]

[1] The folding of circuits and systems Applied Optics, Vol. 31, Issue 26, pp. 5419-5422, 1992.