Project: Tools & libraries for embedded Video content Analysis and understanding on Manycore and heterogeneous Multicore Platforms.

For most of the history of silicon-based computing, the relentless scaling of silicon technology has led to ever faster sequential computing machines, with higher clock rates and more sophisticated internal architectures exploiting the improvements in silicon integration density. Backward compatible processor designs ensured that SW reCOed portable to new machines, enjoying scaling-induced performance benefits._x000D_In recent years this has ceased to be the case. In spite of continued scaling of silicon technology, power dissipation and power density have become strong limiters. Rather than building more complex processors, manufacturers have used the transistor budget to build more processors and diverse HW accelerators onto a single chip, and heterogeneous multi-core machines are now the most effective solution in terms of computation capabilities per dissipated power in a wide range of computing applications. As a result, the performance gains of modern computing systems are primarily due to an increase in the available parallelism. In this general context, the project will focus on STM STHORM SoC (formerly P2012), a scalable many-core fabric targeting extreme energy efficiency in aggressively scaled CMOS technologies. The first STHORM SoC prototype in 28nm CMOS will sample in Q4 2012, featuring four clusters of 16 processors and delivering 80 GOPS in 15.2 mm2 with 2W power consumption._x000D_A many-core platform like STHORM poses a challenge to the SW engineering and implementation methodology in general. Whereas sequential SW could be kept almost as it was and used to automatically execute faster on a faster processor, the achievement of the performance requirements of an application on a new platform that provides more parallelism is predicated on the ability to effectively exploit such parallelism, i.e. to parallelize the application and efficiently match it to the respective computing substrate._x000D_For well-behaving application SW to efficiently (i.e. according to application requirements) scale and map to massive parallel computing platforms, it is necessary to develop the application SW in a way that it:_x000D_- exposes as much parallelism of the application as practical,_x000D_- provides simple and natural abstractions that help manage the high degree of parallelism and permits principled composition of and interaction between modules to explore and find efficient matches with the platform,_x000D_- makes minimal assumptions on the low-level architecture details of the computing machine it is implemented on,_x000D_- is efficiently implementable on a wide range of architectures, from sequential processors, to shared-memory multicores, manycores, and programmable logic devices, as well as combinations thereof._x000D_This is not a trivial proposition, because it is for many aspects at the antithesis of the current approach of mapping SW onto parallel platforms and it implies that the current body of software will not by itself be portable efficiently, but it has to be rewritten to take advantage from the parallel performance of new architectures. So the current practice based on sequential code parallelization, manually or automatically, with the attempt of adapting the sequential codes to a parallel implementation target, needs to evolve towards more appropriate and effective approaches._x000D_With this purpose, the STHORM family provides an actors dataflow based Native Programming Model that gives the possibility to adopt such more appropriate approach. In addition, not only it enables a "HW native", thus very efficient dataflow support, but it also enables the portability of application SW on all the range of the flexible implementation fabric of the STHORM family that spans between fully programmable homogeneous many-cores to heterogeneous many-core clusters integrating specific accelerators._x000D_The project VAMPA intends to support the changes of SW methodology necessary to efficiently program the STHORM family by providing to the customers a complete high level actors dataflow programming environment composed by development tools and libraries. The choice of actors dataflow is not only introduced by the motivations and attractive features discussed above, but also because it naturally matches the NoC communication primitives that naturally yield efficient actors dataflow communication implementations, the key for efficient parallel processing._x000D_The application market addressed is the embedded video analysis and processing. A success on the marketplace of the STHORM processing platform depends not only on the theoretical processing power, on the extremely low power consumption or on the low cost (intrinsic features), but with equal or even larger importance on the ecosystem of tool and library support that translates into the possibility of users to exploit such intrinsic platform features into their applications._x000D_To achieve such objectives VAMPA will develop real sized application examples in the video analysis and processing fields._x000D_

Acronym VAMPA (Reference Number: 7678)
Duration 15/04/2013 - 30/06/2015
Project Topic The key for the success of manycore SoC and heterogeneous multicore platforms is the availability of tools and libraries that unlock their full potential, while enabling highly productive application development. VAMPA addresses this objective for the video analysis and processing market.
Project Results
(after finalisation)
Provide a full profiling tool-chain (source instrumentation, actors mapping) in Orcc Dataflow compiler._x000D_Provide a test platform in Orcc for our HEVC decoder (including full simulation, actor by actor simulation, non regression tests,performances tests)._x000D_Contribution to RVC-CAL HEVC implementation : optimizations as use of SSE instructions, alternative designs (YUV, Frame-based)etc.), cache efficient FIFOs, vectorization, etc._x000D_Development of an innovating actor based dynamic mapping for Dataflow applications._x000D_Creation of a simulator for RVC-HEVC execution on TTA embedded platforms._x000D_Integration of the HEVC decoder in a final demonstrator that was assembled and presented during the Final PCC meeting._x000D_
Network Eurostars
Call Eurostars Cut-Off 9

Project partner

Number Name Role Country
9 Active Technologies S.R.L. Partner Italy
9 AKAtech SA Coordinator Switzerland
9 Alma Mater Studiorum - Università di Bologna, Dipartimento di Elettronica, Informatica e Sistemistica Partner Italy
9 Ecole Polytechnique Federale de Lausanne Partner Switzerland
9 INSA of RENNES, IETR laboratory Partner France
9 NuLink SA Partner Switzerland
9 RESONATE-MP4 SARL Partner France
9 SELEA SRL Partner Italy
9 STMicroelectronics Grenoble 2 SAS Observer France