We can significantly reduce the time required to realize designs if it is possible to find limits to the performance of an embedded system, solely based on high-level system specifications. For that purpose, we present in this paper the cprof profiler, which determines the number
...
We can significantly reduce the time required to realize designs if it is possible to find limits to the performance of an embedded system, solely based on high-level system specifications. For that purpose, we present in this paper the cprof profiler, which determines the number of clock cycles needed to execute a C-program in hardware. The cprof tool is based on the Clang compiler front-end to parse C-programs and to produce instrumented source code for the profiling. Using cprof, we determine a lower and upper bound limit for all 29 cases of the PolyBench/C benchmark suite. The lower and upper bound are determined using the absolute performance estimations assuming all statement are mapped onto the same processing resource and unbounded performance estimations assuming unlimited resources. We also compared the clock cycles found by cprof with RTL implementations for all 29 Polybench/C cases and found that cprof determines with 1.2% accuracy the correct number of clock cycles. It does this in a fraction of the time compared to the time needed to do a full RTL simulation.@en