ARM introduced a new powerful CPU core — Cortex-A77

ARM introduced a new powerful CPU core — Cortex-A77

ARM presented its latest processor design — Cortex-A77. Like last year's Cortex-A76, this core is designed for high-end tasks in smartphones and a wide variety of devices. In it, the developer aims to increase the number of commands executed per clock cycle (IPC). Clock frequency and power consumption has remained approximately at the level of Cortex-A76.

Currently, ARM aims to quickly increase the performance of its cores. According to its plans, starting with Cortex-A73 in 2016 and up to Hercules design in 2020, the company intends to increase the CPU power by 2.5 times. Already transitions from 16 nm to 10 nm and then to 7 nm allowed to raise the clock speed, and in conjunction with the architecture of Cortex-A75 and then Cortex-A76, according to ARM estimates, to date, achieved a 1.8-fold increase in performance. Now the core Cortex-A77 will allow due to the growth of IPC to increase performance by another 20 % at the same clock speed. That is, a 2.5-fold increase in 2020 becomes quite real.

Despite the increase IPC by 20 %, according to estimates of the ARM, A77 power consumption is not increased. The compromise in this case is that the area of the A77 crystal is about 17 % larger than A76 at the same technological standards. As a result, the cost of a single core will increase slightly. If you compare the achievement of ARM with industry leaders, it is worth saying that AMD Zen 2 has achieved IPC growth by 15 % compared to Zen+, and the value of IPC in Intel cores for many years remains at about the same level.

The execution window with the change of the command sequence (out-of-order window size) is increased by 25 % to 160 units, which allows the kernel to increase the parallelism of calculations. Even in the Cortex-A76 was large buffer addresses transition (Branch Target Buffer), and in Cortex-A77 he was increased by another 33 %, to 8 KB, which allows the unit branch prediction to effectively cope with the increasing number of parallel instructions.

An even more interesting innovation is a completely new 1.5-KB cache, which stores macro operations (MOP) returned from the decoding module. The ARM processor architecture decodes instructions from the user application into smaller macro operations and then breaks them down into micro operations that are already passed to the execution kernel. The MOP cache is used to reduce the impact of missed branches and drops, as macro operations are now stored in a separate block and do not need to be decoded again — thus increasing the overall throughput of the kernel. In some workloads, the new block is an extremely useful addition to the standard instruction cache.
Category: Technology | Views: 118 |
Total comments: 0