(Missed this earlier) It would have made an excellent April 1st video with Ive pouring over every nuance of the new cheese-grater design ;-)
'Speaking of Nuance - this is the 'basic tech that underpins Siri. The algorithm is (IMO) a brute force CPU algorithm that needs hefty hand tuning to increase phoneme separation.
To do more correlation and covariance & without (human) deterministic (algorithm) intervention requires considerably more real time data manipulation than state of the art desktop GPUs can currently muster. This summary view is from the white papers that Google have made available on the subject. These souls admit they are at the starting point in trying to guess the underlying way that this new system should work - but even with TPU chips optimised to perform 65536 MAC (multiply and accumulate) on 6 year old chip fabs - their chips out-performs GPUs by one to two orders of magnitude. Googles white papers on reducing speech error rate are also available.
Wcctech article : Not an insightful piece IMO.
NextPlatform piece : Excellent & Informative. It will be fascinating to see if China will move from 256 mini-monster core chips for their prototype Exascale, or if they'll wait until they get access to a modern fab. In terms of a 'real' super (i.e. One that scales) - that will most likely pop up in Japan. As with Googles view on TPU efficacy, Bandwidth looks to be a key limit ... something that the Fujitsu engineers are no doubt 'chewing on (ie Tofu!).
AJ
Edit : Just seen the following that helps illustrate the efficacy of the TPU for 'that type' of load.
![]()