hckrnws
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inferenc
by quantisan
Hi, I’m Jochen, one of the authors.
We recently did a Show HN (https://news.ycombinator.com/item?id=41463916) which did not get much traction, so I’m posting this again here:
We just released Mycelium, the library that powers Talaria’s graph viewer. You can check it out and play around with it here: https://apple.github.io/ml-mycelium
I’m happy to answer any questions about Talaria or Mycelium!
Are inference metrics like latency and power measured live from device? To which devices can Talaria be applied?
How does this compare to TVM?
Disclaimer upfront: I have no direct experience with TVM.
I would imagine that the model compilation works quite similar, but I'm not sure if TVM supports palletization.
What I believe is unique to Talaria, is that it can make recommendations for optimizations to the user for each of the layer in the network.
The system allows the user to quickly identify "problematic" layers either through the table view or the graph viewer. This works based on simulated metrics (energy consumption, latency, ...) that are collected for each layers. It then gives optimization choices for each layer, together with the implied changes to the overall (total) metrics. I'm not sure if TVM collects / exposes similar metrics.
So a large part of the system focus on the user-in-the-loop aspect of optimizing a network for inference, which is also why this paper was presented at a conference on human-computer interaction (SIGCHI).
Ok, thanks :)
Could you give us a tl;dr on this project? and how could I use something like this work for on-device applications, think "smart home" style applications?
Crafted by Rajat
Source Code