Talaria: Interactively Optimizing Machine Learning Models for Efficient Inferenc

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inferenc

10m

by quantisan

jgoertler

10m

Hi, I’m Jochen, one of the authors.

We recently did a Show HN (https://news.ycombinator.com/item?id=41463916) which did not get much traction, so I’m posting this again here:

We just released Mycelium, the library that powers Talaria’s graph viewer. You can check it out and play around with it here: https://apple.github.io/ml-mycelium

I’m happy to answer any questions about Talaria or Mycelium!

SaBaAg

10m

Are inference metrics like latency and power measured live from device? To which devices can Talaria be applied?

efnx

10m

How does this compare to TVM?

jgoertler

10m

Disclaimer upfront: I have no direct experience with TVM.

I would imagine that the model compilation works quite similar, but I'm not sure if TVM supports palletization.

What I believe is unique to Talaria, is that it can make recommendations for optimizations to the user for each of the layer in the network.

The system allows the user to quickly identify "problematic" layers either through the table view or the graph viewer. This works based on simulated metrics (energy consumption, latency, ...) that are collected for each layers. It then gives optimization choices for each layer, together with the implied changes to the overall (total) metrics. I'm not sure if TVM collects / exposes similar metrics.

So a large part of the system focus on the user-in-the-loop aspect of optimizing a network for inference, which is also why this paper was presented at a conference on human-computer interaction (SIGCHI).

efnx

10m

Ok, thanks :)

bobosha

10m

Could you give us a tl;dr on this project? and how could I use something like this work for on-device applications, think "smart home" style applications?

Crafted by Rajat

Source Code

hckrnws

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inferenc