hckrnws

Show HN: Data Studio – Open-Source Data Notebooks

by alx-net

Hey HN, I am Alex. I am open sourcing Data Studio, a lightweight data exploration IDE in your browser that runs locally.

Try it: https://local.dataspren.com (no account needed, runs locally)

More information: https://github.com/dataspren-analytics/data-studio

I love working with data (Postgres, SQL, DuckDB, DBT, Iceberg, ...). I always wanted a data exploration tool that runs in my browser and just works. Without any infra or privacy concerns (DuckDB UI came quite close).

Features:

  - Data Notebooks
    - SQL cells work like DBT models (they materialize to views)
    - Use Python functions inside of SQL queries
    - Use DB views directly in Python as dataframes
  - Transform Excel files with SQL
  - You can open .parquet, .csv, .xlsx, .json files nicely formatted

If you like what you see, you can support me with a star on Github.

Happy to hear about your feedback <3

cxr

51m

Neat. Some things I noticed:

1. Dragging and dropping a CSV (from the system file manager) onto the "Upload Data" button doesn't do anything. This is something that a vanilla <input type="file"> does for free. In reality, someone should be able to drag and drop a file anywhere into window to "upload" it. PS: Don't use the word "upload" since one of your selling points is that this is _not_ using cloud storage.

2. Let people use cloud storage if they want. Please, please use remoteStorage if/when you do. (See <http://remotestorage.org/>)

3. If I try to open a second tab, I get a message "DataStudio uses the Origin Private File System for local data processing, which only supports a single active session. Please close the other tab". You should do whatever you can to mitigate this, incl. not using the those APIs unless you absolutely have to. (In this case, where all I've done is visited the landing page and opened a CSV to see how it looks, you don't have to.)

4. Compiling to WASM is cool and all, but in the aforementioned case where I open a CSV and then click it, what ends up happening is I get a "Loading runtime..." message and a spinner for a really long time (tens of seconds) before the data appears. Again: you should do whatever you can to mitigate this (incl. not "loading the runtime" unless you absolutely have to—and, again, this is not a case where you absolutely have to).

5. There's a "Reset runtime" button in the top right. This suggests a fundamental problem somewhere fairly deep down; this button shouldn't exist.

6. When I open a three-column CSV in a half-width window and then resize my browser to take up the full screen, the data is still displayed in auto-sized column widths that are the same as they were before, but the data table itself expands to fill the extra space, and the column headers do, too, to distribute the extra space. So now I have column data appearing beneath an unrelated column header.

7. There's evidently no splitter to grab to resize the columns manually. This is strange and unexpected.

8. Switching back and forth to a three-column CSV with less than 3,000 rows takes a noticeable amount of time for the data to show up. (This is after the whole "Loading runtime" step. A spinner shows up and stays there for about a second. And then scrolling all the way to the bottom reveals that (a) this is a virtualized list and there are no more than 26 rows on my screen at a time, and (b) even then the table is limited to "Showing 500 of 2,xxx rows". Web browsers are fast. It shouldn't take anywhere near this long even to display the whole table, let alone one with virtualized rows. (Hint: All the tree-/listviews in the Firefox (and Thunderbird) UI are implemented in HTML as Web Components; steal that code.)

Crafted by Rajat

Source Code