hckrnws
I made the first version of this back in 2010, when Pearl's work on causal inference started impacting Epidemiology. A friend was an Epidemiologist and she told me about an MS-DOS program she was using to do something with graphs (https://pubmed.ncbi.nlm.nih.gov/20010223/); she found it painfully slow and wondered if I could "make it more user-friendly".
I did my PhD in algorithms at the time and was intrigued when I started reading Greenland, Pearl, and Robins (https://pubmed.ncbi.nlm.nih.gov/9888278/) and then Pearl's "Causality". I soon found that it was not obvious at all how you could speed up that MS-DOS program, and it led to a paper at UAI in 2011 (https://arxiv.org/abs/1202.3764). I made dagitty as a demonstration that you could actually use the algorithms we developed in that paper, and it took off from there -- started with 10 users per day, growing to the hundreds and thousands as causal inference became more popular.
It's now a bit dated, and I don't have as much time anymore to keep it "fresh" as I would like. But I am still grateful and amazed at about how many people I got to know due to this. Highlights included collaborating with Pearl himself on a solution manual for his book "Causal Inference: A Primer" when it first came out, and so many e-mails I got out of the blue from users all over the world. Just last summer I stayed at the house of the author of one of the builtin examples in dagitty.
As these 14 years flew by, I now am happy to do play a small part in supporting the next generation of causal inference software -- if you're interested in causal inference, be sure to check out pgmpy.org, a Python library for Bayesian networks that includes several causal inference functions (https://arxiv.org/abs/2304.08639). Ankur, the author, did his PhD with me and will soon defend his thesis!
Also, R users, be sure to check out ggdag, a great package by Malcolm Barrett that wraps dagitty functionality in a much nicer and tidyverse-compatible way.
Nice to see this still going! we used daggity in a grad school stats class back in 2013. To the instructor's credit, we spent the first few weeks thinking about causal models before we got into any actual stats. (Put differently, a DAG is a nonparametric structural equation model [0], and the rest of the stats class was about different ways to parametrize those models.)
[0] Pearl 2021: https://ftp.cs.ucla.edu/pub/stat_ser/r370.pdf
I work on a graph-based library and regularly generate DAGs for analysis and debugging. I have been using graphviz/dot but it's just so damn frustrating. You have to jump through hoops to get the layout right. It would be nice if something as ubiquitous as graphviz had a dedicated rendering engine for DAGs which did moderately sane things like place root and tail nodes on the same rank without requiring me to figure out which nodes are and manually position them.
https://github.com/vizdom-dev/vizdom/ seems like a nice library for rendering directed graphs into SVG, and apparently has automatic layout capabilities.
Hey that's me! Thanks for mentioning it.
So far it has automatic layout that will position (and create a rendering) for hierarchical graphs - even if your data isn't hierarchical in nature. This is how the algorithm works internally as it is a reflection of Dagrejs.
I plan to support more layout types in the future.
As for placement of 'root' nodes (aka source and sink nodes) - in reference to the parent comment - my hope is to expose graph theory methods that you can run before layout/render which would help facilitate this automatically, but it's non-trivial to codify as it requires a substantial refactor to the layout algorithm(s).
Please note that while Vizdom is freely available for use under the Apache License 2.0, the Rust WebAssembly binary included in this library is closed-source. You are free to use the library, but the source code for the Rust WebAssembly binary is not publicly available.
What do you mean by root and tail nodes?
Roots: Nodes with no dependencies.
Tails: Nodes with no dependents.
Ok, maybe I am misunderstanding, but why would you want to have them at the same level?
I'd guess the GGP meant all of the root nodes together on one level and all of the tail nodes together on another, but single level. At least that's what makes sense to me.
This seems like it could be done in a little script in the graphviz gvpr processor.
Someone could implement this as a feature in graphviz. It might take a day to implement.
There are probably other "solutions" that already handle this more gracefully, but otherwise don't have a very complete set of features relative to graphviz.
It's reasonable (and constructive) for you to complain, but your "issue" is representative of a wide set of "issues" we've seen, and the real problem is we're just a little unimportant niche service and all this work is being done by volunteers and they mainly work on the things they know how to do.
I was thinking perhaps they want them both aligned to one side of the drawing. So for example if the direction goes from left to right, then the start and end nodes might need to be at the top.
I hate to ask this question.... but I've moved to a python shop after working in the tidyverse for years, and am unimpressed with the DAG visualization capabilities. Does anyone have any recommendations for 1,000 plus node DAGs?
I still miss R and tidy quite a bit, but polars at least gets closer.
There are some tools for larger renderings. I've had success with Graphics but have you tried Gephi https://gephi.org/
I can confirm Gephi handles 1000+ just fine: I used it to solve Adventure of Code problem.
I'm working on a python library for Vizdom to be released later this year, but in the mean time, you can use this python library which uses Graphviz under the hood.
I will give that a try. Thank you very much :)
Any of the python network science libraries can handle a 1000 node directed graph no problem.
Networkx visualizations are ugly out of the box but you can make the network look however you want. The best out of the box visualizations I think are a matter of taste and use case. Same with the layouts.
In a more abstract sense, I think it is hard to not have a 1000 node network visualization not be a useless hairball unless the network is quite sparse.
If you mean with do-calculus though I really have no idea.
This is very cool -- well done!
I would find a python port useful, as R is more of a special use case in my own workflows, but my use case shouldn't deter the authors.
Very cool to see this here. Johannes Textor was my professor for Bayesian Networks and Causal Inference when I studied at the Radboud university in Nijmegen. He is an awesome and down to earth guy, and he was very happy about and open to feedback.
If you‘re using causal diagrams professionally or privately: What are your use cases?
I’m considering using them for Change Management impact analysis
Do good ol' structural equation models count? Because I know quite a few colleagues doing research on patient experiences in healthcare, who do psychometric studies on patient-reported surveys of their experiences (patient-report outcome measures.)
The connection between SEMs and DAGs is really interesting. The underlying models are very similar but developed independently -- SEMs coming from the psychometrics tradition and popularized by Jöreskog, DAGs coming from Bayesian Networks and popularized by Pearl. There are deep connections between them -- we have done some work on that (e.g. https://osf.io/preprints/psyarxiv/2kqxr and https://arxiv.org/abs/2302.13220).
Very simple and well done, I'm surprised how simple the code to generate the DAG is.
I will use it when I have a chance.
I really like the "How to menu", I may recommend to do it a little more prominent on first usages or show me once that it's there.
Congrats!
Suggestion: disable the slide down to reload for mobiles, it will improve ux
The JS part could get some TS love for sure.
Julia port would be much appreciated
You can use Daggity.jl: https://docs.juliahub.com/Dagitty/kxRMH/0.0.1/
You made my day.
Crafted by Rajat
Source Code