hckrnws
Show HN: Graphiti – LLM-Powered Temporal Knowledge Graphs
by roseway4
Hey HN! We're Paul, Preston, and Daniel from Zep. We've just open-sourced Graphiti, a Python library for building temporal Knowledge Graphs using LLMs.
Graphiti helps you create and query graphs that evolve over time. Knowledge Graphs have been explored extensively for information retrieval. What makes Graphiti unique is its ability to build a knowledge graph while handling changing relationships and maintaining historical context.
At Zep, we build a memory layer for LLM applications. Developers use Zep to recall relevant user information from past conversations without including the entire chat history in a prompt. Accurate context is crucial for LLM applications. If an AI agent doesn't remember that you've changed jobs or confuses the chronology of events, its responses can be jarring or irrelevant, or worse, inaccurate.
Before Graphiti, our approach to storing and retrieving user “memory” was, in effect, a specialized RAG pipeline. An LLM extracted “facts” from a user’s chat history. Semantic search, reranking, and other techniques then surfaced facts relevant to the current conversation back to a developer for inclusion in their prompt.
We attempted to reconcile how new information may change our understanding of existing facts:
Fact: “Kendra loves Adidas shoes”
User message: “I’m so angry! My favorite Adidas shoes fell apart! Puma’s are my new favorite shoes!”
Facts:
- “Kendra used to love Adidas shoes but now prefers Puma.”
- “Kendra’s Adidas shoes fell apart.”
Unfortunately, this approach became problematic. Reconciling facts from increasingly complex conversations challenged even frontier LLMs such as gpt-4o. We saw incomplete facts, poor recall, and hallucinations. Our RAG search also failed at times to capture the nuanced relationships between facts, leading to irrelevant or contradictory information being retrieved.
We tried fixing these issues with prompt optimization but saw diminishing returns on effort. We realized that a graph would help model a user’s complex world, potentially addressing these challenges.
We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking with a graph to better model a document corpus. However, it didn't solve our core problem: GraphRAG is designed for static documents and doesn't natively handle temporality.
So, we built Graphiti, which is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and scale:
- Temporal Awareness: Tracks changes in facts and relationships over time. Graph edges include temporal metadata to record relationship lifecycles.
- Episodic Processing: Ingests data as discrete episodes, maintaining data provenance and enabling incremental processing.
- Hybrid Search: Semantic and BM25 full-text search, with the ability to rerank results by distance from a central node.
- Scalable: Designed for large datasets, parallelizing LLM calls for batch processing while preserving event chronology.
- Varied Sources: Ingests both unstructured text and structured data.
Graphiti has significantly improved our ability to maintain accurate user context. It does a far better job of fact reconciliation over long, complex conversations. Node distance reranking, which places a user at the center of the graph, has also been a valuable tool. Quantitative data evaluation results may be a future ShowHN.
Work is ongoing, including:
1. Improving support for faster and cheaper small language models.
2. Exploring fine-tuning to improve accuracy and reduce latency.
3. Adding new querying capabilities, including search over neighborhood (sub-graph) summaries.
## Getting Started
Graphiti is open source and available on GitHub: https://github.com/getzep/graphiti.
We'd love to hear your thoughts. Please also consider contributing!
Let’s say I want to ingest information from a series of interviews with multiple interviewees (multiple interviews per interviewee). It’s possible their opinions/facts change between interviews; but also each interviewee is going to have different opinions/facts.
Would it make most sense to capture this with multiple Graphiti graphs? Or would it be possible to do this in one graph?
At the end of the day the analysis would be finding insights across all interviewees and you want the cumulative knowledge…
Hey, I'm one of the developers on Graphiti project
Adding to Daniel's reply, ingesting a series of interviews is definitely doable with one graph, please make sure to add the episodes from the interviews in their chronological order.
After all the episodes are processed by graphiti, you will be able to retrieve the "complete picture" for every participant in the interviews that reflects the possible change in their views/opinions.
You could achieve this with a single graph. Graphiti has a "message" EpisodeType that expects transcripts in a "<user>: <content>" format. When using this EpisodeType, Graphiti pays careful attention to "users," creating nodes for them and maintaining "fact" context for each user subgraph.
"Facts" shared across all users will also be updated universally. Alongside Graphiti's search, you'd be able to use cypher to query Neo4j to, for example, find hub nodes (aka highly-connected nodes), identifying common beliefs.
More here: https://help.getzep.com/graphiti/graphiti/adding-episodes
I see that you mention Microsoft’s GraphRAG. My understanding is that a key part of their approach is hierarchical agglomeration of graph clusters to be able to answer wide questions from the graph. Is that in the works?
Yes, that is in the works and is a high priority for us. The major discussion point internally around implementing this feature has been on the retrieval portion. In general we want to provide many flexible search strategies that return a variety of different information. We want to organize search in such a way that it is flexible enough to meet a variety of demands, while also being ergonomic enough to be usable and understandable. We want to make sure that we update our retrieval approach at the same time as adding the community summaries so that it is easy to make use of this additional information.
Our implementation will likely involve us adding community nodes that will contain a summary of the nodes in that community. Did you have any perspective or opinions on best ways to implement the graphRAG style summarizations?
Oh that’s excellent! Thank you
Looks cool, would love support for RDF Graphs. The reason I prefer those is because the ontology is already well defined in a lot of cases which is 80% of the battle with Knowledge Graphs in my experience. Without a well defined Ontology I think LLM <> KG integration will not live up to its potential. LLMs have to know what nodes and edges really mean across diverse datasets
Hey, thanks for the feedback! I'm one of the devs on graphiti and adding support for custom schema is high on our to-do list. I agree that this is an important step in helping to bridge the gap between structured and unstructured data, as well as for refining the graph on specific use cases.
Currently, we do have some ways of helping the graph to understand what nodes and edges "really mean." In addition to the name of the relationship our edges also store a "hydrated" version of the fact triple. For example, if Alice and Bob are siblings you might see an edge with the name IS_SIBLING_OF between the two. In addition to this, the edge also stores the fact: "Alice is the sibling of Bob". This way we are storing much of the semantic context on the nodes and edges themselves in addition to the graph structure.
We also support ingesting structured JSON, and I those cases the edges will be exactly the properties in the JSON doc.
The reason I bring RDF is because I use ontologies that have been defined by experts and covers ton of edge cases. If a group of genealogists define a `fam:` RDF Ontology and publish it, then I want every family relationship in my graph to use their Ontology.
I'm looking for something like graphiti that can take in a text block and when creating the relationships, automatically know to use the `fam:` ontology when creating familial relationships. The vast majority of people don't feel like defining schemas for every little thing and they're basically the same across all systems except for custom proprietary ones you define as your IP.
Their ontology would have OWL rules like `fam:isChildOf` `owl:inverseOf` `fam:isParentOf` so running an OWL Reasoner over the graph would generate the inverse triples as well
So if I had the text `Joe is Bob's dad`, input it into graphiti, then get the triples
person:Joe fam:isParentOf person:Bob person:Bob fam:isChildOf person:Joe
and the edge would be in a shared definition amongst all graphiti users. The LLM can be fine tuned to recognize exactly what fam:isParentOf means so there is no ambiguity. Right now I'm guessing graphiti could spit out edges `IS_SIBLING_OF` `SIBLING` `SISTER` `BROTHER` etc, its not standardized which makes it difficult to interact with computationally if say, I wanted to input a bunch of random text and then run pre-trained graph models of family networks.
Thanks for the follow-up and the in depth example and explanation. Like you said, supporting ontologies is definitely a core use-case of KG's and there are also many standard preexisting taxonomies for different things (Google and Amazon both famously have taxonomies that try to cover everything, and there are many other specialized ones as well).
I don't think I was clear enough when I mentioned our plans to add custom schema. The way we are thinking of implementing this idea is by allowing end users to provide specific node types and edge types between those nodes. Then we can pass that information on to the LLM and instruct it to extract only nodes and edges that conform to the provided schema. We would also have methods to verify the output before adding it to the graph.
So in this scenario you could input something like: { NodeType: Person, EdgeTypes: [IS_PARENT_OF, IS_CHILD_OF] }
Always extracting creating inverse relationships as well isn't something we've discussed yet but I think it's a great idea. Happy to hear any other thoughts you have or if you think there is a flaw in our approach to the custom schema to begin to solve the issue you've raised.
Edit: I think part of what you are saying just clicked for me. I think you're suggesting that the graphiti team chooses some open source taxonomy (like Google or Amazon) that we determine as our core taxonomy, and then fine tune an LLM on that data and open-source it? Then users can choose to use that fine-tuned LLM and get consistent schema relationships across the board? I think that is a really cool idea, but probably not something we would be able to do in the foreseeable future. We want the graphiti open source project to not be that opinionated, and we want to allow users to choose or fine tune their own LLMs for their specific use cases.
Yeah, but you’re kinda missing the point, there is an existing eco system of ontologies and technologies using RDF, without need to reinvent something likely not as well thought out.
I'm not quite sure I follow. Today, graphiti extracts entities as nodes and facts between those nodes as edges. The nodes and edges store semantic data, like summaries of entities and facts representing the relationships between them (in addition to other metadata). Our searches are also based on this semantic data, and we aren't intending the extracted edge names to be used as filters as we are not doing any taxonomical classifications of nodes and edges.
In the near future, we intend to allow users of graphiti to input a custom schema (ontology), and we would use that to enforce a classifications of the extracted nodes and edges. In this case we are un-opinionated on what custom schema is being provided. You would be able to use an ontology that is made in-house or one of the many open source ones that exist in whatever field you are working in.
In neither case are we trying to recreate our own custom ontology or reinventing the wheel on how things are being classified.
> So in this scenario you could input something like: { NodeType: Person, EdgeTypes: [IS_PARENT_OF, IS_CHILD_OF] }
RDF, OWL are existing formats for defining a schema
Have you seen TerminusDB? [0] They’ve got a nice solution to versioned RDF graphs, originally pitched as “Git for data” but focused on knowledge graphs.
I’m not affiliated (in fact they launched around the same time that my co-founder and I launched Splitgraph with the same “Git for data” pitch), but I find their technology very intriguing.
Knowledge graphs are on the cusp of revival after being in stasis for 20 years. They’re a perfect match for LLMs and I’m excited to see how the field adopts them.
Hi :) Cool project! Just FYI, there is already a fairly well-established project with that name.
Looks very interesting, will check it out, also it would likely be much more adoptable if standards based.
Any tips for someone who’d like to try implementing something like this in TypeScript?
Hey! Do you mean tips for implementing a similar project in TS? Or do you want to make a project in TS using graphiti?
If you want to use graphiti in a TS project you can wrap the graphiti library in a Python server (using fastAPI, Flask or Django) and then in TS you can call the Python endpoints with something like express. We will have an example of a fastAPI implementation in our repo soon.
If instead you want to build a similar project in TS, I would say it is certainly very doable. We will be releasing blog posts on some of the challenges we ran into when building the project, but the main thing I would say is to focus on having a robust data structure before you start that handles all of your use cases, fundamentally changing the basic building blocks of your library half way through will be really tough.
Than you for open sourcing this!
You are definitely onto something here.
Pleasure! We'd love feedback + suggestions should you try it out.
Comment was deleted :(
Crafted by Rajat
Source Code