Projects and Datasets
Basic concept
One key component of GuardOps is storing your interactions with LLMs. Now when storing, one can differentiate between Projects and Datasets.
In the following, the term trace can be understood as one interaction with an LLM - that being either a simple question-answer or a full chat.
Projects
Any trace that is stored needs to be linked to exactly one project. A project might store all traces for one specific agent, chatbot, usecase or - hence the name - project.
Projects have the usual metadata such as a name, a description and optional tags. In addition to that, a retention feature also exists. Retention defines whether or not traces should be stored indefinitely or automatically deleted after a certain amount of time.
When a project that has traces stored in it is deleted, the user has to decide what to do with these traces. You can pick between deleting the traces or moving all of them to a different project.
A trace is always linked to a project and never more than one. Traces in a project that gets deleted need to be moved to a different project or deleted with the project.
Datasets
Datasets are meant to consolidate traces across projects for a given purpose. That purpose might be creating a dataset for fine tuning an LLM or building a new evaluation dataset to measure your chatbot's performance.
A trace from a project can be linked to as many datasets as you want. Traces from different projects can also be linked to the same dataset meaning that datasets are the ideal usecase for consolidating traces that share the same usage purpose or topic.
When a dataset gets deleted, the traces do not get deleted. Their reference to the dataset simply gets removed. This also means that, if you want to move the traces to a different dataset before, you have to do so manually within the dataset itself by selecting all traces and moving them to the desired targed dataset.
Datasets serve the purpose of aggregating traces towards a shared usecase e.g. creating a dataset for fine tuning an LLM or building an evaluation dataset. There are no reference restrictions like in projects. Deletion simply removes the reference to the dataset from a trace but never deletes the traces. Traces can only be deleted within the projects themselves.
Sharing and collaboration
Every project and dataset can be shared. This allows collaboration across multiple users on the same project or dataset. If you share a dataset and the people that imported that one add traces to it, you will see those traces as well even though you might not even have access to the project and vice versa.
Now when a project or dataset is shared, you need to think about a hierarchy of who has how much 'power' over the data. Deleting a shared project or dataset has different behaviour depending on if you are the creator or just an importer.
1. Creator
When a dataset is deleted by the creator, the reference of the traces to this dataset is also removed, leaving importers with an empty dataset. When a project is deleted by the creator, the traces are either deleted as well or moved to a different project depending on the choice made during deletion, leaving the importers with an empty project.
2. Importer
When either a project or a dataset is deleted by an importer, the root project or dataset is not affected by this and still usable for every other importer and the creator. The deletion is only effective for the deleting importer.