Published February 6, 2026

SyGra Studio: A Tool for Generating Synthetic Data Based on Knowledge Graphs

ServiceNow has opened access to a platform that enables the creation of high-quality datasets based on structured graphs – ranging from simple examples to complex logical scenarios.

Products
Event Source: Hugging Face Reading Time: 4 – 6 minutes

When it comes to training language models, data quality often proves more important than quantity. But where can you find representative examples for niche tasks – for instance, for analyzing medical records or technical documentation? You could label them manually, but that is time-consuming and expensive. You could ask GPT-4 to generate them, but the result would be unpredictable. Or, you could structure the data so that it is correct by design.

That is exactly what SyGra Studio – a platform from the ServiceNow AI research group – is designed for. It allows you to create synthetic training data using knowledge graphs as a foundation. In a nutshell: you describe the structure of what you want to get, and the system generates text examples based on it.

Benefits of Knowledge Graphs for Synthetic Data Generation

Why Use Graphs for Text Generation

Usually, synthetic data is created like this: take a large model, give it a prompt like «generate 1000 examples of medical questions» , and hope for the best. The problem is that the model can repeat itself, drift off-topic, or simply hallucinate facts.

SyGra Studio offers a different path. Instead of relying on the neural network's creativity, you first create a knowledge graph – a formal structure where entities (for example, «patient» , «diagnosis» , «medication» ) and the relationships between them («prescribed» , «contraindicated» ) are fixed. It is similar to a database schema, but for semantic relationships.

Then the platform uses this graph as a skeleton: it «understands» which combinations are valid and which are not, and generates examples that fit within the specified logic. The result is a kind of controlled randomness: diversity is preserved, but factual errors are eliminated.

How SyGra Studio Works

How It Works in Practice

SyGra Studio consists of several components. The first is a graph editor, where you can visually build a data structure or upload an existing one. The second is a generator that turns the graph into text examples using a language model. The third is a set of tools for validation and filtering: they allow you to assess the diversity of the generated data and ensure the absence of repetitions or logical inconsistencies.

The platform supports various task formats. You can generate «question-answer» pairs for fine-tuning, examples for classification, or data for entity extraction from text. All of this is configured through the interface – writing code is not required, though experienced users can plug in their own scripts.

An important nuance: SyGra Studio is not tied to a specific model. You can use different LLMs for generation – from open-source to proprietary. The graph sets the structure, and the model handles the linguistic phrasing.

Use Cases for SyGra Studio

Who Can Benefit

The first obvious audience is developers training models for highly specialized tasks. Suppose you are building a tech support chatbot. You have a knowledge base of products, but you do not have thousands of examples of exactly how people phrase their questions. You can build a «product → feature → problem → solution» graph and generate training dialogues based on it.

The second scenario is research. When you need to test a hypothesis about a model's behavior on a specific type of data, but real examples are scarce or hard to collect. The graph allows you to control exactly which patterns enter the dataset and analyze the model's reaction to them.

Third is data augmentation. If you already have a labeled dataset but its volume is insufficient, SyGra Studio can help expand it while preserving the original relationship structure.

The Limitations and Considerations

As with any tool, there are limitations here. First is the construction of the graph itself. If you work in a field where connections between concepts are non-obvious or controversial, creating a correct structure can be difficult. A graph is a simplification of reality, and it is important to be aware of what exactly you are simplifying.

Second, generation quality still depends on the language model. The graph guarantees logical accuracy, but not stylistic diversity or the naturalness of the wording. If the model is prone to clichéd phrases, this will be reflected in the result.

Third is scalability. For local tasks, the platform works great, but if millions of examples with high variability are required, the process can become resource-intensive – both in terms of generation time and API call costs.

How to Access and Use SyGra Studio

Availability and Usage 🔧

SyGra Studio has been released to the public. It can be tested via a web interface on Hugging Face Spaces or deployed locally – the code is published on GitHub. The documentation includes examples for various domains: from medicine to finance.

The platform is under active development, so the interface and functionality may change. However, the core idea – using structure to control generation – is already viable and open for experimentation.

If you need synthetic data with predictable logic, this is one of the most effective ways to get it. The tool is not universal, but for specific tasks, it fits perfectly.

Original Title: Introducing SyGra Studio
Publication Date: Feb 6, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article Cursor Unveils Prototype for Autonomous Codebase Editing Next Article BrowseSafe: How to Protect Browser AI Agents from Hidden Attacks

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Google DeepMind
3.
Gemini 3 Flash Preview Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 3 Flash Preview Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe