Published on February 6, 2026

SyGra Studio: A Tool for Generating Synthetic Data Based on Knowledge Graphs

ServiceNow has opened access to a platform that enables the creation of high-quality datasets based on structured graphs – ranging from simple examples to complex logical scenarios.

Products 4 – 6 minutes min read

Event Source: Hugging Face 4 – 6 minutes min read

When it comes to training language models, data quality often proves more important than quantity. But where can you find representative examples for niche tasks – for instance, for analyzing medical records or technical documentation? You could label them manually, but that is time-consuming and expensive. You could ask GPT-4 to generate them, but the result would be unpredictable. Or, you could structure the data so that it is correct by design.

That is exactly what SyGra Studio – a platform from the ServiceNow AI research group – is designed for. It allows you to create synthetic training data using knowledge graphs as a foundation. In a nutshell: you describe the structure of what you want to get, and the system generates text examples based on it.

Benefits of Knowledge Graphs for Synthetic Data Generation

Why Use Graphs for Text Generation

Usually, synthetic data is created like this: take a large model, give it a prompt like «generate 1000 examples of medical questions» , and hope for the best. The problem is that the model can repeat itself, drift off-topic, or simply hallucinate facts.

SyGra Studio offers a different path. Instead of relying on the neural network's creativity, you first create a knowledge graph – a formal structure where entities (for example, «patient» , «diagnosis» , «medication» ) and the relationships between them («prescribed» , «contraindicated» ) are fixed. It is similar to a database schema, but for semantic relationships.

Then the platform uses this graph as a skeleton: it «understands» which combinations are valid and which are not, and generates examples that fit within the specified logic. The result is a kind of controlled randomness: diversity is preserved, but factual errors are eliminated.

How SyGra Studio Works

How It Works in Practice

SyGra Studio consists of several components. The first is a graph editor, where you can visually build a data structure or upload an existing one. The second is a generator that turns the graph into text examples using a language model. The third is a set of tools for validation and filtering: they allow you to assess the diversity of the generated data and ensure the absence of repetitions or logical inconsistencies.

The platform supports various task formats. You can generate «question-answer» pairs for fine-tuning, examples for classification, or data for entity extraction from text. All of this is configured through the interface – writing code is not required, though experienced users can plug in their own scripts.

An important nuance: SyGra Studio is not tied to a specific model. You can use different LLMs for generation – from open-source to proprietary. The graph sets the structure, and the model handles the linguistic phrasing.

Use Cases for SyGra Studio

Who Can Benefit

The first obvious audience is developers training models for highly specialized tasks. Suppose you are building a tech support chatbot. You have a knowledge base of products, but you do not have thousands of examples of exactly how people phrase their questions. You can build a «product → feature → problem → solution» graph and generate training dialogues based on it.

The second scenario is research. When you need to test a hypothesis about a model's behavior on a specific type of data, but real examples are scarce or hard to collect. The graph allows you to control exactly which patterns enter the dataset and analyze the model's reaction to them.

Third is data augmentation. If you already have a labeled dataset but its volume is insufficient, SyGra Studio can help expand it while preserving the original relationship structure.

The Limitations and Considerations

As with any tool, there are limitations here. First is the construction of the graph itself. If you work in a field where connections between concepts are non-obvious or controversial, creating a correct structure can be difficult. A graph is a simplification of reality, and it is important to be aware of what exactly you are simplifying.

Second, generation quality still depends on the language model. The graph guarantees logical accuracy, but not stylistic diversity or the naturalness of the wording. If the model is prone to clichéd phrases, this will be reflected in the result.

Third is scalability. For local tasks, the platform works great, but if millions of examples with high variability are required, the process can become resource-intensive – both in terms of generation time and API call costs.

How to Access and Use SyGra Studio

Availability and Usage 🔧

SyGra Studio has been released to the public. It can be tested via a web interface on Hugging Face Spaces or deployed locally – the code is published on GitHub. The documentation includes examples for various domains: from medicine to finance.

The platform is under active development, so the interface and functionality may change. However, the core idea – using structure to control generation – is already viable and open for experimentation.

If you need synthetic data with predictable logic, this is one of the most effective ways to get it. The tool is not universal, but for specific tasks, it fits perfectly.

#applied analysis #technical context #machine learning #engineering #data #development tools #generative models #model optimization #synthetic data

Link to Original: https://huggingface.co/blog/ServiceNow-AI/sygra-studio

Original Title: Introducing SyGra Studio

Publication Date: Feb 6, 2026

Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.

Previous Article Cursor Unveils Prototype for Autonomous Codebase Editing Next Article BrowseSafe: How to Protect Browser AI Agents from Hidden Attacks

SyGra Studio: A Tool for Generating Synthetic Data Based on Knowledge Graphs

Benefits of Knowledge Graphs for Synthetic Data Generation

How SyGra Studio Works

Use Cases for SyGra Studio

The Limitations and Considerations

How to Access and Use SyGra Studio

Related Publications

How AI Is Learning to Invent New Molecules: A Deep Dive into GP-MoLFormer

Nitro-AR: A Compact Transformer for Image Generation

How JSON Helps Deploy and Test AI Models Faster

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration