Built for data scientists, MLOps engineers, and human experts.

The open-source tool for data-centric NLP

Supercharge your NLP data labeling and management.

Loved by the data science community

Close the data loop: collect, label, and monitor.

A new way to iterate on NLP data. Enhance your projects with weak supervision and human-in-the-loop workflows

Are you eager to try it out?

Check out Rubrix’s installation guide, otherwise, if you already have an Elasticsearch instance running, it is as simple as:

              
  pip install "rubrix[server]"
  python -m rubrix
              
            

Use Rubrix on a scalable cloud infrastructure without installing the server.

Join the waiting list

Use the libraries you love

The fastest integration with any model or library. Combine your favourite libraries into novel workflows.

                          
from transformers import pipeline
from datasets import load_dataset
import rubrix as rb

model = pipeline('zero-shot-classification')
dataset = load_dataset("ag_news", split="test[0:100]")

labels = dataset.features["label"].names

for record in dataset:

    prediction = model(record['text'], labels)

    item = rb.TextClassificationRecord(
        inputs=record["text"],
        prediction=list(zip(prediction['labels'],prediction['scores'])),
        annotation=labels[record["label"]]
    )
    rb.log(records=item, name="agnews_zeroshot")
                          
                        
                          
import spacy
import rubrix as rb

text = "Paris a un enfant et la forêt a un oiseau."

nlp = spaCy.load("fr_core_news_sm")

doc = nlp(text)

prediction = [
  (ent.label_, ent.start_char, ent.end_char)
  for ent in doc.ents
]

record = rb.TokenClassificationRecord(
  text=text,
  tokens=[token.text for token in doc],
  prediction=prediction,
  prediction_agent="spaCy.fr_core_news_sm",
)

rb.log(records=record, name="lesmiserables-ner")
                          
                        
             
import pandas as pd
import rubrix as rb

df = pd.read_csv("user_requests.csv")

for i,r in df.iterrows():

  record = rb.TextClassificationRecord(
    inputs={
      "message": r.text,
      "subject": r.subject
    },
    metadata={
      "department": r.department,
      "source": r.source
    },
  )

  rb.log(record, name = "user_requests")
            
            
Logo Hugging Face Check out the tutorial

A new user and developer experience.