November 28, 2024

Instant concept annotation with ConNER

Introducing ConNER, a neural sequence labeling model that identifies concepts in text using BIO tagging. This release includes the code and a pre-trained model snapshot.

Poster image

Introduction

We’re announcing the release of ConNER (Concept Named Entity Recognition), a lightweight and efficient model designed to extract concepts from text. The model is optimized for edge devices, with a size of just a few megabytes, making it suitable for running directly on laptops, tablets, or phones without requiring significant computational resources.

Motivation

Extracting concepts from text often relies on large language models that demand substantial computational power and can be slow. ConNER offers a lightweight alternative, specifically trained for concept annotation. It achieves over 90% accuracy on our validation set and is compact enough to run on edge devices, providing real-time predictions.

Use Cases

ConNER can be used as a building block for various applications:

Examples

  1. Input: Microeconomics focuses on individual markets and consumer behavior.
    Output: ["Microeconomics"]

  2. Input: Understanding mental health and brain chemistry requires studying psychology.
    Output: ["mental health", "brain chemistry"]

  3. Input: Machine learning is a subset of artificial intelligence that enables systems to learn from data.
    Output: ["Machine learning"]

  4. Input: The human brain is the most complex organ in the body.
    Output: ["human brain"]

Technical Details

Architecture

ConNER is built on prajjwal1/bert-tiny with a classification layer for BIO (Beginning, Inside, Outside) tagging. The model:

Training Data

The model was trained on a proprietary dataset of academic content and notes, including:

Training Configuration

Model Card

Key Information

Limitations

Example Limitations

Downloads

Resources

Author(s): Ilia Breitburg