Hint

This page is only partially interactive. Since this is a static HTML page, only front-end interactivity works. This means you can click buttons and highlight text, but the relevant python-level responses to those actions won’t occur.

Text entity annotation

A common task in natural language processing is to extract entities of interest from some text. This may be as simple as extracting the main content of interest from some text that often comes with boilerplate, or involve identifying e.g. place names or personal names.

To do this, ipyannotations has a widget called ipyannotations.text.TextTagger, which allows you to highlight words, phrases or sentences and assign a class to them.

The widget will display any string, including Markdown-formatted text.

import ipyannotations.text
from ipyannotations._doc_utils import recursively_remove_from_dom


widget = ipyannotations.text.TextTagger()
widget.display("This is an *example sentence*. Try highlighting a word.")
widget

The default entity types are PER (person), ORG (organisation), LOC (location), and MISC (miscellaneous). These are chosen because they are relatively standard in the Named Entity Recognition research community.

You can choose which entity type you are tagging at any point by toggling its button, or using the hotkeys 1 – 0, mapped in order.

To set the classes you are interested in, you can pass them to the widget using the classes argument:

import ipyannotations.text

widget = ipyannotations.text.TextTagger(classes=["Insult", "Compliment"])
widget.display("You are annoying, but I like you.")
widget

The widget will snap to word boundaries by default. This means you can double-click on a word to tag it, hopefully making tagging faster. If you need to label entities at the character level, you can set snap_to_word_boundary to False:

import ipyannotations.text

widget = ipyannotations.text.TextTagger(
    classes=["Insult", "Compliment"],
    snap_to_word_boundary=False
)
widget.display("You are annoying, but I like you.")
widget

The format for the annotations takes the form of a three-tuple with types (int, int, str). The integers indicate the starting and ending character of the selected span, and the string indicates the class name.

widget.data
[(8, 16, 'Insult'), (22, 32, 'Compliment')]