here for an illustrated version of some of these guidelines.
- User interface guidelines
- Types of queries
- Single words
- Approximate and wildcard spellimg
- Compound queries: OR
- Compound queries: AND
- Compound queries: proximity AND
- Compound queries: NOT
- Compound queries: word sequences
- Compound queries: macros
- Mixed compound queries
This search interface provides textual access to the iconic
"Trésor des Chartes" (or "Chancery") collection of manuscripts,
which encompasses 83,000 page images reproducing the medieval
registers produced by the French royal chancery.
The registers date from 1302 to 1483 and are handwritten
mainly in Latin, French and Spanish.
While automatic transcription of medieval sources is notoriously
difficult because of the greatly variable handwriting styles,
this corpus is even more challenging because of its multilingual
content and the large number of archaic and abbreviated words.
project (2015-2017), this large image collection was
made searchable through plain-text queries using Probabilisitic
Indexing (PrIx) technology. The present search interface is
a new and enhanced version of a prototype interface developped
in HIMANIS for demonstration purposes.
If you are using this resource and find it useful, please include
in your publications a reference to:
Théodore Bluche, Sébastien Hamel, Christopher Kermorvant,
Joan Puigcerver, Dominique Stutzmann, Alejandro H. Toselli,
and Enrique Vidal: «Preparatory KWS Experiments for Large-Scale
Indexing of a Vast Medieval Manuscript Collection in the HIMANIS
Project». In 14th IAPR International Conference on Document Analysis
and Recognition. ICDAR 2017 proc., pp.312-17, 2017.
User interface guidelines
Queries are typed on the query box,
on the left of the "Search" button.
Text typed in the query box is automatically
case and diacritics folded. For example,
"Paris" becomes "paris", "Sébastien" becomes "sebastien"
and "Août" becomes "aout" (without the quotes).
A confidence box and slider are provided to express a
confidence threshold which determines the desired
A "Max. results" box is provided to indicate the
maximum number of spots to be retrieved and shown. For example,
if this number is set to 20 and the confidence is set to 40%,
the system shows at most 20 spots with confidence higher than 40%;
and if no (or 0) confidence is set, the system retrieves at most
20 spots with the highest confidence.
Text images are organized according to this hierarchy:
HOME, Collection, Volume and Page
Click on the front page (HOME) "Chancery" banner with no query
to enter the full Chancery collection. Then you can see
the 199 Volumes indexed, each showing the number of images it
contains (ranging from less then 100 to more than 1000).
At each level of the hierarchy (except the lowest, page level),
the elements are shown by means of thumbnail images or
miniatures, along with some information; namely,
a) an identifier of the element,
b) a confidence bar representing the system's
self-reliance that the query appears somewhere in this
element and c) the number of elements in the lower level
where the query may appear with a confidence above the threshold
specified. By hovering the mouse over the confidence bar,
the actual confidence value is shown.
Queries can be formulated at any level: HOME, full Collection
(chancery), specific Volume (JJ...), or individual Page.
Pan and Zoom. When a (part of a) page image is shown,
it can be explored by moving the mouse while holding its right
button. In addition, the mouse wheel can be used to zoom in and out.
Double-click. At any point of an image, the left mouse
button can be double-clicked to show all the words or character
sequences which are indexed at this point in the image PrIx.
Tablets and smart phones. With minor limitations,
this user interface is also operational for touch-screen
tablets or smart phones.
Types of queries
Abbreviations are not queried as they appear in the images;
instead the corresponding expanded forms are used. For example,
use "paris" to find "Paris" and "par.", or "johannes" to find
"Johannes", "Joh^s", "Johẽs", etc.
Approximate spelling can be used for any query word.
The special symbol '~' can be appended to any word
to allow searching for words which differ from the given one
in at most one character. Larger dissimilarities can
be specified by appending a number after the '~' symbol.
For example, use 'Theodore~' to find 'Theodore', 'Teodore',
'Theodori', 'Theodoro' and 'Theodores', or 'Theodore~2' to
additionally find 'Theodorum', 'Theodorici', 'Teodoro', etc.
Wildcard spelling can be used for any query word.
The symbol '*' is used as a wildcard representing any
character string. For example, use 'Theodor*' to search for
'Theodoro', 'Theodore', 'Theodori', 'Theodorici', 'Theodorum',
etc. Similarly, the symbol '?' can be used as a wildcard
representing any single character. For example, use 'Johann?s'
to search for 'Johannas', 'Johannes', or 'Johannis'.
A large dissimilarity degree and/or a small number of real
characters in approximate and wildcard spelliling typicllly
lead to huge, useless amounts of retrieved spots, more so if
the query is issued at a high hierarchy level (e.g., at the
root or HOME).
To avoid these useless queries, the maximum dissimilartty
and the minimum amount of real characters are limited
depending on the hierarchy level where the query is issued.
As a general recomendation, approximate and wildcard spelling
should be used at higher hierarchy levels mainly for long words,
and using only the smallest spelling dissimilarity (1), or a
minimum number of wildcard symbols.
Both approximate spelling and wildcards can entail large
server computing demands and should be used with care!
Compound queries. Individual words can be combined into
"compound queries" in three ways: a) boolean (AND, OR, NOT)
queries, b) sequence queries and c) macros.
Boolean queries. AND, OR, NOT relations are interpreted
at the full page-image level and the number of matches is the total
number of word occurrences matching the query. The operators are:
AND: "&&" (or just blank space), OR: "||", NOT: "-" (before the
words which have to be negated). Parenthesis "(", ")", can be
used for grouping: Examples: "paris && (france || flandres)":
pages with at least one instance of "paris" AND at least one
instance of either "france" OR "flandres", or both;
"france - flandres": pages with at least one instance of
"france" but with no instance of "flandres".
Proximity AND queries. Are AND queries with a
number between the two "&" symbols to specify
how far apart the AND components are allowed to be.
The number is a percentage of the whole image size.
For example, 'Angleterre &10& France' retrieves images
where 'Angleterre' is at most 10% of the page images size
apart from 'France'.
Sequence queries. Are sequences of words, expressed
using the symbols "[" and "]". Sequences are not exact strings;
they allow extra words to appear among the stated words.
Examples: "[ludovico francorum]" (where "francorum" is an
expanded abbreviation) would retrieve pages with strings
such as "ludovico franc.", "Ludovico rege franc.",
"ludovico dei gra. francorum", etc. The number of matches
is computed at page-image level and one match corresponds to
the complete sequence. Sequences can be seamlessly
mixed with boolean operators.
Macros are predefined complex queries generally used
to retrieve information which is semantically homogeneous.
In the current system version, macros can only be defined by
the system administrator. Macros can be used by means of their
predefined names preceded by the "@" symbol. The default
system configuration provides six examples of macros:
"@mois" which retrieves the names of the months of the year
in french, "@mensis" for the month names in latin, "@moismensis",
for all the month names in french and latin, and symilarly
for weekday names: "@jour", "@dies" and "@jourdies".
Macros can be seamlessly mixed with word sequences and boolean
operators. The concrete definitions are as follows:
@mois="(janvier || fevrier || mars || avril || mai || may
|| juin || juillet || aout || septembre || octobre
|| novembre || decembre)"
@mensis="(januari* || februari* || marti* || aprilis
|| maio || maii || junio || junii || julio || julii
|| august* || septemb* || octob* || novemb* || decemb*)"
@moismensis="(@mois || @mensis)"
@jour="(lundi || mardi || mercredi || jeudi || vendredi
|| samedi || dimanche)"
@dies="(lune || martis || mercuri || mercurii || iovis
|| veneris || sabbat* || [(die||diem||dies) dominic*])"
@jourdies="(@jour || @dies)"
The approximate locations of spotted queries in each page
are marked with rectangles (called "bounding boxes") surrounding the
corresponding words. The color of a rectangle expresses the degree
of confidence the system has in the corresponding spot. Exact
confidence values can be seen by hovering the mouse over the
If a word appears more than once in a text line, only the instance
with greatest confidence is shown.
The precision-recall tradeoff desired for each query is
specified by mans of the Confidence threshold.
The default value is 50%. A higher value results in more
precision (little or none wrong spots) and less recall
(some, or many existing instances of the query may not be
retrieved); a lower value results in less precision (some,
or many wrong spots) and more recall (all or most the existing
instances can be retrieved).
Queries are interpreted, and results are shown,
hierarchically. If the query is issued at
the top level (HOME), the system first shows the Volumes
where the query may appear with a confidence above the
threshold specified. Then if the same query is issued
for a specific Volume, the system shows all the Pages where
the query may appear with a confidence above the threshold.
Finally if it is issued for a specific Page, the system shows
locations where the words involved in the query may appear
with a confidence above the threshold.
Alexandro Henri Basilio Johannis Constantino Francisco
Andreu Andres Alfons Alfonso Isabel Borbon Ferran Ricardus Richard
Marco Marcus Roberto Robert Jorge George Loys Ludovico Karolo
France Angleterre Normandie Bretagne Gasconia Flandres Alemaigne
Espaigne Portugal Navarre Catalonia Castille Aragon Austria Londres
Rome Milan Bruges Carcassonne Toulouse Vienna Valencia Valence
abbatia acousutme admorti amende bourgeois chambellant chambre
decembre eglise enqueste monnoie monseigneur justice notaire civile
champaigne chancelier especial estable fevrier octembre prevost
relacion remettons septembre sergent subreptice vermendois sapiens
admortisatio anno appelatio aprilis augusti ballivi burgenses englie
februrarii financia firmum gracia gratia honor inquisitio presentibus
remissio septembris servientes speciali thesorarius universis vidimus
viromendensis emanatus scriptor ballivi burgenses cambellanus
Approximate and wildcard spelling
Compound queries: OR
Aquitania || Aquitaine
guerre || paix
hostis || hosti || hoste || hostes || hostum || hostibus
Theodore || Theodoro || Theodori || Theodorici
Isabel || Isabelle
Johannis || Johannes
Constantino || Constantini
Francisco || Franciscus
Andreu || Andres
Compound queries: AND
guerre && paix
Angleterre && France
Henri && Henry
Lambert Lamberto Lamberti Lambertum
Compound queries: proximity AND
Angleterre &5& France
Johannis &10& Ludovico &20& rex
Compound queries: NOT
Gasconia - Johannes
Vienna - France - Paris
Compound queries: word sequences
[ par le roy ]
[ que nobis ex ]
[ philippus dei gratia francorum rex ]
[ Karolus dei gratia francorum et Navarre rex ]
[ karolus dei gratia francorum rex ]
[ Philippes par la grace de Dieu roys de France ]
[ nostro seynor el Rey ]
[ Regno de Navarra ]
[ duc de borbon ]
[ celi terre ]
[ mont saint michiel ]
Compound queries: macros
Mixed compound queries
(guerre || paix) && Espaigne
paix && (Angleterre || France)
(aquitania || aquitaine) Philippus
Gasconia - (johannes && germani)
Ysabelle~ - Ysabel* - sabelle
Vienna - (Johannes || Karolus || Philippes)
fueros~ &20& privilegios~ &20& libertades~
[ mont saint michiel~ ]
[ corporalle~ criminelle~ et civile~ ]
[ assemblee~ (genz||gens||gent) darmes~ ]
[ premier jour @mois ]
[ @jour apres feste nostre dame ]
[ Saint Germain ] &20& (@jour || @mois)
@jour &5& @mois
here for an illustrated version of some of these guidelines.