Crowdsourcing & Citizen Science

FROM CROWD KNOWLEDGE
TO MACHINE KNOWLEDGE
use cases with
semantics & user
interaction in cultural
heritage collections

Lora Aroyo
VU University Amsterdam
@laroyo

OBSERVATION I:
VIDEO ANNOTATIONS

Video: rich in meaning
expressed with objects,
people, events & symbols
Annotation: tedious, time-
consuming, incomplete

Professional annotations:
coarse-grained & refer to
entire video and topics

OBSERVATION II:
SEARCH BEHAVIOR

People predominantly
request video fragments:
broadcast (33%), stories
(17%), fragments (49%)

Finding fragments takes
much longer
than ﬁnding broadcasts:
stories (2x), fragment (3x)

OBSERVATION III:
VOCABULARY MISMATCH

experts use a speciﬁc
vocabulary that is
unknown to general
audiences

35% of clicked results
are not found by title
or thesaurus term

OBSERVATION IV:
LIMITED ANNOTATION
RESOURCES

the professionals can
no longer cope with
the demand for
manual annotation on
the ever growing
multimedia content in
their collections

SO WE WANT TO ...
Make a massive set of videos accessible to end users

Improve video search for end users

Maintain a growing community of engaged users

Support professional annotators

TO IMPROVE VIDEO SEARCH:

fragment retrieval
within video navigation

REQUIRES CHANGES IN
ANNOTATIONS:

Including time-based annotations

Bridging the vocabulary gap of searcher &.
cataloguer

exploit crowdsourcing on the web
integrate in content management system

CULTURAL HERITAGE WITH
INVOLVED CONSUMERS &
PROVIDERS

data: open, shared & accessible
infrastructures: interoperable
smart services: relevant content
in right context

Rijksmuseum
Amsterdam
http://chip-project.org

http://e-culture.multimedian.nl/
pk/annotate?uri=RP-P-
OB-77.320

Netherlands Institute
for Sound & Vision

http://openimages.eu
http://academia.nl

MUNCH project
CHOICE project

Europeana
Cultural Search
http://e-culture.multimedian.nl
http://europeana.eu/portal/
thoughtlab.html
http://europeana.eu

WAYS TO USE THE CROWD

Tagging & classiﬁcation
Editing & transcribing
Contextualising
Acquisition
Co-curation
Crowdfunding

WHAT WE DID:
THE WORKFLOW
collect-analyse-manage-integrate user-generated metadata

domain
experts
gamers

collection
managers

Winner EuroITV Competition
Best Archives on the Web Award

COLLECT
User Generated Metadata

TWO PILOTS
Waisda? 1.0 (8 months)
focus group with lay users for motivation aspects
cognitive walk through with professionals
usability testing with lay users and professionals
tag analysis - primarily WHAT & WHO

Waisda? 2.0 (starting this September)
currently running
initial user & tag analysis

Waisda? 1.0
(8 months)
Participation
 44,000 pageviews
2,000 different players
500 registered players
 thousands anonymous players

Time-based annotations
 612 videos tagged
 420,000 tags added
 46,000 unique tags

Vocabulary gap
User tags added the user perspective on the collection
Community consensus as a measure of ‘validity’
Players score points when tag exactly matches a tag entered by another within 10 secs

Waisda? 2.0
(4 weeks
not public)

Participation
 ~1500 users
 81 registered
 1435 anonymous
users
 2344 games

Time-based annotations 11,109 videos in the game
 32,200 tags --> 25,600 ﬁrst time tags 322 were played with
 19% are one time tags (81% appear more than once)
 12,000 validated tags (36%) / (4000 of them ﬁrst time tag)
 1,900 match in vocabs - 257 GTAA people/83 validated; 1,661 GTAA geo/666 validated
 9,796 validated, but no match in GTAA

MANAGE
Analyze Waisda? Collected Tags; Quality Metrics

WHAT ARE THE RELATIONS
BETWEEN TAGS & PROFESSIONAL
ANNOTATIONS?
in terms of the vocabulary used?
in terms of the topics they describe?

Initially: manual analysis
Output goal: to be expressed in a quality metric

WAISDA? TAG ANALYSIS

Study 1
Quantitative analysis of all 46,762 Waisda? tags
Tag coverage w.r.t different vocabularies

Study 2
Qualitative analysis of selected video fragments
Manual classiﬁcation of 1,343 “veriﬁed” tags

DURING FIRST 9 MONTHS
400000

300000 > 600 videos
13 TV series
Number of tags

200000

100000 > 420,000 tags collected
0
46,792 unique tags
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
500
Day
each top 5 most-tagged
video has > 23,000 tags
375
Number of arrivals

250 avg. tag density > 8 tags/sec
125

0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
201
211
221
231

Day

1
MAPPING TAGS TO
VOCABULARIES
Vocabulary of S&V
GTAA 160 000 terms in six
Thesaurus disjoint facets: keyword,
to

ap
pe
d location, maker, genre,
m
person, named entities

mapped Dutch lexical
database
m
ap
pe
d
to
collective
vocabulary of
internet users

RESULTS
Zero-hits
(4404)
11%

Google
(41388)
89%

Corneto
(29815)
(8723)
63%
18%
(2216)
5%
GTAA
(1634)
3%

RESULTS
Zero-hits
(4404)
11%

Facet Tags
Subject 1199
Google
(41388) Location 613
89%
Genre 52
Corneto
(8723)
(29815) Person 118
63%
18%
(2216)
Maker 4
5%GTAA
GTAA Name 673
(3850)
(1634)
8%3%

RESULTS
Zero-hits
(4404)
11%

Types Tags
Noun 7222
Google
(41388)
89% Verb 2090

Corneto Adjective 1693
Corneto
(8723)
(29815)
(10939) 63%
18%
23%(2216)
Adverb 171
5%
GTAA
(1634)
3%

RESULTS
Zero-hits
Zero-hits
(4404)
(4404)
11%
11%
Zero-hit tags
Google
• garbled text
(41388)
89%
• seriously mistyped words
Corneto
• grammatically incorrect
(29815)
(8723)
18%
63% sentences
(2216)
5%
GTAA
(1634)
3%

RESULTS
Zero-hits
(4404)
11%

Google “Meaningful” Tags
• multi-word tags
Google
(41388) • slang
89%
• names
Corneto Pos-hits
(8723)
(29815)
(29815)
63%
• morphological variations
18% 63%
(2216) • minor typos
5%
GTAA • etc…
(1634)
3%

2
TAG CLASSIFICATION
What video aspects are described by user tags?
Manual classification of tag sample
5 videos
non-visual (0), perceptual (11), the rest conceptual
Object (most of the tags) vs. Scene tags (only 30)

3 levels and 4 facets:
abstract, general & specific
who, what, when, where
Tag sample: all verified tags
from random fragments
(182 non-descriptive tags
removed) #1,343

2
RESULTS
Only 30 scene-level tags
Panofsky-Shatford matrix for object-level tags

Abstract General Speciﬁc Total
Who 10 166 177 31%
What 73 5
563 12 57%
Where 0 68 8 7%
When 4 31 6 5%
Total 7% 74% 9%
195 tags (typically adverbs & adjectives) couldn’t be classiﬁed in
any of the facets

[1] Classification of user image descriptions. L. Hollink, G. Schreiber, B. Wielinga & M. Worring

2
RESULTS

Who 10 166 177 31%
What 73 5
563 12 57%
Where 0 68 8 7%
When 4 31 6 5%
Total 7% 74%
74% 9%
any of the facets


2
RESULTS

Who 10 166 177 31%
31%
What 73 5
563 12 57%
57%
Where 0 68 8 7%
When 4 31 6 5%
Total 7% 74% 9%
any of the facets


EXPERIMENTAL SET

5 videos Episodes All tags Veriﬁed Genre
well tagged Reality show 25,965 5837 amusement

well tagged Reality show 22,792 6153 amusement

medium Missing people 1007 274 informative

low tagged Reporter 403 73 informative

low tagged The Walk 257 45 religious

genre might inﬂuence the usefulness of the user tags
users focus on what can be seen or heard (directly perceivable objects)
because of the fast pace of the game

USEFULNESS
according to professionals

SUBTITLES USEFUL?
on the average
26% of all the tags
(35% of the veriﬁed)
are matched in subtitles

used to run bot-based games

otherwise, don’t really bring much new to the annotation

OBSERVATIONS FROM ANALYSIS:

taggers & professionals use different
vocabularies
user tags complement professional annotations
user tags could bridge the searcher vs. cataloger gap
taggers tend to use general concepts more
taggers focus mainly on the What and Who
taggers describe predominantly objects & rarely
scenes

OBSERVATIONS FROM ANALYSIS:

need for validation for quality & correctness
crowd-sourcing users need continuous support
& motivation to supply more and better
contributions

THE QUESTION IS:

Can user tags be used for fragment search?
How to derive user tag quality metrics?

MANAGE
Moderate Waisda? Analyzed Tags

Tag gardening

annotations for video,
fragments & scenes

adding types & roles to
user tags

mapping tags to concepts,
disambiguation

spelling corrections

single & batch tags/video
editing

sorted by
frequency all frames per tag

INITIAL (SMALL) EXPERIMENT
•1 video, 36 tags, 4 validators

• validatorsselects the most appropriate concept for each tag
(if present)

• frames within the video available - linked to the tags

• reconciliation agains GTAA, Cornetto, Freebase

• (1)
select a source, (2) start reconciliation (3) choose from
suggested concepts

krippendorff
coverage 33 out of 36 tags reliability
coefﬁcient

Number of tags reconciled by the four participants

more than 50% of concepts

Ranks of the selected tags

Lessons
• Cornetto is good for subject terms
• Selecting most suitable concept is time consuming as the
differences sometimes are difﬁcult to understand
• For all NE users could select a concept from Freebase & GTAA
• Vocabularies like GTAA have less additional information, thus
it’s more difﬁcult to select the right concept
• Freebase works well for English (Dutch might be a problem)
• Full coverage - prevented by spelling errors and variations
• Users adapt quickly, e.g. Freebase is good for people & locations

Recommendations
• reconcile agains multiple data sources at the same
time for best coverage
• merge duplicates into a single suggestion
• organize results in categories, e.g. people, locations
and subjects (to avoid suggestions from the wrong type)
• pre-processing to merge similar concepts (deal
with spelling errors and variations)
• rank the suggested concepts by precision, then by recall
(as alternative)

INTEGRATE
the improved (gardened) user annotations

WITHIN VIDEO NAVIGATION
group by type
(faceted)


concepts
uniquely
identiﬁed


background
info

THE TEAM

Riste Gligorov @mbrinkerink
@GuusSchreiber Marteen Brinkerink
Guus Schreiber @johanoomen
Johan Oomen
@laroyo
Lora Aroyo
@McHildebrand
@jrvosse Michiel Hildebrand @lottebelice
Jacco van Ossenbruggen Lotte Belice Baltussen

LINKS

http://waisda.nl

http://www.prestoprime.org/

@waisda

http://www.cs.vu.nl/~laroyo

http://slideshare.com/laroyo

Crowdsourcing & Citizen Science

Recommended

Recommended

More Related Content

More from Lora Aroyo

More from Lora Aroyo (20)

Recently uploaded

Recently uploaded (20)

Crowdsourcing & Citizen Science

Editor's Notes