Person Recognition in Personal Photo Collections

Seong Joon Oh, Rodrigo Benenson, Mario Fritz, and Bernt Schiele

Abstract — Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).

This paper has been accepted to ICCV 2015.

@INPROCEEDINGS{oh2015person,
  title={Person Recognition in Personal Photo Collections},
  author={Oh, Seong Joon and Benenson, Rodrigo and Fritz, Mario and Schiele, Bernt},
  booktitle = {ICCV},
  year={2015} }

Data & Downloads

Validation / Test set splits

We provide additional evaluation protocols (splits) on the People In Photo Albums (PIPA) dataset published in [2]. These broaden the scope of person recognition scenarios considered.

Splits can be downloaded here: pipa-splits.tar.gz.

Models

All the models are based on caffe. We use the same network (AlexNet) for all the models. The network prototxt is given here: alexnet_extraction.prototxt

By default, models are pretrained on ImageNet and finetuned on PIPA for person recognition task. However, some models are finetuned either on a different database (e.g. CASIA heads) or with differen tasks (e.g. gender prediction).

Face (f)
Head (h)
Upper body (u)
Full body (b)
Scene (s)
Finetuning with CACD on head (h_cacd)
Finetuning with CASIA on head (h_casia)
Head finetuned with attribute prediction task
Upper body finetuned with attribute prediction task

Results

We also release the naeil (final system in the paper) scores in four different settings (original, album, time, day) and the evaluation code for regenerating the naeil results in the paper: naeil-evaluation.tar.gz

Attribute annotations

Long term attributes are the attributes that are fixed for a given identity. We have annotated five long term attributes (age, gender, glasses, hair colour, hair length) per identity based on the PIPA heads. The attributes are determined by manually observing multiple instances of each identity.

Long term attribute signals give a coarser supervision than the identity signal. Nonetheless, we find that the long term attribute and the identity supervisions are complementary [1].

PIPA attributes annotations can be downloaded here: attribute-annotations.tar.gz.

Attribute	Classes	Criteria
Age	Infant	Not walking due to young age, in many pictures.
	Child	Body size is not fully grown.
	Young Adult	Body size is fully grown & Age < 45.
	Middle Age	45 <= Age < 60
	Senior	Age >= 60
	Unknown / changing	Little visual evidence to determine. Not included in the finetuning of h_age.

Gender	Female	Female looking persons.
	Male	Male looking persons.
	Unknown / changing	Little visual evidence to determine. Not included in the finetuning of h_gender.

Glasses	None	No eyewear.
	Glasses	Glasses without major eye occlusion.
	Sunglasses	Glasses with major eye occlusion.
	Unknown / changing	Little visual evidence to determine. Not included in the finetuning of h_glasses.

Hair colour	Black	Completely black hair.
	White	Any hint of whiteness.
	Others	Neither of the above.
	Unknown / changing	Little visual evidence to determine. Not included in the finetuning of h_haircolour.

Hair length	No hair	No hair on the scalp.
	Less hair	Hairless for > 1/2 of the upper scalp.
	Short hair	Hair length < 10 cm (when straightened).
	Med hair	Hair does not extend below chin (when straightened).
	Long hair	Hair extends below chin (when straightened).
	Unknown / changing	Little visual evidence to determine. Not included in the finetuning of h_hairlength.

For the upper body attributes, as described in the paper, we finetune on the PETA database of pedestrians [3] with five long term attributes:

Age1: personalLess30
Age2: personalLess45
Gender: personalMale
Short hair: hairShort
Black hair: hair(multiclass) Black

Metadata

We share the "photo-taken-date" metadata used for generating the "Time split": data_timestamp.mat. The times are in the format YYYY mm DD HH MM SS, in the instance order given by: index.txt. The data were collected using Flickr API.

For further information or data, please contact Seong Joon Oh <joon at mpi-inf.mpg.de>.

References

[1] Person Recognition in Personal Photo Collections. S. Oh, R. Benenson, M. Fritz and B. Schiele, IEEE International Conference on Computer Vision (ICCV), 2015, (to appear).

[2] Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues, N. Zhang, M. Paluri, Y. Taigman, R. Fergus and L. Bourdev, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[3] Pedestrian Attribute Recognition at Far Distance, Y. Deng, P. Luo, C. C. Loy, X. Tang, In Proceedings of ACM Multimedia (ACM MM), 2014.