Person Recognition in Personal Photo Collections
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, and Bernt Schiele
Abstract — Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).
This paper has been accepted to ICCV 2015.
@INPROCEEDINGS{oh2015person, title={Person Recognition in Personal Photo Collections}, author={Oh, Seong Joon and Benenson, Rodrigo and Fritz, Mario and Schiele, Bernt}, booktitle = {ICCV}, year={2015} }
Data & Downloads
Validation / Test set splits
We provide additional evaluation protocols (splits) on the People In Photo Albums (PIPA) dataset published in [2]. These broaden the scope of person recognition scenarios considered.
Splits can be downloaded here: pipa-splits.tar.gz.
Models
All the models are based on caffe. We use the same network (AlexNet) for all the models. The network prototxt is given here: alexnet_extraction.prototxt
By default, models are pretrained on ImageNet and finetuned on PIPA for person recognition task. However, some models are finetuned either on a different database (e.g. CASIA heads) or with differen tasks (e.g. gender prediction).
- Face (f)
- Head (h)
- Upper body (u)
- Full body (b)
- Scene (s)
- Finetuning with CACD on head (h_cacd)
- Finetuning with CASIA on head (h_casia)
- Head finetuned with attribute prediction task
- Upper body finetuned with attribute prediction task
Results
We also release the naeil (final system in the paper) scores in four different settings (original, album, time, day) and the evaluation code for regenerating the naeil results in the paper: naeil-evaluation.tar.gz
Attribute annotations
Long term attributes are the attributes that are fixed for a given identity. We have annotated five long term attributes (age, gender, glasses, hair colour, hair length) per identity based on the PIPA heads. The attributes are determined by manually observing multiple instances of each identity.
Long term attribute signals give a coarser supervision than the identity signal. Nonetheless, we find that the long term attribute and the identity supervisions are complementary [1].
PIPA attributes annotations can be downloaded here: attribute-annotations.tar.gz.
Attribute | Classes | Criteria |
---|---|---|
Age | Infant | Not walking due to young age, in many pictures. |
Child | Body size is not fully grown. | |
Young Adult | Body size is fully grown & Age < 45. | |
Middle Age | 45 <= Age < 60 | |
Senior | Age >= 60 | |
Unknown / changing | Little visual evidence to determine. Not included in the finetuning of h_age. | |
Gender | Female | Female looking persons. |
Male | Male looking persons. | |
Unknown / changing | Little visual evidence to determine. Not included in the finetuning of h_gender. | |
Glasses | None | No eyewear. |
Glasses | Glasses without major eye occlusion. | |
Sunglasses | Glasses with major eye occlusion. | |
Unknown / changing | Little visual evidence to determine. Not included in the finetuning of h_glasses. | |
Hair colour | Black | Completely black hair. |
White | Any hint of whiteness. | |
Others | Neither of the above. | |
Unknown / changing | Little visual evidence to determine. Not included in the finetuning of h_haircolour. | |
Hair length | No hair | No hair on the scalp. |
Less hair | Hairless for > 1/2 of the upper scalp. | |
Short hair | Hair length < 10 cm (when straightened). | |
Med hair | Hair does not extend below chin (when straightened). | |
Long hair | Hair extends below chin (when straightened). | |
Unknown / changing | Little visual evidence to determine. Not included in the finetuning of h_hairlength. |
For the upper body attributes, as described in the paper, we finetune on the PETA database of pedestrians [3] with five long term attributes:
- Age1: personalLess30
- Age2: personalLess45
- Gender: personalMale
- Short hair: hairShort
- Black hair: hair(multiclass) Black
Metadata
We share the "photo-taken-date" metadata used for generating the "Time split": data_timestamp.mat. The times are in the format YYYY mm DD HH MM SS, in the instance order given by: index.txt. The data were collected using Flickr API.
For further information or data, please contact Seong Joon Oh <joon at mpi-inf.mpg.de>.
References
[1] Person Recognition in Personal Photo Collections. S. Oh, R. Benenson, M. Fritz and B. Schiele, IEEE International Conference on Computer Vision (ICCV), 2015, (to appear).
[2] Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues, N. Zhang, M. Paluri, Y. Taigman, R. Fergus and L. Bourdev, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[3] Pedestrian Attribute Recognition at Far Distance, Y. Deng, P. Luo, C. C. Loy, X. Tang, In Proceedings of ACM Multimedia (ACM MM), 2014.