File size: 4,374 Bytes
7885a28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
.. _labeled_faces_in_the_wild_dataset:

The Labeled Faces in the Wild face recognition dataset
------------------------------------------------------

This dataset is a collection of JPEG pictures of famous people collected
over the internet, all details are available on the official website:

http://vis-www.cs.umass.edu/lfw/

Each picture is centered on a single face. The typical task is called
Face Verification: given a pair of two pictures, a binary classifier
must predict whether the two images are from the same person.

An alternative task, Face Recognition or Face Identification is:
given the picture of the face of an unknown person, identify the name
of the person by referring to a gallery of previously seen pictures of
identified persons.

Both Face Verification and Face Recognition are tasks that are typically
performed on the output of a model trained to perform Face Detection. The
most popular model for Face Detection is called Viola-Jones and is
implemented in the OpenCV library. The LFW faces were extracted by this
face detector from various online websites.

**Data Set Characteristics:**

=================   =======================
Classes                                5749
Samples total                         13233
Dimensionality                         5828
Features            real, between 0 and 255
=================   =======================

.. dropdown:: Usage

  ``scikit-learn`` provides two loaders that will automatically download,
  cache, parse the metadata files, decode the jpeg and convert the
  interesting slices into memmapped numpy arrays. This dataset size is more
  than 200 MB. The first load typically takes more than a couple of minutes
  to fully decode the relevant part of the JPEG files into numpy arrays. If
  the dataset has  been loaded once, the following times the loading times
  less than 200ms by using a memmapped version memoized on the disk in the
  ``~/scikit_learn_data/lfw_home/`` folder using ``joblib``.

  The first loader is used for the Face Identification task: a multi-class
  classification task (hence supervised learning)::

    >>> from sklearn.datasets import fetch_lfw_people
    >>> lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

    >>> for name in lfw_people.target_names:
    ...     print(name)
    ...
    Ariel Sharon
    Colin Powell
    Donald Rumsfeld
    George W Bush
    Gerhard Schroeder
    Hugo Chavez
    Tony Blair

  The default slice is a rectangular shape around the face, removing
  most of the background::

    >>> lfw_people.data.dtype
    dtype('float32')

    >>> lfw_people.data.shape
    (1288, 1850)

    >>> lfw_people.images.shape
    (1288, 50, 37)

  Each of the ``1140`` faces is assigned to a single person id in the ``target``
  array::

    >>> lfw_people.target.shape
    (1288,)

    >>> list(lfw_people.target[:10])
    [5, 6, 3, 1, 0, 1, 3, 4, 3, 0]

  The second loader is typically used for the face verification task: each sample
  is a pair of two picture belonging or not to the same person::

    >>> from sklearn.datasets import fetch_lfw_pairs
    >>> lfw_pairs_train = fetch_lfw_pairs(subset='train')

    >>> list(lfw_pairs_train.target_names)
    ['Different persons', 'Same person']

    >>> lfw_pairs_train.pairs.shape
    (2200, 2, 62, 47)

    >>> lfw_pairs_train.data.shape
    (2200, 5828)

    >>> lfw_pairs_train.target.shape
    (2200,)

  Both for the :func:`sklearn.datasets.fetch_lfw_people` and
  :func:`sklearn.datasets.fetch_lfw_pairs` function it is
  possible to get an additional dimension with the RGB color channels by
  passing ``color=True``, in that case the shape will be
  ``(2200, 2, 62, 47, 3)``.

  The :func:`sklearn.datasets.fetch_lfw_pairs` datasets is subdivided into
  3 subsets: the development ``train`` set, the development ``test`` set and
  an evaluation ``10_folds`` set meant to compute performance metrics using a
  10-folds cross validation scheme.

.. rubric:: References

* `Labeled Faces in the Wild: A Database for Studying Face Recognition
  in Unconstrained Environments.
  <http://vis-www.cs.umass.edu/lfw/lfw.pdf>`_
  Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller.
  University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.


.. rubric:: Examples

* :ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`