10 Feb 2014 #tutorial #opencv #python

Face Detection and Recognition in Python with OpenCV

Learn how to use the Python bindings for OpenCV to make a program that is able to detect faces and then recognize them.

You can find the code from this tutorial on github.

Note: This post won’t go into depth about what OpenCV is doing behind the scenes. The aim is to end up with working Python code. See the bottom of the post for links to more in-depth explanations (in C++).

You will need both Python (2.7.6) and OpenCV, as well as a webcam. If you already have these installed, skip to the next part.

Installing Python and OpenCV

For Windows and Mac OS X, I won’t write up how to install OpenCV. Instead, I’ll point you to other guides dedicated to the task:

  • OSX: brew install homebrew/science/opencv3 --with-contrib --with-python3
    • Once brew finishes, it will provide three commands that need to be run to set up the Python bindings
  • Ubuntu: Ubuntu 16.04: How to install OpenCV
    • OpenCV is available via apt, but it currently installs OpenCV 2 instead of 3. Some changes to the code below may be necessary.
      • sudo apt-get install python-numpy python-opencv
  • Windows: Building OpenCV 3.0 on Windows

Getting started with OpenCV

The first part of the program is getting the input from your webcam.

# facerec.py
import cv2

# Use camera 0 (you might need to change this if you have multiple cameras)
webcam = cv2.VideoCapture(0)

while True:

    # Loop until the camera is working
    rval = False
    while(not rval):
        # Put the image from the webcam into 'frame'
        (rval, frame) = webcam.read()
        if(not rval):
            print("Failed to open webcam. Trying again...")

    # Flip the image (optional)
    frame=cv2.flip(frame,1,0)

    # Show the image in a window with title "OpenCV"
    cv2.imshow("OpenCV", frame)

    # Wait for
    key = cv2.waitKey(10)
    if key == 27: #The Esc Key
        break

Read through the comments before running the code to make sure you understand it.

Save this as facerec.py and run it from a command line using $ python path/to/file/facerec.py where path/to/file/ is the path to the folder containing the file. If you aren’t sure what it is, you can also drag the file in the command line.
The Python program simply loops capturing and showing an image. To exit the program, either press Exit or Ctrl-C . If you have multiple cameras, you might have to change the ‘0’ in line 4 to ‘1’ or ‘2’ (and so on).

Detecting faces

At this point, the program isn’t doing much. It’s just receiving a video stream from the webcam and displaying it. The next step is getting it to detect the faces we want to identify.

For this part, you are going to need a cascade file for detecting faces. Download haarcascade_frontalface_default.xml (Right-Click > Save Link As) and place it in the same directory as facerec.py. Make sure you don’t change the file’s name, or the program won’t be able to find it.

# facerec.py
import cv2
size = 1
webcam = cv2.VideoCapture(0) #Use camera 0

# We load the xml file
classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

while True:

    # Loop until the camera is working
    rval = False
    while(not rval):
        # Put the image from the webcam into 'frame'
        (rval, frame) = webcam.read()
        if(not rval):
            print("Failed to open webcam. Trying again...")

    frame=cv2.flip(frame,1,0) #Flip to act as a mirror

    # Resize the image to speed up detection
    mini = cv2.resize(frame, (int(frame.shape[1] / size), int(frame.shape[0] / size)))

    # We let OpenCV do it's thing
    faces = classifier.detectMultiScale(mini)

    # Draw rectangles around each face
    for f in faces:
        (x, y, w, h) = [v * size for v in f] #Scale the shapesize backup
        cv2.rectangle(frame, (x, y), (x + w, y + h),(0,255,0),thickness=4)

    # Show the image and lookout for the Escape key
    cv2.imshow('OpenCV', frame)
    key = cv2.waitKey(10)
    if key == 27: #The Esc key
        break

Again, save it as facerec.py and run it. If all goes well, it should be displaying the feed from the webcam, while drawing a rectangle around each face it sees.

If the program feels slow when you run it, you can increase the size = 1 value (e.g. to size = 2). This will down-scale the image to speed-up the detection. Increase the value by one each time, until it is not so slow anymore.

Recognizing faces

Now, we want the program to be able recognize the faces it sees. To do this, we are first going to need some training data. I will later cover how to produce your own training set, but for now I’ll be using a face database compiled by AT&T. You can get it from here . Decompress the .zip file into a folder named att_faces (in the same directory as facerec.py). Your working directory should now look like this:
.
|—- facerec.py
|—- haarcascade_frontalface_default.xml
|—- att_faces
|    |—- s1
|    |    |—- 1.pgm
|    |    |—- 2.pgm
|    |    |—- 3.pgm
|    |    |—- 4.pgm
|    |    |—- 5.pgm
|    |    |—- 6.pgm
|    |    |—- 7.pgm
|    |    |—- 8.pgm
|    |    |—- 9.pgm
|    |    |—- 10.pgm
|    |—- s2
|    |    |—- 1.pgm
|    |    |—- …
| |—- …

The next bit of code:

# facerec.py
import cv2, sys, numpy, os
size = 1
fn_haar = 'haarcascade_frontalface_default.xml'
fn_dir = 'att_faces'

# Part 1: Create fisherRecognizer
print('Training...')

# Create a list of images and a list of corresponding names
(images, lables, names, id) = ([], [], {}, 0)

# Get the folders containing the training data
for (subdirs, dirs, files) in os.walk(fn_dir):

    # Loop through each folder named after the subject in the photos
    for subdir in dirs:
        names[id] = subdir
        subjectpath = os.path.join(fn_dir, subdir)

        # Loop through each photo in the folder
        for filename in os.listdir(subjectpath):

            # Skip non-image formates
            f_name, f_extension = os.path.splitext(filename)
            if(f_extension.lower() not in
                    ['.png','.jpg','.jpeg','.gif','.pgm']):
                print("Skipping "+filename+", wrong file type")
                continue
            path = subjectpath + '/' + filename
            lable = id

            # Add to training data
            images.append(cv2.imread(path, 0))
            lables.append(int(lable))
        id += 1
(im_width, im_height) = (112, 92)

# Create a Numpy array from the two lists above
(images, lables) = [numpy.array(lis) for lis in [images, lables]]

# OpenCV trains a model from the images
# NOTE FOR OpenCV2: remove '.face'
model = cv2.face.createFisherFaceRecognizer()
model.train(images, lables)




# Part 2: Use fisherRecognizer on camera stream
haar_cascade = cv2.CascadeClassifier(fn_haar)
webcam = cv2.VideoCapture(0)
while True:

    # Loop until the camera is working
    rval = False
    while(not rval):
        # Put the image from the webcam into 'frame'
        (rval, frame) = webcam.read()
        if(not rval):
            print("Failed to open webcam. Trying again...")

    # Flip the image (optional)
    frame=cv2.flip(frame,1,0)

    # Convert to grayscalel
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Resize to speed up detection (optinal, change size above)
    mini = cv2.resize(gray, (int(gray.shape[1] / size), int(gray.shape[0] / size)))

    # Detect faces and loop through each one
    faces = haar_cascade.detectMultiScale(mini)
    for i in range(len(faces)):
        face_i = faces[i]

        # Coordinates of face after scaling back by `size`
        (x, y, w, h) = [v * size for v in face_i]
        face = gray[y:y + h, x:x + w]
        face_resize = cv2.resize(face, (im_width, im_height))

        # Try to recognize the face
        prediction = model.predict(face_resize)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 3)

        # [1]
        # Write the name of recognized face
        cv2.putText(frame,
           '%s - %.0f' % (names[prediction[0]],prediction[1]),
           (x-10, y-10), cv2.FONT_HERSHEY_PLAIN,1,(0, 255, 0))

    # Show the image and check for ESC being pressed
    cv2.imshow('OpenCV', frame)
    key = cv2.waitKey(10)
    if key == 27:
        break

If you are using OpenCV2 (e.g. in Ubuntu instructions above), make the modification as noted.

Save the code as facerec.py and run it. Because we are using faces from a pre-made dataset, it won’t recognise you just yet. It will display it’s best guess, which at this point should be of the form s* . If you want the program not to show a name if the certainty is too low, you could put in an if-statement checking ‘prediction[1]’:

if prediction[1]<500:
  cv2.putText(frame,
      '%s - %.0f' % (names[prediction[0]],prediction[1]),
      (x-10, y-10), cv2.FONT_HERSHEY_PLAIN,1,(0, 255, 0))
else:
  cv2.putText(frame,
      'Unknown',
      (x-10, y-10), cv2.FONT_HERSHEY_PLAIN,1,(0, 255, 0))

You can change 500 to what you find works best. This code would replace the three lines below the comment ‘# [1]’.

Training

Hopefully, the program is recognising faces and trying label them, but it still doesn’t know your name. To fix that, we want to generate our own training data, to replace the att_faces database. If you want, you can delete the folders labelled s?, but keep the folder att_faces .
The second Python file, train.py, will add to the training set using images from the webcam. Make sure there is only one face in the shot.

# train.py
import cv2, sys, numpy, os
size = 1
fn_haar = 'haarcascade_frontalface_default.xml'
fn_dir = 'att_faces'
try:
    fn_name = sys.argv[1]
except:
    print("You must provide a name")
    sys.exit(0)
path = os.path.join(fn_dir, fn_name)
if not os.path.isdir(path):
    os.mkdir(path)
(im_width, im_height) = (112, 92)
haar_cascade = cv2.CascadeClassifier(fn_haar)
webcam = cv2.VideoCapture(0)

# Generate name for image file
pin=sorted([int(n[:n.find('.')]) for n in os.listdir(path)
     if n[0]!='.' ]+[0])[-1] + 1

# Beginning message
print("\n\033[94mThe program will save 20 samples. \
Move your head around to increase while it runs.\033[0m\n")

# The program loops until it has 20 images of the face.
count = 0
pause = 0
count_max = 20
while count < count_max:

    # Loop until the camera is working
    rval = False
    while(not rval):
        # Put the image from the webcam into 'frame'
        (rval, frame) = webcam.read()
        if(not rval):
            print("Failed to open webcam. Trying again...")

    # Get image size
    height, width, channels = frame.shape

    # Flip frame
    frame = cv2.flip(frame, 1, 0)

    # Convert to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Scale down for speed
    mini = cv2.resize(gray, (int(gray.shape[1] / size), int(gray.shape[0] / size)))

    # Detect faces
    faces = haar_cascade.detectMultiScale(mini)

    # We only consider largest face
    faces = sorted(faces, key=lambda x: x[3])
    if faces:
        face_i = faces[0]
        (x, y, w, h) = [v * size for v in face_i]

        face = gray[y:y + h, x:x + w]
        face_resize = cv2.resize(face, (im_width, im_height))

        # Draw rectangle and write name
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 3)
        cv2.putText(frame, fn_name, (x - 10, y - 10), cv2.FONT_HERSHEY_PLAIN,
            1,(0, 255, 0))

        # Remove false positives
        if(w * 6 < width or h * 6 < height):
            print("Face too small")
        else:

            # To create diversity, only save every fith detected image
            if(pause == 0):

                print("Saving training sample "+str(count+1)+"/"+str(count_max))

                # Save image file
                cv2.imwrite('%s/%s.png' % (path, pin), face_resize)

                pin += 1
                count += 1

                pause = 1

    if(pause > 0):
        pause = (pause + 1) % 5
    cv2.imshow('OpenCV', frame)
    key = cv2.waitKey(10)
    if key == 27:
        break

Save this as train.py and run it using $ python path/to/file/train.py Name where path/to/file/ is the path to the folder containing the file, and Name is the name of the subject. train.py will use the face detection code from above. It will store 20 images of the face it detects and stores it in att_faces under the name you gave it. Note: If you ever put this code online (eg onto Github), make sure you don’t also upload your images stored in att_faces.

You need a minimum of two different people in the training data. It can also get slow to start the program when you have trained it a lot. This won’t really affect the performance of the recognition itself.

Further Reading

Find out more information from the official OpenCV docs.







Comments