Boffins scrape together a dataset to aid in the fight against modern day slavery
By Katyanna Quach 5 Feb 2019 at 08:02 25
AI is the latest recruit in the ongoing efforts to stamp out the scourge of human trafficking – by helping police figure out which hotels victims are being held.
Hundreds of thousands of people are shuttled across borders every year against their will and exploited, most of them young women coerced into prostitution. Traffickers often take photos of their victims in hotel rooms to use in online escort ads. Now, boffins are trying to use machine-learning software to help cops and non-profits identify where these victims are being held based on patterns discerned from the ad images.
A group of researchers from George Washington University, Temple University, and Adobe in the US have built a large dataset containing over a million images from 50,000 hotels across different countries. They hope their public Hotels-50K dataset will help developers train neural networks that can spot where a victim may be in seconds, judging from the background of their online ad.
A room’s decor may indicate its general vicinity, based on the hotel it is likely to be in. Curtains, wallpaper, bedspreads, and so on, can be analyzed to narrow down victims to particular chains and locations.
“First, we propose and formulate the problem of hotel instance recognition,” they wrote in a paper emitted at the end of last month via arXiv.
“Second, we curate and share a data set and evaluation protocol for this problem at a scale that is relevant to international efforts to address trafficking. Third, we describe and test algorithms that include the data augmentation steps necessary to attack this problem as a reasonable baseline for comparisons.”
The research was presented at the AAAI conference last week in Hawaii.
Trying to identify hotels from pictures alone is, not surprisingly, very difficult. Seeing as most hotel rooms look similar, numerous pictures from as many different hotels as possible are needed to teach a neural network the differentiating identifiers for various hotel chains.
“Our approach to determining the hotel in images of hotel rooms is to train a deep convolutional neural network to produce a short code for each image it sees, where images from the same hotel have very similar codes, and images from different hotels have very different codes” Abby Stylianou, first author of the paper and a Postdoc at George Washington University, explained to The Register. “We then infer the hotel identity from the images with the most similar codes.”
All the images have been annotated to contain information about the name of the hotel, its location, and if it’s part of a hotel chain or not. Some pictures are taken from travel websites like Expedia and show clean, well-lit bedrooms, while others are scraped from TraffickCam, which contains amateur snaps sent in by those keen to help the victims of trafficking and abuse.
Out of the 50,000 different hotel classes in the dataset, 13,900 have corresponding TraffickCam images. These amateur images are valuable during training because they are more similar to the grainy pictures taken by traffickers for online ads, as opposed to professional glossy images used by hotel chains of their rooms.
A neural network trained on the Hotel-50K dataset thus has to learn to map the dodgy TraffickCam images to the perfect travel website images to work out the correct hotel. Some 17,954 TraffickCam images are used to test the AI schooled using the dataset.
Training vs testing
The testing dataset is similar to the training dataset, though it contains silhouettes shaped like people to mimic real photos used in human trafficking cases, where victims are redacted using a black overlay. In other words, the training set mimics photographs shown to the public by the cops to help trace victims based on their whereabouts, where the people in the snaps are blanked out for privacy reasons.
Ideally, a system trained on the Hotels-50K dataset will be able to identify the hotel based on the room’s decor, whether or not a person or people are also in the picture.
“We aren’t able to train with [authentic human trafficking] images, since we don’t have access to many of them, and don’t typically know where they were taken,” said Stylianou.
“But we need to be able to identify the hotel in these images at search time. In order to do this well, we assume that an investigator will always erase the victim from the photo, leaving a ‘blanked out’ region in the image.
“We then artificially erase similar looking regions from a number of the images that our network is trained on, using people shaped masks from the Microsoft Common Objects in Context dataset. This encourages the network to learn to produce similar codes for images from the same hotel even if there is a large blanked out shape in the image.” The researchers used two pre-trained neural networks (ResNet-50 and VCG) to test their dataset. Both could correctly identify common hotel chains from images with nearly 80 per cent accuracy.
“Nearly all of the top images retrieved by our model are from the correct hotel chain, but not necessarily the correct hotel,” according to the paper.
Given a test image, the system finds a thousand images that are most similar to guess the hotel’s chain. But identifying the individual hotel is much harder. Stylianou admitted their approach identified the correct hotel within the first 100 images only 24 per cent of the time. That may sound low, however, she told us:
Recognizing the correct hotel that often is actually a pretty remarkable achievement – these are extraordinarily difficult query images with huge occlusions, and the random chance of selecting the correct hotel in the top 100 examples out of 50,000 possible hotels would be 0.002 per cent. There’s definitely room for improvement, but 24 per cent is a big deal.
Since the researchers haven’t revealed any results about how well a system trained on their dataset works for real human trafficking photos, it’s likely the results aren’t good yet.
Nevertheless, the researchers said their search system is being used by organizations such as America’s National Center for Missing and Exploited Children (NCMEC).
“Our goal is for this dataset to be used by computer vision and machine learning researchers who are interested in a challenging visual recognition problem with a very large number of classes, and who might be motivated by the application of their work to help do good in the world,” said Stylianou.