With Artificial Intelligence, Facebook Helps the Blind to Know What’s in the Pictures

by Diego Graglia

From Univision, published April 5, 2016 (translated with assistance from translate.google.com). To read the article in Spanish, visit http://www.univision.com/noticias/tecnologia/con-inteligencia-artificial....

Facebook is blocked in the office. But Angel Adorno, 51, opens it on his phone when he has nothing to do.

“I check to see what is happening,” he says, “if someone wants to be my friend, or if they send me messages.”

When he comes home to Queens, sometimes he connects to the network and listens as the screen reader recites messages from his friends, interspersed with the barking of his dogs. “I have a lot of friends ...” he says, “bah, I think we are friends.”

He writes occasionally. He prefers Facebook over Twitter, because “140 letters are not many words.” But he does not post anything very private to networks. For family things, he prefers to talk about them by phone.

It may seem obvious, but when Angel uses social networks, he never stops at the photos. He’s almost never interested. It’s not just because he cannot see, but also the people who hang with him usually don’t take the time to describe what’s in them for him.

“I do not look at them,” he says, “because, why?”

That gap between Angel and his contacts can shrink a little with the new features Facebook and Twitter have just developed for people who are blind or low vision to know what’s in the pictures.

Facebook on Tuesday launched an image recognition service based on artificial intelligence that can describe a photo automatically to a blind user. Although now it is only with somewhat general descriptions, it is a first step on a path that can be revolutionary.

“Today the visual content is beyond the reach of people who do not see,” says Matt King, a Facebook engineer who is blind and worked on the development of this function. “This type of technology is new and extremely exciting.”

Twitter, meanwhile, has just opened the possibility for users to add a detailed description to each image they publish. Thus, screen readers such as the programs Angel Adorno uses can convey that information to the user.

And where’s the button?

“While browsing the Internet is a daily and simple task for billions of people, it is a constant challenge if you do not see. People who are blind are on the web looking for information or trying to buy something and find parts of web sites that are not accessible,” says the executive director of the American Council of the Blind, Eric Bridges. “For example, a site that has no accessible ‘Buy’ button: you go through the whole experience of making a purchase and then cannot find the ‘Buy’ button. Those things still happen today. It is a very uneven and sometimes frustrating experience.”

More than one million people are blind in the United States and 12 million suffer some degree of vision loss. Hispanics and African-Americans are a high-risk population of the visually impaired because they are more likely to have diseases like diabetes and glaucoma.

Worldwide, there are 39 million blind and 246 million people with low vision.

Guessing the menu

This is an unequal experience, says Bridges, that happens when those designing a website do not have in mind the technical needs of people with vision problems. The digital content should have additional information for reading programs that will let you tell the blind user what is on the screen: is it a button, a drop-down menu, is it an image? Does the image have description? What does it say?

The same applies to social networks.

“People post photos and do not describe them or put in additional text,” says Bridges, who is blind and often uses Facebook. “All you hear is ‘image,’ ‘image,’ ‘image,’ and you continue asking yourself ... you don’t know what they are. I have friends who are foodies and publish photos ... I have to guess what they will eat or see if someone says in the comments. Or, what’s left is to be the annoying guy who’s asking in the comments: ‘What’s in the picture?’”

Best depend on the machines

The technology Facebook launched Tuesday — which for now is only in English and for users of Apple’s mobile devices — is a step toward solving this problem without relying on users themselves saying what they are eating.

“If we were to rely on humans to do this, it would be very difficult to do,” says Jeff Wieland, head of accessibility at Facebook, which proposed the creation of this team in the company five years ago.

“It’s pretty exciting to be involved in this field,” he says now, “because there are technologies that can really change things.”

The new feature is called “Automatic text alternative,” and it works on iPhones and iPads that have the VoiceOver feature enabled. For those listening to it in English, they will hear phrases like: “The image may contain: one or more people, smiling, sunglasses, outdoors, sky, water.”

How can Facebook know what’s in a photo?

The system uses a “neural network” computer system armed with a structure similar to the human brain. Company artificial intelligence engineers, who receive 2,000 million images every day through the apps, feed it with millions of examples of photos to learn to recognize objects. Still in its first stage, the service will be pretty basic. It will only use between 80 and 100 concepts to describe the photos.

Some are: auto, airplane, bicycle, train, exterior, mountain, tree, snow, sky, ocean, beach, tennis, swimming, stadium, basketball, ice cream, sushi, pizza, dessert, coffee, glasses, baby, beard, shoes. And I could not miss, obviously, “selfie.”

Is it a baby or a chimpanzee?

The technology is not ready to differentiate, for example, between two varieties of pizza or to say with certainty that a person whose face is not seen is a human being. And developers of Facebook prefer to be conservative to avoid creating distrust in users. The objects and concepts described will be those in which the system achieves at least 80% certainty.

“To say that a baby is a chimpanzee or a chimpanzee is a baby is a mistake we do not want to commit,” says King.

The description seems limited at this early stage. But this engineer, who lost his sight when he was a college student, recalled that in the ‘80s and ‘90s he had to hack into computer programs to use them without seeing.

“It always starts like this, slowly,” he says, “so it was with the [recognition] text. But it will grow and expand.”

Also on Twitter

Twitter also gave what he calls “a first step” to improve the accessibility of images that tag their users. The social network launched last week a feature that allows users to add the “alternate text” to their photos manually.

“We believe it is an important first step,” says the head of accessibility of the company, Todd Kloots. “We will continue to make improvements in the future.”

Twitter’s solution, of course, confronts the great problem of accessibility on the web: it depends on each person posting an image being aware of the importance of adding this additional text for blind users. This is a challenge because not even all software engineers have been trained to understand the importance of incorporating those texts. “True,” Kloots says, “it is not an area where all universities are concentrating. But it’s something they learn during their career.”

To hear the description of a photo is a breakthrough for a user with visual disabilities, says Bridges, American Council of the Blind. “We now have the capability to understand what is in the pictures. It is an excellent step forward,” he says.

In the future, he imagines, descriptions will become increasingly detailed. “They could say, ‘A light brown-skinned man holding a bottle of Sam Adams,’ or even more detailed, ‘... holding a bottle of Sam Adams with cold falling chunks of ice,’” he says. “Having such information is useful. The sighted take it for granted because they can see the images, but we’re asking ourselves, inventing stories in our heads.”

A word is worth a thousand pictures

In Queens, Angel Adorno says that “something good can come out” when you hear the description of the new feature of Facebook, based on artificial intelligence.

What would improve their experience on social networks is if more people would write well. “For a blind person, the word is better,” he says. “If someone writes something well, there’s the picture. If you can write well, talk about things, the picture is formed there.”