Data Annotation: By Typing Captcha, You are Actually Helping AI Model Training

ByteBridge
Nerd For Tech
Published in
3 min readMar 1, 2021

--

Living in the Internet age, how occasionally have you come across the tricky CAPTCHA tests while entering a password or filling a form to prove that you’re fully human? For example, typing the letters and numbers of a warped image, rotating objects to certain angles, or moving puzzle pieces into position.

What is CAPTCHA and How Does It Work?

CAPTCHA is also known as the Completely Automated Public Turing Test to filter out the overwhelming armies of spambots. Researchers at Carnegie Mellon University developed CAPTCHA in the early 2000s. Initially, the program displayed some garbled, warped, or distorted text that a computer could not read, only humans can. Users were requested to type the text in a box before having access to the websites.

The program has achieved wild success. CAPTCHA has grown into a common part of the internet user experience. Websites need CAPTCHAs to prevent the “bots” of spammers and other computer underworld types. “Anybody can write a program to sign up for millions of accounts, and the idea was to prevent that,” said Luis von Ahn, a pioneer of the early CAPTCHA team and founder of Google’s reCAPTCHA, one of the biggest CAPTCHA services. The little puzzles run on because computers are not as good as humans at reading distorted text. Google says that people are solving 200 million CAPTCHAs a day.

Over the past years, Google’s reCAPTCHA button saying “I’m not a robot” was up in more complicated scenarios, such as selecting all the traffic lights, crosswalks, and buses in an image grid.

CAPTCHA’s Potential Influence on AI

While used mostly for security reasons, CAPTCHAs also serve as a benchmark task for artificial intelligence technologies. According to CAPTCHA: using hard AI problems for security by Ahn, Blum, and Langford, “any program that has high success over a captcha can be used to solve a hard, unsolved Artificial Intelligence (AI) problem. CAPTCHAs can be used in many places.”

ReCAPTCHA is a CAPTCHA system developed by Google, which is a system that allows web hosts to distinguish human beings from machines. The original version asked users to decipher hard to read text or match images.

Since 2011, reCAPTCHA has digitized the entire Google Books archive and 13 million articles from New York Times catalog, dating back to 1851. Thereafter, reCAPTCHA started to select snippets from Google Street View in 2012. The company made users recognize door numbers, signs, and symbols.

The warped characters that users identify and fill in for reCaptcha are for a bigger purpose, although users have unknowingly transcribed texts for Google. reCAPTCHA distributes the same content to dozen users across the world and automatically verifies if it has been transcribed correctly by comparing the results.

Clicks on the blurry images can also help identify objects that computing systems fail to manage, and users are actually sorting and clarifying images to train Google’s AI engine. In 2014, the system started training the Artificial Intelligence (AI) engines.

Through such mechanisms, Google has been able to get users involved in recognizing the image, in order to give better Google search and Google Maps results.

End

Outsource your data labeling tasks to ByteBridge, you can get the high-quality ML training datasets cheaper and faster!

  • Free Trial Without Credit Card: you can get your sample result in a fast turnaround, check the output, and give feedback directly to our project manager.
  • 100% Human Validated
  • Transparent & Standard Pricing: clear pricing is available(labor cost included)

Why not have a try?

Relevant Articles:

1 What are the Most Impressive AI Products?

2 Importance of High-Quality Training Data in Different AI Algorithm Stage

3 Data Annotation Companies for Machine Learning in 2021

4 Data Annotation Service — From the Backstage to the Front Stage

5 Customer Needs and Wants in Data Annotation Services

--

--

ByteBridge
Nerd For Tech

Data labeling outsourced service: get your ML training datasets cheaper and faster!— https://bytebridge.io/#/