Amazon Mechanical Turk Tutorial

Setting up an Mturk evaluation task with qualification test from scratch 🚀

Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that can be used to get human annotations via hiring workers to perform human intelligent tasks (HITs). In this tutorial, we are going to cover the basics on how to set up an MTurk evaluation task with quality control from scratch. To this end, we will be using an example HIT throughout the tutorial. Workers will be asked to rate how close the meaning of a French-English bitext is; though all steps described below could be easily adapted to any other task.


Overview

This tutorial is split into the following five parts assuming the reader has no prior knowledge about MTurk. You can treat the following list like a table of contents; if you’d like to jump to a specific section, click on it.


Setting up accounts

To get started with MTurk we need four accounts: an AWS account, an account on the MTurk Requester site (those two are needed to use MTurk when we are ready to go live and publish our tasks), one account on the Requester Sandbox, and one on the Worker Sandbox (those two are needed for testing our task on an isolated environment that looks like the real MTurk website before going live and publish our tasks on the real website).

1️⃣ AWS account

AWS (Amazon Web Services) is a cloud platform offering reliable, scalable, and inexpensive cloud computing services and MTurk is one of those services. For the time what you need to know is that your billing information is stored with your AWS account, rather than your MTurk requester account. That being said, MTurk has no direct link to your credit card.

To sign up for an AWS account you will need: a) an email account, b) a valid credit card (you will not be charged as there is a free tier), and c) a phone number (you will receive an automated phone call to verify your identity).

Once you have created an AWS account, you could create an IAM user to securately control access to our AWS resources. An IAM user could grant permission to administer and use resources in your AWS account (such as access the MTurk API) without having to share your root credentials. To add an IAM user, click on the My Security Credentials tab and then Add user following the Users tab under the DashBoard appearing on the left of the page.

2️⃣ MTurk Requester account

Next, you will need to create and register an MTurk Requester account.

After completing the two above steps you will have to link your AWS account to your MTurk Requester account via using your AWS Root user credentials.

3️⃣ Requester SandBox account

We are now going to create a Requester SandBox account in the Amazon Mechanical Turk Sandbox testing environment. This website looks exactly the same as the real MTurk website and we are going to use it to test on our tasks and qualifications before we launch them for real.

At this point, you will also need to link your AWS account to your Requester SandBox account as per Link Your AWS account to your MTurk Requester account.

4️⃣ Worker SandBox account

Finally, to test how our task will be presented to workers we will create a Worker SandBox account.


Concepts and Terminology

Below, we briefly describe the basic concepts and terminology you should be aware of to effectively use MTurk.

  • Requester: a company, organization, or person that creates and submits tasks (HITs) to MTurk for Workers to perform. In our case, we are the Requester.

  • Human Intelligent Task (HIT): a task that a Requester submits to MTurk for Workers to perform. A HIT represents a single, self-contained task, for example, “Describe what emotion is conveyed in the following text.” In our case, an HIT is one example we want to get annotations for.

  • Worker: a person who performs the tasks specified by a Requester in a HIT.

  • Assignment: specifies how many people can submit completed work for your HIT. (Hint: the number of assignments is the same as the number of workers working on a single HIT).

  • Reward: the money a Requester pays Workers for satisfactory work they do on their HITs.


Qualification Type

Amazon Mechanical Turk gives us the ability to add qualification types in the creation or processing of our HIT for better quality control. Once we attach a qualification type to an HIT, a Worker can only perform the task if they have this qualification. Apart from predefined qualifications, MTurk gives us the flexibility to create our own qualification type to represent a Worker’s skill or ability to perform the task at hand.

For our purposes, we are now going to create a customized qualification test consisting of multiple choice questions using the MTurk API. Once the qualification type has been attached to our HIT we can find it under the Qualification Types you have created tab on the Worker requirements section.

In this tutorial we are going to access the MTurk API using Boto3, the Amazon Web Services SDK for Python. First, we need to install the latest boto3 release via pip:

> pip install boto3

Once installation is complete, we are ready to create, update, delete or assign a qualification type to a Worker or an HIT at Amazon Mechanical Turk!

Import the required libraries:

   import argparse
   import logging
   import boto3
   import os

To make our code flexible we pass MTurk parameters as arguments to the main script. Important note: Even if you prefer to hard code those parameters it is highly recommended to at least pass the IAM credentials as arguments!

def main():

	"""
	Code for creating/updating/deleting a qualification type at Amazon Mechanical Turk
	Important Note: Do not hard code the key and secret_key arguments
	"""

	parser = argparse.ArgumentParser(description='Create qualification type for English-French bilingual speakers')
	parser.add_argument('--aws_access_key_id', help='aws_access_key_id -- DO NOT HARDCODE IT')
	parser.add_argument('--aws_secret_access_key', help='aws_secret_access_key -- DO NOT HARDCODE IT')
	parser.add_argument('--questions', help='qualification questions (xml file)')
	parser.add_argument('--answers', help='answers to qualification questions (xml file)')
	parser.add_argument('--worker_id', help='worker id, if given we give worker access to the qualification type', \                            	
					default=None)	
	parser.add_argument('--Name',  help='name of qualification test', default='English French qualification test for bilingual speakers.')
	parser.add_argument('--Keywords', help='keywords that help worker find your test', \
					default='test, qualification, english, french, same meaning, same, meaning, bilingual')
	parser.add_argument('--Description', help='description of qualification test', \
                       			default='This is a qualification test for bilingual English-French speakers')
	parser.add_argument('--TestDurationInSeconds', help='time for workers to complete the test', default=5400)
	parser.add_argument('--RetryDelayInSeconds',help='time workers should wait until they retake the test', default=1)
	parser.add_argument('--update', help='if true it updates an existing qualification type', action='store_true')
	parser.add_argument('--verbose', help='increase output verbosity', default=True)
	parser.add_argument('--delete', help='delete qualification type', action='store_true')
	args = parser.parse_args()

	if args.verbose:
		logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

The first step in the creation of our qualification test is to read our question and answer files (xml files). The answer file contains the gold standard answers to the questions provided under the question file and will be later used to automatically assign scores to Workers taking the test. Below, we include example QuestionForm and AnswerKey files in xml format.

	questions = open(args.questions, mode='r').read()
	answers = open(args.answers, mode='r').read()

Following, we create a low-level service client using boto3.

	mturk = boto3.client('mturk',
                         	aws_access_key_id=args.aws_access_key_id,
                         	aws_secret_access_key=args.aws_secret_access_key,
                         	region_name='us-east-1',
                         	endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com')
The endpoint_url argument should be set to 'https://mturk-requester-sandbox.us-east-1.amazonaws.com' during developing at sandbox. When you are ready to go live at MTurk, replace it with 'https://mturk-requester.us-east-1.amazonaws.com'.

Let’s now create our first qualification type! Note that each qualification type is associated with a unique ID that could be used to update, delete or assign a qualification type to a Worker through boto3. That being said, it is important to save this ID so that we could refer to our qualification type later.

	if not args.update:

		# Create qualification type based on questions-answers provided and save the qualification id
		try:
			qual_response = mturk.create_qualification_type(
                        				Name=args.Name,
                         				Keywords=args.Keywords,
                         				Description=args.Description,
                         				QualificationTypeStatus='Active',
                               				Test=questions,
                               				AnswerKey=answers,
                               				RetryDelayInSeconds=args.RetryDelayInSeconds,
                               				TestDurationInSeconds=args.TestDurationInSeconds)

			qualification_type_id  = qual_response['QualificationType']['QualificationTypeId']

			logging.info(' Congrats! You have created a new qualification type')
			logging.info(' You can refer to it using the following id: %s' % (qualification_type_id))
			logging.warning(' The qualification_type_id is saved under: qualification_type_id file.')
			logging.warning(' This is the id you will use to refer to your qualification test when creating your HIT!')

			q_id = open('qualification_type_id','w')
			q_id.write(qualification_type_id)

		# If the qualification type has already been created try read the if from file
 		except:
			logging.warning(' You have already created your qualification type. Read from qualification_type_id file...')
           		try:
               			q_id = open('qualification_type_id','r')
               			qualification_type_id =  q_id.readline()
           		except:
               			logging.error(' You have probably deleted the qualification type id file')

If we want to update an already created qualification type we can simply access it through the its unique ID.

	# Update an already created qualification type
	else:
		logging.warning(' You have already created your qualification type. Read from qualification_type_id file...')
		try:
			q_id = open('qualification_type_id', 'r')
			qualification_type_id = q_id.readline()
           		mturk.update_qualification_type(
                   			QualificationTypeId=qualification_type_id,
                   			Description=args.Description,
                   			Test=questions,
                   			AnswerKey=answers,
                  			RetryDelayInSeconds=args.RetryDelayInSeconds,
                   			TestDurationInSeconds=args.TestDurationInSeconds)

		except:
           		logging.error(' You have probably deleted the qualification type id file')

Now that we have learned how to create and update a qualification type we are ready to assign it to a Worker. To do so, we should be provided with the ID of the Worker. Note that this is important at test time as you may wish to link your qualification type to your worker account and take the test. When you are ready to shift from SandBox to the real platform you can just link the qualification type to your HIT or to a specific Worker easily through the MTurk website.

	# If worker id is provided try to link to it
	if args.worker_id:
		mturk.associate_qualification_with_worker(
           				QualificationTypeId=qualification_type_id,
           				WorkerId=args.worker_id,
           				IntegerValue=0,
           				SendNotification=True)

		response = mturk.list_workers_with_qualification_type(
           				QualificationTypeId=qualification_type_id)

 		logging.info(' You have associated your qualification type to the worker with id: %s ' % str(response))
	else:
		logging.info(' You may want to associate your qualification type to a worker or attach it to an HIT!')

Finally, you could delete the qualification type via again using its ID.

        # Delete the qualification type 
	if args.delete:
		try:
			q_id = open('qualification_type_id', 'r')
			qualification_type_id = q_id.readline()
			mturk.delete_qualification_type(QualificationTypeId=qualification_type_id)
			os.remove('qualification_type_id')
			logging.warning(' You have already created your qualification type. Read from qualification_type_id file...')
		except:
			logging.error(' You have probably deleted the qualification type id file')

Done! We have now created our own qualification test. Note that this is just one way to ensure high quality annotations through MTurk; there are also plenty of other tips on how to use crowdsourcing through quality control.

Example QuestionForm file

<?xml version="1.0" encoding="UTF-8"?>
<QuestionForm xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd">
	<Question>
      		<QuestionIdentifier>en_fr_qual_test_0</QuestionIdentifier>
      		<DisplayName>Q0</DisplayName>
      		<IsRequired>true</IsRequired>
      		<QuestionContent>
			<Text>Which statement best describes the relationship between the English and the French sentence?</Text>
         		<Text>English and French texts:</Text>
         		<EmbeddedBinary>
            			<EmbeddedMimeType>
               				<Type>image</Type>
               				<SubType>png</SubType>
            			</EmbeddedMimeType>
            			<DataURL>https://path_to_bucket.s3.amazonaws.com/0.png</DataURL>
            			<AltText>english-french sentence-pair</AltText>
            			<Width>700</Width>
            			<Height>200</Height>
         		</EmbeddedBinary>
		</QuestionContent>
		<AnswerSpecification>
         		<SelectionAnswer>
            		<StyleSuggestion>radiobutton</StyleSuggestion>
            		<Selections>
               			<Selection>
                  			<SelectionIdentifier>1</SelectionIdentifier>
                  			<Text>are completely unrelated.</Text>
               			</Selection>
               			<Selection>
                  			<SelectionIdentifier>2</SelectionIdentifier>
                  			<Text>have a few words in common but convey unrelated information about them.</Text>
               			</Selection>
               			<Selection>
                  			<SelectionIdentifier>3</SelectionIdentifier>
                  			<Text>convey mostly the same information, but some information is added and/or missing on either or both sides.</Text>
               			</Selection>
               			<Selection>
                  			<SelectionIdentifier>4</SelectionIdentifier>
                  			<Text>have the exact same meaning.</Text>
               			</Selection>
            		</Selections>
         		</SelectionAnswer>
      		</AnswerSpecification>
	</Question>
</QuestionForm>

Example AnswerKey file

<?xml version="1.0" encoding="UTF-8"?>
<AnswerKey xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/AnswerKey.xsd">
	<Question>
		<QuestionIdentifier>en_fr_qual_test_0</QuestionIdentifier>
		<AnswerOption>
			<SelectionIdentifier>1</SelectionIdentifier>
			<AnswerScore>-1</AnswerScore>
		</AnswerOption>
		<AnswerOption>
         		<SelectionIdentifier>2</SelectionIdentifier>
         		<AnswerScore>0</AnswerScore>
      		</AnswerOption>
      		<AnswerOption>
         		<SelectionIdentifier>3</SelectionIdentifier>
         		<AnswerScore>1</AnswerScore>
      		</AnswerOption>
      		<AnswerOption>
         		<SelectionIdentifier>4</SelectionIdentifier>
         		<AnswerScore>0</AnswerScore>
      		</AnswerOption>
	</Question>
</AnswerKey>
Tip: You can assign negative scores to specific answers thus establishing your own weighing scheme.

Creating an MTurk project

Now that we have created our qualification test we are ready to create our MTurk project using one of the customizable templates. First, log in to the MTurk Sandbox and click on the New Project link in the Create tab. Choose the most suitable template for your task and then click on Create Project. For our tutorial, we will choose the Emotion Detection template, and customize the Design Layout section as shown below:

<!-- You must include this JavaScript file -->
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<!-- For the full list of available Crowd HTML Elements and their input/output documentation,
   please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html -->
<!-- You must include crowd-form so that your task submits answers to MTurk -->
<crowd-form answer-format="flatten-objects">
	<!-- Your image file URLs will be substituted for the "image_url" variable below
	when you publish a batch with a CSV input file containing multiple image file URLs.
	To preview the element with an example image, try setting the src attribute to
	"https://s3.amazonaws.com/cv-demo-images/basketball-outdoor.jpg" -->
	<crowd-classifier
		header="Choose the option that best describes the relation between the English and French sentences."
      	 	name="divergent"
      	 	categories="['completely unrelated',
      	 	'a few words in common but convey unrelated information about them',
      	 	'mostly the same meaning, except for some details',
      	 	'exact same meaning']"
      	>
   	<classification-target>
      		<p><img src="${image_url}" style="max-width: 100%; max-height: 250px" /></p>
   	</classification-target>
   	<full-instructions header="Guidelines for comparing English and French text">
      		You are asked to rate how <strong>close the meaning</strong> of the French and English text are, on a scale from 1 to 4.
      		<p>
         		<strong>1:</strong> English and French texts are <strong>completely unrelated</strong> <br><br>
         		<i><u> Example </i></u> <br>
         		<font color="blue"> <i> Egnlish: The girl remained missing as of April 2016. </i></font> <br>
         		<font color="deeppink"> <i> French: L'Union athlétique libournaise a disparu en 2016.</i></font> <br>
      		</p>
   	</full-instructions>
   	<short-instructions>
      		You are asked to rate how close the meaning of the French and English text are, on a scale from 1 to 4.
	</short-instructions>
</crowd-form>

Once we are satisfied with the result we can preview the task and finish. Note that ${image_url} is a template variable that will be substituted with the actual name of the image from our CSV file (more on this in the following section).


Publishing batches

Now that the project is set up we can go ahead and publish batches of tasks. This is simply done by clicking on the Publish Batch button on the new project and uploading a CSV file containing our HIT data. Note that we should add the name of the template variable (e.g. imaga_url) as a header to our CSV file. Once the CSV file is uploaded, MTurk will create an individual HIT for each row in your file.

Enjoy! ☕️


References

Avatar
Eleftheria Briakou
Ph.D. Student

My research interests include semantics, multilingual NLP and machine translation.