How to Distribute a Large Dataset for Annotation

Taskflow allows AI researchers to efficiently manage and distribute large datasets for annotation tasks within a single study on Prolific. This article explains how to set up your study using multiple URLs or append URLs with unique identifiers to allocate participants to different tasks, enabling parallel completion by Prolific taskers.

Step 1: Planning Your Study Conditions

Before creating your study in the Prolific platform, plan how you will divide your large dataset into manageable tasks. Each task should have a corresponding URL where the specific annotation tasks will be hosted.

Publishing a Taskflow study with Prolific's API

For API Users please see the two below sections of our API documentation

Step 2: Creating Your Study on Prolific

  1. Log in to your Prolific researcher account.
  2. Navigate to 'New Task' and enter your task details.
  3. Under the 'Audience' section, define the demographic characteristics of the participants you want to target.

Step 3: Using Taskflow to Distribute Multiple URLs

  1. In the 'Data Collection' section, select 'Taskflow'
  2. You can then upload a CSV file with your URLs in column A. While in beta this will only accept the upload if the other columns in your CSV are blank.
  3. Taskflow will automatically distribute participants evenly across the different tasks however you can manually amend the number of participants you want assigned to each task.

Step 4: Appending URLs with Custom Identifiers

Some researchers may wish to add custom identifiers to their URLs that a data collection system can detect and use to allocate participants to specific conditions. This identifier could be a number or any custom text that helps the end system assign participants to the appropriate condition.

  1. Structure Your URLs: Ensure each URL is structured to accept parameters (e.g., https://yourannotationplatform.com?task=).
  2. Add Custom Identifiers: Append custom identifiers to each URL that correspond to different subsets of your dataset (e.g., https://yourannotationplatform.com?task=1, https://yourannotationplatform.com?task=A, etc.).
  3. Configure in Taskflow: In Taskflow, input each custom URL. This allows you to control which taskers receive which URL and, consequently, which subset of the dataset they are assigned to annotate.
  • Example: You might have five different conditions, you would set up URLs like:
    • https://yourannotationplatform.com?task=1
    • https://yourannotationplatform.com?task=2
    • https://yourannotationplatform.com?task=3
    • https://yourannotationplatform.com?task=4
    • https://yourannotationplatform.com?task=4
  1. Implement in Your Annotation Tool: Ensure that your annotation tool is set up to recognize these custom identifiers and allocate participants accordingly based on the URL parameter.

Example for an Annotation Platform:

  • In your annotation platform, use the URL parameter feature to detect the task parameter in the URL and set up logic to direct participants to the appropriate annotation task based on this value.

Launching and Monitoring Your Study

  1. Once all tasks are set and URLs are properly configured, review your task setup.
  2. Click 'Publish Task' to start distributing your conditions to participants.
  3. Navigate to the page for your live study once published and select “See status of URLs
  4. Here you can see Taskflow provides real-time data on tasker completion rates across your different annotation tasks.

Handling Returns

Taskflow ensures that if a participant returns a task, another participant is allocated to that same task. This guarantees that the required number of participants complete each task, ensuring that your entire dataset is annotated as required.

  1. Automatic Reallocation: If a participant returns their task, Taskflow automatically re-allocates that task to another available participant.
  2. Continuous Monitoring: Taskflow continuously monitors task completion rates and re-allocates tasks as needed to maintain the desired number of annotations for each subset of your dataset.

Need further help?
Click here to contact us