How to Distribute a Large Dataset for Annotation
Taskflow lets you distribute large datasets across multiple annotation tasks within a single Prolific study. Participants are automatically allocated to different URLs, so your dataset can be annotated in parallel.
Before you start: plan your tasks
Divide your dataset into manageable subsets before creating your study. Each subset needs a hosted URL where participants will complete their annotation work.
If you're using the API to create or manage your Taskflow study, see our API documentation.
Setting up Taskflow
Via the web app
Log in to your Prolific researcher account
Go to Projects and Add new study
In Data collection type, select Taskflow
Upload a CSV file structured as follows:
Column A: one URL per row
Column B: the number of participants to allocate to that URL (optional, defaults to 1 if left blank)
Taskflow will distribute participants according to your chosen allocation strategy. You can manually adjust per-URL allocations after upload
Via the API
Set the access_details attribute on your study creation request as an array of objects, each with:
external_url— the task URLtotal_allocation— the number of participants to allocate to that URL
For full request body details, see Create study and Get Taskflow progress in the API documentation.
Allocation strategies
Taskflow offers three allocation strategies, configurable via the API using the data_collection_metadata.allocation_strategy field.
Strategy | How it works | Best for |
Round robin (default) | Distributes participants evenly across URLs, prioritizing URLs that haven't been allocated yet | Most annotation studies |
Random | Selects randomly from any URL that hasn't reached capacity | Studies where even distribution isn't required |
Deallocated first round robin | Like round robin, but prioritizes refilling slots vacated by returns, rejections, or timeouts before moving to unallocated URLs | Studies where full coverage across all URLs is critical |
Adding participant tracking to your URLs
You can embed Prolific's built-in placeholders directly into your CSV URLs. These are automatically substituted with real values when a participant is allocated:
{{%PROLIFIC_PID%}}— the participant's Prolific ID{{%STUDY_ID%}}— your study ID{{%SESSION_ID%}}— the session ID
Example:
This lets you track which participant completed which task without any additional configuration. These placeholders work whether you're uploading a CSV via the web app or setting URLs via the API.
How Taskflow handles incomplete submissions
Taskflow automatically releases a URL slot back into the available pool when a participant returns their submission, times out, or is rejected. Released slots are reallocated to other available participants, ensuring your completion targets are met. The order in which slots are refilled depends on your chosen allocation strategy.
A participant will never be allocated the same URL twice, even if they previously returned that submission. This is guaranteed platform behavior and requires no configuration.
Screen-outs
When a participant is screened out of a task, Taskflow automatically adds 1 to the capacity for that URL, preserving your intended completion targets.
Increasing places on an active study
The total number of available places in a Taskflow study is calculated automatically from the sum of all per-URL total_allocation values.
Via the web app
Go to your study's submissions page
Select Increase places from the Actions menu
Specify the total number of places you want to add
Distribute the new places across your existing URLs
Select Confirm to apply the update
Important: you can't add new URLs to a published Taskflow study via the web app.
Via the API
Update the access_details for the relevant URLs by increasing their total_allocation value. Note that you can't set a total_allocation lower than the number of slots already allocated for that URL.
You can also increase total places by adding new URLs to a published Taskflow study. To do so, send a PATCH request to https://api.prolific.com/api/v1/studies/{study_id}/ with only the URLs you're adding or updating. For example, to add a new URL with 100 places:
{
"access_details": [{
"external_url": "https://yourannotationplatform.com?task=6",
"total_allocation": 100
}]
}
You only need to include the URLs being added or updated, not your full list of existing URLs.
Finding which URL a participant was allocated
The URL allocated to each submission is recorded in the demographic export under the URL column. Use this to match completed submissions back to specific tasks or dataset subsets.
How many URLs can I add?
Taskflow collections support up to 20,000 URLs. If your dataset requires more, contact Prolific via the Support icon at the bottom right of the screen, or reach out to your Prolific contact.
Which study types does Taskflow support?
Taskflow is compatible with representative sample and quota studies.
Via the web app, select your sample type in the Study distribution and screener selection section as you normally would.
Via the API, Taskflow automatically applies your study's sample type across all sub-studies when you publish.
If you need to screen participants using custom criteria, we recommend the two-study screening approach before launching your Taskflow study.
