Snag HITs. mturk.
worker sub-domain compatible
To use the script once installed, attach hit_scraper
or hit-scraper
or hitscraper
to the path of any mturk URL. Any the following URLs will work and are equally valid for the purposes of initializing the script:
https://worker.mturk.com/hitScraper https://worker.mturk.com/hit_scraper https://www.mturk.com/hit-scraper https://www.mturk.com/mturk/findhits?match=true&hit_scraper
Understanding the Interface
The top section with all the various search settings and options is internally called the Control Panel. This is filled with options that users may want to change more frequently--on a per search/scrape basis--than the items in the Settings Panel (accessed through the Settings button). Control Panel options
Auto-refresh delay | This controls how often (in seconds) a scrape will automatically be run. Setting this to 0 will force the scraper into manual mode, turning off automatic scraping. |
Pages to scrape | Sets the minimum threshold for number of pages to retrieve. |
Correct for skips | If more than 66% of HITs are blocked by the blocklist, an additional page will be added until the number of blocked HITs is less than 66% of the total r###lts. |
R###lts per page | Controls the number of r###lts retrieved per page. It has a maximum of 100. It is typically better to increase the number r###lts per page rather than increasing the number of pages to scrape. |
Minimum reward | Sets a minimum pay threshold. |
Qualified | Limits r###lts to only HITs for which you are qualified. |
Masters Only | Limits r###lts to only HITs that require the Masters qualification. |
Hide Masters | Filters out HITs that require the Masters qualification while keeping all other HITs for which you may not be qualified. This setting is mutually exclusive with the Qualified setting. If both are selected, the Qualified setting will take precedence. |
Hide Infeasible | Filters out HITs with qualifications you can neither request nor take a test to obtain. Useful for filtering out location based qualifications |
Minimum batch size | Sets a threshold for number of HITs per HIT group. All HIT groups which contain fewer HITs than specified will be filtered out. This setting only applies when the Search by option is set to Most Available . |
| Forces the Minimum batch size value to apply to all search options, not only Most Available . |
New HIT highlighting | Sets the amount of time (in seconds) for which new HITs will be highlighted. Highlighted HITs will be emboldened and appear in larger font. Their cells in the r###lts table will also be outlined in a white, dotted line which is more prominent on some themes than others. |
Sound on new HIT | When new HITs are found, play an audio alert. There are two options--Ding and Squee. |
Disable TO | Skip directly to displaying the scrape r###lts without retrieving Turkopticon data. |
Search by | Controls the method by which to query HITs from mturk.
|
| inverts the ordering of the above selection |
Min pay TO | Sets a threshold on requesters' Turkopticon pay rating and hides all r###lts with requesters below the specified value. Their visibility can be toggled via the Toggle Ignored HITs button.Note: Requesters that have not been rated will not be affected by this setting. |
Hide no TO | Hides all r###lts from requesters that have no reviews on Turkopticon. Their visibility can be toggled via the Toggle Ignored HITs button. |
Sort by TO pay | Sort the r###lts by Turkopticon pay rating. |
Sort by overall TO | Sorts the r###lts by overall Turkopticon ratings. |
Search Terms | Search terms to search for specific HITs or requesters |
Hide blocklisted | Hide all r###lts which trigger a match against the blocklist. |
Restrict to includelist | Hides all r###lts that do not trigger a match against the includelist. If the inludelist is empty, all r###lts will be blocked. |
Highlight inludelist | R###lts which trigger a match against the includelist will be enclosed in a thick, green, dashed outline. |
R###lts Table
--Additional Settings
Settings Panel options are already pretty well explained. This section is probably not necessary.