Managing the Data
Last updated
Last updated
There are multiple places where you can record data in an Empirica experiment.
One way of recording the data of players' responses is to set them to the player
prop itself. You can do so with this command:
You can retrieve what you have set as a specific property/answer for the player with:
This will be accessible in the player
collection of your database.
One way of recording the data of players' responses is to set them to the player.stage
prop to identify a particular data/response of a particular player to a particular stage. You can do so with this command:
You can retrieve what you have set as a specific property/answer with:
This will be accessible in the player-stages
collection of your database.
One way of recording the data of players' responses is to set them to the player.round
prop to identify a particular data/response of a particular player to a particular round. You can do so with this command:
You can retrieve what you have set as a specific property/answer with:
This will be accessible in the player-rounds
collection of your database.
If you want to save general data (not specific to one player), you can save it to the round
with:
You can retrieve what you have set as a specific property/answer with:
This will be accessible in the round
collection of your database.
If you want to save general data (not specific to one player), you can save it to the game
with:
You can retrieve what you have set as a specific property/answer with:
This will be accessible in the game
collection of your database.
The data is stored in your database across multiple different collections (e.g., batches, games, players, etc.).
The collections you are the most interested in are:
batches
games
player-inputs
players
Because that is where most of the data is. You can then wrangle these together according to game, player, batch, and other _ids
.
In the Admin Panel, in the configuration tab, you can click on Export to download a zip file with CSVs or JSONs for each of these collections. The advantage of downloading from the panel is that all the data objects will have been unnested to some extent. For example, every attribute of a Player will be a column in players.csv and every Player will be a row.
Empirica allows you to collect the user agents and the IP addresses of players, but this has to be activated to avoid running into privacy issues unknowingly.
To do so, in the "public"
part of the settings file you need to add:
When you download files from the control panel, you need to tick a specific box if you want to download the player id
(the id they enter themselves or that is set as a URL parameter), the user agents, and the IP addresses of players (if you have decided to collect them): Add Personal Identity Information (includes the player ID, URL parameters, and IP addresses)
This allows you to download a version of the data without any identifying information.
Once you have the data, in the player_stages
part of the data, you can see each playerId
and stageId
combination, when it is was started (createdAt
) and when the player submitted (submittedAt
). Compare the two and that will give you the time it took the player to submit that stage.
Once you have the data, in the game
part of the data, you can see for each game, when it is was created (createdAt
) and when it finished (finishedAt
). Compare the two and that will give you how long that game lasted.
Note: this does not count the intro and exit steps, only the game part.
Instead of manually downloading the data, you can directly access the data in your database with the R programing language (similar things can be done with Python and other languages).
You will need the mongolite
library and the tidyverse
(or at least dplyr
) library.
To get data from collection that is then unnest, use the mongo
and find
functions from the mongolite
packaged, the pipe from dplyr
and the special unnesting function:
The first argument of the mongo
function is the string of the collection you want to download. You can do this for players, player-stage, player-rounds, rounds, stages, games, batches, factors, treatments.
The fields argument of the find
function ca be tricky. If you set it to '{}'
it means it will grab every field in the database. However, you might do that and find that there is an issue with your dataframes because of a mismatch between the names of dataframe and the actual names available. This is usually because Empirica creates potential a data
field - the field where all the data that you .set()
goes to (see above on recording the data). So if you haven't set any data to this part of the experiment, and you are getting errors when trying to view the dataframes, set it to '{"data":false}'
instead.
Once you have accessed all the collections you need, you can use merger/join functions to bring everything together into one dataframe.
The games, the players, the rounds, etc. will have certain variables with the same names that you might want to rename: _id
(the unique id from the database), index
, createdAt
.
The id variable of from dataframe might be referred to in another, so you can use this to join dataframes together. For example, players
has _id
which corresponds to playerId
in player_stages
.
You will need . The top of your script should look like this:
Then you need to save your to a variable (without the ?retryWrites=true&w=majority
):