Data preparation

In the previous step you explored the three input files pamflow needs. Now you will configure pamflow to find them and run the first pipeline to extract and standardize metadata from your recordings.

Configure audio path and timezone

Open conf/local/parameters.yml with any text editor (e.g. Notepad on Windows, TextEdit on macOS) and set the path to your audio folder and the timezone of your recordings. For The Guaviare Project, the recordings were made in Guaviare, Colombia, so the timezone is America/Bogota:

audio_root_directory: "/media/pamResearcher/guaviare_project_external_disk/pam_data_guaviare"
timezone: "America/Bogota"

Move input files to the pamflow folder

Copy the field_deployments_sheet.xlsx and target_species.csv files to their respective locations inside the pamflow folder:

  • field_deployments_sheet.xlsxdata/input/field_deployments/

  • target_species.csvdata/input/target_species/

Run the data preparation pipeline

Now everything is ready to run pamflow’s first pipeline:

kedro run --pipeline data_preparation

This pipeline generates two standardized tables stored in data/output/data_preparation/.

media.csv contains one row per audio file:

mediaID

deploymentID

timestamp

filePath

sampleRate

bitDepth

fileLength

MC-013_20240302_070000.WAV

MC-013

2024-03-02T07:00:00

…/MC-013/MC-013_20240302_070000.WAV

24000

16

30.0

MC-013_20240229_063000.WAV

MC-013

2024-02-29T06:30:00

…/MC-013/MC-013_20240229_063000.WAV

24000

16

30.0

MC-013_20240304_053000.WAV

MC-013

2024-03-04T05:30:00

…/MC-013/MC-013_20240304_053000.WAV

24000

16

30.0

deployments.csv contains one row per deployment:

deploymentID

locationID

latitude

longitude

deploymentStart

deploymentEnd

recorderModel

habitat

MC-002

EL REBALSE

2.117463

-72.779575

2024-02-15T15:04:45

2024-03-06T15:04:45

AudioMoth v 1.2.0

Pastos limpios

MC-007

SAN MIGUEL

2.059644

-72.920236

2024-02-15T15:32:00

2024-03-06T15:32:00

AudioMoth v 1.2.0

Pastos limpios

MC-009

LA TORTUGA

2.183335

-72.987016

2024-02-16T20:48:06

2024-03-07T20:48:06

AudioMoth v 1.2.0

Pastos limpios

MC-013

LA TORTUGA

2.183335

-72.987016

2024-02-16T20:48:06

2024-03-07T20:48:06

AudioMoth v 1.2.0

Pastos limpios

See also

The structure and full schema of media.csv and deployments.csv are described in detail in the Data Exchange Format section.

In the next section you will learn how to check recorder behavior and performance.