Prepare data

Prepare data

– How the data is collected

The data was given by the Cyclistic in this URL: https://divvy-tripdata.s3.amazonaws.com/index.htmland would be understandable that the data would be Reliable, Original, Comprehensive, Current, and Cited (ROCCC). However, after revising the data it is easy to see that there are a few issues.

There is not much information on how the data is collected, it is mentioned that the bikes are geo-tracked, but no mention of an app. I can’t know how the data related to the time of the trip or membership status was acquired.


– Identify data (formats and types)

The data is divided into 12 CSV, one for each month, though there were 19 CSV compiled over 1 year and 8 months of data. These were the files that I originally used until the Analysis phase.

Each CSV contain 13 columns:

  1. ride_id-(string data type)

  2. rideable_type-(string data type), (classic_bike, electric_bike and docked_bike)

  3. started_at-(string data type), year-month-day hour:minute:second

  4. ended_at-(string data type), year-month-day hour:minute:second

  5. start_station_name-(string data type)

  6. start_station_id-(string data type) except for the month of November (integer data type)

  7. end_station_name-(string data type)

  8. end_station_id-(string data type) except for the month of November(integer data type)

  9. start_lat-(float point data type)

  10. start_lng-(float point data type)

  11. end_lat-(float point data type)

  12. end_lng-(float point data type)

  13. member_casual-(string data type), (member, casual)


– Verify data credibility

This part is a bit too long, so you can see the rest in the document attached below, where you can find the rest of this step: Prepare Data.

Powered By EmbedPress

Road path vector created by jcomp for Freepik