Prepare data

– How the data is collected

The data was given by the Cyclistic in this URL: https://divvy-tripdata.s3.amazonaws.com/index.html, and would be understandable that the data would be Reliable, Original, Comprehensive, Current, and Cited (ROCCC). However, after revising the data it is easy to see that there are a few issues.

There is not much information on how the data is collected, it is mentioned that the bikes are geo-tracked, but no mention of an app. I can’t know how the data related to the time of the trip or membership status was acquired.

– Identify data (formats and types)

The data is divided into 12 CSV, one for each month, though there were 19 CSV compiled over 1 year and 8 months of data. These were the files that I originally used until the Analysis phase.

Each CSV contain 13 columns:

ride_id-(string data type)
rideable_type-(string data type), (classic_bike, electric_bike and docked_bike)
started_at-(string data type), year-month-day hour:minute:second
ended_at-(string data type), year-month-day hour:minute:second
start_station_name-(string data type)
start_station_id-(string data type) except for the month of November (integer data type)
end_station_name-(string data type)
end_station_id-(string data type) except for the month of November(integer data type)
start_lat-(float point data type)
start_lng-(float point data type)
end_lat-(float point data type)
end_lng-(float point data type)
member_casual-(string data type), (member, casual)

– Verify data credibility

This part is a bit too long, so you can see the rest in the document attached below, where you can find the rest of this step: Prepare Data.

in case you can't see it Just refresh the page until it appears. Or download the document here:

Ask

Process

Road path vector created by jcomp for Freepik