Prepare data

– How the data is collected
The data was given by the Cyclistic in this URL: https://divvy-tripdata.s3.amazonaws.com/index.html, and would be understandable that the data would be Reliable, Original, Comprehensive, Current, and Cited (ROCCC). However, after revising the data it is easy to see that there are a few issues.
There is not much information on how the data is collected, it is mentioned that the bikes are geo-tracked, but no mention of an app. I can’t know how the data related to the time of the trip or membership status was acquired.
– Identify data (formats and types)
The data is divided into 12 CSV, one for each month, though there were 19 CSV compiled over 1 year and 8 months of data. These were the files that I originally used until the Analysis phase.
Each CSV contain 13 columns:
-
ride_id-(string data type)
-
rideable_type-(string data type), (classic_bike, electric_bike and docked_bike)
-
started_at-(string data type), year-month-day hour:minute:second
-
ended_at-(string data type), year-month-day hour:minute:second
-
start_station_name-(string data type)
-
start_station_id-(string data type) except for the month of November (integer data type)
-
end_station_name-(string data type)
-
end_station_id-(string data type) except for the month of November(integer data type)
-
start_lat-(float point data type)
-
start_lng-(float point data type)
-
end_lat-(float point data type)
-
end_lng-(float point data type)
-
member_casual-(string data type), (member, casual)
– Verify data credibility
This part is a bit too long, so you can see the rest in the document attached below, where you can find the rest of this step: Prepare Data.
Powered By EmbedPress
Road path vector created by jcomp for Freepik