Data from Public Bicycle Hire Systems

sckott · October 17, 2017, 1:05am

A new rOpenSci package provides access to data to which users may already have directly contributed, and for which contribution is fun, keeps you fit, and helps made the world a better place. The data come from using public bicycle hire schemes, and the package is called bikedata. Public bicycle hire systems operate in many cities throughout the world, and most systems collect (generally anonymous) data, minimally consisting of the times and locations at which every single bicycle trip starts and ends. The bikedata package provides access to data from all cities which openly publish these data, currently including London, U.K., and in the U.S.A., New York, Los Angeles, Philadelphia, Chicago, Boston, and Washington DC. The package will expand as more cities openly publish their data (with the newly enormously expanded San Francisco system next on the list).

…

Read the rest at https://ropensci.org/blog/blog/2017/10/17/bikedata

rensa · October 19, 2017, 12:18am

This looks like a really cool dataset, but given it’s individuals’ spatial trip data, I’d like to know whether the package maintainers or the constituent dataset providers do much to deidentify the trips beyond just taking obvious identifiers like name and DOB off the records. Are the trip start and end points fuzzed at all?

rensa · October 19, 2017, 12:20am

Oops! Should’ve read the package README a bit more closely—it just exposes aggregate numbers of trips between stations. Sick!

mpadge · October 19, 2017, 10:20am

No anonymization is done because there are no names or DOBs. The only personal detail is from those systems which record user-provided year of birth and gender, which the users may provide as they wish.

Note that these individual trips are stored the whole time in an SQLite database, and may be accessed with standard database calls

> db <- dplyr::src_sqlite(bikedb, create = FALSE)
> dplyr::collect (dplyr::tbl (db, "trips"))
# A tibble: 1,175,305 x 11
      id  city trip_duration          start_time           stop_time start_station_id end_station_id bike_id user_type birth_year gender
   <int> <chr>         <int>               <chr>               <chr>            <chr>          <chr>   <chr>     <chr>      <chr>  <chr>
 1     1    ph           660 2017-01-01 00:05:00 2017-01-01 00:16:00           ph3046         ph3041                 1       <NA>   <NA>
 2     2    ph          2160 2017-01-01 00:21:00 2017-01-01 00:57:00           ph3110         ph3054                 0       <NA>   <NA>
 3     3    ph          2100 2017-01-01 00:22:00 2017-01-01 00:57:00           ph3110         ph3054                 0       <NA>   <NA>
 4     4    ph           720 2017-01-01 00:27:00 2017-01-01 00:39:00           ph3041         ph3005                 1       <NA>   <NA>
 5     5    ph           480 2017-01-01 00:28:00 2017-01-01 00:36:00           ph3047         ph3124                 0       <NA>   <NA>
 6     6    ph           420 2017-01-01 00:29:00 2017-01-01 00:36:00           ph3047         ph3124                 0       <NA>   <NA>
 7     7    ph           540 2017-01-01 00:31:00 2017-01-01 00:40:00           ph3072         ph3068                 1       <NA>   <NA>
 8     8    ph           960 2017-01-01 00:34:00 2017-01-01 00:50:00           ph3033         ph3114                 1       <NA>   <NA>
 9     9    ph          1140 2017-01-01 00:38:00 2017-01-01 00:57:00           ph3013         ph3028                 0       <NA>   <NA>
10    10    ph          1020 2017-01-01 00:40:00 2017-01-01 00:57:00           ph3013         ph3028                 0       <NA>   <NA>
# ... with 1,175,295 more rows

(for example). In this case, the only individual data are membership categories of 0 (not a system member) and 1 (system member), but other cities will have non-NA values for the other variables too.

gueyenono · November 10, 2017, 11:09pm

I apologize for the completely unrelated question, but… what font are you using? I can see the font supports ligatures!

Topic		Replies	Views
An R API for UK Police Data Package Use Questions data , api , package	5	2226	October 27, 2016
Promoting R and rOpenSci packages in UK local government UseCases r , package , ropenaq , fingertipsr	4	1818	July 17, 2019
nominatim & overpass (candidates for rOpenSci?) Package Use Questions geospatial	7	2721	February 15, 2016
Relevance inquiry: a wrapper for HESA Open Data and other Higher Education statistics General Q&A r	3	486	August 24, 2022
visdat, skimr, and assertr use case: Exploring and understanding a new data set UseCases usecase , visdat , skimr , assertr	0	1236	March 16, 2019

Data from Public Bicycle Hire Systems

Related topics