Papernotes: Understanding tourist behavior using large-scale mobile sensing approach: A case study of mobile phone users in Japan

Notes from reading the below paper:

1. Analyzed GPS location traces of 130,861 mobile phone users in Japan collected for one year.

2. To reduce battery consumption, an accelerometer was used to detect periods of relative stasis during which power-consuming GPS acquisition functions can be suspended.

3. we selected the 130,861 subjects whose GPS locations were observed at least 350 days out of 365 days in 2012 (95%).

4. The first step was to identify stop, which was a collection of recorded GPS locations in close proximity.

If X u = { xt1, xt2, … , xti , …} denotes a set of GPS locations of user u where xti is the location at time ti , then our experimental results suggested that we group xti, xti + 1 , xti + 2 , … , xtm that are within 196m and tm − ti ≤ 14 min as a stop.

5. The second step was the spatial clustering of stops. The centroid of the cluster was considered as a significant place (e.g., home, workplace, other). DBSCAN (Density-based spatial clustering of applications with noise) had the best performance.

6. For validation, we developed a tool that allowed the tool user to label significant places after observing clusters of stops. With this tool, we annotated our data with the home and workplace locations of 400 subjects, and used this as ground truth in our validation.

7. The last step was to classify significant places as home or workplace. The hand-labeling ground truth was used for this task and we found that Random Forest had the best performance when compared with k-nearest neighbors and naïve Bayesian classifier (10-fold cross-validation was used) using the following 10 different features:

– Cluster ranking: top ranked clusters can be indicative of home and workplace locations.
– Portion of stops in cluster: to some extent, this suggests the importance of places because people tend to visit important
places, such as home and work more frequently than others.
– Hours of stops: it is the portion of the hours of the day, where clustered stops appeared. For example, if stops were
observed from 9 am to 4 pm (throughout the year), this feature would be 8 / 24.
– Days of stops: the number of days where clustered stops were observed.

– Inactive hours: for each subject, an inactive period was defined as the hours where a number of GPS locations are less
than the average for at least three consecutive hours. Inactive-hours feature is a portion of clustered stops that fall into
the inactive period.
– Day-hour stops: the portion of day hours (9 am–6 pm) that stops were observed.
– Night-hour stops: the portion of night hours (10 pm–6 am) that stops were observed.
– Max stop duration: maximum value of stop duration.
– Min stop duration: minimum value of stop duration.
– Avg. stop duration: average value of stop duration.

8. To further validate our home location estimation, we compared our results against the census data and observed that the estimated population density based on our home location estimation was comparable (R 2 = 0 . 966) with the city population density information obtained from the 2006 Japanese Census.

9. After obtaining a firm estimation of home and workplace locations, we were able to identify trips that were commuting (between home and workplace) as well as non-commuting. A commuting trip is defined as a trip where at least one stop appears at a workplace. A non-commuting trip is defined as a
trip where none of the stops appear at a workplace.

10. We defined a touristic stop as a stop that is either within 200 m from a touristic destination location or within the polygon area covered by a touristic destination. We used the touristic destinations information provided by the Ministry of Land, Infrastructure and Transport of Japan (MLIT).

11. We were interested in trip flows—the number of touristic trips made to and from different prefectures in Japan, time spent at destination, modes of transportation used by the tourists, and correlations between personal
mobility and touristic travel behavior.

12. We calculated the time spent at destination simply as the total amount starting from the time of arrival at the first destination until the departure time of the last destination of a trip.

13. We used a framework for identifying modes of transportation used by mobile phone users based on their GPS locations. Our framework basically
looked at the GPS traces along with detected stops, and defined a segment as
a series of GPS locations between adjacent stops. These segments were then classified into walking and non-walking segments based on the rate of change in velocity and train line proximity.

14. The non-walking segments are then classified into two modes of transportation, car and train, based on Random Forest classification.

15. Personal mobility and tourist travel behavior, and Similarity in travel behavior.

Blog at

Up ↑