April 3, 2020

Fun with GTFS & Clojure

Hi people of the internet!

Today I’ll be talking about Clojure to parse and extract stuff on GTFS files, wrap all that in a simple REST API using Clojure again, and then expose this API on the net so that my iPhone can consume it and tell me what are my next bus rides.

So much fun to come. I’ve got a bus next monday morning (just kidding, with the quarantine I’m not sure I’ll take a bus anytime soon), just enough time to use some code and get the stuff done!

In a scenario where a mobile app would be created, my main goal would be to let the user select its favorite bus stops regarding specific bus routes and display the next 3 stops in time so that he won’t miss any bus from now on.

What is GTFS

GTFS means General Transit Feed Specification. It is a common format for public transportation schedules and associated geographic information.

For more information, you can take a look at the following page: https://developers.google.com/transit/gtfs/

Sample GTFS archive.

I live in Metz, France. Hopefully we can grab a GTFS archive with the desired data for all the buses riding in my little town. It can be found here: gtfs_current.zip

Tinkering from the REPL

Create a deps.edn file with the following content:

{:deps
  {com.rpl/specter {:mvn/version "1.1.3"}
   nrepl/nrepl {:mvn/version "0.5.3"}}}

Then just run nREPL:

$ clj -m nrepl.cmdline
nREPL server started on port 63456 on host localhost - nrepl://localhost:63456

Now connect your IDE to this running REPL. Or just create one from your IDE directly, it doesn’t matter.

I’m currently running on Windows and run my nREPL from the Windows Linux Subsystem and attach my IDE running in Windows to it.

Load the GTFS files

Now that we have the project ready, unzip the GTFS archive and place it somewhere, let’s say in /tmp/gtfs.

We’ve got multiple files extracted:

  • agency.txt: contains the various bus operators (here LE MET’ and PROXIS)
  • calendar.txt: contains information about whether the different services and on what day do they operate (here we have 4 different (mon-fri, wed, sat, sun))
  • calendar_dates.txt: contains exception of services, for example the service id 1 does not operate on 2015-11-11.
  • routes.txt: contains the different routes that a bus could take (35 here).
  • stop_times.txt: contains the different stops for each bus trip, indicating hour of arrival and departure.
  • stops.txt: contains all the bus stops (here 1493).
  • trips.txt: contains all the different bus trips linking stops together (here 4953)

Ok now that we know what each file contains, just load them up as Clojure records, and see what we can do.

What we’ll need to do is essentially:

  1. Load a file (which are CSV)
  2. Skip the first line
  3. Create a record instance for each line
  4. Use that list of lines to do something usefull

We will first define the various map shapes for all these files:

; an agency is composed of a name, a URL, a time zone, a lang and a phone
(def ->agency [:name :url :tz :lang :phone])

; etc...
(def ->calendar [:service-id :mon :tue :wed :thu :fri :sat :sun :start-date :end-date])

(def ->route [:route-id :short-name :long-name :desc :type :url])

(def ->stop-time [:trip-id :arrival-time :departure-time :stop-id :stop-sequence :pickup-type :drop-off-type :shape-dist-traveled])

(def ->stop [:stop-id :code :name :desc :lat :lon :id :zone-id :url :location-type :parent-station-id])

(def ->trip [:route-id :service-id :trip-id :headsign :direction-id :block-id :shape-id :block-id-alt])

Everything OK? Just try to load the agencies:

(require '[clojure.java.io :as io])
(require '[clojure.string :as str])

(defn make-record
      "Make a record out of a line where fields are separated by a comma."
      [record line]
      (->> (str/split line #"," -1)       ; split on commas
           (map str/trim)                 ; remove un-needed spaces
           (map #(str/replace % "\"" "")) ; remove un-needed double quotes
           (zipmap record)))              ; create a map of the separated line

(defn load-records
      "Load the given file into a list of maps, discarding the header."
      [f mapper]
      (with-open [r (io/reader f)]
                 (->> (doall (line-seq r))
                      rest ; skip the header
                      (map (partial make-record mapper)))))

(load-records "/tmp/gtfs/agency.txt" ->agency)
; => ({
;     :name "SAEML TAMM (Le Met')", 
;     :url "https://lemet.fr", 
;     :tz "Europe/Paris", 
;     :lang "", 
;     :phone ""})

It works! We have a sequence of one Agency record. Pretty neat!

Extract the transit data

Great, now let’s dive deeper into the project. We really need to link routes and bus stops.

To do so we need to load stops, trips, routes and stop-times.

; load them all
(def routes (load-records "/tmp/gtfs/routes.txt" ->route))

(def stops (load-records "/tmp/gtfs/stops.txt" ->stop))

(def stop-times (load-records "/tmp/gtfs/stop_times.txt" ->stop-time))

(def trips (load-records "/tmp/gtfs/trips.txt" ->trip))

(def calendars (load-records "/tmp/gtfs/calendar.txt" ->calendar))

Just check that I can find the stop near my appartment:

(count (filter #(= (:name %) "PIERNE") stops))
; => 2

There’s 2 of them. One for each direction.

We can just check to be sure:

(filter #(= (:name %) "PIERNE") stops)
;=>
;    ({:location-type "",
;      :desc "",
;      :name "PIERNE",
;      :id "",
;      :stop-id "PIERNE01",
;      :lon "6.179863",
;      :url "0",
;      :code "257",
;      :zone-id "http://lemet.fr/screen/index2.php?stop=257",
;      :lat "  49.102273"}
;     {:location-type "",
;      :desc "",
;      :name "PIERNE",
;      :id "",
;      :stop-id "PIERNE02",
;      :lon "6.179825",
;      :url "0",
;      :code "234",
;      :zone-id "http://lemet.fr/screen/index2.php?stop=234",
;      :lat "  49.102558"})

Discover the data

Let’s define some utilities methods to retrieve stops and routes easily, using specter which is ideal for traversing data structures and such exploration.

(require '[com.rpl.specter :as s])

(defn record-with-property
  "Find all records having a specific property matching a given value."
  [prop value records]
  (s/select* [s/ALL (s/selected? prop #(= % value))] records))

; Take a look at the use of s/select-one* instead of s/select*.
(defn stop-with-id
  "Find a stop having a given id."
  [stop-id]
  (s/select-one* [s/ALL (s/selected? :stop-id #(= % stop-id))] stops))

; define some additional methods

(defn stops-with-name 
  "Find all stops with a given name."
  [name]
  (record-with-property :name name stops))

(defn routes-with-short-name 
  "Find all routes with a given short name."
  [name]
  (record-with-property :short-name name routes))

(defn routes-with-long-name   
  "Find all routes with a given long name."
  [name]
  (record-with-property :route-long-name name routes))

(routes-with-short-name "5")
; => [{
;    :route-id "5-77", 
;    :short-name "5", 
;    :long-name "Ligne 5", 
;    :desc "", 
;    :type "3", 
;    :url ""}]

(count (stops-with-name "PIERNE"))
; => 2

; CASINO is where I lived when I was young
; https://www.google.fr/maps/place/49%C2%B006'59.5%22N+6%C2%B008'28.5%22E/@49.116532,6.1407078,19z
(count (stops-with-name "CASINO"))
; => 2

; how many bus stops do we have in Metz?
(count (distinct (map :name stops)))
; => 491

So far it seems to be retrieving what we want. Next I’d like to resume what I will need to do to retrieve the next 3 stops for a specific route and bus stop:

  1. Retrieve the route with its short name (usually the bus line number)
  2. Retrieve the trips for that route (a trip is a route in a defined way (A to B or B to A))
  3. Retrieve the stops on that route (for that we need to load both the trips and time table)
  4. Retrieve from the time table (stop_times.txt)
; get the route id for the route
(map :route-id (routes-with-short-name "5"))
=> ("5-77")

(defn trips-for-route 
  "Find all trips for a given route-id."
  [route-id]
  (record-with-property :route-id route-id trips))

(distinct (map :headsign (trips-for-route "5-77")))
; =>
; ("L5a - MAISON NEUVE"
;  "L5f - MAGNY PAR RUE DE POUILLY"
;  "L5e - MAGNY PAR RUE AU BOIS"
;  "L5 - FORT MOSELLE"
;  "L5 - REPUBLIQUE")

(count (trips-for-route "5-77"))
; => 326

(defn trips-with-headsign 
  "Find all the trips given a headsign."
  [headsign]
  (record-with-property :headsign headsign trips))

(count (trips-with-headsign "L5a - MAISON NEUVE"))   
; => 159                                       
; it means that there's 159 trips on Line 5 that goes in direction of the final stop MAISON NEUVE.
; let's check what the first trip is

(first (trips-with-headsign "L5a - MAISON NEUVE"))
; =>
; {:route-id "5-77",
;  :service-id "HIV1920-DIM_HIV-Dimanche-00",
;  :trip-id "269026-HIV1920-DIM_HIV-Dimanche-00",
;  :headsign "L5a - MAISON NEUVE",
;  :direction-id "1",
;  :block-id "39965",
;  :shape-id "50048",
;  :block-id-alt "05 - 01"}

We want all the different stops on a trip:

```clojure
(defn stops-on-trip 
  "Find all the stops on a given trip."
  [trip-id]
  (record-with-property :trip-id trip-id stop-times))

(count (stops-on-trip "269026-HIV1920-DIM_HIV-Dimanche-00"))
; => 40

(map (comp :name stop-with-id :stop-id) 
     (stops-on-trip "269026-HIV1920-DIM_HIV-Dimanche-00"))
; =>
;("AUBEPINE"
; "HAUTS-DE-MAGNY"
; "BEAUSOLEIL"
; "ARMOISIERES"
; "OBELLIANE"
; "MAGNY-AU-BOIS"
; "ROOSEVELT"
; "LA PLAINE"
; "APREMONT"
; "PLATEAU"
; "FAUBOURG"
; "FRECOT"
; "BOUCHOTTE"
; "VANDERNOOT"
; "LOTHAIRE"
; "PIERNE"   <----- cool
; "LEMUD"
; "MUSE"
; "CENTRE POMPIDOU METZ"
; "GARE"
; "ROI GEORGE"
; "REPUBLIQUE"
; "SQUARE DU LUXEMBOURG"
; "FORT MOSELLE"
; "TIGNOMONT"
; "ST-MARTIN"
; "FOCH"
; "PONT DE VERDUN"
; "CASINO"    <----- also cool 
; "MIGETTE"
; "LONGEVILLE"
; "LECLERC"
; "EN PRILLE"
; "SCY BAS"
; "LIBERTE"
; "MOULINS"
; "ST-JEAN"
; "SERRET"
; "HAIE BRULEE"
; "MAISON NEUVE")

Let’s say it is 17:41, I want to know each 3 bus trips after 17:41 where the bus are at my stop for the route L5a - MAISON NEUVE

(defn times-for-stop 
  "Find the times for a given stop."
  [stop-id]
  (record-with-property :stop-id stop-id stop-times))

(count (times-for-stop "PIERNE01"))
; => 194

(take 3 
   (sort-by :arrival-time
      (s/select* [s/ALL
         (s/selected? :stop-id #(= % "PIERNE01"))
         (s/selected? :arrival-time #(= 1 (compare % "17:41:00")))]
      (times-for-stop "PIERNE01"))))
; =>
; ({:trip-id "269041-HIV1920-DIM_HIV-Dimanche-00",
;   :arrival-time "17:54:58",
;   :departure-time "17:54:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "16",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "5208.0"}
;  {:trip-id "281460-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "17:58:25",
;   :departure-time "17:58:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"}
;  {:trip-id "268643-HIV1920-H1920SAM-Samedi-00",
;   :arrival-time "17:59:58",
;   :departure-time "17:59:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"})

That’s cool and all but it might return me the stops for all different routes, I want only those on the route named Ligne 5a - MAISON NEUVE:

(defn trip-in-route? [trip-id route-id] (contains? (set (map :trip-id trips)) trip-id))

(take 3 
   (sort-by :arrival-time
      (s/select* [s/ALL
         (s/selected? :stop-id #(= % "PIERNE01"))
         (s/selected? :arrival-time #(= 1 (compare % "17:41:00")))
         (s/selected? :trip-id #(trip-in-route? % "L5a - MAISON NEUVE"))]
      (times-for-stop "PIERNE01"))))
; =>
; ({:trip-id "269041-HIV1920-DIM_HIV-Dimanche-00",
;   :arrival-time "17:54:58",
;   :departure-time "17:54:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "16",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "5208.0"}
;  {:trip-id "281460-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "17:58:25",
;   :departure-time "17:58:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"}
;  {:trip-id "268643-HIV1920-H1920SAM-Samedi-00",
;   :arrival-time "17:59:58",
;   :departure-time "17:59:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"})                        

It gives the exact same result, and that for a reason. The PIERNE bus stop (with id 21340) is only served by the bus line Ligne 5.

Just check that these 3 buses have the same arrival:

(defn trip-with-id 
  "Find a trip given its id."  
  [trip-id]
  (s/select-one* [s/ALL (s/selected? :trip-id #(= % trip-id))] trips))

(map (comp :headsign trip-with-id)
     ["269041-HIV1920-DIM_HIV-Dimanche-00" "281460-HIV1920-SEM_GTFS-Semaine-00" "268643-HIV1920-H1920SAM-Samedi-00"])
; => ("L5a - MAISON NEUVE" "L5a - MAISON NEUVE" "L5a - MAISON NEUVE")

Just one last tweak to do before we go to the next step.

Currently it gives the trips for all days of the week, but in Metz (and elsewhere) we have different trips during the week and during the week-end, let’s say we’re tuesday.

; which services are valid for tuesday?
(map :service-id (record-with-property :tue "1" calendars))
; => ("HIV1920-SEM_GTFS-Semaine-00")

; let's deal with it
(defn trip-for-day? 
  "Does the given trip-id runs on the given day."
  [trip-id day]
  (let [service-ids (into #{} (map :service-id (record-with-property day "1" calendars)))
        trip-service-id (:service-id (first (record-with-property :trip-id trip-id trips)))]
       (contains? service-ids trip-service-id)))

(take 3
   (sort-by :arrival-time
      (s/select* [s/ALL
         (s/selected? :stop-id #(= % "PIERNE01"))  ; for the stop PIERNE01
         (s/selected? :arrival-time #(= 1 (compare % "17:41:00"))) ; after 17:41:00
         (s/selected? :trip-id #(trip-in-route? % "L5a - MAISON NEUVE")) ; following the headsign L5a - MAISON NEUVE
         (s/selected? :trip-id #(trip-for-day? % :tue))] ; running on tuesday
      (times-for-stop "PIERNE01"))))
; =>
; ({:trip-id "281460-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "17:58:25",
;   :departure-time "17:58:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"}
;  {:trip-id "281426-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "18:08:25",
;   :departure-time "18:08:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "16",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "5208.0"}
;  {:trip-id "281461-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "18:18:58",
;   :departure-time "18:18:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"})

; Make a function of it
(defn next-times 
   [number stop-id arrival-time headsign day]
   (take number
      (sort-by :arrival-time
         (s/select* [s/ALL
            (s/selected? :stop-id #(= % stop-id))
            (s/selected? :arrival-time #(= 1 (compare % arrival-time)))
            (s/selected? :trip-id #(trip-in-route? % headsign))
            (s/selected? :trip-id #(trip-for-day? % day))]
         (times-for-stop stop-id)))))

; check it works 
(next-times 3 "PIERNE01" "17:41:00" "L5a - MAISON NEUVE" :tue)
; =>
; ({:trip-id "281460-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "17:58:25",
;   :departure-time "17:58:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"}
;  {:trip-id "281426-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "18:08:25",
;   :departure-time "18:08:25",
;   :stop-id "PIERNE01",
;   :stop-sequence "16",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "5208.0"}
;  {:trip-id "281461-HIV1920-SEM_GTFS-Semaine-00",
;   :arrival-time "18:18:58",
;   :departure-time "18:18:58",
;   :stop-id "PIERNE01",
;   :stop-sequence "12",
;   :pickup-type "0",
;   :drop-off-type "0",
;   :shape-dist-traveled "4162.0"})

Seems about right 🤠 !

So far, so good, we now know how to list the different routes, retrieve the stops of a route with specific direction (headsign), and how to retrieve the next 3 stops at a bus station.

The problem is that people that will be using the application will also need to retrieve the routes and trips from a stop, because they usually already know the name of their favorite bus stops.

Let’s do that with a bus stop that has many bus lines passing by, named REPUBLIQUE:

(count (stops-with-name "REPUBLIQUE"))
; => 9

(def republique-stops
  (set (map :stop-id (stops-with-name "REPUBLIQUE"))))
; => #'user/republique-stops

republique-stops
; => #{"REPU5052" "REPUB123" "place_REPUB" "REP01" "REPUBL88" "REPUBL01" "REPU5051" "REP02" "REPUB436"}

(def republique-stop-times-trip-ids
    (set (map :trip-id (s/select* [s/ALL
                (s/selected? :stop-id #(contains? republique-stops %))] stop-times))))
; => #'user/republique-stop-times-trip-ids

(count republique-stop-times-trip-ids)                
; => 2979

(distinct (map :route-id (s/select* [s/ALL
                                     (s/selected? :trip-id #(contains? republique-stop-times-trip-ids %))] trips)))
; => ("12-101" "11-9" "5-77" "4-7" "3-81" "2-5" "1-100" "B-99" "A-98" "83-71")

(defn route-with-id [route-id]
  (s/select-one* [s/ALL (s/selected? :route-id #(= % route-id))] routes))

(map :long-name (map route-with-id '("12-101" "11-9" "5-77" "4-7" "3-81" "2-5" "1-100" "B-99" "A-98" "83-71")))
; => (
; "Citeis 12" 
; "Citeis 11" 
; "Ligne 5" 
; "Ligne 4" 
; "Ligne 3" 
; "Ligne 2" 
; "Ligne 1" 
; "Mettis B" 
; "Mettis A" 
; "Navette CITY")   

We can conclude that there are 10 bus lines that goes through the REPUBLIQUE bus stop. Having that we’ll need to propose the choice of a direction (headsign) for a chosen route:

; route id "5-77" is L5: MAGNY - MAISON NEUVE the one I used to take when I was going to school.
(distinct
  (map :headsign (filter #(= (:route-id %) "5-77") trips)))
; => ("L5a - MAISON NEUVE"
;     "L5f - MAGNY PAR RUE DE POUILLY"
;     "L5e - MAGNY PAR RUE AU BOIS"
;     "L5 - FORT MOSELLE"
;     "L5 - REPUBLIQUE")
 
(distinct
  (map :headsign (filter #(= (:route-id %) "B-99") trips)))
; => ("MB - HOPITAL MERCY" "MB - UNIVERSITE SAULCY")

There are 5 different final destination for the bus line number Ligne 5 and 2 final destinations for the bus line Mettis B.

Finally, one last problem: a trip with a final destination (head sign) has a determined way, it goes from A to B or in reverse, and we should really group these trips by destination:

; There's no built-in method in Clojure that I know of to map on the values of a hashmap
(defn map-values [m f & args]
   (reduce (fn [a [k v]] (assoc a k (apply f v args))) {} m))

(def trips-by-direction
    (group-by :direction-id (distinct (filter #(= (:route-id %) "5-77") trips))))
; =>  #'user/trips-by-direction

(map-values trips-by-direction (fn [m] (distinct (map :headsign m))))
; => 
;  {"1" ("L5a - MAISON NEUVE" "L5 - FORT MOSELLE"),
;   "0" ("L5f - MAGNY PAR RUE DE POUILLY" "L5e - MAGNY PAR RUE AU BOIS" "L5 - REPUBLIQUE")}

Well I think at this point we pretty much covered what we need for our REST API, so let’s do that!

Performance

For a dataset of the size of Metz it’s easy, but if you were to load a huge dataset like Paris, I think these functions would take to much time on each HTTP request to filter. I’m just trying to find excuses to do some more because I’m bored :)

So let’s create a big store which will contain all the information already tailored for the API, so that all we have to do is a matter of get-in that store to get what we need.

Such kind of structure should be great:

{
    "agencies": [],
    "routes": [],
    "stops": [],
    "trips": [],
    "store": {
        "SEILLE": {
            "Mettis B": {  
                "MB - HOPITAL MERCY": [
                    "Here, some time-table please"
                ],
                "MB - UNIVERSITE SAULCY": [
                    "Here, some time-table please"
                ]
            }
        }
    }
}

We already have the agencies, routes, stops and trips. We also already know how to retrieve the following:

  1. The different distinct stop names aka store.SEILLE
  2. The routes passing by a stop aka store.SEILLE.'Mettis B'
  3. The destination of a specific route passing by a stop aka store.SEILLE.'Mettis B'.'MB - UNIVERSITE SAULCY'
  4. The trip ids of a specific route + stop + destination.

So it’s all just about mapping functions we already have to build a giant structure so that it will be very easy for us to write our REST API.

Building the index

We’re going to build an intermediate index with what we need:

  • agencies
  • routes + routes indexed by their route-id (routes-map, for performance)
  • stops
  • trips + trips indexed by their trip-id (trips-map, for performance)
  • trips indexed by route-id (trips-by-route, for performance)
  • stop times
  • stop times indexed by stop id (stop-times-by-stop, again for performance)

This is specific to what I have in mind, it’s not necessary to understand how it’s build, the key point is that we have the structure that I explained above so that the API just navigates in it, everything is prebuilt on startup.

The different distinct stop names aka store.SEILLE:

(defn distinct-stop-names-with-ids
      "Find the distinct stops name with their associated ids."
      [{:keys [stops]}]
           (map #(vector % (map :stop-id (stops-with-name %)))
                (sort (distinct (map :name stops)))))

(distinct-stop-names-with-ids {:stops stops})
; =>
; (["11ème D'AVIATION" ("11AVIAT1" "11AVIAT2")]
;  ["19 NOVEMBRE" ("19NOV01" "19NOV02")]
;  ["8 MAI 45" ("08MAI01" "08MAI02")]
;  ["ABBE BAUZIN" ("ABBEBAU1" "ABBEBAU2")]
;  ["ACACIAS" ("ACACIAS1" "ACACIAS2")]
;  ["ACTISUD DUNIL" ("ACTISUD1" "ACTISUD2")]
;  ["ALGER" ("ALGER1" "ALGER2")]
;  ["ALSACE" ("ALSACE1" "ALSACE2")]
;  ...         

The routes passing by a stop aka store.SEILLE.'Mettis B':

(defn trip-ids-for-stops
  "Find the trip ids for given number of stops."
  [stops stop-times]
  (set (map :trip-id (flatten (map #(get stop-times %) stops)))))

(defn distinct-route-ids-for-trips
  "Find the distinct route ids for a given number of trips."
  [trip-ids trips]
  (distinct (map :route-id (map #(get trips %) trip-ids))))

(defn routes-passing-by-stop-ids
      "Find the routes passing by a given number of stops."
      [stop-ids {:keys [routes-map stop-times-by-stop trips-map]}]
      (filter some? (map #(get routes-map %)
                         (distinct-route-ids-for-trips (trip-ids-for-stops stop-ids stop-times-by-stop)
                                                       trips-map))))

; test it works

(map :long-name (routes-passing-by-stop-ids #{"PIERNE01"} store))
; => ("Proxis 113" "Ligne 5")

(map :long-name (routes-passing-by-stop-ids #{"REPUBL01"} store))
; => ("Ligne 3" "Ligne 5")

(map :long-name (routes-passing-by-stop-ids #{"REPUBL88"} store))
; => ("Ligne 1" "Ligne 4" "Citeis 11" "Navette CITY")

The destination of a specific route passing by a stop aka store.SEILLE.'Mettis B'.'MB - UNIVERSITE SAULCY':

(defn times-by-headsign-at-stop
  "Find the times a bus stops at a specific stop grouped by headsign."
  [stop route-id store]
    (group-by #(:headsign (get (:trips-map store) (:trip_id %)))   ; group trips by trip headisgn
            (sort-by (juxt :arrival_time :departure-time)               ; sort by arrival-time then departure-time
                     (filter #(contains? (set (get (:trips-by-route store) route-id)) ; filter trips
                                         (get (:trips-map store) (:trip-id %)))       ; on routes they're on
                             (get (:stop-times-by-stop store) stop)))))    ; get the times at specified stop

(keys (times-by-headsign-at-stop "PIERNE01" "5-77" store))
; => ("L5a - MAISON NEUVE")   

(count (get (times-by-headsign-at-stop "PIERNE01" "5-77" store) "L5a - MAISON NEUVE"))
; => 194                        

Having written all that, we just need to use them all to create our desired structure:

(defn make-api-struct
  "Construct the struct that will be searched by the REST API."
  [store]
  (->> (distinct-stop-names-with-ids store)                 
       (map #(hash-map (first %)                 ; create hash-map with stop name as key                    
                       (apply merge-with into    ; and as value:
                              (map (fn [r] (hash-map (:route_long_name r) ; a hash-map with route name as key
                                                     (map (fn [stop] (times-by-headsign-at-stop stop (:route_id r) store)) ; and as value the times grouped by destination
                                                          (second %))))
                                   (routes-passing-by-stop-ids (second %) store))
                              )))
       (apply merge-with into)))

Then create our load-gtfs method to build it once the parsing of the GTFS files has terminated:

(defn load-gtfs
  "Load the GTFS data, and make a store of it."
  [path]
  (let [routes (load-records (str path "/routes.txt") ->Route)
        trips (load-records (str path "/trips.txt") ->Trip)
        stop-times (load-records (str path "/stop_times.txt") ->StopTime)]
    (let [store {:agencies           (load-records (str path "/agency.txt") ->Agency)
                 :routes             routes
                 :routes-map         (into {} (map #(vector (:route_id %) %) routes))
                 :stops              (load-records (str path "/stops.txt") ->Stop)
                 :trips              trips
                 :trips-map          (into {} (map #(vector (:trip_id %) %) trips))
                 :trips-by-route     (apply merge-with into (map #(hash-map (:route_id %) [%]) trips))
                 :stop-times         stop-times
                 :stop-times-by-stop (apply merge-with into (map #(hash-map (:stop_id %) [%]) stop-times))}]
      (assoc store :data (make-api-struct store))))) ; <-- here!

(time (def store (load-gtfs "/tmp/gtfs")))
; Loading /tmp/gtfs/routes.txt...
; Loading /tmp/gtfs/trips.txt...
; Loading /tmp/gtfs/stop_times.txt...
; Loading /tmp/gtfs/stops.txt...
; Loading /tmp/gtfs/agency.txt...
; "Elapsed time: 540.088 msecs"
; => #'user/store

(keys store)
; => (:trips-by-route :routes :stops :stop-times-by-stop :trips-map :stop-times :trips :routes-map :agencies :data)

(keys (:data store))
; => 
;   ("PREFECTURE"
;    "SERRET"
;    "LECLERC"
;    "P+R ROCHAMBEAU"
;    "PUYMAIGRE"
;    "GENDARMERIE"
;    "LA MAXE"
;    ...

(count (keys (:data store)))
; => 491

(count (distinct (map :name stops)))
; => 491

Awesome, and seems ok performance wise.

Now how do we use that structure I can hear you saying? It’s just a matter of get-in the actual structure.

We’ll get to that in the next section.

REST API

We’re going to build a simple API with compojure, I’m going to name it monmet (my Met’, Le Met’ being the name of the bus network in Metz).

Init the project

Stop your REPL, modify the deps.edn file to put this content in it.

{:paths ["resources" "src"]
 :deps {
        org.clojure/clojure {:mvn/version "1.10.1"}
        ring {:mvn/version "1.7.1"}
        ring/ring-core {:mvn/version "1.7.1"}
        ring/ring-json {:mvn/version "0.5.0"}
        ring/ring-defaults {:mvn/version "0.3.2"}
        ring/ring-jetty-adapter {:mvn/version "1.7.1"}
        com.rpl/specter {:mvn/version "0.11.2"}
        mount {:mvn/version "0.1.10"}
        compojure {:mvn/version "1.6.1"}}}

And create the needed directories:

$ mkdir -p src/monmet
$ cd src/monmet
$ touch api.clj data.clj gtfs.clj handler.clj main.cl

Loading

In order to load one time our datastore (all the GTFS files) we’re going to use mount. It’s a very simple library, an alternative to components.

Let’s do so by modifying the data.clj file with the following code:

(ns monmet.data
  (:require [monmet.gtfs :as gtfs]
            [mount.core :as mnt]))

(mnt/defstate store :start (gtfs/load-gtfs "/tmp/gtfs"))

(defn store-has-entry?
  "Check if the given entry is a valid one in our datastore."
  [entry]
  (contains? #{:agencies :routes :stops :trips :stop-times} entry))

Here we have defined a state named store that will on call to (mnt/start) invoke (gtfs/load-gtfs "/tmp/gtfs") which returns a map holding the various GTFS files content. It will be our datastore and it will be available in every namespace of our own, and it is guaranteed to be loaded only one time since it’s an expensive operation.

Routes

We will be using ring and compojure to create the simple REST API we need.

The skeleton looks like this:

(ns monmet.handler
  (:require [compojure.core :refer :all]
            [compojure.handler :as handler]
            [ring.middleware.json :as middleware]
            [ring.util.response :refer [response]]
            [compojure.route :as route]
            [mount.core :as mnt]))

(defroutes app-routes
   (GET "/hello" [] (response {:foo "bar"}))
   (route/not-found "Not Found"))

(def app
  (do
    (mnt/start) ; mount up the datastore
    (-> (handler/site app-routes)
        (middleware/wrap-json-body {:keywords? true}) ; transform fields to clojure keywords
        middleware/wrap-json-response)))

GET /gtfs/:entry

On GET to /gts/:entry where :entry is one of :agencies, :routes, :stops, :trips, :stop-times we will return what’s inside the GTFS file directly, we’re just serving the CSV files as JSON.

We’ll need two methods of our data namespace: store which is our datastore holding the precious data, and store-has-entry? which is a little helper to know if the requested entry really exists.

When we’re done it looks like this:

(ns monmet.handler
  (:require [compojure.core :refer :all]
            [compojure.handler :as handler]
            [ring.middleware.json :as middleware]
            [ring.util.response :refer [response]]
            [compojure.route :as route]
            [mount.core :as mnt]
            [monmet.data :refer [store store-has-entry?]]))

(defroutes app-routes
   ; Expose GTFS files directly, just for fun :)
   (GET "/gtfs/:file" [file]
     (let [entry (keyword file)]      ; transform URL path to keyword
       (if (store-has-entry? entry)   
         (response (entry store))         ; in case the entry exist, just serve what's in the datastore
         (route/not-found "Not Found")))) ; otherwhise, 404
  (route/not-found "Not Found"))

(def app
  (do
    (mnt/start) ; mount up the datastore
    (-> (handler/site app-routes)
        (middleware/wrap-json-body {:keywords? true})
        middleware/wrap-json-response)))

Test it works:

Run the program with clj -m monmet.main and use curl

$ curl http://localhost:8080/gtfs/agencies
[
    {
        name: "SAEML TAMM (Le Met')",
        url: "https://lemet.fr",
        tz: "Europe/Paris",
        lang: "",
        phone: ""
    }
]

GET /api/...

Bus routes

To make things clearer we’ll create another namespace called api where we will really need our gtfs namespace to do some work.

Let’s start with routes. I’d like to return the content of routes.txt but enhanced with the Agency corresponding to the :agency_id field of each Route.

(ns monmet.api
  (:require [monmet.gtfs :as gtfs]))

(defn get-routes
  "Retrieve routes."
  [store]
  (:routes store))          

We just need to expose that very api/get-routes in our compojure defroutes:

(defroutes app-routes
   ; ...
   (GET "/api/routes" [] (response (api/get-routes store)))
   ; ...
  (route/not-found "Not Found"))

That’s it, test it using curl (or even your browser):

$ curl -X GET http://localhost:3000/api/routes
[
  {
    "route_id": "1-100",
    "route_short_name": "1",
    "route_long_name": "Ligne 1",
    "route_desc": "",
    "route_type": "3",
    "route_url": ""
  },
  {
    "route_id": "2-5",
    "route_short_name": "2",
    "route_long_name": "Ligne 2",
    "route_desc": "",
    "route_type": "3",
    "route_url": ""
  },
   ...
]

Bus stops

Now that our bus routes are returned, we really need some bus stops or we won’t be able to enter the damn bus!

Get back to the api namespace in order to create two methods, one to retrieve the distinct bus stops sorted by name and one to retrieve all the bus stops matching a name.

(ns monmet.api
  (:require [monmet.gtfs :as gtfs]))

; ...

(defn get-stops
  "Retrieve the distinct stop names."
  [store]
  (gtfs/distinct-stop-names-with-ids store))

(defn get-stop-with-name
  "Retrieve the stops matching (strict equality) a name."
  [stop-name store]
  (gtfs/stops-with-name stop-name (:stops store)))

Expose them in the API:

(GET "/api/stops" [] (response (api/get-stops store)))
(GET "/api/stops/:name" [name] (response (api/get-stop-with-name name store)))

Test it:

$ curl -X GET http://localhost:3000/api/stops | jq '.'
[
  [
    "11ème D'AVIATION",
    [
      "11AVIAT1",
      "11AVIAT2"
    ]
  ],
  [
    "19 NOVEMBRE",
    [
      "19NOV01",
      "19NOV02"
    ]
  ],
  ...
]

$ curl -X GET http://localhost:3000/api/stops/PIERNE
[
  {
    "stop_id": "PIERNE01",
    "stop_code": "257",
    "stop_name": "PIERNE",
    "stop_desc": "",
    "stop_lat": "49.102273",
    "stop_lon": "6.179863",
    "location_type": "",
    "parent_station": "http://lemet.fr/screen/index2.php?stop=257"
  },
  {
    "stop_id": "PIERNE02",
    "stop_code": "234",
    "stop_name": "PIERNE",
    "stop_desc": "",
    "stop_lat": "49.102558",
    "stop_lon": "6.179825",
    "location_type": "",
    "parent_station": "http://lemet.fr/screen/index2.php?stop=234"
  }
]

Working from the first try. It’s not even funny!

Routes passing by a bus stop

I said earlier that I’d like the user to first select the stop, then the route, so we need to propose an endpoint to do that, and it will be bound to /api/stops/:name/routes.

This endpoint needs to return the different routes passing by the specified stops, and also for each of these route return the associated possible headsigns.

(ns monmet.api
  (:require [monmet.gtfs :as gtfs]))

; ...

(defn direction-names-for-stop-and-route [stop route store]
  (sort (filter not-empty
                (flatten (map keys (get-in store [:data stop route]))))))

(defn route-names-and-directions-for-stop [stop store]
  (map #(hash-map % (direction-names-for-stop-and-route stop % store))
       (route-names-for-stop stop store)))

; ...

Expose this in the REST API:

(GET "/api/stops/:name/routes" [name]
  (response (api/route-names-and-directions-for-stop name store)))

Test it’s working:

$ curl -X GET http://localhost:3000/api/stops/REPUBLIQUE/routes | jq '.'
[
   {
      "name": "Ligne 5",
      "directions": [
         "L5 - FORT MOSELLE",
         "L5a - MAISON NEUVE",
         "L5e - MAGNY PAR RUE AU BOIS",
         "L5f - MAGNY PAR RUE DE POUILLY"
      ],
      "route": {
         "route_id": "5-77",
         "route_short_name": "5",
         "route_long_name": "Ligne 5",
         "route_desc": "",
         "route_type": "3",
         "route_url": ""
      }
   },
   {
      "name": "Proxis 113",
      "directions": [
         "P113 - POLE MULTIMODAL",
         "P113 - POUILLY"
      ],
      "route": {
         "route_id": "113-91",
         "route_short_name": "113",
         "route_long_name": "Proxis 113",
         "route_desc": "",
         "route_type": "3",
         "route_url": ""
      }
   }
]

Working as expected :)

Still we miss one piece of the puzzle, the time table.

Times for a trip at a bus stop

Final piece of our API, we need to retrieve the times of day when a bus stops at our favorite bus stop. Besides, we want to query for multiple route+direction at the same stop, because sometimes you just want to go somewhere and there are multiple headsigns that goes where you want to go, so you don’t care what bus you take.

Open the api namespace and write some code:

(defn times-for-stop-route-and-direction [stop route direction store]
  (map #(assoc % :headsign direction)
       (get (first (filter #(contains? (set(keys %)) direction)
                           (get-in store [:data stop route]))) direction)))

(defn today 
  "Return the keyword for today, moday -> :mon, and so on."
  []
  (-> (LocalDate/now) 
      .getDayOfWeek 
      (.getDisplayName TextStyle/SHORT Locale/ENGLISH) 
      .toLowerCase 
      keyword))

(defn now 
  "Return the time now."
  [] 
  (-> (LocalDateTime/now)
      (.format DateTimeFormatter/ISO_LOCAL_TIME)
      (subs 0 8)))

(defn times-for-stop-and-multiple-route-direction [stop routes-directions store]
  (->> (parse-routes-directions routes-directions)
       (map #(times-for-stop-route-and-direction stop (first %) (nth % 1) store))
       (flatten)
       (filter #(trip-for-day? (:trip-id %) (today) (:calendar store) (:trips store)))
       (filter #(= 1 (compare (:arrival-time %) (now))))
       (sort gtfs/by-arrival-time)))

Expose this in the REST API:

(GET "/api/stops/:name/routes/:routes" [name routes]
  (response (api/times-for-stop-and-multiple-route-direction name routes store)))

The format I decided is the following Route:::Headsign, multiple routes/headsign can be separated by :::.

Let’s encode some URI using this useful encoder that I just found on Google: http://meyerweb.com/eric/tools/dencoder/.

  • Ligne 5 = Ligne%205
  • L5a - MAISON NEUVE = L5a%20-%20MAISON%20NEUVE
  • L5 - FORT MOSELLE = L5%20-%20FORT%20MOSELLE

The URL to call is: http://localhost:8080/api/stops/PIERNE/routes/Ligne%205:::L5a%20-%20MAISON%20NEUVE—Ligne%205:::L5%20-%20FORT%20MOSELLE

$ curl -X GET 'http://localhost:8080/api/stops/PIERNE/routes/Ligne%205:::L5a%20-%20MAISON%20NEUVE---Ligne%205:::L5%20-%20FORT%20MOSELLE' \
    | jq 'map(.arrival_time + " " + .headsign)' | head -3
[
  "05:19:05 L5a - MAISON NEUVE",
  "05:43:05 L5a - MAISON NEUVE",
  "06:05:32 L5a - MAISON NEUVE",
  "06:22:32 L5a - MAISON NEUVE",

Exciting isn’t it? These are all the bus I can take every morning to get to the train station :).

Can you spell it?

At first I wanted to create an iOS app for displaying it, but the iPhone’s Shortcuts app can just fetch the URL and spell it.

Just duplicate the code for the last route we created, add /spell to the URL and then just generate a French sentence fo it.

(ns monmet.api
  ...)

; ...

(defn make-sentence [{:keys [headsign arrival_time]}]
  (str headsign ", arivant a " (str/replace (subs arrival_time 0 5) #":" " heures ") "."))

(defn times-for-stop-and-multiple-route-direction-spelled [stop routes-direction store]
  (let [result (times-for-stop-and-multiple-route-direction stop routes-direction store)
        next-3 (map make-sentence (take 3 result))]
    (str "Les 3 prochains bus a l'arret \"" stop "\" sont les suivants.\n\n- " (apply str (interpose "\n- " next-3)))))

; ...

and the route:

 (GET "/api/stops/:name/routes/:routes/spell" [name routes]
       (response (api/times-for-stop-and-multiple-route-direction-spelled name routes store)))

Then run the server again, and launch ngrok:

$ clj -m monmet.main &
$ ngrok http 8080
ngrok by @inconshreveable                                                                               (Ctrl+C to quit)                                              
Session Status                online                
Session Expires               7 hours, 51 minutes   
Version                       2.3.35                
Region                        United States (us)    
Web Interface                 http://127.0.0.1:4040 
Forwarding                    http://2769085b.ngrok.io -> http://localhost:8080  
Forwarding                    https://2769085b.ngrok.io -> http://localhost:8080

Then just create a shortcut in your iPhone like this:

And finally, ask Siri for it:

BAM.

Done.

I have now three different shortcut for when I need to go to the train station or city center, or when I’m in the city and wan’t to get back home.

Works from anywhere, every day. I just need to download the GTFS files from time to time (two times per year).

All this with under 250 lines of badly written, non optimized Clojure 🤠.

Until next time 🤘 !

Alexandre Grison - //grison.me - @algrison