Kotlin script to download NYC Yellow Taxi Data

I am trying to do two things:

Use more Java (ecosystem)
Use less Python (get out of the goldilocks zone)

So, the natural answer is to use Kotlin ;)

—

I wrote a “throwaway” script to download NYC Yellow Taxi data from here

import java.net.URL
import java.nio.file.Files
import java.nio.file.Paths
 
fun main(args: Array<String>) {
 
    for(year in 2009..2022) {
        for (month in 1 .. 12) {
            var uri = "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_${year}-${month.toString().padStart(2, '0')}.parquet"
            var fileName = uri.split("/").last()
            if (year == 2022 && month > 6) { //they only have data upto 2022-06
                continue
            }
            println("${uri} -> ${fileName}")
            var url = URL(uri)
            // yes, this does not handle exceptions
            // it's a script, YOLO
            url.openStream().use { Files.copy(it, Paths.get(fileName)) }
        }
    }
 
}

Some observations about this code:

didn’t have to use a third party library like requests in Python, like I usually reach for.
This isn’t that much longer than an equivalent Python script, except for curly braces
String interpolation - ${uri} is a must have. Don’t know why Python didn’t have f-strings for so long.
no semicolons is a nice touch
an integrated IDE support out of the box…? chef’s kiss.

—

Followup - I plan to:

Take DuckDB for a spin using these parquet data files.
Play with Tantivy and “search indexes” and see if Tantivy et al can be a replacement for Solr for certain use cases.

btbytes.com

Kotlin script to download NYC Yellow Taxi Data

Graph View

Backlinks