YAML config

Tables

You can configure multiple table sources using YAML config, which supports more advanced format specific table options. For example:

addr:
  # binding address for TCP port that speaks HTTP protocol
  http: 0.0.0.0:8080
  # binding address for TCP port that speaks Postgres wire protocol
  postgres: 0.0.0.0:5432
tables:
  - name: "blogs"
    uri: "test_data/blogs.parquet"

  - name: "ubuntu_ami"
    uri: "test_data/ubuntu-ami.json"
    option:
      format: "json"
      pointer: "/aaData"
      array_encoded: true
    schema:
      columns:
        - name: "zone"
          data_type: "Utf8"
        - name: "name"
          data_type: "Utf8"
        - name: "version"
          data_type: "Utf8"
        - name: "arch"
          data_type: "Utf8"
        - name: "instance_type"
          data_type: "Utf8"
        - name: "release"
          data_type: "Utf8"
        - name: "ami_id"
          data_type: "Utf8"
        - name: "aki_id"
          data_type: "Utf8"

  - name: "spacex_launches"
    uri: "https://api.spacexdata.com/v4/launches"
    option:
      format: "json"

  - name: "github_jobs"
    uri: "https://web.archive.org/web/20210507025928if_/https://jobs.github.com/positions.json"

Key value stores

Table sources can be loaded into in-memory key value stores if you specify which two columns to be used to load keys and values in the config:

kvstores:
  - name: "spacex_launch_name"
    uri: "test_data/spacex_launches.json"
    key: id
    value: name

The above config will create a keyvalue store named spacex_launch_name that allows you to lookup SpaceX launch names using launch ids.

DataFusion configuration

You can override DataFusion configuration settings by specifying them in the datafusion section of your config file. This allows you to tune the query engine's behavior for your specific use case:

datafusion:
  "execution.collect_statistics": "true"
  "execution.batch_size": "8192"
  "sql_parser.enable_ident_normalization": "true"

The datafusion field accepts a map of configuration key-value pairs where both keys and values are strings. You can reference the DataFusion configuration documentation for a complete list of available configuration options.

Specify a config file on startup

Use -c argument to run ROAPI using a specific config file:

roapi -c ./roapi.yml