Quickstart
This quick start will guide you to try out Optimus fast without getting into many details. As part of this, you will be provided with step-by-step instructions to start Optimus server, connect Optimus client with server, create BigQuery resource through Optimus, create BigQuery to BigQuery job, and deploy it.
Prerequisite
- Docker or a local installation of Optimus.
- Postgres database.
- BigQuery project
- Airflow
- This is not mandatory to complete the quick start, but needed for scheduling jobs.
Step 1: Start Server
Start server with GOTO’s BigQuery to BigQuery plugin.
Create a config.yaml file:
version: 1
log:
level: debug
serve:
port: 9100
host: localhost
ingress_host: localhost:9100
app_key:
db:
dsn: postgres://<dbuser:dbpassword>@localhost:5432/dbname?sslmode=disable
plugin:
artifacts:
- https://github.com/goto/transformers/releases/download/v0.3.15/transformers_0.3.15_macos_x86_64.tar.gz
Note: make sure you put artifacts link that suitable to your system.
Start Server
With the config.yaml available in the same working directory, you can start server by running:
$ optimus serve --install-plugins
This will automatically install the plugins as specified in your server configuration.
Step 2: Connect Client With Server
Go to the directory where you want to have your Optimus specifications. Create client configuration by using
optimus init
command. An interactive questionnaire will be presented, such as below:
$ optimus init
? What is the Optimus service host? localhost:9100
? What is the Optimus project name? sample_project
? What is the namespace name? sample_namespace
? What is the type of data store for this namespace? bigquery
? Do you want to add another namespace? No
Client config is initialized successfully
After running the init command, Optimus client config will be configured. Along with it, the directories for the chosen namespaces, including the subdirectories for jobs and resources will be created with the following structure:
sample_project
├── sample_namespace
│ └── jobs
│ └── resources
└── optimus.yaml
Below is the client configuration that has been generated:
version: 1
log:
level: INFO
format: ""
host: localhost:9100
project:
name: sample_project
config: {}
namespaces:
- name: sample_namespace
config: {}
job:
path: sample_namespace/jobs
datastore:
- type: bigquery
path: sample_namespace/resources
backup: {}
Let’s add storage_path
project configuration that is needed to store the result of job compilation and
scheduler_host
which is needed for compilation.
project:
name: sample_project
config:
storage_path: file:///Users/sample_user/optimus/sample_project/compiled
scheduler_host: http://sample-host
Note: storage path is the location where airflow is reading its dags from.
Now, let's register sample_project
and sample_namespace
to your Optimus server.
$ optimus project register --with-namespaces
You can verify if the project has been registered successfully by running this command:
$ optimus project describe
Step 3: Create BigQuery resource
Before creating BigQuery resources, make sure your Optimus server has access to your BQ project by adding a
BQ_SERVICE_ACCOUNT
secret.
Assume you have your service account json file in the same directory (project directory), create the secret using the following command. Make sure the service account that you are using is authorized to create tables.
$ optimus secret set BQ_SERVICE_ACCOUNT --file service_account.json
Check whether the secret has been registered successfully by running this command.
$ optimus secret list
Now, let’s create a resource using the following interactive command.
$ optimus resource create
? Please choose the namespace: sample_namespace
? What is the resource name? sample-project.sample_namespace.table1
? What is the resource type? table
? Provide new directory name to create for this spec? [sample_namespace/resources] sample-project.sample_namespace.table1
Resource spec [sample-project.sample_namespace.table1] is created successfully
Note: resource name should be unique within the project. Take a look at the complete guide on how to create resource here if needed.
After running the command, the resource specification file will be automatically created in the following directory:
sample_project
├── sample_namespace
│ └── jobs
│ └── resources
| └── sample-project.sample_namespace.table1
| └── resource.yaml
└── optimus.yaml
Let’s open the resource.yaml file and add additional spec details as follows:
version: 1
name: sample-project.sample_namespace.table1
type: table
labels: {}
spec:
description: "sample optimus quick start table"
schema:
- name: sample_day
type: STRING
mode: NULLABLE
- name: sample_timestamp
type: TIMESTAMP
mode: NULLABLE
Now that resource specification is complete, let’s deploy this to the Optimus server and it will create the resource in BigQuery.
$ optimus resource upload-all --verbose
> Validating namespaces
namespace validation finished!
> Uploading all resources for namespaces [sample_namespace]
> Deploying bigquery resources for namespace [sample_namespace]
> Receiving responses:
[success] sample-project.sample_namespace.table1
resources with namespace [sample_namespace] are deployed successfully
finished uploading resource specifications to server!
Step 4: Create & Deploy Job
Sync plugins to your local for optimus to provide an interactive UI to add jobs, this is a prerequisite before creating any jobs.
$ optimus plugin sync
Let’s verify if the plugin has been synced properly by running below command.
$ optimus version
You should find bq2bq
plugin in the list of discovered plugins.
To create a job, we need to provide a job specification. Let’s create one using the interactive optimus job command.
$ optimus job create
? Please choose the namespace: sample_namespace
? Provide new directory name to create for this spec? [.] sample-project.sample_namespace.table1
? What is the job name? sample-project.sample_namespace.table1
? Who is the owner of this job? sample_owner
? Select task to run? bq2bq
? Specify the schedule start date 2023-01-26
? Specify the schedule interval (in crontab notation) 0 2 * * *
? Window truncate to: d
? Window offset: 0
? Window size: 24h
? Project ID sample-project
? Dataset Name sample_namespace
? Table ID table1
? Load method to use on destination REPLACE
Job successfully created at sample-project.sample_namespace.table1
Note: take a look at the details of job creation here.
After running the job create command, the job specification file and assets directory are created in the following directory.
├── sample_namespace
│ └── jobs
| └── sample-project.sample_namespace.table1
| └── assets
| └── query.sql
| └── job.yaml
│ └── resources
| └── sample-project.sample_namespace.table1
| └── resource.yaml
└── optimus.yaml
For BQ2BQ job, the core transformation logic lies in assets/query.sql
. Let’s modify the query to the following script:
SELECT
FORMAT_DATE('%A', CAST("{{ .DSTART }}" AS TIMESTAMP)) AS `sample_day`,
CAST("{{ .DSTART }}" AS TIMESTAMP) AS `sample_timestamp`;
Note: take a look at Optimus’ supported macros here.
Let’s also verify the generated job.yaml file.
version: 1
name: sample-project.sample_namespace.table1
owner: sample_owner
schedule:
start_date: "2023-01-26"
interval: 0 2 * * *
behavior:
depends_on_past: false
task:
name: bq2bq
config:
DATASET: sample_namespace
LOAD_METHOD: REPLACE
PROJECT: sample-project
SQL_TYPE: STANDARD
TABLE: table1
window:
size: 24h
offset: "0"
truncate_to: d
labels:
orchestrator: optimus
hooks: []
dependencies: []
For this quick start, we are not adding any hooks, dependencies, or alert configurations. Take a look at the details of job specification and the possible options here.
Before proceeding, let’s add the BQ_SERVICE_ACCOUNT secret in the task configuration.
task:
name: bq2bq
config:
BQ_SERVICE_ACCOUNT: "{{.secret.BQ_SERVICE_ACCOUNT}}"
DATASET: sample_namespace
...
Later, you can avoid having the secret specified in every single job specification by adding it in the parent yaml specification instead. For more details, you can take a look here.
Now the job specification has been prepared, lets try to add it to the server by running this command:
$ optimus job replace-all --verbose
> Validating namespaces
validation finished!
> Replacing all jobs for namespaces [sample_namespace]
> Receiving responses:
[sample_namespace] received 1 job specs
[sample_namespace] found 1 new, 0 modified, and 0 deleted job specs
[sample_namespace] processing job job1
[sample_namespace] successfully added 1 jobs
replace all job specifications finished!
Above command will try to add/modify all job specifications found in your project. We are not providing registering a single job through Optimus CLI, but it is possible to do so using API.
Now that the jobs has been registered to Optimus, let’s compile and upload it to the scheduler by using the following command.
$ optimus scheduler upload-all
The command will try to compile your job specification to the DAG file. The result will be stored in the storage_path
location as you have specified when configuring the optimus.yaml file.
Later, once you have Airflow ready and want to try out, this directory can be used as a source to be scheduled by Airflow.