Update your raw data

The qTest Data Export API returns your daily delta data as a set of CRUD operations in a flat parquet file, which you can use to update the historical data in your schema. To update your data, you need to sort your downloaded parquet file and apply CRUD operations to the historical data in your storage database. This creates a current state for your data that you can use with your BI tools.

We suggest that you have one of your data engineers create a program to automatically download and update the parquet files, as this operation needs to be performed for each daily delta update you receive through the API.

The code examples below are in Python 3.9.7. You can use these samples as reference material to understand how you can update your data, but the details of your infrastructure will likely look different.

Before you start

Download your parquet files through the Data Export API.

Have your database administrator set up a database. You can do this with our sample data, or you can do it once you receive your historical data through the API. You can find the sample data after you log into the Support Hub.

Sort your parquet files

The API sends you tables in multiple parts. To begin converting your data into a usable state, sort the parquet files by file name.

To sort your files, follow these steps:

  1. Import all files for the day, including tables with multiple parts.

    You can download 41 tables through the API.

  1. Sort the parquet files for each table by date, major chunk ID, then minor chunk ID in ascending order. For example, 0001_part_01, 0001_part_02, 0002_ part_01, etc.

    Note that the API returns files in the following name format.YYYY-MM-DD/<table_name>/<MajorChunkId>_part_<MinorChunkId>.parquet.

    Tables are split by part numbers every 10 MBs. For example, 1GB of data is 1000 MB, or 100 parts.

    all_files = os.listdir(path)
    parquet_files = [x for x in all_files if x.endswith(‘.parquet’)]
    parquet_files.sort(key=lambda f: int("".join(filter(str.isdigit, f)))) Output example: ['0000_part_00.parquet',
    '0001_part_00.parquet',
    '0002_part_00.parquet',
    '0011_part_00.parquet']

Apply operations to create a current state

The first data delivery from the API is your historical data, which is a picture of all of your qTest data up to the present day. Every parquet file after this is your delta data, which are CRUD operations that you can apply to update your historical data.

After you create a database and sort your parquet files, you'll apply the CRUD operations you receive in the parquet files for your delta data to upsert or delete records in your database each day. ​

In the parquet files, upsert operations are denoted by u. This is an EITHER operation, which can perform the following:

  • It inserts the data in your schema if the primary key does not exist.

  • It updates the data in your schema if the primary key does exist.

Delete operations are denoted by d. This operation deletes the record with the primary key.

For specific examples on how to handle upsert statements in common database structures, check out these resources:

To apply operations, follow these steps:

  1. In the parquet file, identify which records are labeled as upsert and which records are delete.

    upsert_df = df[df.operation=='u']	
    delete_df = df[df.operation=='d']
  2. Out of the upsert records, identify which records do not exist in your database and insert those records into the database.

    insert_df = upsert_df[~upsert_df.index.isin(output_df.index)]
    
    output_df = pd.concat([output_df,insert_df])		
  3. Identify the records that already exist in your database and update those records.

    update_df = upsert_df[upsert_df.index.isin(output_df.index)]
    output_df.update(update_df,join='left',overwrite=True)		
  4. Remove all rows listed in the parquet file as delete from the database.

    output_df = output_df[~output_df.index.isin(delete_df.index)]

After you apply CRUD operations, your BI engineer can use your data by joining it to external data, creating views, or importing it into your BI tool.

What's next

Now that you've set up your tables to update, you can import your data into an external viewing tool, such as PowerBI, or you can set up your own custom program.

If you still have questions, make sure you take a look at the FAQ. Or, to learn more about the raw data offered by each field, take a look at the Data Export table schemas.