Customers with the cache add-on have the option of adding their data to the Periscope Cache.
When Periscope Data caches a table, all of the table's rows are copied into the cache. Three different cache population strategies are used to keep the cache's version of a database table in sync with it's database counterpart. Many tables use multiple caching strategies in parallel on different schedules to maximize freshness and minimize differences.
This is the most basic and heavyweight option. A complete copy of the table is downloaded on a schedule and replaces the version in the cache. Every table starts with this strategy because at first the cache is empty and this is the most efficient way of transferring all of the rows. Full updates have high latency, but are necessary for tables that change throughout or for warehouses that get completely rebuilt as part of nightly processes.
For append-only tables, incremental updates are the obvious choice. As long as the table has a column with a monotonically increasing value (e.g. a numeric
created_at timestamp) it's easy to grab the newest rows every few minutes.
The Incremental Updates strategy is frequently paired with Full Updates for tables that are growing and changing at the same time. On a typical
users table, this combination of caching strategies gets new users into the cache very quickly, and occasionally updates all of the historical users in case their rows changed.
We grab the newest rows by comparing the maximum value of the import column between the cached and origin tables:
-- max value in the cache is 123456 select * from users where id > 123456;
The third strategy is for large tables that only update recent rows. This is especially useful when the table is too large to regularly do a Full Update, and is not truly append-only.
We see this most often with tables that hold objects soon to be changed by a background process, like email and transaction logs tables. These rows get marked as successful or failed a few minutes after they are created.
Note: Bounded Updates are only supported on tables with a timestamp import column and the bound must be in terms of the import column.
Every time it's scheduled, the Bounded Updates strategy replaces the most recent N days of rows. The Periscope Cache rows are replaced by rows in the customer database that have an import column value after the start of the bound. With the Bounded Update strategy, there is a short period of time where the rows have been removed from the Periscope Cache but have not been replaced with the updated rows.
For example, a configuration that fetches and replaces the last 3 days of rows every day using a
created_at import column. This strategy will first remove all rows in the periscope cache where
created_at is in the last 3 days. It will then import rows from the customer database where
created_at is greater than the largest value that exist on the cache.
With almost all caching strategies, Periscope Data grabs data in batches of 25,000 rows and re-queries the table until all rows have been acquired. To do this, we use an integer or timestamp import column that is unique, non-null, and monotonically increasing value.
Common columns that work well as import columns include
id columns that are auto-incrementing with new rows, and
created_at timestamps that are unique and indexed.
The Periscope Cache population strategies rely on these import columns to find new rows, and can miss rows if the expectations are not met.
New rows are expected to have larger values than old rows. Non-integer IDs and UUIDs usually do not meet this requirement. Thus, import columns must be:
Admin users can add new tables to the Periscope cache, as well as edit current cache settings and remove tables from the cache.
To start caching tables, navigate to the cache menu via the settings icon in the bottom left corner of the screen. If the cache option is not visible, Periscope Support may need to configure the database for the Cache.
Open the caching menu for a table by clicking the row that corresponds to that table. Select the Caching Strategy from the dropdown that best fits the table as described in the strategies above.
After selecting a caching strategy, choose an import column from the dropdown. Click the question mark to find out more information on selecting the appropriate import column.
The Full Update of the cache can be set to run at an interval.
Alternatively, the Full Update can be scheduled to run at specific time in UTC every day.
If an "Incremental" or "Full and Incremental" Caching Strategy has been selected, the Incremental Update interval can be set.
Once a table has been configured for the Cache, click "Save" to start the caching process.
If a "Bounded Updates" Caching Strategy is selected, both the update interval and update window will need to be selected. The interval is the frequency at which the updates will occur and the window is how far back from the current time the rows will be refreshed.
In the above screenshot, the annotations table has been set to perform bounded updates every 24 hours. When the bounded update runs, it will look back over the update window of 4 days from the current timestamp and reload that data into the cache.
In the scenario that a full update of the table that is currently cached needs to be restarted, the "Reload" icon in the cache menu can be clicked.
If a table is no longer needed on the Cache, it can be removed using the "Delete" button in the Cache menu.
To temporarily stop caching a table, the "Cache Enabled" can be turned off switch. This will stop any future caches of the table but keep the current cached version of the table stored. The cached version will remain available for querying.