Customers on the Business Plan and above have the option of having their data cached on the Periscope Cache.
When Periscope Data caches a table, all of the table's rows are copied into the cache. Three different cache population strategies are used to keep the cache's version of your data in sync with your databases. Many tables use multiple caching strategies in parallel on different schedules to maximize freshness and minimize differences.
This is the most basic and heavyweight option. A complete copy of the table is downloaded on a schedule and replaces the version in the cache. Every table starts with this strategy because at first the cache is empty and this is the most efficient way of transferring all of the rows. Full updates have high latency, but are necessary for tables that change throughout or for warehouses that get completely rebuilt as part of nightly processes.
For append-only tables, incremental updates are the obvious choice. As long as the table has a column with a monotonically increasing value (e.g. a numeric
created_at timestamp) it's easy to grab the newest rows every few minutes.
The Incremental Updates strategy is frequently paired with Full Updates for tables that are growing and changing at the same time. On a typical
users table, this combination of caching strategies gets new users into the cache very quickly, and occasionally updates all of the historical users in case their rows changed.
We grab the newest rows by comparing the maximum value of the import column between the cached and origin tables:
-- max value in the cache is 123456 select * from users where id > 123456;
The third strategy is for large tables that only update recent rows. This is especially useful when the table is too large to regularly do a Full Update, and is not truly append-only.
We see this most often with tables that hold objects soon to be changed by a background process, like email and transaction logs tables. These rows get marked as successful or failed a few minutes after they are created.
Note: Bounded Updates are only supported on tables with a timestamp import column and the bound must be in terms of the import column.
Every time it's scheduled, the Bounded Updates strategy replaces the most recent N days of rows. The Periscope Cache rows are replaced by rows in the customer database that have an import column value after the start of the bound. With the Bounded Update strategy, there is a short period of time where the rows have been removed from the Periscope Cache but have not been replaced with the updated rows.
For example, a configuration that fetches and replaces the last 3 days of rows every day using a
created_at import column. This strategy will first remove all rows in the periscope cache where
created_at is in the last 3 days. It will then import rows from the customer database where
created_at is greater than the largest value that exist on the cache.
With almost all caching strategies, Periscope Data grabs data in batches of 25,000 rows and re-queries the table until all rows have been acquired. To do this, we use an integer or timestamp import column that is unique, non-null, and monotonically increasing value.
Common columns that work well as import columns include
id columns that are auto-incrementing with new rows, and
created_at timestamps that are unique and indexed.
The Periscope Cache population strategies rely on these import columns to find new rows, and can miss rows if the expectations are not met.
New rows are expected to have larger values than old rows. Non-integer IDs and UUIDs usually do not meet this requirement. Thus, import columns must be:
Admin users can add new tables to the Periscope cache, as well as edit current cache settings and remove tables from the cache.
To start caching your tables, navigate to the database menu via the data-burger icon, and then switch to the Cache tab. If you do not see this tab, you may need to reach out to Periscope Support to configure your database for the Cache.
Open the caching menu for a table by clicking the row that corresponds to that table. Select the Caching Strategy from the dropdown that best fits the table as described in the strategies above.
After selecting a caching strategy, choose an import column from the dropdown. Click the question mark to find out more information on selecting the appropriate import column.
The Full Update of the cache can be set to run at an interval.
Alternatively, the Full Update can be scheduled to run at specific time in UTC every day.
If you have selected an "Incremental" or "Full and Incremental" Caching Strategy, you can set the Incremental Update interval.
Once a table has been configured for the Cache, click "Save" to start the caching process.
If you have selected the "Bounded Updates" Caching Strategy, you will need to select both the update interval and update window. The interval is the frequency at which you want the updates to occur and the window is how far back from the current time you want the rows refreshed.
In the above screenshot, we have set the annotations table to perform bounded updates every 24 hours. When the bounded update runs, it will look back over the update window of 4 days from the current timestamp and reload that data into the cache.
In the scenario that you need to restart a full update of the table that is currently cached, you can click the "Reload" button in the cache menu.
If you no longer want a table on the Cache, you can remove it using the "Delete" button in the Cache menu.
If you would like to temporarily stop caching a table, you can check the "Disabled" box. This will stop any future caches of the table but keep the current cached version of the table stored. The cached version will remain available for querying.