Database
- Overview
- Storing long-term statistics of an entity
- MariaDB
- Managing database size
- Accessing history not shown in the frontend
- References
Overview
Home Assistant stores history in four different ways:
States
The real time state1 (every reported state change) is stored in the database, in three separate tables4:
state_metadata: Stores theentity_idof an entity.state_attributes: Stores the attributes of a state.states: Stores the state itself, andFOREIGN KEYreferences tostate_attributesonattributes_idand tostate_metadataonmetadata_id.
This means that a state change that only changes the state of an entity will only update
the states table, and a state change that only changes the attributes of an entity will
only update the state_attributes table. A state change that changes both the attributes
and state of an entity will update all three tables.
Long term statistics
Home Assistant keeps both long-term and short-term statistics2, with the short-term statistics being downsampled to 5 minute intervals, while the long-term statistics are downsampled to hourly aggregates.
The long-term statistics are stored in three different tables4:
statistics_meta: Data source metadata, such asunit_of_measurementandname.statistics_short_term: The 5-minute aggregates of data in thestatestable.statistics: Hourly aggregates of the data instates_short_termtable.
Short-term statistics are purged along with states, but long-term statistics are never purged
from the database. Since they only store hourly aggregates, only there are only 24 rows per day.
Storing long-term statistics of an entity
For Home Assistant to store the long-term statistics of an entity_id, the entity must
have state_class set to one of the valid state classes5:
measurement: Measurements in current time, for example current energy consumption.total: The total amount that can both increase and decrease.total_increasing: Monotonically increasing value, that periodically restarts from 0, for example monthly energy consumption.
If state_class is set to measurement, then device_class must not be energy, gas
or monetary.
MariaDB
I'm using MariaDB for my database instead of the default SQLite file, which is
configured in the recorder integration6 as:
recorder:
db_url: |
mysql://${db_user}:@localhost/${db_name}?charset=utf8mb4&unix_socket=/run/mysqld/mysqld.sock
auto_purge: true
auto_repack: true
Connects as ${db_user} to the unix socket file in the standard location
/run/mysqld/mysqld.sock, using 4-byte UTF-8 Unicode encoding.
Managing database size
To get the total size of the database (in MiB):
SELECT table_schema AS 'DB',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'Size in MiB'
FROM information_schema.tables
WHERE table_schema='${db_name}';
The result is the size of data in the database schema ${db_name}, including
the size used for indices.
It's useful to define as a sensor in Home Assistant with the sql sensor
platform7, to get the database size as a sensor
entity in Home Assistant:
sql:
- name: mariadb_database_size
query: |
SELECT table_schema AS 'DB',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'value'
FROM information_schema.tables
WHERE table_schema='${db_name}';
# Column name from query result used as value for the sensor
column: "value"
device_class: data_size
unit_of_measurement: MiB
The name of column configuration value needs to match the name that the
returned column is SELECT'ed AS in the SQL statement.
Analyze usage
Get an overview of event_types stored in your database:
SELECT
COUNT(*) as cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
event_types.event_type
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt DESC;
Since events typically don't store as much data as states, these entries likely don't consume as much disk space. But they will give you insight into the size of your database.
Select entity_ids which have a large number of state changes recorded:
SELECT COUNT(*) AS count,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS 'count_pct',
states_meta.entity_id
FROM states
INNER JOIN states_meta
ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY count DESC
LIMIT 10;
That will return how many rows an entity_ids is using in the states table
in your database as cnt, representing state changes. The attributes of an
entity are also recorded on each state change, meaning that entities which
record a lot of data using attributes can bloat the states table.
To inspect how many bytes are used to store state changes per entity_id:
SELECT COUNT(state_id) AS cnt,
ROUND(SUM(LENGTH(state_attributes.shared_attrs)) / 1024 / 1024, 1) AS mbytes,
states_meta.entity_id
FROM states
LEFT JOIN state_attributes
ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta
ON (states.metadata_id=states_meta.metadata_id)
GROUP BY states.metadata_id
ORDER BY mbytes DESC
LIMIT 10;
This will return how much space in MiB is used for each entity_id to store
it's state and attributes as mbytes.
The JOIN in the statements aren't very efficient, so it will likely take some
time for the queries to return the result. Ajust the LIMIT if you want
a longer/shorter list.
Configuring recorder
Then you can use these results to configure the recorder integration6
to exclude entity_ids and events that are bloating your database.
recorder:
exclude:
# Exclude a whole domain
domains: []
# Exclude specific entity_ids
entities:
- sensor.time
- sensor.last_boot
# Use shell-like globs as a wildcard to exclude many
entity_globs:
- sensor.uptime_*
- sensor.esphome_radar_*_still_energy
- sensor.*_rx
- sensor.*_tx
# Exclude an event_type
event_types:
- call_service
Note that if you exclude an entity_id from being stored, it's history will
not be recorded so you won't have any graphs for that entity_id. If you
exclude an event_type, you won't have a history of those events.
Home Assistant will by default keep 10 days of short-term history in the
database, unless you disable auto_purge.
recorder:
auto_purge: true
purge_keep_days: 10
Anything older than purge_keep_days will still be recorded as downsampled
long-term statistics2, and are still visible in
the usual charts (just as a darker line). This does not apply for any excluded
entities though, as those are not recoreded at all.8
Note that it is no longer recommended to change purge_keep_days, as per
the release notes for 2023.12.09
where the improved history charts (that include the long-term data) were introduced.
Though it is not clear why that means the short-term retention period should be
changed.
By default auto_repack is enabled, and it is recommended to not disable it. It runs
every second sunday, after the auto_purge operation.
recorder:
auto_repack: true
Using MariaDB this will optimize or recreate the events and states tables and
can often greatly help keep their size under control. Note that this is a heavy
operation, but safe because MariaDB won't actually commit the changes until it
is finished. Instead it writes to temporary tables, and "swaps" them out for
the old tables when it's done.
Using recorder services to trim the database
The recoder.purge_entities service can be used to prune entities.
service: recoder.purge_entities
data:
domains: []
entity_id:
- sensor.last_boot
entity_globs:
- sensor.esphome_radar_*_still_energy
keep_days: 1
You can set keep_days to keep short-term statistic in the database for some
days if you don't want to completely remove it. This can be very useful in a
script or automation if you want to keep a few days of short-term history
for some entities instead of excluding them completely.
Similarly the recorder.purge service can execute a repacking-operation:
service: recorder.purge
data:
repack: true
If you have just pruned a large amount of entity_ids from your databse, a
repack operation can help free up some disk space. Keep in mind that this is
a heavy operation
Delete events from database
The recorder.purge_entities service does not delete entries from the events
table and there is no service to delete events, so we need to use SQL.
DELETE events
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
WHERE event_types.event_type = @event_type;
Removing multiple event_types is probably easiest to do with a simple shell
script:
#!/bin/bash
set -e
events="
call_service
ios.entered_background
ios.became_active
ios.became_active
imap_content
"
for item in $events; do
echo "Deleting event_type: '${item}'"
mariadb --verbose hass -e "
DELETE events
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
WHERE event_types.event_type = '${item}';
commit;
"
done
This can easily be run out of cron as needed.
Accessing history not shown in the frontend
There is a lot of data in Home Assistant that isn't possible to view with the frontend.
A lot of that is data that is stored in attributes, since it's not possible to
view the history or graphs of them. It's also only possible to see the current
location of a device_tracker or person_entity, but not a map of location
history. But the location history is stored in the database, since the coordinates
are stored as attributes.
SELECT states_meta.entity_id, state_attributes.shared_attrs, from_unixtime(states.last_updated_ts)
FROM states
LEFT JOIN states_meta
ON (states.metadata_id=states_meta.metadata_id)
LEFT JOIN state_attributes
ON (states.attributes_id=state_attributes.attributes_id)
WHERE last_changed_ts is NULL AND states_meta.entity_id = '${entityid}';
This willl sort and return attributes for $entity_id sorted by time.
References
Database - Home Assistant configuration, includes an incomplete schema definition.