Database
- Overview
- Storing long-term statistics of an entity
- MariaDB
- Managing database size
- Accessing history not shown in the frontend
- References
Overview
Home Assistant stores history in four different ways:
States
The real time state1 (every reported state change) is stored in the database, in three separate tables4:
state_metadata
: Stores theentity_id
of an entity.state_attributes
: Stores the attributes of a state.states
: Stores the state itself, andFOREIGN KEY
references tostate_attributes
onattributes_id
and tostate_metadata
onmetadata_id
.
This means that a state change that only changes the state
of an entity will only update
the states
table, and a state change that only changes the attributes
of an entity will
only update the state_attributes
table. A state change that changes both the attributes
and state
of an entity will update all three tables.
Long term statistics
Home Assistant keeps both long-term and short-term statistics2, with the short-term statistics being downsampled to 5 minute intervals, while the long-term statistics are downsampled to hourly aggregates.
The long-term statistics are stored in three different tables4:
statistics_meta
: Data source metadata, such asunit_of_measurement
andname
.statistics_short_term
: The 5-minute aggregates of data in thestates
table.statistics
: Hourly aggregates of the data instates_short_term
table.
Short-term statistics are purged along with states
, but long-term statistics are never purged
from the database. Since they only store hourly aggregates, only there are only 24 rows per day.
Storing long-term statistics of an entity
For Home Assistant to store the long-term statistics of an entity_id
, the entity must
have state_class
set to one of the valid state classes5:
measurement
: Measurements in current time, for example current energy consumption.total
: The total amount that can both increase and decrease.total_increasing
: Monotonically increasing value, that periodically restarts from 0, for example monthly energy consumption.
If state_class
is set to measurement
, then device_class
must not be energy
, gas
or monetary
.
MariaDB
I'm using MariaDB for my database instead of the default SQLite file, which is
configured in the recorder
integration6 as:
recorder:
db_url: |
mysql://${db_user}:@localhost/${db_name}?charset=utf8mb4&unix_socket=/run/mysqld/mysqld.sock
auto_purge: true
auto_repack: true
Connects as ${db_user}
to the unix socket file in the standard location
/run/mysqld/mysqld.sock
, using 4-byte UTF-8 Unicode encoding.
Managing database size
To get the total size of the database (in MiB
):
SELECT table_schema AS 'DB',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'Size in MiB'
FROM information_schema.tables
WHERE table_schema='${db_name}';
The result is the size of data in the database schema ${db_name}
, including
the size used for indices.
It's useful to define as a sensor in Home Assistant with the sql
sensor
platform7, to get the database size as a sensor
entity in Home Assistant:
sql:
- name: mariadb_database_size
query: |
SELECT table_schema AS 'DB',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS 'value'
FROM information_schema.tables
WHERE table_schema='${db_name}';
# Column name from query result used as value for the sensor
column: "value"
device_class: data_size
unit_of_measurement: MiB
The name of column
configuration value needs to match the name that the
returned column is SELECT
'ed AS
in the SQL statement.
Analyze usage
Get an overview of event_types
stored in your database:
SELECT
COUNT(*) as cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
event_types.event_type
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt DESC;
Since events typically don't store as much data as states, these entries likely don't consume as much disk space. But they will give you insight into the size of your database.
Select entity_id
s which have a large number of state changes recorded:
SELECT COUNT(*) AS count,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS 'count_pct',
states_meta.entity_id
FROM states
INNER JOIN states_meta
ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY count DESC
LIMIT 10;
That will return how many rows an entity_id
s is using in the states
table
in your database as cnt
, representing state changes. The attributes of an
entity are also recorded on each state change, meaning that entities which
record a lot of data using attributes can bloat the states
table.
To inspect how many bytes are used to store state changes per entity_id
:
SELECT COUNT(state_id) AS cnt,
ROUND(SUM(LENGTH(state_attributes.shared_attrs)) / 1024 / 1024, 1) AS mbytes,
states_meta.entity_id
FROM states
LEFT JOIN state_attributes
ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta
ON (states.metadata_id=states_meta.metadata_id)
GROUP BY states.metadata_id
ORDER BY mbytes DESC
LIMIT 10;
This will return how much space in MiB
is used for each entity_id
to store
it's state and attributes as mbytes
.
The JOIN
in the statements aren't very efficient, so it will likely take some
time for the queries to return the result. Ajust the LIMIT
if you want
a longer/shorter list.
Configuring recorder
Then you can use these results to configure the recorder
integration6
to exclude entity_id
s and events that are bloating your database.
recorder:
exclude:
# Exclude a whole domain
domains: []
# Exclude specific entity_ids
entities:
- sensor.time
- sensor.last_boot
# Use shell-like globs as a wildcard to exclude many
entity_globs:
- sensor.uptime_*
- sensor.esphome_radar_*_still_energy
- sensor.*_rx
- sensor.*_tx
# Exclude an event_type
event_types:
- call_service
Note that if you exclude an entity_id
from being stored, it's history will
not be recorded so you won't have any graphs for that entity_id
. If you
exclude an event_type
, you won't have a history of those events.
Home Assistant will by default keep 10 days of short-term history in the
database, unless you disable auto_purge
.
recorder:
auto_purge: true
purge_keep_days: 10
Anything older than purge_keep_days
will still be recorded as downsampled
long-term statistics2, and are still visible in
the usual charts (just as a darker line). This does not apply for any excluded
entities though, as those are not recoreded at all.8
Note that it is no longer recommended to change purge_keep_days
, as per
the release notes for 2023.12.09
where the improved history charts (that include the long-term data) were introduced.
Though it is not clear why that means the short-term retention period should be
changed.
By default auto_repack
is enabled, and it is recommended to not disable it. It runs
every second sunday, after the auto_purge
operation.
recorder:
auto_repack: true
Using MariaDB this will optimize or recreate the events
and states
tables and
can often greatly help keep their size under control. Note that this is a heavy
operation, but safe because MariaDB won't actually commit the changes until it
is finished. Instead it writes to temporary tables, and "swaps" them out for
the old tables when it's done.
Using recorder
services to trim the database
The recoder.purge_entities
service can be used to prune entities.
service: recoder.purge_entities
data:
domains: []
entity_id:
- sensor.last_boot
entity_globs:
- sensor.esphome_radar_*_still_energy
keep_days: 1
You can set keep_days
to keep short-term statistic in the database for some
days if you don't want to completely remove it. This can be very useful in a
script or automation if you want to keep a few days of short-term history
for some entities instead of excluding them completely.
Similarly the recorder.purge
service can execute a repacking-operation:
service: recorder.purge
data:
repack: true
If you have just pruned a large amount of entity_id
s from your databse, a
repack operation can help free up some disk space. Keep in mind that this is
a heavy operation
Delete events
from database
The recorder.purge_entities
service does not delete entries from the events
table and there is no service to delete events
, so we need to use SQL.
DELETE events
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
WHERE event_types.event_type = @event_type;
Removing multiple event_types
is probably easiest to do with a simple shell
script:
#!/bin/bash
set -e
events="
call_service
ios.entered_background
ios.became_active
ios.became_active
imap_content
"
for item in $events; do
echo "Deleting event_type: '${item}'"
mariadb --verbose hass -e "
DELETE events
FROM events
INNER JOIN event_types
ON events.event_type_id = event_types.event_type_id
WHERE event_types.event_type = '${item}';
commit;
"
done
This can easily be run out of cron as needed.
Accessing history not shown in the frontend
There is a lot of data in Home Assistant that isn't possible to view with the frontend.
A lot of that is data that is stored in attributes, since it's not possible to
view the history or graphs of them. It's also only possible to see the current
location of a device_tracker
or person_entity
, but not a map of location
history. But the location history is stored in the database, since the coordinates
are stored as attributes.
SELECT states_meta.entity_id, state_attributes.shared_attrs, from_unixtime(states.last_updated_ts)
FROM states
LEFT JOIN states_meta
ON (states.metadata_id=states_meta.metadata_id)
LEFT JOIN state_attributes
ON (states.attributes_id=state_attributes.attributes_id)
WHERE last_changed_ts is NULL AND states_meta.entity_id = '${entityid}';
This willl sort and return attributes for $entity_id
sorted by time.
References
Database - Home Assistant configuration, includes an incomplete schema definition.