Quantcast
Channel: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index) - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 2

JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

$
0
0

I have a database which stores receiver to indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where all column data is the same with the exception of the receiver column. While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver.

Current table definition:

CREATE TABLE public.actions (    global_sequence bigint NOT NULL DEFAULT nextval('actions_global_sequence_seq'::regclass),    time timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,    receiver text NOT NULL,    tx_id text NOT NULL,    block_num integer NOT NULL,    contract text NOT NULL,    action text NOT NULL,    data jsonb NOT NULL);

Indexes:

  • "actions_pkey" PRIMARY KEY, btree (global_sequence, time)
  • "actions_time_idx" btree (time DESC)
  • "receiver_idx" btree (receiver)

Field details:

  • Global sequence is a serially incrementing ID
  • Block number and time are not unique, but also incrementing
  • Global sequence and time are primary key, as the data is internally partitioned by time
    • There are some receivers that have over 1 billion associated actions (each with a unique global_sequence).
  • Average text lengths:
    • Receiver: 12
    • tx_id: 52
    • contract: 12
    • action: 6
    • data: small-medium sized JSONB with action metadata

Cardinality of 3 schema options:

  • Current: sitting at 4.2 billion rows in this table
  • Receiver as array: Would be at approximately 1.8 billion rows
  • Normalized: There would be 3 tables:
    • Actions: 1.8 billion rows
    • Actions_Accounts: 4.2 billion rows
    • Accounts: 500 000 rows

Common Query:

  • SELECT * FROM actions WHERE receiver = 'Alpha' ORDER BY time DESC LIMIT 100

All columns are required in the query. NULL values are not seen. I believe joins in the normalized schema would slow down & query speed is #1 priority)


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images