This website was generated with AI. Questions, or spot something off? Reach out to Leonard on the TTPR Slack.

Data Guide

A field guide to the data

You'll work with two separate datasets. Most of this guide covers the honeypot attack logs — one row per attack. The final section covers a completely separate dataset, the ransomware victim list. For each, this explains what every column means and where it comes from.

2Datasets
~90Honeypot columns
640+Raw fields in JSONB
20+Honeypot sensors
Dataset 01

Honeypot attack logs

Every attack on our decoy servers, one per row · table events · this is the bulk of the guide, the next 13 sections.

Start here — three things to know

Read this before you write your first query. It will save you a lot of confusion about empty columns.

  1. The table is sparse on purpose. Each row comes from one honeypot, and every honeypot reports different things. A Cowrie SSH row has a username and password but no HTTP fields; a web row is the opposite. Most columns are NULL for any given row — that's normal, not missing data.
  2. Always start from honeypot_type. It tells you which honeypot made the event, and therefore which other columns will have values. Filter or group by it first.
  3. Nothing is ever lost. The full original record — all 640+ possible fields — is kept in the raw_event JSONB column. If a detail isn't in a typed column, reach into the raw with raw_event->>'fieldname'.
  4. Some fields describe our honeypot, not the attacker. Don't mistake them for attack data — see the box below.

Ignore these — they're us, not the attacker

Every event is the attacker connecting to our honeypot, so a handful of fields just describe our own server and the logging pipeline. They're the same on (almost) every row and carry no attacker information — skip them in analysis. The big trap is geoip_ext: it's the location of our server, a mirror of the GEO columns. The attacker's location is always the plain geoip / GEO columns.

dest_ip geoip_ext.* sensor host t-pot_hostname t-pot_ip_ext t-pot_ip_int @version
field_name
TYPESENSOR

How to read an entry: the monospace name is the exact column name, the grey chip is its SQL type, and a teal chip names the honeypot that populates it. A SENSITIVE chip flags real attacker data to handle with care.

The sensors

T-Pot bundles many honeypots, each emulating a different service. The value in honeypot_type tells you which one caught the event. Here are the ones you'll meet most, and the field groups they feed.

cowrie

Fake SSH & Telnet server. Records login attempts and the commands attackers type after "getting in."

AUTHDATA

dionaea

Lures malware over SMB, FTP, MSSQL and more, then captures the samples dropped on it.

CONNMAL

suricata

A network IDS watching all traffic to the honeypot. Emits signature-based alerts.

IDS

tanner / snare

Web-application honeypot that answers HTTP requests and logs what attackers probe for.

WEB

h0neytr4p

HTTP/HTTPS trap focused on exploit and scanner traffic.

WEB

heralding

Credential catcher spanning many protocols (FTP, SMTP, IMAP, …).

AUTH

honeytrap

Generic TCP/UDP trap that grabs payloads and downloads for unknown services.

MAL

sentrypeer

VoIP / SIP honeypot — catches phone-system fraud and recon.

VOIP

conpot

Industrial control systems (SCADA) — emulates PLCs and protocols like Modbus.

ICS

adbhoney

Android Debug Bridge trap — catches IoT/Android malware over ADB.

MISC

p0f · fatt

Passive fingerprinting. p0f guesses the attacker's OS; Fatt extracts TLS/SSH fingerprints.

FP

mailoney · ciscoasa

SMTP email trap and a Cisco ASA appliance trap, respectively.

DATAWEB
CORE

Identity & timing

Present on every row, whatever the honeypot. This is your backbone for counting, de-duplicating, and plotting over time.

id
bigserial

An auto-incrementing row number Postgres assigns. Unique per row, but just a handle — it has no security meaning.

es_id
text

The original Elasticsearch document ID from T-Pot. Used as the natural key so re-syncing never creates duplicates.

es_index
text

The daily index the event came from, e.g. logstash-2026.06.21. Effectively the calendar day of capture.

timestamp
timestamptz

When the event happened, parsed to a real UTC timestamp. Use this as your time axis for every chart.

timestamp_raw
text

The original timestamp string before parsing. Kept for reference; prefer timestamp for queries.

honeypot_type
text

Which honeypot generated the event (cowrie, dionaea, suricata, …). The single most important field — it decides which other columns are filled in.

sensor
text

Identifier of the honeypot host that captured the event. Useful only if you run more than one box.

NET

Network

Who connected, to which service, over what protocol. The "where from / to what" of every attack.

src_ip
text

The attacker's source IP address. Your primary "who" field — joins to all the GEO columns.

dest_ip
textour server

The honeypot's own IP that received the connection. This is us, not the attacker — same on every row from a given box. Ignore it for attacker analysis.

src_port
integer

The attacker's source port — usually random/ephemeral.

dest_port
integer

The port hit on the honeypot, e.g. 22 (SSH), 23 (Telnet), 445 (SMB). Tells you which service was targeted.

protocol
text

Application/network protocol label as reported by the honeypot.

proto
text

Lower-level transport (tcp/udp) as seen by some honeypots. Overlaps with protocol; check both.

ip_rep
text

Reputation label for the source IP (e.g. "known attacker", "mass scanner") from T-Pot's enrichment, when available.

GEO

Attacker location (GeoIP)

All derived from the attacker's src_ip (the raw geoip object) using MaxMind GeoIP. Treat as approximate — VPNs, proxies and cloud hosts distort the picture. Don't confuse these with geoip_ext in raw_event, which locates our server.

country_name
text

Full country name guessed from the IP. The most-used grouping field for "where are attacks from."

country_code2 · country_code3
text

ISO country codes — two-letter (US) and three-letter (USA). Handy for map visualizations.

city_name · region_name · region_code
text

City and state/province of the source IP. Often null for cloud/hosting IPs.

continent_code
text

Two-letter continent (EU, AS, NA…).

latitude · longitude
double

Approximate coordinates of the source IP. Plot these for an attack map.

postal_code · timezone
text

Postal code and IANA timezone of the location.

asn
bigint

Autonomous System Number — the ID of the network the attacker's IP belongs to.

as_org
text

Owner of that network (e.g. "DigitalOcean", "China Telecom"). Excellent for spotting hosting-provider scanners.

AUTH

Credentials & sessions

Mostly from Cowrie and Heralding. This is where credential-stuffing patterns live.

username
textcowrie · heralding

The username the attacker tried to log in with.

password
textcowrie · heraldingsensitive

The password the attacker tried. These are real credentials seen in the wild — analyze them, but never reuse them anywhere.

event_id
text

The honeypot's event code, e.g. cowrie.login.failed. Sub-classifies what happened within a session.

session · session_id
text

Identifiers that group all events from one connection. Different honeypots use one or the other.

uuid · auth_id
text

Unique event/session and authentication-attempt IDs, where the honeypot provides them.

DATA

Messages & payloads

The human-readable summary and the raw bytes an attacker sent.

message
text

A human-readable log line for the event. Often the fastest way to see "what happened" at a glance.

data
text

Raw payload or command data the attacker sent — shell commands, SMTP bodies, exploit strings. Can be long.

IDS

Intrusion detection (Suricata)

Suricata watches the network and fires an alert when traffic matches a known-attack signature. These columns describe that alert.

event_type
textsuricata

The Suricata record type: alert, flow, dns, http, tls

flow_id
bigint

Links all packets belonging to the same network flow.

alert_signature
text

The readable name of the rule that fired (e.g. "ET SCAN Nmap Scripting Engine").

alert_signature_id
bigint

The numeric rule ID (SID) of that signature.

alert_category
text

The rule's category (e.g. "Attempted Information Leak").

alert_severity
integer1 = worst

Severity ranking where 1 is most severe. Sort or filter on this to triage.

alert_action
text

What the IDS would do — allowed or blocked.

alert_gid · alert_rev
integer

Rule generator ID and revision number. Rule bookkeeping — rarely needed for analysis.

WEB

HTTP & web

From the web honeypots (Tanner, H0neytr4p, NGINX, Ciscoasa). Some fields appear twice under different names because honeypots disagree on naming — check both.

method · request_method
text

The HTTP method — GET, POST, etc.

url · request_uri
text

The path the attacker requested. Watch for exploit paths like /wp-login.php or /../../etc/passwd.

status
integer

The HTTP status code the honeypot returned.

http_user_agent · user_agent
text

The client's User-Agent string. Great for identifying bots and scanner tools.

http_referer · http_host
text

The Referer header and the Host the attacker asked for.

scheme · server_protocol
text

http/https and the protocol version (e.g. HTTP/1.1).

bytes_sent · body_bytes_sent
integer

Size of the response, total and body-only.

request_time
double

How long the request took to serve, in seconds.

ssl_protocol · ssl_cipher
text

The TLS version and cipher negotiated, for HTTPS requests.

request_id
text

A per-request identifier.

VOIP

VoIP / SIP (Sentrypeer)

Telephone-system attacks — toll fraud and SIP reconnaissance.

sip_method
textsentrypeer

The SIP request type — REGISTER, INVITE, OPTIONS

called_number
text

The phone number the attacker tried to dial. Premium-rate numbers here signal toll fraud.

transport_type · sip_user_agent
text

SIP transport (UDP/TCP/TLS) and the calling client's identifier.

collected_method · event_uuid
text

How the event was captured, and its unique SIP event ID.

CONN

Connections (Dionaea)

How Dionaea's emulated services handled an inbound connection.

connection_type
textdionaea

What happened to the connection — accept, connect, reject.

connection_transport
text

Transport used — tcp, udp, tls.

connection_protocol
text

The service Dionaea pretended to be smb, http, mssqld, ftp

src_hostname
text

Reverse-DNS hostname of the source, if it resolved.

MAL

Malware capture (Honeytrap)

Honeytrap accepts connections to unknown services and tries to capture whatever the attacker uploads.

operation_mode
integerhoneytrap

How Honeytrap handled the connection (its internal mode).

is_virtual
boolean

Whether a virtual/emulated service handled it rather than a real proxy.

download_count · download_tries
integer

How many payloads were captured, and how many download attempts were made. Non-zero counts mean a malware sample was pulled.

FP

Fingerprinting (p0f & Fatt)

Passive identification of the attacker's system and tooling, without sending anything back.

fatt_protocol
textfatt

Protocol fingerprinted by Fatt — extracts JA3 (TLS) and SSH fingerprints that identify client software.

mod
textp0f

The p0f module/event, e.g. syn or http request.

subject
text

What was fingerprinted — the client or the server side.

link · raw_mtu
text

p0f's guessed link type and observed MTU — clues to the attacker's network path and OS.

ICS

Industrial control (ConPot)

ConPot emulates industrial / SCADA equipment such as PLCs.

sensorid
textconpot

The ConPot sensor identifier.

data_type
text

The kind of ICS/SCADA data or protocol touched — e.g. modbus, s7comm.

MISC

Everything else

A few cross-cutting fields — including the one that holds all the data.

duration
doubleadbhoney

How long a session or command lasted, in seconds.

tags
text[]

An array of enrichment labels T-Pot attached to the event. It's a Postgres array — query with 'label' = ANY(tags).

raw_event
jsonball sensors

The complete original record — every one of the 640+ possible fields. When a typed column is null but you need the detail, it's in here: raw_event->>'fieldname'.

synced_at
timestamptz

When this row was loaded into Postgres. Housekeeping — not attack data. Don't confuse with timestamp.

Dataset 02

Ransomware victims

A completely separate dataset — nothing to do with the honeypot. Victims that ransomware gangs publicly name on their leak sites · table ransomlook_posts · one row per victim post.

TABLE

The ransomware table — ransomlook_posts

Victims posted on ransomware leak sites, collected from the public RansomLook feed. One row per victim post. This data does not come from our honeypots — it's an independent feed, included so you can study the ransomware ecosystem alongside the attack data.

group_name
text

The ransomware group claiming the victim (e.g. lockbit, akira, qilin). Your main grouping field.

post_title
text

The victim organisation as named on the leak site.

discovered
timestamptz

When RansomLook first saw the post. Use as your time axis here.

description
text

Free text from the post — often empty.

link · screen · magnet
text

Link to the post, path to a screenshot, and a torrent magnet for leaked data (when provided). link is often null or duplicated — don't use it as a key.

id
text

A stable hash of group + title + discovered. The unique key that makes re-syncing idempotent.

TABLE

Sync bookkeeping — sync_state

Not attack data. One row per daily index, tracking what's been pulled so a sync can stop and resume safely. You'll rarely query it: columns are es_index, total_docs, synced_docs, started_at, completed_at, and status (in_progress / complete / error).