An auto-incrementing row number Postgres assigns. Unique per row, but just a handle — it has no security meaning.
Data Guide
A field guide to the data
You'll work with two separate datasets. Most of this guide covers the honeypot attack logs — one row per attack. The final section covers a completely separate dataset, the ransomware victim list. For each, this explains what every column means and where it comes from.
Honeypot attack logs
Start here — three things to know
Read this before you write your first query. It will save you a lot of confusion about empty columns.
- The table is sparse on purpose. Each row comes from one honeypot, and every honeypot reports different things. A Cowrie SSH row has a
usernameandpasswordbut no HTTP fields; a web row is the opposite. Most columns areNULLfor any given row — that's normal, not missing data. - Always start from
honeypot_type. It tells you which honeypot made the event, and therefore which other columns will have values. Filter or group by it first. - Nothing is ever lost. The full original record — all 640+ possible fields — is kept in the
raw_eventJSONB column. If a detail isn't in a typed column, reach into the raw withraw_event->>'fieldname'. - Some fields describe our honeypot, not the attacker. Don't mistake them for attack data — see the box below.
Ignore these — they're us, not the attacker
Every event is the attacker connecting to our honeypot, so a handful of fields just describe our own server and the logging pipeline. They're the same on (almost) every row and carry no attacker information — skip them in analysis. The big trap is geoip_ext: it's the location of our server, a mirror of the GEO columns. The attacker's location is always the plain geoip / GEO columns.
How to read an entry: the monospace name is the exact column name, the grey chip is its SQL type, and a teal chip names the honeypot that populates it. A SENSITIVE chip flags real attacker data to handle with care.
The sensors
T-Pot bundles many honeypots, each emulating a different service. The value in honeypot_type tells you which one caught the event. Here are the ones you'll meet most, and the field groups they feed.
cowrie
Fake SSH & Telnet server. Records login attempts and the commands attackers type after "getting in."
dionaea
Lures malware over SMB, FTP, MSSQL and more, then captures the samples dropped on it.
suricata
A network IDS watching all traffic to the honeypot. Emits signature-based alerts.
tanner / snare
Web-application honeypot that answers HTTP requests and logs what attackers probe for.
h0neytr4p
HTTP/HTTPS trap focused on exploit and scanner traffic.
heralding
Credential catcher spanning many protocols (FTP, SMTP, IMAP, …).
honeytrap
Generic TCP/UDP trap that grabs payloads and downloads for unknown services.
sentrypeer
VoIP / SIP honeypot — catches phone-system fraud and recon.
conpot
Industrial control systems (SCADA) — emulates PLCs and protocols like Modbus.
adbhoney
Android Debug Bridge trap — catches IoT/Android malware over ADB.
p0f · fatt
Passive fingerprinting. p0f guesses the attacker's OS; Fatt extracts TLS/SSH fingerprints.
mailoney · ciscoasa
SMTP email trap and a Cisco ASA appliance trap, respectively.
Identity & timing
Present on every row, whatever the honeypot. This is your backbone for counting, de-duplicating, and plotting over time.
The original Elasticsearch document ID from T-Pot. Used as the natural key so re-syncing never creates duplicates.
The daily index the event came from, e.g. logstash-2026.06.21. Effectively the calendar day of capture.
When the event happened, parsed to a real UTC timestamp. Use this as your time axis for every chart.
The original timestamp string before parsing. Kept for reference; prefer timestamp for queries.
Which honeypot generated the event (cowrie, dionaea, suricata, …). The single most important field — it decides which other columns are filled in.
Identifier of the honeypot host that captured the event. Useful only if you run more than one box.
Network
Who connected, to which service, over what protocol. The "where from / to what" of every attack.
The attacker's source IP address. Your primary "who" field — joins to all the GEO columns.
The honeypot's own IP that received the connection. This is us, not the attacker — same on every row from a given box. Ignore it for attacker analysis.
The attacker's source port — usually random/ephemeral.
The port hit on the honeypot, e.g. 22 (SSH), 23 (Telnet), 445 (SMB). Tells you which service was targeted.
Application/network protocol label as reported by the honeypot.
Lower-level transport (tcp/udp) as seen by some honeypots. Overlaps with protocol; check both.
Reputation label for the source IP (e.g. "known attacker", "mass scanner") from T-Pot's enrichment, when available.
Attacker location (GeoIP)
All derived from the attacker's src_ip (the raw geoip object) using MaxMind GeoIP. Treat as approximate — VPNs, proxies and cloud hosts distort the picture. Don't confuse these with geoip_ext in raw_event, which locates our server.
Full country name guessed from the IP. The most-used grouping field for "where are attacks from."
ISO country codes — two-letter (US) and three-letter (USA). Handy for map visualizations.
City and state/province of the source IP. Often null for cloud/hosting IPs.
Two-letter continent (EU, AS, NA…).
Approximate coordinates of the source IP. Plot these for an attack map.
Postal code and IANA timezone of the location.
Autonomous System Number — the ID of the network the attacker's IP belongs to.
Owner of that network (e.g. "DigitalOcean", "China Telecom"). Excellent for spotting hosting-provider scanners.
Credentials & sessions
Mostly from Cowrie and Heralding. This is where credential-stuffing patterns live.
The username the attacker tried to log in with.
The password the attacker tried. These are real credentials seen in the wild — analyze them, but never reuse them anywhere.
The honeypot's event code, e.g. cowrie.login.failed. Sub-classifies what happened within a session.
Identifiers that group all events from one connection. Different honeypots use one or the other.
Unique event/session and authentication-attempt IDs, where the honeypot provides them.
Messages & payloads
The human-readable summary and the raw bytes an attacker sent.
A human-readable log line for the event. Often the fastest way to see "what happened" at a glance.
Raw payload or command data the attacker sent — shell commands, SMTP bodies, exploit strings. Can be long.
Intrusion detection (Suricata)
Suricata watches the network and fires an alert when traffic matches a known-attack signature. These columns describe that alert.
The Suricata record type: alert, flow, dns, http, tls…
Links all packets belonging to the same network flow.
The readable name of the rule that fired (e.g. "ET SCAN Nmap Scripting Engine").
The numeric rule ID (SID) of that signature.
The rule's category (e.g. "Attempted Information Leak").
Severity ranking where 1 is most severe. Sort or filter on this to triage.
What the IDS would do — allowed or blocked.
Rule generator ID and revision number. Rule bookkeeping — rarely needed for analysis.
HTTP & web
From the web honeypots (Tanner, H0neytr4p, NGINX, Ciscoasa). Some fields appear twice under different names because honeypots disagree on naming — check both.
The HTTP method — GET, POST, etc.
The path the attacker requested. Watch for exploit paths like /wp-login.php or /../../etc/passwd.
The HTTP status code the honeypot returned.
The client's User-Agent string. Great for identifying bots and scanner tools.
The Referer header and the Host the attacker asked for.
http/https and the protocol version (e.g. HTTP/1.1).
Size of the response, total and body-only.
How long the request took to serve, in seconds.
The TLS version and cipher negotiated, for HTTPS requests.
A per-request identifier.
VoIP / SIP (Sentrypeer)
Telephone-system attacks — toll fraud and SIP reconnaissance.
The SIP request type — REGISTER, INVITE, OPTIONS…
The phone number the attacker tried to dial. Premium-rate numbers here signal toll fraud.
SIP transport (UDP/TCP/TLS) and the calling client's identifier.
How the event was captured, and its unique SIP event ID.
Connections (Dionaea)
How Dionaea's emulated services handled an inbound connection.
What happened to the connection — accept, connect, reject.
Transport used — tcp, udp, tls.
The service Dionaea pretended to be — smb, http, mssqld, ftp…
Reverse-DNS hostname of the source, if it resolved.
Malware capture (Honeytrap)
Honeytrap accepts connections to unknown services and tries to capture whatever the attacker uploads.
How Honeytrap handled the connection (its internal mode).
Whether a virtual/emulated service handled it rather than a real proxy.
How many payloads were captured, and how many download attempts were made. Non-zero counts mean a malware sample was pulled.
Fingerprinting (p0f & Fatt)
Passive identification of the attacker's system and tooling, without sending anything back.
Protocol fingerprinted by Fatt — extracts JA3 (TLS) and SSH fingerprints that identify client software.
The p0f module/event, e.g. syn or http request.
What was fingerprinted — the client or the server side.
p0f's guessed link type and observed MTU — clues to the attacker's network path and OS.
Industrial control (ConPot)
ConPot emulates industrial / SCADA equipment such as PLCs.
The ConPot sensor identifier.
The kind of ICS/SCADA data or protocol touched — e.g. modbus, s7comm.
Everything else
A few cross-cutting fields — including the one that holds all the data.
How long a session or command lasted, in seconds.
An array of enrichment labels T-Pot attached to the event. It's a Postgres array — query with 'label' = ANY(tags).
The complete original record — every one of the 640+ possible fields. When a typed column is null but you need the detail, it's in here: raw_event->>'fieldname'.
When this row was loaded into Postgres. Housekeeping — not attack data. Don't confuse with timestamp.
Ransomware victims
The ransomware table — ransomlook_posts
Victims posted on ransomware leak sites, collected from the public RansomLook feed. One row per victim post. This data does not come from our honeypots — it's an independent feed, included so you can study the ransomware ecosystem alongside the attack data.
The ransomware group claiming the victim (e.g. lockbit, akira, qilin). Your main grouping field.
The victim organisation as named on the leak site.
When RansomLook first saw the post. Use as your time axis here.
Free text from the post — often empty.
Link to the post, path to a screenshot, and a torrent magnet for leaked data (when provided). link is often null or duplicated — don't use it as a key.
A stable hash of group + title + discovered. The unique key that makes re-syncing idempotent.
Sync bookkeeping — sync_state
Not attack data. One row per daily index, tracking what's been pulled so a sync can stop and resume safely. You'll rarely query it: columns are es_index, total_docs, synced_docs, started_at, completed_at, and status (in_progress / complete / error).