Data Guide

A field guide to the data

You'll work with two separate datasets. Most of this guide covers the honeypot attack logs — one row per attack. The final section covers a completely separate dataset, the ransomware victim list. For each, this explains what every column means and where it comes from.

2Datasets

~90Honeypot columns

640+Raw fields in JSONB

20+Honeypot sensors

Dataset 01

Honeypot attack logs

Every attack on our decoy servers, one per row · table events · this is the bulk of the guide, the next 13 sections.

Start here — three things to know

Read this before you write your first query. It will save you a lot of confusion about empty columns.

The table is sparse on purpose. Each row comes from one honeypot, and every honeypot reports different things. A Cowrie SSH row has a username and password but no HTTP fields; a web row is the opposite. Most columns are NULL for any given row — that's normal, not missing data.
Always start from honeypot_type. It tells you which honeypot made the event, and therefore which other columns will have values. Filter or group by it first.
Nothing is ever lost. The full original record — all 640+ possible fields — is kept in the raw_event JSONB column. If a detail isn't in a typed column, reach into the raw with raw_event->>'fieldname'.
Some fields describe our honeypot, not the attacker. Don't mistake them for attack data — see the box below.

Ignore these — they're us, not the attacker

Every event is the attacker connecting to our honeypot, so a handful of fields just describe our own server and the logging pipeline. They're the same on (almost) every row and carry no attacker information — skip them in analysis. The big trap is geoip_ext: it's the location of our server, a mirror of the GEO columns. The attacker's location is always the plain geoip / GEO columns.

dest_ip geoip_ext.* sensor host t-pot_hostname t-pot_ip_ext t-pot_ip_int @version

field_name

TYPESENSOR

How to read an entry: the monospace name is the exact column name, the grey chip is its SQL type, and a teal chip names the honeypot that populates it. A SENSITIVE chip flags real attacker data to handle with care.

The sensors

T-Pot bundles many honeypots, each emulating a different service. The value in honeypot_type tells you which one caught the event. Here are the ones you'll meet most, and the field groups they feed.

cowrie

Fake SSH & Telnet server. Records login attempts and the commands attackers type after "getting in."

AUTHDATA

dionaea

Lures malware over SMB, FTP, MSSQL and more, then captures the samples dropped on it.

CONNMAL

suricata

A network IDS watching all traffic to the honeypot. Emits signature-based alerts.

IDS

tanner / snare

Web-application honeypot that answers HTTP requests and logs what attackers probe for.

WEB

h0neytr4p

HTTP/HTTPS trap focused on exploit and scanner traffic.

WEB

heralding

Credential catcher spanning many protocols (FTP, SMTP, IMAP, …).

AUTH

honeytrap

Generic TCP/UDP trap that grabs payloads and downloads for unknown services.

MAL

sentrypeer

VoIP / SIP honeypot — catches phone-system fraud and recon.

VOIP

conpot

Industrial control systems (SCADA) — emulates PLCs and protocols like Modbus.

ICS

adbhoney

Android Debug Bridge trap — catches IoT/Android malware over ADB.

MISC

p0f · fatt

Passive fingerprinting. p0f guesses the attacker's OS; Fatt extracts TLS/SSH fingerprints.

mailoney · ciscoasa

SMTP email trap and a Cisco ASA appliance trap, respectively.

DATAWEB

CORE

Identity & timing

Present on every row, whatever the honeypot. This is your backbone for counting, de-duplicating, and plotting over time.

bigserial

An auto-incrementing row number Postgres assigns. Unique per row, but just a handle — it has no security meaning.

es_id

text

The original Elasticsearch document ID from T-Pot. Used as the natural key so re-syncing never creates duplicates.

es_index

text

The daily index the event came from, e.g. logstash-2026.06.21. Effectively the calendar day of capture.

timestamp

timestamptz

When the event happened, parsed to a real UTC timestamp. Use this as your time axis for every chart.

timestamp_raw

text

The original timestamp string before parsing. Kept for reference; prefer timestamp for queries.

honeypot_type

text

Which honeypot generated the event (cowrie, dionaea, suricata, …). The single most important field — it decides which other columns are filled in.

sensor

text

Identifier of the honeypot host that captured the event. Useful only if you run more than one box.

NET

Network

Who connected, to which service, over what protocol. The "where from / to what" of every attack.

src_ip

text

The attacker's source IP address. Your primary "who" field — joins to all the GEO columns.

dest_ip

textour server

The honeypot's own IP that received the connection. This is us, not the attacker — same on every row from a given box. Ignore it for attacker analysis.

src_port

integer

The attacker's source port — usually random/ephemeral.

dest_port

integer

The port hit on the honeypot, e.g. 22 (SSH), 23 (Telnet), 445 (SMB). Tells you which service was targeted.

protocol

text

Application/network protocol label as reported by the honeypot.

proto

text

Lower-level transport (tcp/udp) as seen by some honeypots. Overlaps with protocol; check both.

ip_rep

text

Reputation label for the source IP (e.g. "known attacker", "mass scanner") from T-Pot's enrichment, when available.

GEO

Attacker location (GeoIP)

All derived from the attacker's src_ip (the raw geoip object) using MaxMind GeoIP. Treat as approximate — VPNs, proxies and cloud hosts distort the picture. Don't confuse these with geoip_ext in raw_event, which locates our server.

country_name

text

Full country name guessed from the IP. The most-used grouping field for "where are attacks from."

country_code2 · country_code3

text

ISO country codes — two-letter (US) and three-letter (USA). Handy for map visualizations.

city_name · region_name · region_code

text

City and state/province of the source IP. Often null for cloud/hosting IPs.

continent_code

text

Two-letter continent (EU, AS, NA…).

latitude · longitude

double

Approximate coordinates of the source IP. Plot these for an attack map.

postal_code · timezone

text

Postal code and IANA timezone of the location.

asn

bigint

Autonomous System Number — the ID of the network the attacker's IP belongs to.

as_org

text

Owner of that network (e.g. "DigitalOcean", "China Telecom"). Excellent for spotting hosting-provider scanners.

AUTH

Credentials & sessions

Mostly from Cowrie and Heralding. This is where credential-stuffing patterns live.

username

textcowrie · heralding

The username the attacker tried to log in with.

password

textcowrie · heraldingsensitive

The password the attacker tried. These are real credentials seen in the wild — analyze them, but never reuse them anywhere.

event_id

text

The honeypot's event code, e.g. cowrie.login.failed. Sub-classifies what happened within a session.

session · session_id

text

Identifiers that group all events from one connection. Different honeypots use one or the other.

uuid · auth_id

text

Unique event/session and authentication-attempt IDs, where the honeypot provides them.

DATA

Messages & payloads

The human-readable summary and the raw bytes an attacker sent.

message

text

A human-readable log line for the event. Often the fastest way to see "what happened" at a glance.

data

text

Raw payload or command data the attacker sent — shell commands, SMTP bodies, exploit strings. Can be long.

IDS

Intrusion detection (Suricata)

Suricata watches the network and fires an alert when traffic matches a known-attack signature. These columns describe that alert.

event_type

textsuricata

The Suricata record type: alert, flow, dns, http, tls…

flow_id

bigint

Links all packets belonging to the same network flow.

alert_signature

text

The readable name of the rule that fired (e.g. "ET SCAN Nmap Scripting Engine").

alert_signature_id

bigint

The numeric rule ID (SID) of that signature.

alert_category

text

The rule's category (e.g. "Attempted Information Leak").

alert_severity

integer1 = worst

Severity ranking where 1 is most severe. Sort or filter on this to triage.

alert_action

text

What the IDS would do — allowed or blocked.

alert_gid · alert_rev

integer

Rule generator ID and revision number. Rule bookkeeping — rarely needed for analysis.

WEB

HTTP & web

From the web honeypots (Tanner, H0neytr4p, NGINX, Ciscoasa). Some fields appear twice under different names because honeypots disagree on naming — check both.

method · request_method

text

The HTTP method — GET, POST, etc.

url · request_uri

text

The path the attacker requested. Watch for exploit paths like /wp-login.php or /../../etc/passwd.

status

integer

The HTTP status code the honeypot returned.

http_user_agent · user_agent

text

The client's User-Agent string. Great for identifying bots and scanner tools.

http_referer · http_host

text

The Referer header and the Host the attacker asked for.

scheme · server_protocol

text

http/https and the protocol version (e.g. HTTP/1.1).

bytes_sent · body_bytes_sent

integer

Size of the response, total and body-only.

request_time

double

How long the request took to serve, in seconds.

ssl_protocol · ssl_cipher

text

The TLS version and cipher negotiated, for HTTPS requests.

request_id

text

A per-request identifier.

VOIP

VoIP / SIP (Sentrypeer)

Telephone-system attacks — toll fraud and SIP reconnaissance.

sip_method

textsentrypeer

The SIP request type — REGISTER, INVITE, OPTIONS…

called_number

text

The phone number the attacker tried to dial. Premium-rate numbers here signal toll fraud.

transport_type · sip_user_agent

text

SIP transport (UDP/TCP/TLS) and the calling client's identifier.

collected_method · event_uuid

text

How the event was captured, and its unique SIP event ID.

CONN

Connections (Dionaea)

How Dionaea's emulated services handled an inbound connection.

connection_type

textdionaea

What happened to the connection — accept, connect, reject.

connection_transport

text

Transport used — tcp, udp, tls.

connection_protocol

text

The service Dionaea pretended to be — smb, http, mssqld, ftp…

src_hostname

text

Reverse-DNS hostname of the source, if it resolved.

MAL

Malware capture (Honeytrap)

Honeytrap accepts connections to unknown services and tries to capture whatever the attacker uploads.

operation_mode

integerhoneytrap

How Honeytrap handled the connection (its internal mode).

is_virtual

boolean

Whether a virtual/emulated service handled it rather than a real proxy.

download_count · download_tries

integer

How many payloads were captured, and how many download attempts were made. Non-zero counts mean a malware sample was pulled.

Fingerprinting (p0f & Fatt)

Passive identification of the attacker's system and tooling, without sending anything back.

fatt_protocol

textfatt

Protocol fingerprinted by Fatt — extracts JA3 (TLS) and SSH fingerprints that identify client software.

mod

textp0f

The p0f module/event, e.g. syn or http request.

subject

text

What was fingerprinted — the client or the server side.

link · raw_mtu

text

p0f's guessed link type and observed MTU — clues to the attacker's network path and OS.

ICS

Industrial control (ConPot)

ConPot emulates industrial / SCADA equipment such as PLCs.

sensorid

textconpot

The ConPot sensor identifier.

data_type

text

The kind of ICS/SCADA data or protocol touched — e.g. modbus, s7comm.

MISC

Everything else

A few cross-cutting fields — including the one that holds all the data.

duration

doubleadbhoney

How long a session or command lasted, in seconds.

Ransomware victims

A completely separate dataset — nothing to do with the honeypot. Victims that ransomware gangs publicly name on their leak sites · table ransomlook_posts · one row per victim post.

TABLE

The ransomware table — `ransomlook_posts`

Victims posted on ransomware leak sites, collected from the public RansomLook feed. One row per victim post. This data does not come from our honeypots — it's an independent feed, included so you can study the ransomware ecosystem alongside the attack data.

group_name

text

The ransomware group claiming the victim (e.g. lockbit, akira, qilin). Your main grouping field.

post_title

text

The victim organisation as named on the leak site.

discovered

timestamptz

When RansomLook first saw the post. Use as your time axis here.

description

text

Free text from the post — often empty.

link · screen · magnet

text

Link to the post, path to a screenshot, and a torrent magnet for leaked data (when provided). link is often null or duplicated — don't use it as a key.

text

A stable hash of group + title + discovered. The unique key that makes re-syncing idempotent.

TABLE

Sync bookkeeping — `sync_state`

Not attack data. One row per daily index, tracking what's been pulled so a sync can stop and resume safely. You'll rarely query it: columns are es_index, total_docs, synced_docs, started_at, completed_at, and status (in_progress / complete / error).

Honeypot attack logs

Start here — three things to know

Ignore these — they're us, not the attacker

The sensors

cowrie

dionaea

suricata

tanner / snare

h0neytr4p

heralding

honeytrap

sentrypeer

conpot

adbhoney

p0f · fatt

mailoney · ciscoasa

Identity & timing

Network

Attacker location (GeoIP)

Credentials & sessions

Messages & payloads

Intrusion detection (Suricata)

HTTP & web

VoIP / SIP (Sentrypeer)

Connections (Dionaea)

Malware capture (Honeytrap)

Fingerprinting (p0f & Fatt)

Industrial control (ConPot)

Everything else

Ransomware victims

The ransomware table — ransomlook_posts

Sync bookkeeping — sync_state

The ransomware table — `ransomlook_posts`

Sync bookkeeping — `sync_state`