Inspiration
Don't know where to start? Pick one thread and pull.
A strong project answers one specific, data-grounded question well — and the best ones combine two angles to find a why, not just a what. Each starting point maps to fields in the Data Guide, suggests ways to go deeper, and comes with a query to get you moving. They're grouped below by which dataset they use.
Tools: Metabase is already set up and pointed at the database. You can also export results to CSV and work in Python, Excel, or Tableau — whatever your team is comfortable with.
Honeypot attack data
table: eventsGEO × TIME
Follow the sun
Do attackers from different regions strike at different hours? Combine where an attack comes from with when it lands. Round-the-clock activity points to botnets; a 9-to-5 rhythm hints at humans in a specific time zone.
Take it further
- Map active hours per country — which sources look automated vs. hands-on-keyboard?
- Does a country's rhythm shift over the weeks, or stay constant?
- Do the busiest sources also run 24/7, or just the long-tail ones?
country_nametimestampsrc_ip
-- Attack volume by country and hour of day
SELECT country_name,
date_part('hour', timestamp) AS hour,
count(*) AS events
FROM events
WHERE country_name IS NOT NULL
GROUP BY 1, 2
ORDER BY 1, 2;
AUTH
The password playbook
What's the strategy behind the login attempts? Top passwords are just the start — the interesting part is the pattern: pairings, reuse, and what they reveal about the attacker's source list.
Take it further
- Which username + password pairs repeat (admin/admin, root/123456)?
- Per source IP: many unique passwords (spraying) or the same one everywhere (a shared botnet list)?
- Do the credentials look like vendor defaults for specific devices — cameras, routers, databases?
usernamepasswordsrc_ip
-- The most-used username + password combinations
SELECT username, password, count(*) AS tries
FROM events
WHERE password IS NOT NULL
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 30;
SESSION
What happens after they ‘get in’?
Once an attacker thinks they've logged in, what do they actually do? Some honeypots let attackers ‘succeed’ and record the commands they type. The data and message fields hold the play-by-play.
Take it further
- What's the very first command most attackers run?
- How many immediately try to download a payload (look for
wget, curl)? - Follow a single
session from start to finish and narrate the attack as a story.
datamessagesessionduration
-- Sessions where the attacker tried to download something
SELECT timestamp, src_ip, data
FROM events
WHERE data ILIKE '%wget%' OR data ILIKE '%curl%'
ORDER BY timestamp DESC
LIMIT 50;
NET × IDS
Scanner or sniper?
Can you tell a mass scanner apart from a targeted attacker? Scanners spray many ports shallowly; focused attackers hammer one or two. Cross-reference how many ports each IP touches with how severe its IDS alerts are.
Take it further
- Rank IPs by distinct ports probed — the spray-and-pray crowd.
- Find IPs with few ports but high-severity alerts — the ones actually worth worrying about.
- Do certain countries or networks specialize in one style over the other?
src_ipdest_portalert_severity
-- Per attacker: ports touched, volume, and worst alert
SELECT src_ip,
count(DISTINCT dest_port) AS ports_probed,
count(*) AS events,
min(alert_severity) AS worst_severity
FROM events
GROUP BY 1
ORDER BY ports_probed DESC
LIMIT 30;
TIME
Spikes and storms
What happened on the busiest day? Plot daily volume, find the surge, then explain it. Anomalies are where the stories hide.
Take it further
- When a spike hits, what changed — a new country, a new signature, a new port?
- Do spikes line up with newly disclosed vulnerabilities (CVEs) in the news?
- Is there a weekly rhythm — quieter weekends, busier weekdays?
timestampalert_signaturecountry_namedest_port
-- Daily attack volume -- find the spikes, then drill into one day
SELECT date_trunc('day', timestamp) AS day, count(*) AS events
FROM events
GROUP BY 1
ORDER BY 1;
-- then: add WHERE timestamp::date = 'YYYY-MM-DD' and GROUP BY alert_signature
FP
Fingerprint the toolkit
Which tools are doing the scanning? User-agents and fingerprints give the software away. Some are legitimate research scanners (Shodan, Censys); others are botnet tooling.
Take it further
- What share is known research scanners vs. unknown probes?
- Do specific tools favor specific ports or services?
- Can you separate benign crawlers from genuinely malicious clients?
http_user_agentfatt_protocoldest_port
-- Group scanner traffic into tool families
SELECT CASE
WHEN http_user_agent ILIKE '%zgrab%' THEN 'zgrab'
WHEN http_user_agent ILIKE '%masscan%' THEN 'masscan'
WHEN http_user_agent ILIKE '%censys%' THEN 'censys'
WHEN http_user_agent ILIKE '%shodan%' THEN 'shodan'
ELSE 'other / browser'
END AS tool,
count(*) AS hits
FROM events
WHERE http_user_agent IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC;
Ransomware victim data
table: ransomlook_postsRANSOM × TIME
Rise and fall of the gangs
How do ransomware groups grow, peak, and fade? Each row is a named victim. Tracked across months, the data shows the lifecycle of the whole ecosystem.
Take it further
- Which groups are growing vs. shrinking month over month?
- Spot newcomers that first appeared in the last 60 days.
- Is there seasonality — quieter holidays, end-of-quarter pushes?
group_namediscovered
-- Month-over-month victim counts for each group
SELECT group_name,
date_trunc('month', discovered) AS month,
count(*) AS victims
FROM ransomlook_posts
GROUP BY 1, 2
ORDER BY 2 DESC, 3 DESC;
RANSOM
Who gets targeted?
Which kinds of organizations end up on the leak sites? Victim names hint at sector and geography. A little keyword digging turns raw titles into industries.
Take it further
- Search titles for sectors — health, schools, law, manufacturing.
- Do specific groups specialize in specific industries?
- Any geographic skew you can infer from names or domain endings?
post_titlegroup_name
-- A rough sector read from victim names (healthcare example)
SELECT group_name, count(*) AS victims
FROM ransomlook_posts
WHERE post_title ILIKE '%health%'
OR post_title ILIKE '%hospital%'
OR post_title ILIKE '%clinic%'
GROUP BY 1
ORDER BY 2 DESC;
RANSOM
Profile a single ransomware gang
Pick one group and tell its whole story. Choose an active gang — LockBit, Akira, Qilin, Play, RansomHub — and build a complete profile from the data, then back it with open-source research (news, security blogs, advisories) into a publishable report on that group.
Take it further
- Their activity over time — when did they appear, peak, or go quiet?
- Their targets — which industries, regions, and sizes of organization show up in the victim names?
- Their frequency & cadence — victims per week or month; a steady drip or sudden bursts?
- Pair the numbers with outside research to explain the why, and write it up as a group profile.
group_namediscoveredpost_titledescription
-- First, find exact group names: SELECT DISTINCT group_name FROM ransomlook_posts ORDER BY 1;
-- Then profile one gang's cadence, month by month
SELECT date_trunc('month', discovered) AS month, count(*) AS victims
FROM ransomlook_posts
WHERE group_name = 'akira'
GROUP BY 1
ORDER BY 1;
Combining both datasets
events + ransomlook_postsDEEP
Two windows on one threat
Do the honeypot and the ransomware data tell the same story? The most ambitious projects connect the datasets. Put both on one timeline and look for relationships — this is where original findings come from.
Take it further
- Do attack volumes from a country track that country's ransomware victim counts?
- Do exploit spikes in the honeypot precede ransomware waves?
- Build a combined timeline and let the overlaps raise the next question.
eventsransomlook_poststimestampdiscovered
-- Two series on one daily timeline: attacks vs ransomware posts
SELECT date_trunc('day', timestamp) AS day, 'attacks' AS series, count(*) AS n
FROM events GROUP BY 1
UNION ALL
SELECT date_trunc('day', discovered), 'ransomware', count(*)
FROM ransomlook_posts GROUP BY 1
ORDER BY day;