In my work with cybersecurity logs, such as Microsoft Sentinel Signin Logs or Email Events, I often faced the challenge of sharing data for analysis, training, or troubleshooting without compromising sensitive information. Logs can contain personally identifiable information (PII), proprietary data, or even sensitive patterns that need to be anonymized before use outside a secure environment.
Existing solutions for log anonymization were either too limited, complex, or not flexible enough for the diverse needs of my workflows. I envisioned a tool that would empower users to easily anonymize and even generate logs to simulate realistic scenarios. With this, analysts could test detections, train teams, or evaluate tools in controlled environments.
Having some experience in coding my own tools, I decided to tackle this problem with a custom solution. My primary goal was to create a tool that allows users to upload a .csv file containing exported logs, select the columns they want to anonymize, and then generate an anonymized .csv file as the output—all done locally for security.
As a secondary goal, I wanted the tool to enable users to generate logs from scratch. By simply defining the row count, specifying column names, and selecting data types, users could create a realistic .csv file for various purposes like testing or training.