Risk indicators that are triggered are notable alerts. Lastly, other risk indicators need standalone machine learning-based processes in real time for example, whether the user is connecting to a domain generation algorithm (DGA) domain can be flagged via a deep learning based method. Determining contextual information is typically performed through a periodic offline machine learning process. For example, the user’s peer group label is a context required for peer-based risk indicator similarly, knowing whether the external email address “ ” belongs to the employee John Doe is a required context for a risk indicator that monitors employees’ email usage. Other risk indicators require machine-learning estimated context. Some risk indicators are computed with simple statistical analysis using the p-value based metric for hypothesis testing - for example, whether the current VPN login country meets a p-value threshold against a frequency distribution tracking historical countries visited. Below I detail the steps of processing events to generate risk indicators, scoring alerts or events, and ultimately grouping them to presentable units for prioritization.Įvents are examined against a collection of risk indicators. For network logs with independent events from activities of authentication, authorization, access, and security vendors’ alerts, etc., algorithms at varying levels of sophistication are at our disposal for UEBA. For endpoint logs in which events have explicit hierarchical or parent-child relationships, graph algorithms that mine the directional links among events are favorable. Different algorithm classes apply to different data types. AlgorithmsĪlgorithms in UEBA are ways of deriving new information or insights from data. Correlation rules are prone to trigger noisy alerts much more than anomaly-based risk indicators. But the user will not trigger a UEBA alert if sending 10 MB is a historically normal behavior, unless a substantially high number of bytes (i.e., an order of magnitude difference) is now sent out. A user may regularly send 10 MB, triggering the correlation rule every time. On the other hand, a UEBA anomaly risk indicator would say “if the number of bytes sent is unusual for this user, then trigger”. This rule is deterministic and does not take user context into account. A correlation rule says, “if the number of bytes sent is more than 10 MB, then trigger”. Products run on correlation rules alone cannot masquerade as UEBA systems. UEBA risk indicators do more than the correlation rules. It is important to know the distinction between typical UEBA risk indicators and conventional correlation rules in SIEM. Security knowledge plays a critical role in providing input to the machine. Then, machine learning follows up to convert the states of these indicators, commonly termed as machine learning features, of an event to a risk score. Whether the user’s activity is performed at an unusual time.Ĭrafting these risk indicators is the work of security experts.Whether this is the user’s first time accessing this asset.Whether the email size is unusually large.Humans design these risk indicators - what hints or clues to look for, for example: Modern UEBA systems have hundreds to thousands of risk indicators. The best UEBA system takes a hybrid approach in combining machine learning and domain knowledge in security. Should UEBA be a completely data-driven system? Or can it incorporate human knowledge? The answer is yes to both. Similarly, filling in missing IP address or host information in an event requires maintaining stateful IP-to-host (and host-to-IP) tracking, which requires event correlation in real-time. Example event enrichment includes the IP-geo lookup per event, which requires joining real-time streams of log and IP-geolocation feeds. Third, the pipeline must perform additional event enrichments in a timely manner. Keeping up with the myriad of changing log formats requires a small army of security log experts, not a small feat. This seemingly mundane task is made difficult because vendors’s security logs are always being created and changed. Today, event parsing and normalization rely heavily on humans to define rules and conditions for specific vendor log formats. Second, the pipeline must be intelligent, transforming raw events to semantically meaningful fields before they can be utilized by the risk engine. Enterprises are now routinely demanding 100K EPS and more, with 1 million EPS now within reach with New-Scale SIEM™. Only a few years ago, processing events at the rate of 10K – 50K events per second (EPS) had seemed satisfactory.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |