Skip to main content

Flow Search

Searching

Each row on Network Flows page (Data Lake -> Network Flows) represents a group of flows that share common attributes; in short, a search.

The search used is written in a lucene-like search syntax. This is a powerful way to search data, and more intuitive than SQL. The search syntax is fairly straightforward. Flow data appears in a javascript object notation (JSON) format.

As the picture shown below, for a row, click the expand icon (top left corner, under the text "expand") to show a table format of the JSON record.

When clicking on the code icon < > on the right, the row opens up and JSON can be seen.

This particular JSON in its entirety looks like:

{
"dur": 0,
"d_isp": "Quad9",
"@type": "metaflow",
"d_org": "Quad9",
"dip": "9.9.9.9",
"rxP": 0,
"d_city": "Zurich",
"dp": 53,
"txP": 0,
"s_country": "US",
"s_city": "North Bethesda",
"prot": 17,
"partition": "default",
"d_country": "CH",
"s_org": "MCI Communications Services, Inc. d/b/a Verizon Business",
"sip": "71.178.173.2",
"s_isp": "Verizon Communications",
"sp": 12235,
"rxB": 0,
"start_ms": 1706198623000,
"txB": 0
}

We can see how much data really is in a simple flow. Fluency keeps a record of data in two formats: flow and event. The difference between these is that events are the uncorrelated direct logs, while flow records are the records of flows with all related events contained in them.

The basics of a flow are called the Tuple. A TCP/IP tuple is composed of a source and destination’s addresses, ports and protocol used, such as TCP or UDP. Every tuple is unique at any given time. Fluency uses the characteristic of tuples to fuse the data into a single document (document being the big data word for record).

There is no communication data in a Fluency flow record unless provided by a reporting system. All the data being displayed, such as how long the communication and the HTTP negotiation, is called metadata. This type of data is used most commonly for behavioral analysis and analytics.

Field-Value Pairings

The most common search is a Field-Value Pairing search. This means we are looking to match a field with a particular value.

In the example above, let’s say we want to find all the flow records going to IP address 10.0.55.91. This can be done by writing a search string where the field and the value we are looking for are separated by a colon:

dip:10.0.55.91

Here, "dip" stands for destination IP address.

Some other netflow tuple data fields are:

sip: source ip address

dip: destination ip address

sp: source port

dp: destination port

txB: transmitted total bytes

rxB: received total bytes

txP: transmitted numbers of packets

rxP: received number of packets

rf: a join value of received flags

tf: a join value of transmitted flags

start_ms: the starting time of the tuple session in UNIX millisecond time

prot: the protocol number

totalB: the total number of bytes sent and received

dur: length in milliseconds that the tuple session lasted

Fluency only relies on the tuple information for correlation, but all these fields are searchable. So, where is the magic if it’s so straightforward?

The magic is that outside of the tuple fields for Network Flows page, none of the other fields are defined, never mind required. Fluency is a schemeless database. This means that when data is parsed, the field names are defined completely by the parser.

This allows Fluency to be forward compatible with new data content that may appear in logs. Only Fluency and Elastic have this capability. It allows for searches to be available as soon as the new data is parsed.

Referring to Nested Fields

There are two reasons why having the code button (<>) is important. The first is that it lets you see all the fields that are available. The other is to see the structure of that field. Fluency stores its records in JSON format. JSON allows for other JSON objects to be part of its data. For example, the http JSON object in the example above contains more fields. When one object is inside the other, it’s referred to as nested.

In order for users to use field-value pairing to find the flows going to terplab.cloud.fluencysecurity.com, we need to list every parent field.

Using the Facet

Facet is a technical term used for groups of attributes; you can think of them as filters. The Facet Section covers this topic in more detail. Facets are an easy way to focus on data and see the most common responses by field.

Page last updated: 2023 Aug 10